Hands-On Data Science with R
eBook - ePub

Hands-On Data Science with R

Techniques to perform data manipulation and mining to build smart analytical models using R

Vitor Bianchi Lanzetta, Nataraj Dasgupta, Ricardo Anjoleto Farias

Compartir libro
  1. 420 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Hands-On Data Science with R

Techniques to perform data manipulation and mining to build smart analytical models using R

Vitor Bianchi Lanzetta, Nataraj Dasgupta, Ricardo Anjoleto Farias

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

A hands-on guide for professionals to perform various data science tasks in R

Key Features

  • Explore the popular R packages for data science
  • Use R for efficient data mining, text analytics and feature engineering
  • Become a thorough data science professional with the help of hands-on examples and use-cases in R

Book Description

R is the most widely used programming language, and when used in association with data science, this powerful combination will solve the complexities involved with unstructured datasets in the real world. This book covers the entire data science ecosystem for aspiring data scientists, right from zero to a level where you are confident enough to get hands-on with real-world data science problems.

The book starts with an introduction to data science and introduces readers to popular R libraries for executing data science routine tasks. This book covers all the important processes in data science such as data gathering, cleaning data, and then uncovering patterns from it. You will explore algorithms such as machine learning algorithms, predictive analytical models, and finally deep learning algorithms. You will learn to run the most powerful visualization packages available in R so as to ensure that you can easily derive insights from your data.

Towards the end, you will also learn how to integrate R with Spark and Hadoop and perform large-scale data analytics without much complexity.

What you will learn

  • Understand the R programming language and its ecosystem of packages for data science
  • Obtain and clean your data before processing
  • Master essential exploratory techniques for summarizing data
  • Examine various machine learning prediction, models
  • Explore the H2O analytics platform in R for deep learning
  • Apply data mining techniques to available datasets
  • Work with interactive visualization packages in R
  • Integrate R with Spark and Hadoop for large-scale data analytics

Who this book is for

If you are a budding data scientist keen to learn about the popular pandas library, or a Python developer looking to step into the world of data analysis, this book is the ideal resource you need to get started. Some programming experience in Python will be helpful to get the most out of this course

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Hands-On Data Science with R un PDF/ePUB en línea?
Sí, puedes acceder a Hands-On Data Science with R de Vitor Bianchi Lanzetta, Nataraj Dasgupta, Ricardo Anjoleto Farias en formato PDF o ePUB, así como a otros libros populares de Informatik y Informatik Allgemein. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2018
ISBN
9781789135831
Edición
1
Categoría
Informatik

Machine Learning with R

"What we want is a machine that can learn from experience."
– Alan Turing
Machine learning is an interdisciplinary field that involves computer science, neurocomputing, statistics, and more. The idea of machines actually learning can be dated back to Alan Turing and the beginning of Artificial Intelligence (AI). Although the foundations of machine learning and the vague idea of it could be found earlier in the sayings of the great Turing, it was not until 1959 that the term machine learning, was coined by the computer scientist, Arthur Samuel.
Although such ideas were circulating before 20st century, it only became popular in the first decades of the 21st century; since then, its reputation has skyrocketed. There are many reasons for this having happened—machine learning is extremely useful—but I would mostly point to two different reasons.
First, there is data volume. Huge volumes of data are being produced every day, everywhere. To process all this information, a much more efficient and novel way of doing it was needed. Machine learning methods aimed to solve this problem. Some of their methods are data-hungry and practically each of them is able to handle linear and non-linear relations.
The second reason is feasibility. Algorithms and computing power have improved rapidly; thus, allowing machines to learn from large datasets in a reasonable time. This chapter is designed to introduce readers to the world of machine learning while estabilhing some paralallels with traditional statistics. The chapter also demonstrates how to practially fit several machine learning models through R.
The reader may feel that too much attention is given to unsupervised learning rather than supervised. This approach was purposeful given that later chapters will more cautiously discuss supervised learning methods.
Here is what can be found in this chapter:
  • Which big companies are using machine learning
  • Linear regression with base R
  • Building decision trees with tree and rpart
  • Random forest, bagging, and boosting methods
  • Training support vector machines (SVM) with caret
  • Building feedforward neural networks using h2o
There are several machine learning models already available for R users. In this chapter, quite a few of them will be discussed in a practical manner. But what is machine learning? There are many definitions. The next section is defining machine learning and briefly discussing its use.

What is machine learning?

What do we mean by machine learning? It's an interdisciplinary subject that cares about the development, comprehension, and application of computational methods meant to learn and generalize from datasets; it's usually related but not limited to big data. Machine learning shores up a family of ever-growing methods, suitable for overcoming a wide range of problems.
I deeply appreciate how it has been used to fight junk email. The way it suggests replies to emails (that hardly are spam) proved to be of enormous aid too.
Such a great ability to solve problems certainly attracted big companies and tech geeks all over the world.

Machine learning everywhere

Netflix is uses machine learning to give you personal recommendations of content to watch; Amazon uses machine learning to recommend products to buy based on what you've already bought. These are the so-called recommenders. They are usually (but not only) built using clustering techniques.
Machine learning techniques have been also used to diagnose illnesses. Aside from the application of clustering in cancer diagnosis already mentioned in Chapter 4, KDD, Data Mining, and Text Mining, neural networks can be trained to read various exams and even predict how likely a patient is to develop certain kinds of diseases—this field is called predictive medicine and highly benefits from machine learning advancements.
Saving endangered species is yet another wonderful usage of machine learning. Researchers from the University of Southern California Center for AI in Society have trained a neural network to detect illegal hunters that set foot in national parks from Zimbabwe and Malawi. This system is designed to distinguish hunters from animals using heat signatures and was baptized as Systematic POacher deTector (SPOT).
There are unconventional uses of machine learning models. Some folks are using it to compose songs, poems, and draw figures.
Tech workers, such as Zach Lubarsky and Ethan Phelps-Goodman, are actively engaging in data-driven campaigns to solve social issues. Lubarsky and Phelps-Goodman belong to the Seattle Tech 4 Housing organization, a community dedicated to improving Seattle's residence affordability.
A quick web search will tell you that there are many real-world applications of machine learning as there are stars in the sky. Talking about stars, how do you think that the galactical sized datasets generated by astronomers are being processed? That's right, machine learning.
This collection of methods can be separated into two classes: unsupervised (unlabeled) and supervised (labeled) learning. For the former, there is no target value to fit the models—hierarchical clusters are a good example of those. The objective of unsupervised learning is usually, but not always, to extract features from data rather than actual forecasts.
Next we will be looking at how traditional statistics connect to machine learning. There are many clear connections linking both streams. To mention one, regressions from traditional statistics can also be seen in machine learning applications. Ronald Fisher, a well-renowned statistician, is recognized by some people to be among the first individuals to use machine learning.
Supervised learning models are trained to target one or more variables; hence you need labeled data. Recurrent neural networks (RNNs) can be cited as a supervised learning technique. Although practical examples for both classes are provided in this chapter, more attention is given to unsupervised learning, since supervised is focused on in further chapters such as Chapter 8, Neural Networks and Deep Learning.
Although many concepts adopted in machine learning field are essentially the same as the ones that arose from traditional statisticians and forecasters, machine learning has a vocabulary of its own. Differences may have originated due to the main proponents of the field being more related to computing than statistics.
There is no downside to learning this vocabulary. A great way to do so is to relate machine learning terms to statistical ones. Moving on to the next section, we can see how many core ideas from machine learning can be somehow translated into statistical concepts.

Machine learning vocabulary

At the end of the last section, we already hypothesized why machine learning managed to diverge in vacabulary from statistcs. Let me begin this section by discussing why the core ideas converge in essence. Many statistical methods crave to prae e videre, that is Latin for to see something that did not happen yet before it actually does, or simply, predict.
Prediction tasks, as other pattern recognition duties, often require a very sharp ability to comprehend data and generalize well into yet unseen information. This sort of shared goal drove the distinct efforts from traditional statistics and machine learning to many common places. Also, statistics, virtue to conceive all sorts of events in a probabilistic way makes it very useful to machine learning, which could be another source of shared ground acrross the different fields, not to mention the interdisciplinary nature of machine learning.
No matter the reason for that, machine learning vocabulary can be adapted and understood through statistics. This translation makes it especially easy for lovers of statistics to master machine learning and vice versa. The paper, Neural Networks and Statistical Models, written by Warren S. Sarle and published in 1994, showed how machine learning jargon could be related to statistical jargon. Here are some jargons:
Statistical jargon Machine learning correspondent
Model estimation Model training or learning
Estimation criteria Cost function
Variables Features
Independent variables Inputs
Predicted values Outputs
Dependent variables Training or target values
Now that we acknowledge the existence of a link between statistics and machine learning, the time is coming to take a practical tour through the traditional methods of linear regression given by statistics using our beloved R—but not before examining the general tasks that machine learning is up to.

Generic problems solved by machine learning

Whether a problem can be solved through machine learning is only a matter of how much data, creativity, and computational power does one have. Machine learning can be used to aid diagnosis, draw recommendations, classify stellar objects, protect animal life and tackle social issues.
It can likewise be used to detect frauds, such as fraudulent credit card t...

Índice