# Hands-On Data Science with R

## Techniques to perform data manipulation and mining to build smart analytical models using R

## Vitor Bianchi Lanzetta, Nataraj Dasgupta, Ricardo Anjoleto Farias

- 420 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android

# Hands-On Data Science with R

## Techniques to perform data manipulation and mining to build smart analytical models using R

## Vitor Bianchi Lanzetta, Nataraj Dasgupta, Ricardo Anjoleto Farias

## About This Book

A hands-on guide for professionals to perform various data science tasks in R

Key Features

- Explore the popular R packages for data science
- Use R for efficient data mining, text analytics and feature engineering
- Become a thorough data science professional with the help of hands-on examples and use-cases in R

Book Description

R is the most widely used programming language, and when used in association with data science, this powerful combination will solve the complexities involved with unstructured datasets in the real world. This book covers the entire data science ecosystem for aspiring data scientists, right from zero to a level where you are confident enough to get hands-on with real-world data science problems.

The book starts with an introduction to data science and introduces readers to popular R libraries for executing data science routine tasks. This book covers all the important processes in data science such as data gathering, cleaning data, and then uncovering patterns from it. You will explore algorithms such as machine learning algorithms, predictive analytical models, and finally deep learning algorithms. You will learn to run the most powerful visualization packages available in R so as to ensure that you can easily derive insights from your data.

Towards the end, you will also learn how to integrate R with Spark and Hadoop and perform large-scale data analytics without much complexity.

What you will learn

- Understand the R programming language and its ecosystem of packages for data science
- Obtain and clean your data before processing
- Master essential exploratory techniques for summarizing data
- Examine various machine learning prediction, models
- Explore the H2O analytics platform in R for deep learning
- Apply data mining techniques to available datasets
- Work with interactive visualization packages in R
- Integrate R with Spark and Hadoop for large-scale data analytics

Who this book is for

If you are a budding data scientist keen to learn about the popular pandas library, or a Python developer looking to step into the world of data analysis, this book is the ideal resource you need to get started. Some programming experience in Python will be helpful to get the most out of this course

## Frequently asked questions

## Information

# Machine Learning with R

– Alan Turing

**Artificial Intelligence**(

**AI**). Although the foundations of machine learning and the vague idea of it could be found earlier in the sayings of the great Turing, it was not until 1959 that the term machine learning, was coined by the computer scientist, Arthur Samuel.

^{st}century, it only became popular in the first decades of the 21

^{st}century; since then, its reputation has skyrocketed. There are many reasons for this having happened—machine learning is extremely useful—but I would mostly point to two different reasons.

- Which big companies are using machine learning
- Linear regression with base R
- Building decision trees with tree and rpart
- Random forest, bagging, and boosting methods
- Training
**support vector machines**(**SVM**) with caret - Building feedforward neural networks using h2o

# What is machine learning?

# Machine learning everywhere

**recommenders**. They are usually (but not only) built using clustering techniques.

*KDD, Data Mining, and Text Mining*, neural networks can be trained to read various exams and even predict how likely a patient is to develop certain kinds of diseases—this field is called

**predictive medicine**and highly benefits from machine learning advancements.

*University of Southern California Center for AI in Society*have trained a neural network to detect illegal hunters that set foot in national parks from Zimbabwe and Malawi. This system is designed to distinguish hunters from animals using heat signatures and was baptized as

**Systematic POacher deTector**(

**SPOT**).

**Recurrent neural networks**(

**RNNs**) can be cited as a supervised learning technique. Although practical examples for both classes are provided in this chapter, more attention is given to unsupervised learning, since supervised is focused on in further chapters such as Chapter 8,

*Neural Networks and Deep Learning*.

# Machine learning vocabulary

*prae e videre*, that is Latin for

*to see something that did not happen yet before it actually does*, or simply, predict.

*Neural Networks and Statistical Models,*written by Warren S. Sarle and published in 1994, showed how machine learning jargon could be related to statistical jargon. Here are some jargons:

Statistical jargon | Machine learning correspondent |

Model estimation | Model training or learning |

Estimation criteria | Cost function |

Variables | Features |

Independent variables | Inputs |

Predicted values | Outputs |

Dependent variables | Training or target values |

# Generic problems solved by machine learning

## Table of contents

- Title Page
- Copyright and Credits
- About Packt
- Contributors
- Preface
- Getting Started with Data Science and R
- Descriptive and Inferential Statistics
- Data Wrangling with R
- KDD, Data Mining, and Text Mining
- Data Analysis with R
- Machine Learning with R
- Forecasting and ML App with R
- Neural Networks and Deep Learning
- Markovian in R
- Visualizing Data
- Going to Production with R
- Large Scale Data Analytics with Hadoop
- R on Cloud
- The Road Ahead
- Other Books You May Enjoy