R for Data Science
eBook - ePub

R for Data Science

Dan Toomey

Share book
  1. 364 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

R for Data Science

Dan Toomey

Book details
Book preview
Table of contents
Citations

About This Book

R is a powerful, open source, functional programming language. It can be used for a wide range of programming tasks and is best suited to produce data and visual analytics through customizable scripts and commands.

The purpose of the book is to explore the core topics that data scientists are interested in. This book draws from a wide variety of data sources and evaluates this data using existing publicly available R functions and packages. In many cases, the resultant data can be displayed in a graphical form that is more intuitively understood. You will also learn about the often needed and frequently used analysis techniques in the industry.

By the end of the book, you will know how to go about adopting a range of data science techniques with R.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is R for Data Science an online PDF/ePUB?
Yes, you can access R for Data Science by Dan Toomey in PDF and/or ePUB format, as well as other popular books in Computer Science & Open Source Programming. We have over one million books available in our catalogue for you to explore.

Information

Year
2014
ISBN
9781784390860

R for Data Science


Table of Contents

R for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Data Mining Patterns
Cluster analysis
K-means clustering
Usage
Example
K-medoids clustering
Usage
Example
Hierarchical clustering
Usage
Example
Expectation-maximization
Usage
List of model names
Example
Density estimation
Usage
Example
Anomaly detection
Show outliers
Example
Example
Another anomaly detection example
Calculating anomalies
Usage
Example 1
Example 2
Association rules
Mine for associations
Usage
Example
Questions
Summary
2. Data Mining Sequences
Patterns
Eclat
Usage
Using eclat to find similarities in adult behavior
Finding frequent items in a dataset
An example focusing on highest frequency
arulesNBMiner
Usage
Mining the Agrawal data for frequent sets
Apriori
Usage
Evaluating associations in a shopping basket
Determining sequences using TraMineR
Usage
Determining sequences in training and careers
Similarities in the sequence
Sequence metrics
Usage
Example
Questions
Summary
3. Text Mining
Packages
Text processing
Example
Creating a corpus
Converting text to lowercase
Removing punctuation
Removing numbers
Removing words
Removing whitespaces
Word stems
Document term matrix
Using VectorSource
Text clusters
Word graphics
Analyzing the XML text
Questions
Summary
4. Data Analysis – Regression Analysis
Packages
Simple regression
Multiple regression
Multivariate regression analysis
Robust regression
Questions
Summary
5. Data Analysis – Correlation
Packages
Correlation
Example
Visualizing correlations
Covariance
Pearson correlation
Polychoric correlation
Tetrachoric correlation
A heterogeneous correlation matrix
Partial correlation
Questions
Summary
6. Data Analysis – Clustering
Packages
K-means clustering
Example
Optimal number of clusters
Medoids clusters
The cascadeKM function
Selecting clusters based on Bayesian information
Affinity propagation clustering
Gap statistic to estimate the number of clusters
Hierarchical clustering
Questions
Summary
7. Data Visualization – R Graphics
Packages
Interactive graphics
The latticist package
Bivariate binning display
Mapping
Plotting points on a map
Plotting points on a world map
Google Maps
The ggplot2 package
Questions
Summary
8. Data Visualization – Plotting
Packages
Scatter plots
Regression line
A lowess line
scatterplot
Scatterplot matrices
splom – display matrix data
cpairs – plot matrix data
Density scatter plots
Bar charts and plots
Bar plot
Usage
Bar chart
ggplot2
Word cloud
Questions
Summary
9. Data Visualization – 3D
Packages
Generating 3D graphics
Lattice Cloud – 3D scatterplot
scatterplot3d
scatter3d
cloud3d
RgoogleMaps
vrmlgenbar3D
Big Data
pbdR
Common global values
Distribute data across nodes
Distribute a matrix across nodes
bigmemory
pdbMPI
snow
More Big Data
Research areas
Rcpp
parallel
microbenchmark
pqR
SAP integration
roxygen2
bioconductor
swirl
pipes
Questions
Summary
10. Machine Learning in Action
Packages
Dataset
Data partitioning
Model
Linear model
Prediction
Logistic regression
Residuals
Least squares regression
Relative importance
Stepwise regression
The k-nearest neighbor classification
Naïve Bayes
The train Method
predict
Support vector machines
K-means clustering
Decision trees
AdaBoost
Neural network
Random forests
Questions
Summary
11. Predicting Events with Machine Learning
Automatic forecasting packages
Time series
The SMA function
The decompose function
Exponential smoothing
Forecast
Correlogram
Box test
Holt exponential smoothing
Automated forecasting
ARIMA
Automated ARIMA forecas...

Table of contents