Mastering Data Analysis with R
eBook - ePub

Mastering Data Analysis with R

  1. 396 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Mastering Data Analysis with R

About this book

Gain sharp insights into your data and solve real-world data science problems with R—from data munging to modeling and visualization

About This Book

  • Handle your data with precision and care for optimal business intelligence
  • Restructure and transform your data to inform decision-making
  • Packed with practical advice and tips to help you get to grips with data mining

Who This Book Is For

If you are a data scientist or R developer who wants to explore and optimize your use of R's advanced features and tools, this is the book for you. A basic knowledge of R is required, along with an understanding of database logic.

What You Will Learn

  • Connect to and load data from R's range of powerful databases
  • Successfully fetch and parse structured and unstructured data
  • Transform and restructure your data with efficient R packages
  • Define and build complex statistical models with glm
  • Develop and train machine learning algorithms
  • Visualize social networks and graph data
  • Deploy supervised and unsupervised classification algorithms
  • Discover how to visualize spatial data with R

In Detail

R is an essential language for sharp and successful data analysis. Its numerous features and ease of use make it a powerful way of mining, managing, and interpreting large sets of data. In a world where understanding big data has become key, by mastering R you will be able to deal with your data effectively and efficiently.

This book will give you the guidance you need to build and develop your knowledge and expertise. Bridging the gap between theory and practice, this book will help you to understand and use data for a competitive advantage.

Beginning with taking you through essential data mining and management tasks such as munging, fetching, cleaning, and restructuring, the book then explores different model designs and the core components of effective analysis. You will then discover how to optimize your use of machine learning algorithms for classification and recommendation systems beside the traditional and more recent statistical methods.

Style and approach

Covering the essential tasks and skills within data science, Mastering Data Analysis provides you with solutions to the challenges of data science. Each section gives you a theoretical overview before demonstrating how to put the theory to work with real-world use cases and hands-on examples.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

Information

Mastering Data Analysis with R


Table of Contents

Mastering Data Analysis with R
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Hello, Data!
Loading text files of a reasonable size
Data files larger than the physical memory
Benchmarking text file parsers
Loading a subset of text files
Filtering flat files before loading to R
Loading data from databases
Setting up the test environment
MySQL and MariaDB
PostgreSQL
Oracle database
ODBC database access
Using a graphical user interface to connect to databases
Other database backends
Importing data from other statistical systems
Loading Excel spreadsheets
Summary
2. Getting Data from the Web
Loading datasets from the Internet
Other popular online data formats
Reading data from HTML tables
Reading tabular data from static Web pages
Scraping data from other online sources
R packages to interact with data source APIs
Socrata Open Data API
Finance APIs
Fetching time series with Quandl
Google documents and analytics
Online search trends
Historical weather data
Other online data sources
Summary
3. Filtering and Summarizing Data
Drop needless data
Drop needless data in an efficient way
Drop needless data in another efficient way
Aggregation
Quicker aggregation with base R commands
Convenient helper functions
High-performance helper functions
Aggregate with data.table
Running benchmarks
Summary functions
Adding up the number of cases in subgroups
Summary
4. Restructuring Data
Transposing matrices
Filtering data by string matching
Rearranging data
dplyr versus data.table
Computing new variables
Memory profiling
Creating multiple variables at a time
Computing new variables with dplyr
Merging datasets
Reshaping data in a flexible way
Converting wide tables to the long table format
Converting long tables to the wide table format
Tweaking performance
The evolution of the reshape packages
Summary
5. Building Models (authored by Renata Nemeth and Gergely Toth)
The motivation behind multivariate models
Linear regression with continuous predictors
Model interpretation
Multiple predictors
Model assumptions
How well does the line fit in the data?
Discrete predictors
Summary
6. Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)
The modeling workflow
Logistic regression
Data considerations
Goodness of model fit
Model comparison
Models for count data
Poisson regression
Negative binomial regression
Multivariate non-linear models
Summary
7. Unstructured Data
Importing the corpus
Cleaning the corpus
Visualizing the most frequent words in the corpus
Further cleanup
Stemming words
Lemmatisation
Analyzing the associations among terms
Some other metrics
The segmentation of documents
Summary
8. Polishing Data
The types and origins of missing data
Identifying missing data
By-passing missing values
Overriding the default arguments of a function
Getting rid of missing data
Filtering missing data before or during the actual analysis
Data imputation
Modeling missing values
Comparing different imputation methods
Not imputing missing values
Multiple imputation
Extreme values and outliers
Testing extreme values
Using robust methods
Summary
9. From Big to Small Data
Adequacy tests
Normality
Multivariate normality
Dependence of variables
KMO and Barlett's test
Principal Component Analysis
PCA algorithms
Determining the number of components
Interpreting components
Rotation methods
Outlier-detection with PCA
Factor analysis
Principal Component Analysis versus Factor Analysis
Multidimensional Scaling
Summary
10. Classification and Clustering
Cluster analysis
Hierarchical clustering
Determining the ideal number of clusters
K-means clustering
Visualizing clusters
Latent class models
Latent Class Analysis
LCR models
Discriminant analysis
Logistic regression
Machine learning algorithms
The K-Nearest Neighbors algorithm
Classification trees
Random forest
Other algorithms
Summary
11. Social Network Analysis of the R Ecosystem
Loading network data
Centrality measures of networks
Visualizing network data
Interactive network plots
Custom plot layouts
Analyzing R package dependencies with an R package
Further network analysis resources
Summary
12. Analyzing Time-series
Creating time-series objects
Visualizing time-series
Seasonal decomposition
Holt-Winters filtering
Autoregressive Integrated Moving Average models
Outlier detection
More complex time-series objects
Advanced time-series analysis
Summary
13. Data Around Us
Geocoding
Visualizing point data in space
Finding polygon overlays of point data
Plotting thematic maps
Rendering polygons around points
Contour lines
Voronoi diagrams
Satellite maps
Interactive maps
Querying Google Maps
JavaScript mapping libraries
Alternative map design...

Table of contents

  1. Mastering Data Analysis with R

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Mastering Data Analysis with R by Gergely Daroczi in PDF and/or ePUB format, as well as other popular books in Ciencia de la computación & Bases de datos. We have over one million books available in our catalogue for you to explore.