Hands-On Exploratory Data Analysis with R
eBook - ePub

Hands-On Exploratory Data Analysis with R

Become an expert in exploratory data analysis using R packages

  1. 266 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Hands-On Exploratory Data Analysis with R

Become an expert in exploratory data analysis using R packages

About this book

Learn exploratory data analysis concepts using powerful R packages to enhance your R data analysis skills

Key Features

  • Speed up your data analysis projects using powerful R packages and techniques
  • Create multiple hands-on data analysis projects using real-world data
  • Discover and practice graphical exploratory analysis techniques across domains

Book Description

Hands-On Exploratory Data Analysis with R will help you build not just a foundation but also expertise in the elementary ways to analyze data. You will learn how to understand your data and summarize its main characteristics. You'll also uncover the structure of your data, and you'll learn graphical and numerical techniques using the R language.

This book covers the entire exploratory data analysis (EDA) process—data collection, generating statistics, distribution, and invalidating the hypothesis. As you progress through the book, you will learn how to set up a data analysis environment with tools such as ggplot2, knitr, and R Markdown, using tools such as DOE Scatter Plot and SML2010 for multifactor, optimization, and regression data problems.

By the end of this book, you will be able to successfully carry out a preliminary investigation on any dataset, identify hidden insights, and present your results in a business context.

What you will learn

  • Learn powerful R techniques to speed up your data analysis projects
  • Import, clean, and explore data using powerful R packages
  • Practice graphical exploratory analysis techniques
  • Create informative data analysis reports using ggplot2
  • Identify and clean missing and erroneous data
  • Explore data analysis techniques to analyze multi-factor datasets

Who this book is for

Hands-On Exploratory Data Analysis with R is for data enthusiasts who want to build a strong foundation for data analysis. If you are a data analyst, data engineer, software engineer, or product manager, this book will sharpen your skills in the complete workflow of exploratory data analysis.

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Section 1: Setting Up Data Analysis Environment

We will start by setting up the R toolkit for exploratory data analysis, dig deep into the concepts of importing data into R, cleaning and manipulating data, and then move onto visualizing data, before producing reproducible data reports.
The following chapters will be covered in this section:
  • Chapter 1, Setting Up Our Data Analysis Environment
  • Chapter 2, Importing Diverse Datasets
  • Chapter 3, Examining, Cleaning, and Filtering
  • Chapter 4, Graphically Visualize Data with ggplot2
  • Chapter 5, Creating Aesthetically Pleasing Reports with Knitr and R Markdown

Setting Up Our Data Analysis Environment

In this chapter, we will look at how Exploratory Data Analysis (EDA) benefits businesses and has a significant impact on almost all vertical markets.
EDA is nothing but a pattern of analyzing datasets to summarize their main features, mostly with visual methods. We will list the R packages and tools that are required to do EDA. We will also focus on the installation procedure and setting up the packages for the EDA environment from an R perspective.
The following topics will be covered in this chapter:
  • The benefits of EDA across vertical markets
  • The most popular R packages for EDA
  • Installing the required R packages and tools

Technical requirements

R is an open source software that is platform independent. All you need to do is download the particular package from the following links:
The following steps are used to install R in your system:
  • You need to have the R language installed. Download the R installer from here: https://cran.r-‐project.org/.
  • We recommend using RStudio. If you don't already have it installed, you can get it from the following link: https://www.rstudio.com/products/rstudio/download.
  • Check that R and RStudio are working.
  • Install the R packages required for the workshop.
The first time you open the RStudio user interface after installation, it will look as shown in the following screenshot:
  • You will also need to have prior knowledge of the R programming language. Packt has a wide range of books and video titles that are available for this purpose.
  • The code for this chapter is available at the following link:
    https://github.com/PacktPublishing/Hands-On-Exploratory-Data-Analysis-with-R.

The benefits of EDA across vertical markets

Every organization today produces and relies on a lot of data in their everyday processes. Before making assumptions and decisions based on this data, organizations need to be able to understand it. EDA enables data analysts and data scientists to bring this information to the right people. It is the most important step on which a data-driven organization should focus its energy and resources.
Having practical tools in hand for carrying out EDA helps data analysts and data scientists produce reproducible and knowledgeable data analysis results. R is one of the most popular data analysis environments, so it makes sense to equip your data analysis teams with powerful R techniques to make the most of their EDA skills.
At the time of writing this book, there are more than 13,000 R packages available according to CRAN. You can get R packages for all kinds of tasks and domains. For our purpose, we will be concentrating on a particular set of R packages that are considered the best by the R community for the purpose of EDA. Some of the packages that we are going to cover may not be directly related to EDA, but they are relevant for other stages of dealing with the data, as indicated by the following diagram:
We will introduce these packages briefly in this chapter and go into more detail as the book progresses. The different stages are as mentioned as follows:
  • Pre Modeling Stage: This stage involves the manipulation of the data frame based on Data Visualization, Data Transformation, Missing Value Imputations, Outlier Detection, Feature Selection, and Dimension Reduction.
  • Modeling Stage: This stage is considered as an intermediate stage that involves Continuous Regression, Ordinal Regression, Classification, Clustering, and Time Series with Survival.
  • Post Modeling Stage: This stage is considered as a final stage where only output interpretation is considered on high priority. It includes the implementation of various algorithms such as clustering, classification, and regression.

Manipulating data

Before you can start exploring your data, you first need to import it into your data analysis environment. There are many types of data, ranging from plain data in comma-separated value files to binary data in databases. Different R packages are equipped to handle these different kinds of data expertly and to import them almost ready for use in our environment. Since we are using R and RStudio, we will describe some of the most powerful R packages to import data in the following sections:
  • readr: readr can be used to read flat, rectangular data into R. It works with both comma-separated and tab-separated values.
  • readxl: We can use the readxl package to read data from MS Excel files.
  • jsonlite: Web services have increasingly started to provide data in a JSON format. The jsonlite package is a good way to import this kind of data into R.
  • httr, rvest: httr, and rvest are very good packages to get data from the web, either from web APIs or by web scraping.
  • DBI: DBI is used to read data from relational databases into R.

Examining, cleaning, and filtering data

The next steps after importing the data are to examine it and check for missing or erroneous data. We then need to clean the data and apply filters and s...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. Dedication
  4. About Packt
  5. Contributors
  6. Preface
  7. Section 1: Setting Up Data Analysis Environment
  8. Setting Up Our Data Analysis Environment
  9. Importing Diverse Datasets
  10. Examining, Cleaning, and Filtering
  11. Visualizing Data Graphically with ggplot2
  12. Creating Aesthetically Pleasing Reports with knitr and R Markdown
  13. Section 2: Univariate, Time Series, and Multivariate Data
  14. Univariate and Control Datasets
  15. Time Series Datasets
  16. Multivariate Datasets
  17. Section 3: Multifactor, Optimization, and Regression Data Problems
  18. Multi-Factor Datasets
  19. Handling Optimization and Regression Data Problems
  20. Section 4: Conclusions
  21. Next Steps
  22. Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Hands-On Exploratory Data Analysis with R by Radhika Datar, Harish Kumar Garg in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Modelling & Design. We have over one million books available in our catalogue for you to explore.