R Programming Fundamentals
eBook - ePub

R Programming Fundamentals

Deal with data using various modeling techniques

  1. 206 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

R Programming Fundamentals

Deal with data using various modeling techniques

About this book

Study data analysis and visualization to successfully analyze data with R

Key Features

  • Get to grips with data cleaning methods
  • Explore statistical concepts and programming in R, including best practices
  • Build a data science project with real-world examples

Book Description

R Programming Fundamentals, focused on R and the R ecosystem, introduces you to the tools for working with data. To start with, you'll understand you how to set up R and RStudio, followed by exploring R packages, functions, data structures, control flow, and loops.

Once you have grasped the basics, you'll move on to studying data visualization and graphics. You'll learn how to build statistical and advanced plots using the powerful ggplot2 library. In addition to this, you'll discover data management concepts such as factoring, pivoting, aggregating, merging, and dealing with missing values.

By the end of this book, you'll have completed an entire data science project of your own for your portfolio or blog.

What you will learn

  • Use basic programming concepts of R such as loading packages, arithmetic functions, data structures, and flow control
  • Import data to R from various formats such as CSV, Excel, and SQL
  • Clean data by handling missing values and standardizing fields
  • Perform univariate and bivariate analysis using ggplot2
  • Create statistical summary and advanced plots such as histograms, scatter plots, box plots, and interaction plots
  • Apply data management techniques, such as factoring, pivoting, aggregating, merging, and dealing with missing values, on the example datasets

Who this book is for

R Programming Fundamentals is for you if you are an analyst who wants to grow in the field of data science and explore the latest tools.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access R Programming Fundamentals by Kaelen Medeiros in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.

Information

Data Visualization and Graphics

Data visualizations are very important in data science. They are used as a part of Exploratory Data Analysis (EDA), to familiarize yourself with data, to examine the distributions of variables, to identify outliers, and to help guide data cleaning and analysis. They are also used to communicate results to a variety of audiences, from other data scientists to customers.
EDA is the general name for the process of using numerical summaries, plots, and aggregating methods to explore a dataset to familiarize yourself with its contents. It will almost certainly involve you examining the distribution of variables in the dataset, looking at missingness, deciding whether there are any outliers or errors, and generally getting a feel for what is contained in your data.
In this chapter, you'll learn about base plots, ggplot2, and will be briefly introduced to more advanced plotting with the applications Shiny and Plotly.
By the end of this chapter, you will be able to:
  • Use Base R for plotting, and identify when to do so
  • Create a variety of different data visualizations using the ggplot2 package
  • Explain different tools for interactive plotting in R

Creating Base Plots

R can plot data without installing any additional packages. This is commonly referred to as base plotting. It is called base plotting because, like functions that come pre-installed with R in the base package, discussed in Chapter 1, Introduction to R, these plots are built into R. The graphics package comes with a download of R and enables you to plot data without installing any other packages.
To see details on the graphics package, you can search for R graphics package in a search engine of your choice or navigate to the following URL: https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/00Index.html.
Base plots are often not used outside of work done for data cleaning and EDA. Many data scientists use other more aesthetically pleasing plots, such as those generated using ggplot2 or Plotly, for any plots or graphs that a customer may see. It is important to know how to use plot() and create base plots, however, so let's dive in!

The plot() Function

The plot() function is the backbone of base plots in R. It provides capability for generic X-Y plotting. It requires only one argument, x, which should be something to plot—a vector of numbers, one variable of a dataset, or a model object such as linear or logistic regression. You can, of course, add a second variable, y, plus an assortment of options to customize the plot, but x is the only input required for the function to run successfully.
For anything beyond the basic x and y arguments to the function, you'll need to get very familiar with using ?plot or help(plot). The documentation suggests options, such as those for titles and axis labels, and also points you to the documentation for other graphical parameters, found under the par() function in R. The options provided by the function are far more detailed and allow you to change the colors, fonts, positions of axis labels, and much more for your base plots.
Beyond knowing the basics about how to use plot(), you do not need to memorize all of the function's possible options. Realistically, you do not need to memorize all of the options for any function in R. Most of the time when you are doing your work, you will have access to documentation and help. Learning R is about learning both how to use functions and also how to look for help when you need it.
All of the preceding options take you directly to the help documentation, also found online at the following URL: https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/plot.html.
When you start out to write plots in base R, you may be interested to know that there are many other inputs besides just the data you want to plot. You can access the R help documentation for the plot() function in the following ways:
  • ?plot
  • help("plot")
  • help(plot)
In RStudio, sometimes the plot may be skewed or squished, as it is constrained by the size of your plot window (usually the bottom-right window, under the Plots tab.) You can, at any time, click the Zoom button and your plot will pop out, usually larger, and give you a better look:
If we first load the datasets library, we gain access to a number of built-in datasets in R that will be useful for both base plotting and using ggplot2. To begin with, we'll use the mtcars dataset. mtcars is a very famous example dataset, and its description (accessed using ?mtcars) is as follows:
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
Minimally, we can plot just one variable of mtcars, for example mpg or the miles per gallon of the cars. This generates a very basic plot of mpg on the y-axis, with index on the x-axis, literally corresponding to the row index of each observation, as follows:
This plot isn't very informative, but it is powerful in terms of seeing how well R can plot even when it is not installed on a particular mac...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. Packt Upsell
  4. Contributors
  5. Preface
  6. Introduction to R
  7. Data Visualization and Graphics
  8. Data Management
  9. Solutions
  10. Other Books You May Enjoy