eBook - ePub

Using R for Introductory Statistics

Name: Using R for Introductory Statistics
ISBN: 9781315360300

John Verzani,

518 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Using R for Introductory Statistics

John Verzani,

About this book

The second edition of a bestselling textbook, Using R for Introductory Statistics guides students through the basics of R, helping them overcome the sometimes steep learning curve. The author does this by breaking the material down into small, task-oriented steps. The second edition maintains the features that made the first edition so popular, while updating data, examples, and changes to R in line with the current version.

See What's New in the Second Edition:

Increased emphasis on more idiomatic R provides a grounding in the functionality of base R.
Discussions of the use of RStudio helps new R users avoid as many pitfalls as possible.
Use of knitr package makes code easier to read and therefore easier to reason about.
Additional information on computer-intensive approaches motivates the traditional approach.
Updated examples and data make the information current and topical.

The book has an accompanying package, UsingR, available from CRAN, R's repository of user-contributed packages. The package contains the data sets mentioned in the text (data(package="UsingR")), answers to selected problems (answers()), a few demonstrations (demo()), the errata (errata()), and sample code from the text.

The topics of this text line up closely with traditional teaching progression; however, the book also highlights computer-intensive approaches to motivate the more traditional approach. The authors emphasize realistic data and examples and rely on visualization techniques to gather insight. They introduce statistics and R seamlessly, giving students the tools they need to use R and the information they need to navigate the sometimes complex world of statistical computing.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Chapman and Hall/CRC

Year

2018

Print ISBN

9781466590731

Edition

eBook ISBN

9781315360300

Topic

Mathematics

Subtopic

Computer Science General

Index

Mathematics

1

Getting started

1.1 What is data?

Data and their statistical summaries and interpretations are ubiquitous. For example, we found these four articles during a typical day reading the paper:

• Example 1.1: To compile evidence to establish cause and effect

In an opinion piece, Joe Nocera [46] discusses the prevalence of guns in the movies (in anticipation of yet another “Die Hard” movie). He quotes a spokesperson from the Motion Picture Association of America as

“There is a predominance of findings that show there is no consistent or convincing evidence that exposure [to gun violence in movies] causes people to be more violent.”

However, Nocera immediately refutes this quoting a professor from the University of Wisconsin: “There is tons of research on this.”

Clearly the collection and interpretation of data is crucial when making policy decisions. This isn’t an easy task, of course. A casual reader may think the above differences of opinion are a matter of political motivation, but this need not be the case. Relationships between variables can exist, even if there is not a cause and effect relationship. Trying to find convincing evidence in data often requires a careful collection of data in order for conclusions to be made.

• Example 1.2: Price of a hip replacement

In a news piece, Elisabeth Rosenthal [51] describes the research of Jaime Rosenthal who called more than 100 hospitals, covering every state in the summer of 2012 seeking the price of a hip replacement for a hypothetical, uninsured, 62-year-old female. The results were surprising:

Only about half the institutions could provide an estimate
Of those that could, the range of prices went from $11,000 to $125,798

Commentary in the article urges people to place the price data in the context of many other factors such as infection rates and unexpected deaths. However, the article summarizes the primary researcher’s belief that there is little consistent correlation between higher prices and better quality in American health care.

Even in what is perhaps the most data-driven industry, there is clear need for data and context to place this data within. Further, this example hints at some other difficulties in data collection: e.g., the question of what to do with missing data, as it is often the case that some values will be unavailable. As well, the issue that the actual mechanism for computing this value at a given hospital may vary from that of another.

• Example 1.3: Safety of the airline industry

In a front page article titled “Airline Industry at Its Safest Since the Dawn of the Jet Age,” authors Jad Mouawad and Christopher Drew [43] summarize the data collected by the Aviation Safety Network pointing out that 2012 had only 23 deadly accidents and 475 fatalities. This may sound high, but putting it into a rate helps give context: this is a risk of one death per 45 million flights. That is, a person could fly daily for an average of 123,000 years before being in a fatal plane crash.

The improvements in safety are not limited to advanced technologies, as the industry (regulators, pilots, and airlines) have created a culture of sharing data about flying hazards with the goal of preventing accidents.

This example shows how a focus on understanding the many factors that can contribute to a given statistic can help improve an area. It wasn’t enough that the airline kept statistics, but rather that they used their findings to address shortcomings.

• Example 1.4: Networking

On the business page Andrew Sorkin [53] reports on a data base containing names of over two-million deal makers, power brokers and business executives, and in many cases the name of spouses, children, associates, political donations, charity work, and more. This information held by a company called Relations Science is compiled by more than 800 people.

The goal of course is to sell this information to people who plan to leverage the network of relationships. Of course, other companies, such as Face-book and LinkedIn have such information on their users, and the NSA seeming has all the data it could ever need, but in this case the information is scraped from web sites—a person need not be a member of a social network or have a security clearance.

How such large data bases get mined and what this means for personal privacy will likely continue to be a major topic of conversation for years to come. Though the statistical techniques of working with so-called “big data” are outside the scope of this text, many of the computational skills will be developed.

In this sampling of articles, we see the analysis of data used in many different ways:

Under the name “studies,” data is used to make a case about social policy (in two different ways!).
To investigate variability in prices and transparency, data is collected and summarized.
In an industry, data demonstrates that forward looking practices can have a substantial effect.
Data and the information it contains is mined to establish a financial advantage.

Data and its analysis is a very wide topic, so wide we couldn’t begin to describe it all. In this text we narrow our focus, looking at data with an eye towards statistical inference. This is the process of drawing conclusions about populations based on data collected from these populations. To do this, we will use the language of probability. This will give us the flexibility to describe concrete things using data subject to random variation. Exactly how this will be used will require us to make models for our data. This text is roughly organized into three areas: the first to develop techniques for exploring data, the second the basics of statistical inference, and the third area covers the beginnings of modeling with data.

The rest of this chapter is focused on getting started with using R. We save more statistically oriented examples for Chapters 2 and beyond.

1.2 Getting started with R

This section covers the basics of getting started with R, beginning with some notes on installation and continuing with the basics of interacting with R through the command line.

Installing R

Before beginning with R, it must be installed for usage. R is available as source code from CRAN, http://cran.r-project.org/. However, most users probably will install R from a distributed binary. These are also available from CRAN. For example, the Microsoft Windows binary is distributed as a self-extracting .exe file. Simply download the file then install it as any other download. For Microsoft Windows users, the standard installation will create a desktop icon and start menu item for opening R. If started this way, R will open to its standard Microsoft Windows GUI, but we suggest using RSTUDIO^®, as described next.

Figure 1.1 The RSTUDIO development environment for R. Visible are the console, the source code editor, the plot pane, and the workspace pane....

Cover
Half Title
Title Page
Copyright Page
Contents
Preface
1 Getting started
2 Univariate data
3 Bivariate data
4 Multivariate data
5 Multivariate graphics
6 Populations
7 Statistical inference
8 Confidence intervals
9 Significance tests
10 Goodness of fit
11 Linear regression
12 Analysis of variance
13 Extensions of the linear model
A.1 Functions
Bibliography
Index

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is Using R for Introductory Statistics an online PDF/ePUB?

Yes, you can access Using R for Introductory Statistics by John Verzani in PDF and/or ePUB format, as well as other popular books in Mathematics & Computer Science General. We have over 1.5 million books available in our catalogue for you to explore.

Related ISBNs