Using R for Introductory Statistics
eBook - ePub

Using R for Introductory Statistics

John Verzani

  1. 518 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Using R for Introductory Statistics

John Verzani

Book details
Book preview
Table of contents
Citations

About This Book

The second edition of a bestselling textbook, Using R for Introductory Statistics guides students through the basics of R, helping them overcome the sometimes steep learning curve. The author does this by breaking the material down into small, task-oriented steps. The second edition maintains the features that made the first edition so popular, while updating data, examples, and changes to R in line with the current version.

See What's New in the Second Edition:



  • Increased emphasis on more idiomatic R provides a grounding in the functionality of base R.
  • Discussions of the use of RStudio helps new R users avoid as many pitfalls as possible.
  • Use of knitr package makes code easier to read and therefore easier to reason about.
  • Additional information on computer-intensive approaches motivates the traditional approach.
  • Updated examples and data make the information current and topical.

The book has an accompanying package, UsingR, available from CRAN, R's repository of user-contributed packages. The package contains the data sets mentioned in the text (data(package="UsingR")), answers to selected problems (answers()), a few demonstrations (demo()), the errata (errata()), and sample code from the text.

The topics of this text line up closely with traditional teaching progression; however, the book also highlights computer-intensive approaches to motivate the more traditional approach. The authors emphasize realistic data and examples and rely on visualization techniques to gather insight. They introduce statistics and R seamlessly, giving students the tools they need to use R and the information they need to navigate the sometimes complex world of statistical computing.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Using R for Introductory Statistics an online PDF/ePUB?
Yes, you can access Using R for Introductory Statistics by John Verzani in PDF and/or ePUB format, as well as other popular books in Informatik & Informatik Allgemein. We have over one million books available in our catalogue for you to explore.

Information

Year
2018
ISBN
9781315360300

1

Getting started

1.1 What is data?

Data and their statistical summaries and interpretations are ubiquitous. For example, we found these four articles during a typical day reading the paper:
ā€¢ Example 1.1: To compile evidence to establish cause and effect
In an opinion piece, Joe Nocera [46] discusses the prevalence of guns in the movies (in anticipation of yet another ā€œDie Hardā€ movie). He quotes a spokesperson from the Motion Picture Association of America as
ā€œThere is a predominance of findings that show there is no consistent or convincing evidence that exposure [to gun violence in movies] causes people to be more violent.ā€
However, Nocera immediately refutes this quoting a professor from the University of Wisconsin: ā€œThere is tons of research on this.ā€
Clearly the collection and interpretation of data is crucial when making policy decisions. This isnā€™t an easy task, of course. A casual reader may think the above differences of opinion are a matter of political motivation, but this need not be the case. Relationships between variables can exist, even if there is not a cause and effect relationship. Trying to find convincing evidence in data often requires a careful collection of data in order for conclusions to be made.
ā€¢ Example 1.2: Price of a hip replacement
In a news piece, Elisabeth Rosenthal [51] describes the research of Jaime Rosenthal who called more than 100 hospitals, covering every state in the summer of 2012 seeking the price of a hip replacement for a hypothetical, uninsured, 62-year-old female. The results were surprising:
  1. Only about half the institutions could provide an estimate
  2. Of those that could, the range of prices went from $11,000 to $125,798
Commentary in the article urges people to place the price data in the context of many other factors such as infection rates and unexpected deaths. However, the article summarizes the primary researcherā€™s belief that there is little consistent correlation between higher prices and better quality in American health care.
Even in what is perhaps the most data-driven industry, there is clear need for data and context to place this data within. Further, this example hints at some other difficulties in data collection: e.g., the question of what to do with missing data, as it is often the case that some values will be unavailable. As well, the issue that the actual mechanism for computing this value at a given hospital may vary from that of another.
ā€¢ Example 1.3: Safety of the airline industry
In a front page article titled ā€œAirline Industry at Its Safest Since the Dawn of the Jet Age,ā€ authors Jad Mouawad and Christopher Drew [43] summarize the data collected by the Aviation Safety Network pointing out that 2012 had only 23 deadly accidents and 475 fatalities. This may sound high, but putting it into a rate helps give context: this is a risk of one death per 45 million flights. That is, a person could fly daily for an average of 123,000 years before being in a fatal plane crash.
The improvements in safety are not limited to advanced technologies, as the industry (regulators, pilots, and airlines) have created a culture of sharing data about flying hazards with the goal of preventing accidents.
This example shows how a focus on understanding the many factors that can contribute to a given statistic can help improve an area. It wasnā€™t enough that the airline kept statistics, but rather that they used their findings to address shortcomings.
ā€¢ Example 1.4: Networking
On the business page Andrew Sorkin [53] reports on a data base containing names of over two-million deal makers, power brokers and business executives, and in many cases the name of spouses, children, associates, political donations, charity work, and more. This information held by a company called Relations Science is compiled by more than 800 people.
The goal of course is to sell this information to people who plan to leverage the network of relationships. Of course, other companies, such as Face-book and LinkedIn have such information on their users, and the NSA seeming has all the data it could ever need, but in this case the information is scraped from web sitesā€”a person need not be a member of a social network or have a security clearance.
How such large data bases get mined and what this means for personal privacy will likely continue to be a major topic of conversation for years to come. Though the statistical techniques of working with so-called ā€œbig dataā€ are outside the scope of this text, many of the computational skills will be developed.
In this sampling of articles, we see the analysis of data used in many different ways:
  • Under the name ā€œstudies,ā€ data is used to make a case about social policy (in two different ways!).
  • To investigate variability in prices and transparency, data is collected and summarized.
  • In an industry, data demonstrates that forward looking practices can have a substantial effect.
  • Data and the information it contains is mined to establish a financial advantage.
Data and its analysis is a very wide topic, so wide we couldnā€™t begin to describe it all. In this text we narrow our focus, looking at data with an eye towards statistical inference. This is the process of drawing conclusions about populations based on data collected from these populations. To do this, we will use the language of probability. This will give us the flexibility to describe concrete things using data subject to random variation. Exactly how this will be used will require us to make models for our data. This text is roughly organized into three areas: the first to develop techniques for exploring data, the second the basics of statistical inference, and the third area covers the beginnings of modeling with data.
The rest of this chapter is focused on getting started with using R. We save more statistically oriented examples for Chapters 2 and beyond.

1.2 Getting started with R

This section covers the basics of getting started with R, beginning with some notes on installation and continuing with the basics of interacting with R through the command line.

Installing R

Before beginning with R, it must be installed for usage. R is available as source code from CRAN, http://cran.r-project.org/. However, most users probably will install R from a distributed binary. These are also available from CRAN. For example, the Microsoft Windows binary is distributed as a self-extracting .exe file. Simply download the file then install it as any other download. For Microsoft Windows users, the standard installation will create a desktop icon and start menu item for opening R. If started this way, R will open to its standard Microsoft Windows GUI, but we suggest using RSTUDIOĀ®, as described next.
Images
Figure 1.1 The RSTUDIO development environment for R. Visible are the console, the source code editor, the plot pane, and the workspace pane....

Table of contents

Citation styles for Using R for Introductory Statistics

APA 6 Citation

Verzani, J. (2018). Using R for Introductory Statistics (2nd ed.). CRC Press. Retrieved from https://www.perlego.com/book/2193510/using-r-for-introductory-statistics-pdf (Original work published 2018)

Chicago Citation

Verzani, John. (2018) 2018. Using R for Introductory Statistics. 2nd ed. CRC Press. https://www.perlego.com/book/2193510/using-r-for-introductory-statistics-pdf.

Harvard Citation

Verzani, J. (2018) Using R for Introductory Statistics. 2nd edn. CRC Press. Available at: https://www.perlego.com/book/2193510/using-r-for-introductory-statistics-pdf (Accessed: 15 October 2022).

MLA 7 Citation

Verzani, John. Using R for Introductory Statistics. 2nd ed. CRC Press, 2018. Web. 15 Oct. 2022.