eBook - ePub

Reproducible Research with R and RStudio

Name: Reproducible Research with R and RStudio
Author: Christopher Gandrud

Christopher Gandrud

Share book

276 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Reproducible Research with R and RStudio

Christopher Gandrud

Book details

Book preview

Table of contents

Citations

About This Book

Praise for previous editions:
" Gandrud has written a great outline of how a fully reproducible research project should look from start to finish, with brief explanations of each tool that he uses along the way… Advanced undergraduate students in mathematics, statistics, and similar fields as well as students just beginning their graduate studies would benefit the most from reading this book. Many more experienced R users or second-year graduate students might find themselves thinking, 'I wish I'd read this book at the start of my studies, when I was first learning R!'…This book could be used as the main text for a class on reproducible research …" ( The American Statistician)

Reproducible Research with R and R Studio, Third Edition brings together the skills and tools needed for doing and presenting computational research. Using straightforward examples, the book takes you through an entire reproducible research workflow. This practical workflow enables you to gather and analyze data as well as dynamically present results in print and on the web. Supplementary materials and example are available on the author's website.

New to the Third Edition

Updated package recommendations, examples, URLs, and removed technologies no longer in regular use.
More advanced R Markdown (and less LaTeX) in discussions of markup languages and examples.
Stronger focus on reproducible working directory tools.
Updated discussion of cloud storage services and persistent reproducible material citation.
Added discussion of Jupyter notebooks and reproducible practices in industry.
Examples of data manipulation with Tidyverse tibbles (in addition to standard data frames) and pivot_longer() and pivot_wider() functions for pivoting data.

Features

Incorporates the most important advances that have been developed since the editions were published
Describes a complete reproducible research workflow, from data gathering to the presentation of results
Shows how to automatically generate tables and figures using R
Includes instructions on formatting a presentation document via markup languages

Discusses cloud storage and versioning services, particularly Github
Explains how to use Unix-like shell programs for working with large research projects

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Reproducible Research with R and RStudio an online PDF/ePUB?

Yes, you can access Reproducible Research with R and RStudio by Christopher Gandrud in PDF and/or ePUB format, as well as other popular books in Economics & Statistics for Business & Economics. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Chapman and Hall/CRC

Year

2020

ISBN

9780429627958

Edition

Topic

Economics

Subtopic

Statistics for Business & Economics

Index

Economics

Part I

Getting Started

1 Introducing Reproducible Research

Research is typically presented in very selective containers: slideshows, journal articles, books, or websites. These presentation documents announce a project’s findings and try to convince us that the results are correct (Mesirov, 2010). It’s important to remember that these documents are not the research. Especially in the computational and statistical sciences, these documents are the “advertising”. The research is the “full software environment, code, and data that produced the results” (Buckheit and Donoho, 1995; Donoho, 2010, 385). When we separate the research from its advertisement, we are making it difficult for others to verify the findings by reproducing them.

This book gives you the tools to dynamically combine your research with the presentation of your findings. The first tool is a workflow for reproducible research that weaves the principles of reproducibility throughout your entire research project, from data gathering to the statistical analysis, and the presentation of results. You will also learn how to use a number of computer tools that make this workflow easier and more robust. These tools include:

• the R statistical language that will allow you to gather data and analyze it;

• the LaTeX and Markdown markup languages that you can use to create documents–slideshows, articles, books, and webpages–for presenting your findings;

• the knitr and rmarkdown packages for R and other tools, including command-line programs like GNU Make and Git version control, for dynamically tying your data gathering, analysis, and presentation documents together so that they can be easily reproduced;

• RStudio, a program that brings all of these tools together.

1.1 What Is Reproducible Research?

Though there is some debate over the necessary and sufficient conditions for a full replication (Makel and Plucker, 2014, 2), research results are generally considered¹ replicable if there is sufficient information available for independent researchers to make the same findings using the same procedures with new data.² For research that relies on experiments, this can mean a researcher not involved in the original research being able to rerun the experiment, including sampling, and validate that the new results are comparable to the original results. In computational and quantitative empirical sciences, results are replicable if independent researchers can recreate findings by following the procedures originally used to gather the data and run the computer code. Of course, it is sometimes difficult to replicate the original data set because of issues such as limited resources to gather new data or because the original study already sampled the full universe of cases. So as a next-best standard, we can aim for “really reproducible research” (Peng, 2011, 1226).³ In computational sciences⁴ this means:

the data and code used to make a finding are available and they are sufficient for an independent researcher to recreate the finding.

In practice, research needs to be easy for independent researchers to reproduce (Ball and Medeiros, 2011). If a study is difficult to reproduce, it’s more likely that no one will reproduce it. If someone does attempt to reproduce this research, it will be difficult for them to tell if any errors they find were in the original research or problems they introduced during the reproduction. In this book, you will learn how to avoid these problems.

In particular, you will learn tools for dynamically “knitting”⁵ the data and the source code together with your presentation documents. Combined with wellorganized source files and clearly and completely commented code, independent researchers will be able to understand how you obtained your results. This will make your computational research easily reproducible.

1.2 Why Should Research Be Reproducible?

Reproducible research is one of the main components of science. If that’s not enough reason for you to make your research reproducible, consider that the tools of reproducible research also have direct benefits for you as a researcher.

1.2.1 For science

Replicability has been a key part of scientific inquiry from perhaps the 1200s (Bacon, 1859; Nosek et al., 2012). It has even been called the “demarcation between science and non-science” (Braude, 1979, 2). Why is replication so important for scientific inquiry?

Standard to judge scientific claims

Replication opens claims to scrutiny, allowing us to keep what works and discard what doesn’t. Science, according to the American Physical Society, “is the systematic enterprise of gathering knowledge … organizing and condensing that knowledge into testable laws and theories”. The “ultimate standard” for evaluating scientific claims is whether or not the claims can be replicated (Peng, 2011; Kelly, 2006). Research findings cannot even really be considered “genuine contributions to human knowledge” until they have been verified through replication (Stodden, 2009b, 38). Replication “requires the complete and open exchange of data, procedures, and materials”. Scientific conclusions that are not replicable should be abandoned or modified “when confronted with more complete or reliable … evidence”.⁶

Reproducibility enhances replicability. If other researchers are able to clearly understand how a finding was originally made, then they will be better able to conduct comparable research in meaningful attempts to replicate the original findings. Sometimes strict replicability is not feasible, for example, when it is only possible to gather one data set on a population of interest. In these cases reproducibility is a “minimum standard” for judging scientific claims (Peng, 2011).

It is important to note that though reproducibility is a minimum standard for judging scientific claims, “a study can be reproducible and still be wrong” (Peng, 2014). For example, a statistically significant finding in one study may remain statistically significant when reproduced using the original data/code, but when researchers try to replicate it using new data and even methods, they are unable to find a similar result. The original finding could have been noise, even though it is fully reproducible.

Avoiding effort duplication and encouraging cumulative knowledge development

Not only is reproducibility important for evaluating scientific claims, it can also contribute to the cumulative growth of scienti...