Reproducible Research with R and RStudio
eBook - ePub

Reproducible Research with R and RStudio

Christopher Gandrud

Compartir libro
  1. 276 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Reproducible Research with R and RStudio

Christopher Gandrud

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Praise for previous editions:
" Gandrud has written a great outline of how a fully reproducible research project should look from start to finish, with brief explanations of each tool that he uses along the way… Advanced undergraduate students in mathematics, statistics, and similar fields as well as students just beginning their graduate studies would benefit the most from reading this book. Many more experienced R users or second-year graduate students might find themselves thinking, 'I wish I'd read this book at the start of my studies, when I was first learning R!'…This book could be used as the main text for a class on reproducible research …" ( The American Statistician)

Reproducible Research with R and R Studio, Third Edition brings together the skills and tools needed for doing and presenting computational research. Using straightforward examples, the book takes you through an entire reproducible research workflow. This practical workflow enables you to gather and analyze data as well as dynamically present results in print and on the web. Supplementary materials and example are available on the author's website.

New to the Third Edition



  • Updated package recommendations, examples, URLs, and removed technologies no longer in regular use.


  • More advanced R Markdown (and less LaTeX) in discussions of markup languages and examples.


  • Stronger focus on reproducible working directory tools.


  • Updated discussion of cloud storage services and persistent reproducible material citation.


  • Added discussion of Jupyter notebooks and reproducible practices in industry.


  • Examples of data manipulation with Tidyverse tibbles (in addition to standard data frames) and pivot_longer() and pivot_wider() functions for pivoting data.

Features



  • Incorporates the most important advances that have been developed since the editions were published


  • Describes a complete reproducible research workflow, from data gathering to the presentation of results


  • Shows how to automatically generate tables and figures using R


  • Includes instructions on formatting a presentation document via markup languages


  • Discusses cloud storage and versioning services, particularly Github


  • Explains how to use Unix-like shell programs for working with large research projects

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Reproducible Research with R and RStudio un PDF/ePUB en línea?
Sí, puedes acceder a Reproducible Research with R and RStudio de Christopher Gandrud en formato PDF o ePUB, así como a otros libros populares de Economics y Statistics for Business & Economics. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2020
ISBN
9780429627958

Part I

Getting Started

1

Introducing Reproducible Research

Research is typically presented in very selective containers: slideshows, journal articles, books, or websites. These presentation documents announce a project’s findings and try to convince us that the results are correct (Mesirov, 2010). It’s important to remember that these documents are not the research. Especially in the computational and statistical sciences, these documents are the “advertising”. The research is the “full software environment, code, and data that produced the results” (Buckheit and Donoho, 1995; Donoho, 2010, 385). When we separate the research from its advertisement, we are making it difficult for others to verify the findings by reproducing them.
This book gives you the tools to dynamically combine your research with the presentation of your findings. The first tool is a workflow for reproducible research that weaves the principles of reproducibility throughout your entire research project, from data gathering to the statistical analysis, and the presentation of results. You will also learn how to use a number of computer tools that make this workflow easier and more robust. These tools include:
• the R statistical language that will allow you to gather data and analyze it;
• the LaTeX and Markdown markup languages that you can use to create documents–slideshows, articles, books, and webpages–for presenting your findings;
• the knitr and rmarkdown packages for R and other tools, including command-line programs like GNU Make and Git version control, for dynamically tying your data gathering, analysis, and presentation documents together so that they can be easily reproduced;
RStudio, a program that brings all of these tools together.

1.1 What Is Reproducible Research?

Though there is some debate over the necessary and sufficient conditions for a full replication (Makel and Plucker, 2014, 2), research results are generally considered1 replicable if there is sufficient information available for independent researchers to make the same findings using the same procedures with new data.2 For research that relies on experiments, this can mean a researcher not involved in the original research being able to rerun the experiment, including sampling, and validate that the new results are comparable to the original results. In computational and quantitative empirical sciences, results are replicable if independent researchers can recreate findings by following the procedures originally used to gather the data and run the computer code. Of course, it is sometimes difficult to replicate the original data set because of issues such as limited resources to gather new data or because the original study already sampled the full universe of cases. So as a next-best standard, we can aim for “really reproducible research” (Peng, 2011, 1226).3 In computational sciences4 this means:
the data and code used to make a finding are available and they are sufficient for an independent researcher to recreate the finding.
In practice, research needs to be easy for independent researchers to reproduce (Ball and Medeiros, 2011). If a study is difficult to reproduce, it’s more likely that no one will reproduce it. If someone does attempt to reproduce this research, it will be difficult for them to tell if any errors they find were in the original research or problems they introduced during the reproduction. In this book, you will learn how to avoid these problems.
In particular, you will learn tools for dynamically “knitting5 the data and the source code together with your presentation documents. Combined with wellorganized source files and clearly and completely commented code, independent researchers will be able to understand how you obtained your results. This will make your computational research easily reproducible.

1.2 Why Should Research Be Reproducible?

Reproducible research is one of the main components of science. If that’s not enough reason for you to make your research reproducible, consider that the tools of reproducible research also have direct benefits for you as a researcher.

1.2.1 For science

Replicability has been a key part of scientific inquiry from perhaps the 1200s (Bacon, 1859; Nosek et al., 2012). It has even been called the “demarcation between science and non-science” (Braude, 1979, 2). Why is replication so important for scientific inquiry?

Standard to judge scientific claims

Replication opens claims to scrutiny, allowing us to keep what works and discard what doesn’t. Science, according to the American Physical Society, “is the systematic enterprise of gathering knowledge … organizing and condensing that knowledge into testable laws and theories”. The “ultimate standard” for evaluating scientific claims is whether or not the claims can be replicated (Peng, 2011; Kelly, 2006). Research findings cannot even really be considered “genuine contributions to human knowledge” until they have been verified through replication (Stodden, 2009b, 38). Replication “requires the complete and open exchange of data, procedures, and materials”. Scientific conclusions that are not replicable should be abandoned or modified “when confronted with more complete or reliable … evidence”.6
Reproducibility enhances replicability. If other researchers are able to clearly understand how a finding was originally made, then they will be better able to conduct comparable research in meaningful attempts to replicate the original findings. Sometimes strict replicability is not feasible, for example, when it is only possible to gather one data set on a population of interest. In these cases reproducibility is a “minimum standard” for judging scientific claims (Peng, 2011).
It is important to note that though reproducibility is a minimum standard for judging scientific claims, “a study can be reproducible and still be wrong” (Peng, 2014). For example, a statistically significant finding in one study may remain statistically significant when reproduced using the original data/code, but when researchers try to replicate it using new data and even methods, they are unable to find a similar result. The original finding could have been noise, even though it is fully reproducible.

Avoiding effort duplication and encouraging cumulative knowledge development

Not only is reproducibility important for evaluating scientific claims, it can also contribute to the cumulative growth of scienti...

Índice