A Primer in Biological Data Analysis and Visualization Using R
eBook - ePub

A Primer in Biological Data Analysis and Visualization Using R

Gregg Hartvigsen

Share book
  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

A Primer in Biological Data Analysis and Visualization Using R

Gregg Hartvigsen

Book details
Book preview
Table of contents
Citations

About This Book

R is the most widely used open-source statistical and programming environment for the analysis and visualization of biological data. Drawing on Gregg Hartvigsen's extensive experience teaching biostatistics and modeling biological systems, this text is an engaging, practical, and lab-oriented introduction to R for students in the life sciences.

Underscoring the importance of R and RStudio in organizing, computing, and visualizing biological statistics and data, Hartvigsen guides readers through the processes of entering data into R, working with data in R, and using R to visualize data using histograms, boxplots, barplots, scatterplots, and other common graph types. He covers testing data for normality, defining and identifying outliers, and working with non-normal data. Students are introduced to common one- and two-sample tests as well as one- and two-way analysis of variance (ANOVA), correlation, and linear and nonlinear regression analyses. This volume also includes a section on advanced procedures and a chapter introducing algorithms and the art of programming using R.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is A Primer in Biological Data Analysis and Visualization Using R an online PDF/ePUB?
Yes, you can access A Primer in Biological Data Analysis and Visualization Using R by Gregg Hartvigsen in PDF and/or ePUB format, as well as other popular books in Scienze biologiche & Biologia. We have over one million books available in our catalogue for you to explore.

Information

Year
2014
ISBN
9780231537049
CHAPTER 1
INTRODUCING OUR SOFTWARE TEAM
In science we are interested in understanding systems that are complicated. Our use of quantitative approaches gives us the ability to not only understand these systems but also to predict how a system might behave in the future (or maybe even how it behaved in the past). As we work to understand and predict complex biological systems we need computational help. You probably have written lab reports using only a calculator. This should be avoided for a variety of important reasons:
1. Difficulty in verifying that you entered the data correctly. (I think the numbers are right.)
2. Difficulty in repeating the analysis. (I’m not doing it again because I might get a different answer!)
3. Inability to share your analytical approaches and results. (Sorry, I hit the all-clear button! You have to trust me.)
4. Inflexibility in how the data are analyzed. (You wanted me to do what?).
5. Inability to make and share appropriate graphs. (Can I take a picture of the graph on my calculator with my phone and incorporate that in my lab report?)
To solve these shortcomings we will use Excel and R.
You may be somewhat familiar with Excel but probably have little or no experience with R. Therefore, I welcome you to the world of R! I know this might be a scary place for you at first. I bet R is really different from all the programs you’ve used. Fortunately, this introduction is intended for newcomers. But as you proceed you will learn how to do some really amazing things with R. You’ll gain independence with practice. R is like playing an instrument, a sport, or learning a foreign language—they all require practice. I have confidence that you are capable of using R to solve interesting problems. And the more time you spend at it the better you will get.
1.1 SOLVING PROBLEMS WITH EXCEL AND R
For many analytical problems we will be able to use just R. However, in biology, we often test our ideas, or hypotheses, with large amounts of data. We, therefore, will try to use Excel for what it does well (allows us to enter and organize our data). But we will not use Excel to do what it doesn’t do well (statistical analyses, modeling, and visualizing data). Instead, these core scientific skills are best done with R. If you love Excel then you’ll be happy to know we’re not abandoning it—Excel has its place.
It is important to recognize that doing things well is rarely easy. Writing a good poem, playing tennis well, or doing ballet well are all hard. And conducting hypothesis tests correctly and making professional-quality graphs are not simple, one-click operations.
At first you will likely think that making graphs and performing statistical tests in R are absolute nightmares. (And when you become a skilled R programmer you’ll still be challenged at times!) But the days of skipping an analysis or accepting a ungly or incorrect graph because “that’s the best I can do with Excel” are over. You can do it in R! Therefore, in this introduction we will discuss Excel but focus mainly on R. It is the combination of using Excel to organize our data and R for analyses and visualizations that will allow you to ask and answer questions in biology.
You still may be wondering why you can’t just do this all in Excel. Here is a sampling of reasons why R is clearly better than Excel for problem solving in biology. With R you can:
1. create professional, publication-quality visualizations;
2. conduct quantitative analyses, both analytical and statistical (e.g., do a t-test, solve systems of differential equations, conduct non-linear regression, use matrix algebra, conduct signal processing, perform wavelet analysis, analyze fMRI data, do genome analyses, and create phylogenetic reconstructions, to name a few);
3. build statistical tests that can be repeated easily and shared with anyone. These tests might rely on their own data, data read from a file, or data acquired directly from a website;
4. do the same thing and work the same way on computers running Mac, Windows, and Linux;
5. write computer programs, such as modeling a population growing over time, using an object-oriented language;
6. access modern analytical tools for biologists that are being developed right now, right here, and no where else;
7. use and receive widely available help from the R open-source community;
8. use open-source software that provides solutions that are “auditable,” meaning you can understand and explain to others how you got your results (there are no black boxes - it’s open software!);
9. write a document like this. This environment allows one to compile together in one document words, mathematical equations, computer code, statistical tests and output, and professional-quality graphs, all within the free, open-source LATEX typesetting environment;
10. carry a research project, paper, all the data, AND carry the entire software package for doing the analysis on a low-capacity flash drive;
11. rest assured that your investment in skill building will pay off well into the future. You don’t have to hope you’ll have access to the program when you move on to your next stage of life (which could be in a hospital in Ghana!);
12. enjoy these benefits because open-source means R is free!
Your ability to use R to make informed, evidence-based conclusions likely will provide you the most valuable set of skills you’ll learn as an undergraduate science major. If you keep this skill set you will be highly marketable. R helps you speak the language of science, which is written in mathematics, statistics, and data evaluation and visualization. This ability to answer scientific questions and present your results professionally is finally in your hands.
Your ability to use R helps fulfill an important goal that was synthesized in the report Scientific Foundations for Future Physicians produced by the American Association of American Medical Colleges and the Howard Hughes Medical Institute, 2009. The authors of this report downplay the importance of memorizing facts and, instead, encourage students to learn to
apply quantitative reasoning and appropriate mathematics to describe or explain phenomena in the natural world.
Additionally, in the report Vision and Change in Undergraduate Biology: A Call to Action, produced jointly by the American Association for the Advancement of Science and the National Science Foundation (2009), six “core competencies” are advocated for undergraduates in biology. Below are four of th...

Table of contents