In science we are interested in understanding systems that are complicated. Our use of quantitative approaches gives us the ability to not only understand these systems but also to predict how a system might behave in the future (or maybe even how it behaved in the past). As we work to understand and predict complex biological systems we need computational help. You probably have written lab reports using only a calculator. This should be avoided for a variety of important reasons:
1. Difficulty in verifying that you entered the data correctly. (I think the numbers are right.)
2. Difficulty in repeating the analysis. (I’m not doing it again because I might get a different answer!)
3. Inability to share your analytical approaches and results. (Sorry, I hit the all-clear button! You have to trust me.)
4. Inflexibility in how the data are analyzed. (You wanted me to do what?).
5. Inability to make and share appropriate graphs. (Can I take a picture of the graph on my calculator with my phone and incorporate that in my lab report?)
To solve these shortcomings we will use Excel and R.
You may be somewhat familiar with Excel but probably have little or no experience with R. Therefore, I welcome you to the world of R! I know this might be a scary place for you at first. I bet R is really different from all the programs you’ve used. Fortunately, this introduction is intended for newcomers. But as you proceed you will learn how to do some really amazing things with R. You’ll gain independence with practice. R is like playing an instrument, a sport, or learning a foreign language—they all require practice. I have confidence that you are capable of using R to solve interesting problems. And the more time you spend at it the better you will get.
For many analytical problems we will be able to use just R. However, in biology, we often test our ideas, or hypotheses, with large amounts of data. We, therefore, will try to use Excel for what it does well (allows us to enter and organize our data). But we will not use Excel to do what it doesn’t do well (statistical analyses, modeling, and visualizing data). Instead, these core scientific skills are best done with R. If you love Excel then you’ll be happy to know we’re not abandoning it—Excel has its place.
It is important to recognize that doing things well is rarely easy. Writing a good poem, playing tennis well, or doing ballet well are all hard. And conducting hypothesis tests correctly and making professional-quality graphs are not simple, one-click operations.
At first you will likely think that making graphs and performing statistical tests in R are absolute nightmares. (And when you become a skilled R programmer you’ll still be challenged at times!) But the days of skipping an analysis or accepting a ungly or incorrect graph because “that’s the best I can do with Excel” are over. You can do it in R! Therefore, in this introduction we will discuss Excel but focus mainly on R. It is the combination of using Excel to organize our data and R for analyses and visualizations that will allow you to ask and answer questions in biology.
You still may be wondering why you can’t just do this all in Excel. Here is a sampling of reasons why R is clearly better than Excel for problem solving in biology. With R you can:
1. create professional, publication-quality visualizations;
2. conduct quantitative analyses, both analytical and statistical (e.g., do a t-test, solve systems of differential equations, conduct non-linear regression, use matrix algebra, conduct signal processing, perform wavelet analysis, analyze fMRI data, do genome analyses, and create phylogenetic reconstructions, to name a few);
3. build statistical tests that can be repeated easily and shared with anyone. These tests might rely on their own data, data read from a file, or data acquired directly from a website;
4. do the same thing and work the same way on computers running Mac, Windows, and Linux;
5. write computer programs, such as modeling a population growing over time, using an object-oriented language;
6. access modern analytical tools for biologists that are being developed right now, right here, and no where else;
7. use and receive widely available help from the R open-source community;
8. use open-source software that provides solutions that are “auditable,” meaning you can understand and explain to others how you got your results (there are no black boxes - it’s open software!);
9. write a document like this. This environment allows one to compile together in one document words, mathematical equations, computer code, statistical tests and output, and professional-quality graphs, all within the free, open-source LATEX typesetting environment;
10. carry a research project, paper, all the data, AND carry the entire software package for doing the analysis on a low-capacity flash drive;
11. rest assured that your investment in skill building will pay off well into the future. You don’t have to hope you’ll have access to the program when you move on to your next stage of life (which could be in a hospital in Ghana!);
12. enjoy these benefits because open-source means R is free!
Your ability to use R to make informed, evidence-based conclusions likely will provide you the most valuable set of skills you’ll learn as an undergraduate science major. If you keep this skill set you will be highly marketable. R helps you speak the language of science, which is written in mathematics, statistics, and data evaluation and visualization. This ability to answer scientific questions and present your results professionally is finally in your hands.
Your ability to use R helps fulfill an important goal that was synthesized in the report Scientific Foundations for Future Physicians produced by the American Association of American Medical Colleges and the Howard Hughes Medical Institute, 2009. The authors of this report downplay the importance of memorizing facts and, instead, encourage students to learn to
apply quantitative reasoning and appropriate mathematics to describe or explain phenomena in the natural world.
Additionally, in the report Vision and Change in Undergraduate Biology: A Call to Action, produced jointly by the American Association for the Advancement of Science and the National Science Foundation (2009), six “core competencies” are advocated for undergraduates in biology. Below are four of th...