Molecular Data Analysis Using R
eBook - ePub

Molecular Data Analysis Using R

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Molecular Data Analysis Using R

About this book

This book addresses the difficulties experienced by wet lab researchers with the statistical analysis of molecular biology related data. The authors explain how to use R and Bioconductor for the analysis of experimental data in the field of molecular biology. The content is based upon two university courses for bioinformatics and experimental biology students (Biological Data Analysis with R and High-throughput Data Analysis with R). The material is divided into chapters based upon the experimental methods used in the laboratories. Key features include:
•Broad appeal--the authors target their material to researchers in several levels, ensuring that the basics are always covered.
•First book to explain how to use R and Bioconductor for the analysis of several types of experimental data in the field of molecular biology.
•Focuses on R and Bioconductor, which are widely used for data analysis. One great benefit of R and Bioconductor is that there is a vast user community and very active discussion in place, in addition to the practice of sharing codes. Further, R is the platform for implementing new analysis approaches, therefore novel methods are available early for R users.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Molecular Data Analysis Using R by Csaba Ortutay,Zsuzsanna Ortutay in PDF and/or ePUB format, as well as other popular books in Medicine & Biostatistics. We have over one million books available in our catalogue for you to explore.

Information

Year
2016
Print ISBN
9781119165026
eBook ISBN
9781119165040
Edition
1

CHAPTER 1
Introduction to R statistical environment

Why R?

If you work in the field of biodata analysis, or if you are interested in getting a bioinformatics job, you can see a large number of related job advertisements targeting young professionals. There is one common topic coming back in those ads: they demand “a high degree of familiarity with R/Bioconductor.” (Here, I am quoting an actual recent ad from Monster.com.)
Besides, when we have to create and analyze a large amount of data during our bio‐researcher career, sooner or later we realize that simple approaches using spread sheets (aka the Excel part of MS Office) are not flexible anymore to fulfill the needs of our projects. In these situations, we start to look for dedicated statistical software tools, and soon we encounter the countless alternatives from which we can choose. The R statistical environment is one among the possibilities.
With the exponential spread of high‐throughput experimental methods, including microarray and next‐generation sequencing (NGS)-based experiments, the skills related to large‐scale analysis of data from biological experiments have higher and higher value. R and Bioconductor offer a free and flexible tool‐set for these types of analyses; therefore, many research groups and companies select it as their data analysis platform.
R is an open‐source software licensed under the GNU General Public License (GPL). This has an advantage that you can install R for free on your desktop computer, regardless of whether you use Windows, Mac OS X, or a Linux distribution.
Introducing all the features of R thoroughly at a general level exceeds the scope and purpose of this book, which is to focus on molecular biology‐specific applications. For those who are interested in a deeper introduction into R itself, it is suggested reading the book R for Beginners by Emmanuel Paradis as a reference guide. It is an excellent general guide, which can be found online (Paradis 2005). In the course, we use more biology‐oriented examples to illustrate the most important topics. The other recommended book for this chapter is R in a Nutshell by Joseph Adler (2012).

Installing R

The first task of analyzing data with R is to install R on the computer. There is a nice discussion on the bioinformatics blogs about why people so seldom use their knowledge acquired on short bioinformatics courses. One of the main considerations points out that it is because the greatest challenge is to install the software in question.
There are plenty of available information on the web about how to install R, but the most authentic source is the website of the R project itself. In this page, the official documentation, installer, and other related links from the developers of R themselves are collected. The first step is to navigate to the download section of the page and find the mirror pages closest to the location of the user.
However, there are some differences in the installation process depending on the operating system of the computer in use. Windows users should find the Windows installer to their system from the download pages. It is useful to check for the base installer, not the contributed libraries. In the case of a Linux distribution, R can be installed via the package manager. Several Linux distributions provide R (and many R libraries) as a part of their repositories. This way, the package manager can take care of the updates. Mac OS X users and Apple fans can find the pkg file containing the R framework, 64‐bit graphical user interface (GUI) (R.app) and Tcl/Tk 8.6.0 X11 libraries for installing the R base systems on their computer. Brave users of other UNIX systems (i.e., FreeBSD or OpenBSD) can use R, but they should compile it from the source. This is not a beginner topic. In the case of a computer owned by a company, university, or library, the installation of R (just like many other programs) requires most often superuser rights.

Interacting with R

The interface of R is somewhat different from other software used for statistics, such as SPSS, S‐plus, Prism, or MS Excel (which is not a statistical software tool!). There are neither icons nor sophisticated menus to perform analyses. Instead, commands should be typed in the appropriate place of R called the “command prompt”. It is marked with >. In this book, the commands for typing into the prompt are marked by fixed‐width (monospaced) fonts:
> citation() 
After typing in a command (and hitting Enter), the results turn up either under the command or, in case of graphics, in a separate window. If the result of a command is nothing, the string NULL appears as a result. Mistyping or making an error in the parameters of a command leads to an error message with some information about what was wrong.
> c() NULL > a * 5 Error: object 'a' not found 
From now on, we will omit the > prompt character from the code samples so you can just copy/paste the commands. Leaving R happens with the quit() function.
quit(save='no') q() 

Graphical interfaces and integrated development environment (IDE) integration

A command‐line interface is enough for performing the practices. However, some prefer to have GUI. There are multiple choices depending on the operating system in use. The Windows and Mac versions of R starts with a very simple GUI, while Linux/UNIX versions start only with a command‐line interface. The Java GUI for R is available for any platform capable of running Java, and it sports simple, functional menus to perform the most basic tasks related to an analysis (Helbig, Urbanek, and Fellows 2013).
For a more advanced GUI, one can experiment with RStudio or R Commander (Fox 2005). There are several plugins to integrate R into the best coding production tools, such as Emacs (with the Emacs Speaks Statistics add‐on), Eclipse (by StatET for R), and many others.

Scripting and sourcing

Doing data analysis in R means typing in commands and experimenting with parameters suitable for the given set of data. At a later stage, the procedure will be repeated either on the same data with slight modifications in the course o...

Table of contents

  1. Cover
  2. Title Page
  3. Table of Contents
  4. Foreword
  5. Preface
  6. Acknowledgements
  7. About the Companion Website
  8. CHAPTER 1: Introduction to R statistical environment
  9. CHAPTER 2: Simple sequence analysis
  10. CHAPTER 3: Annotating gene groups
  11. CHAPTER 4: Next‐generation sequencing: introduction and genomic applications
  12. CHAPTER 5: Quantitative transcriptomics: qRT‐PCR
  13. CHAPTER 6: Advanced transcriptomics: gene expression microarrays
  14. CHAPTER 7: Next‐generation sequencing in transcriptomics: RNA‐seq experiments
  15. CHAPTER 8: Deciphering the regulome: from ChIP to ChIP‐seq
  16. CHAPTER 9: Inferring regulatory and other networks from gene expression data
  17. CHAPTER 10: Analysis of biological networks
  18. CHAPTER 11: Proteomics: mass spectrometry
  19. CHAPTER 12: Measuring protein abundance with ELISA
  20. CHAPTER 13: Flow cytometry: counting and sorting stained cells
  21. Glossary
  22. Index
  23. End User License Agreement