Learn R
eBook - ePub
Available until 25 Jan |Learn more

Learn R

As a Language

  1. 350 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub
Available until 25 Jan |Learn more

Learn R

As a Language

About this book

Learning a computer language like R can be either frustrating, fun, or boring. Having fun requires challenges that wake up the learner's curiosity but also provide an emotional reward on overcoming them. This book is designed so that it includes smaller and bigger challenges, in what I call playgrounds, in the hope that all readers will enjoy their path to R fluency. Fluency in the use of a language is a skill that is acquired through practice and exploration. Although rarely mentioned separately, fluency in a computer programming language involves both writing and reading. The parallels between natural and computer languages are many, but differences are also important. For students and professionals in the biological sciences, humanities, and many applied fields, recognizing the parallels between R and natural languages should help them feel at home with R. The approach I use is similar to that of a travel guide, encouraging exploration and describing the available alternatives and how to reach them. The intention is to guide the reader through the R landscape of 2020 and beyond.

Features



  • R as it is currently used


  • Few prescriptive rules—mostly the author's preferences together with alternatives


  • Explanation of the R grammar emphasizing the "R way of doing things"


  • Tutoring for "programming in the small" using scripts


  • The grammar of graphics and the grammar of data described as grammars


  • Examples of data exchange between R and the foreign world using common file formats


  • Coaching for becoming an independent R user, capable of both writing original code and solving future challenges

What makes this book different from others:



  • Tries to break the ice and help readers from all disciplines feel at home with R


  • Does not make assumptions about what the reader will use R for


  • Attempts to do only one thing well: guide readers into becoming fluent in the R language

Pedro J. Aphalo is a PhD graduate from the University of Edinburgh, and is currently a lecturer at the University of Helsinki. A plant biologist and agriculture scientist with a passion for data, electronics, computers, and photography, in addition to plants, Dr. Aphalo has been a user of R for 25 years. He first organized an R course for MSc students 18 years ago, and is the author of 13 R packages currently in CRAN.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Learn R by Pedro J. Aphalo in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming Languages. We have over one million books available in our catalogue for you to explore.

1

R: The language and the program

In a world of … relentless pressure for more of everything, one can lose sight of the basic principles—simplicity, clarity, generality—that form the bedrock of good software.
Brian W. Kernighan and Rob Pike The Practice of Programming, 1999

1.1 Aims of this chapter

In this chapter you will learn some facts about the history and design aims behind the R language, its implementation in the R program, and how it is used in actual practice when sitting at a computer. You will learn the difference between typing commands interactively, reading each partial response from R on the screen as you type versus using R scripts to execute a “job” which saves results for later inspection by the user.
I will describe the advantages and disadvantages of textual command languages such as R compared to menu-driven user interfaces as frequently used in other statistics software and occasionally also with R. I will discuss the role of textual languages in the very important question of reproducibility of data analyses.
Finally you will learn about the different types and sources of help available to R users, and how to best make use of them.

1.2 R

1.2.1 What is R?

Most people think of R as a computer program. R is indeed a computer program-a piece of software— but it is also a computer language, implemented in the R program. Does this make a difference? Yes. Until recently we had only one mainstream implementation of R, the program R. Recently another implementation has gained some popularity, Microsoft R Open (MRO), which is directly based on the R program from The R Project for Statistical Computing. MRO is described as an enhanced distribution of R. These two very similar implementations are not the only ones available, but others are not in widespread use. In other words, the R language can be used not only in the R program, and it is feasible that other implementations will be developed in the future.
The name “base R ” is used to distinguish R itself, as in the R distribution, from R in a broader sense, which includes independently developed extensions that can be loaded from separately distributed extension packages.
Being that R is essentially a command-line application, it can be used on what nowadays are frugal computing resources, equivalent to a personal computer of three decades ago. R can run even on the Raspberry Pi, a micro-controller board with the processing power of a modest smart phone. At the other end of the spectrum, on really powerful servers, R can be used for the analysis of big data sets with millions of observations. How powerful a computer you will need will depend on the size of the data sets you want to analyze, on how patient you are, and on your ability to write “good” code.
One could think of R as a dialect of an earlier language, called S. S evolved into S-Plus (Becker et al. 1988). S and S-Plus are commercial programs, and variations in the language appeared only between versions. R started as a poor man’s home-brewed implementation of S, for use in teaching. Initially R, the program, implemented a subset of the S language. The R program evolved until only relatively few differences between S and R remained, and these differences are intentional-thought of as significant improvements. As R overtook S-Plus in popularity, some of the new features in R made their way back into S-Plus. R is free and open-source and the name Gnu S is sometimes used to refer to R.
What makes R different from SPSS, SAS, etc., is that S was designed from the start as a computer programming language. This may look unimportant for someone not actually needing or willing to write software for data analysis. However, in reality it makes a huge difference because R is easily extensible. By this we mean that new functionality can be easily added, and shared, and this new functionality is to the user indistinguishable from that built into R. In other words, instead of having to switch between different pieces of software to do different types of analyses or plots, one can usually find an R package that will provide the tools to do the job within R. For those routinely doing similar analyses the ability to write a short program, sometimes just a handful of lines of code, allows automation of routine analyses. For those willing to spend time programming, they have the door open to building the tools they need when these do not already exist.
However, the most important advantage of using a language like R is that it makes it easy to do data analyses in a way that ensures that they can be exactly repeated. In other words, the biggest advantage of using R, as a language, is not in communicating with the computer, but in communicating to other people what has been done, in a way that is unambiguous. Of course, other people may want to run the same commands in another computer, but still it means that a translation from a set of instructions to the computer into text readable to humans—say the materials and methods section of a paper—and back is avoided together with the ambiguities usually creeping in.

1.2.2 R as a language

R is a computer language designed for data analysis and data visualization, however, in contrast to some other scripting languages, it is, from the point of view of computer programming, a complete language—it is not missing any important feature. In other words, no fundamental operations or data types are lacking (Chambers 2016). I attribute much of its success to the fact that its design achieves a very good balance between simplicity, clarity and generality. R excels at generality thanks to its extensibility at the cost of only a moderate loss of simplicity, while clarity is ensured by enforced documentation of extensions and support for both object-oriented and functional approaches to programming. The same three principles can be also easily respected by user code written in R.
As mentioned above, R started as a free and open-source implementation of the S language (Becker and Chambers 1984; Becker et al. 1988). We will describe the features of the R language in later chapters. Here I mention, for those with programming experience, that it does have some features that make it different from other frequently used programming languages. For example, R does not have the strict type checks of Pascal or C++. It has operators that can take vectors and matrices as operands allowing more concise program statements for such operations than other languages. Writing programs, specially reliable and fast code, requires familiarity with some of these idiosyncracies of the R language. For those using R interactively, or writing short scripts, these idiosyncratic features make life a lot easier by saving typing.
fig1_1_C.webp
Some languages have been standardized, and their grammar has been formally defined. R, in contrast is not standardized, and there is no formal grammar definition. So, the R language is defined by the behavior of the R program.

1.2.3 R as a computer program

The R program itself is open-source, and the source code is available for anybody to inspect, modify and use. A small fraction of users will directly contribute improvements to the R program itself, but it is possible, and those contributions are important in making R reliable. The executable, the R program we actually use, can be built for different operating systems and computer hardware. The members of the R developing team make an important effort to keep the results obtained from calculations done on all the different builds and computer architectures as consistent as possible. The aim is to ensure that computations return consistent results not only across updates to R but also across different operating systems like Linux, Unix (including OS X), and MS-Windows, and computer hardware.
The R program does not have a graphical user interface (GUI), or menus from which to start different types of analyses. Instead, the user types the commands at the R console (Figure 1.1). The same textual commands can also be saved into a text file, line by line, and such a file, called a “script” can substitute repeated typing of the same sequence of commands. When we work at the console typing in commands one by one, we say that we use R interactively. When we run script, we may say that we run a “batch job.”
fig1_1_C.webp
FIGURE 1.1
The R console where the user can type textual commands one by one. Here the user has typed print(“Hello”) and entered it by ending the line of text by pressing the “enter” key. The result of running the command is displayed below the command. The character at the head of the input line, a “>” in this case, is called the command prompt, signaling where a command can be typed in. Commands entered by the user are displayed in red, while results returned by R are displayed in blue.
The two approaches described above are part of the R program by itself. However, it is common to use a second program as a front-end or middleman between the user and the R program. Such a program allows more flexibility and has multiple features that make entering commands or writing scripts easier. Computations are still done by exactly the same R program. The simplest option is to use a text editor like Emacs to edit the scripts and then run the scripts in R from within the editor. With some editors like Emacs, rather good integration is possible. However, nowadays there are also Integrated Development Environments (IDEs) availabl...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Table of Contents
  6. Preface
  7. 1 R: The language and the program
  8. 2 The R language: “Words” and “sentences”
  9. 3 The R language: “Paragraphs” and “essays”
  10. 4 The R language: Statistics
  11. 5 The R language: Adding new “words”
  12. 6 New grammars of data
  13. 7 Grammar of graphics
  14. 8 Data import and export
  15. Bibliography
  16. General index
  17. Index of R names by category
  18. Alphabetic index of R names