eBook - ePub

R Programming By Example

Name: R Programming By Example
Author: Omar Trejo Navarro, Peter C. Figliozzi

Omar Trejo Navarro, Peter C. Figliozzi

Share book

470 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

R Programming By Example

Omar Trejo Navarro, Peter C. Figliozzi

Book details

Book preview

Table of contents

Citations

About This Book

This step-by-step guide demonstrates how to build simple-to-advanced applications through examples in R using modern tools.About This Book• Get a firm hold on the fundamentals of R through practical hands-on examples• Get started with good R programming fundamentals for data science• Exploit the different libraries of R to build interesting applications in RWho This Book Is ForThis books is for aspiring data science professionals or statisticians who would like to learn about the R programming language in a practical manner. Basic programming knowledge is assumed.What You Will Learn• Discover techniques to leverage R's features, and work with packages• Perform a descriptive analysis and work with statistical models using R• Work efficiently with objects without using loops• Create diverse visualizations to gain better understanding of the data• Understand ways to produce good visualizations and create reports for the results• Read and write data from relational databases and REST APIs, both packaged and unpackaged• Improve performance by writing better code, delegating that code to a more efficient programming language, or making it parallelIn DetailR is a high-level statistical language and is widely used among statisticians and data miners to develop analytical applications. Often, data analysis people with great analytical skills lack solid programming knowledge and are unfamiliar with the correct ways to use R. Based on the version 3.4, this book will help you develop strong fundamentals when working with R by taking you through a series of full representative examples, giving you a holistic view of R.We begin with the basic installation and configuration of the R environment. As you progress through the exercises, you'll become thoroughly acquainted with R's features and its packages. With this book, you will learn about the basic concepts of R programming, work efficiently with graphs, create publication-ready and interactive 3D graphs, and gain a better understanding of the data at hand. The detailed step-by-step instructions will enable you to get a clean set of data, produce good visualizations, and create reports for the results. It also teaches you various methods to perform code profiling and performance enhancement with good programming practices, delegation, and parallelization.By the end of this book, you will know how to efficiently work with data, create quality visualizations and reports, and develop code that is modular, expressive, and maintainable.Style and ApproachThis is an easy-to-understand guide filled with real-world examples, giving you a holistic view of R and practical, hands-on experience.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is R Programming By Example an online PDF/ePUB?

Yes, you can access R Programming By Example by Omar Trejo Navarro, Peter C. Figliozzi in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming Languages. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2017

ISBN

9781788291361

Edition

Topic

Computer Science

Subtopic

Programming Languages

Index

Computer Science

Implementing an Efficient Simple Moving Average

During the last few decades, demand for computing power has steadily increased as the data volume has become larger and models have become more complex. It is obvious that minimizing the time needed for these calculations has become an important task and that there are obvious performance problems that need to be tackled. These performance problems arise from a mismatch between data volume and existing analytical methods. Eventually, a fundamental shift in data analysis techniques will be required, but for now, we must settle with improving the efficiency of our implementations.

R was designed as an interpreted language with a high-level expressiveness, and that's one of the reasons why it lacks much of the fine-grained control and basic constructs to support highly-performant code. As Arora nails it in the book, she edited, Conquering Big Data with High Performance Computing, by Springer, 2016: "While R is clearly a high productivity language, it has not necessarily been a high performance language."

It is not uncommon for the execution time of an R program to be measured in hours, or even in days. As the volume of data to be analyzed increases, the execution time can become prohibitively long, and it's often the case that data scientists and statisticians get stuck with these bottlenecks. When this happens, and if they don't know much about performance optimization, they'll probably just settle with reduced amounts of data, which can hinder their analysis. However, fear not; R programs can be slow, but well-written R programs are usually fast enough, and we will look at various techniques you can use to increase the performance of your R code.

This chapter is not meant to make you a performance optimization expert, but rather provide an overview that introduces you to the vast amount of techniques that can be used when attempting to increase your code's performance. We will look at many different techniques, each of which can have chapters and even books dedicated to them, so we will have to look at them from a very high level, but if you find yourself being constantly restricted by computing resources, they are something you will want to look further into.

Some of the important topics covered in this chapter are as follows:

Deciding how fast an implementation must be
The importance of using good algorithms
Reasons why R can be slow or inefficient at times
The big performance impact small changes can have
Measuring your code's performance to find bottlenecks
Comparing different implementations among themselves
Getting the most from your computer by parallelizing
Improving performance by interfacing with other languages

Required packages

We have already worked with some of the packages required for this chapter, such as ggplot2 and lubridate. The other three packages are introduced to benchmark functions and compare their performance among themselves, and for advanced optimization techniques like delegation and parallelization, which will be explained in their respective sections.

To be able to replicate all the examples in this chapter, you also need working compilers for Fortran and C++ code. Refer to Appendix, Required Packages, for instructions on how to install them for your operating system.

Let's take a look at the following table depicting the uses of the required packages:

Packages	Reason
ggplot2	High-quality graphs
lubridate	Easily transfer dates
microbenchmark	Benchmark functions' performance

Starting by using good algorithms

To be able to communicate the ideas contained in this chapter clearly, first I need to provide some simple definitions. When I refer to an algorithm, I mean an abstract specification for a process. When I refer to an implementation, I refer to the way an algorithm is actually programmed. Finally, when I refer to a program or an application, I mean a set of such algorithm implementations working together. Having said that, it's easy to see how an algorithm can be implemented in many different ways (for example, one implementation may be using a list, while another may be using an array). Each of these implementations will have different performances, and they are related, but not equivalent, to an algorithm's time-complexity.

For those unfamiliar with the last term, each algorithm has the following two basic properties

Time complexity: This property refers to the number of calculations an algorithm needs to execute, in relation to the size of input it receives. There are various mathematical tools to measure this complexity, the most common one being Big-O notation, which measures the worst-case scenario for an algorithm.
Space complexity: This property refers to the amount of memory required to execute the algorithm, again in relation to the size of the input it receives, and it can be also measured with the same mathematical tools.

It's a well-known fact that an inefficient algorithm implemented very efficiently can be orders of magnitude slower than an efficient algorithm implemented inefficiently. This means that most of the time, algorithm selection is much more important than implementation optimization.

There are many other things to consider when evaluating an algorithm other than the complexities mentioned previously, such as efficiency resources usage (for example, internet bandwith), as well as other properties such as security or implementation difficulty. We won't dig into these topics in this book. However, if you want your code to perform well, you must study data structures and algorithms formally. Great resources to get started on these topics are the book by Cormen, Leiserson, Rivest, and Stein, titled Introduction to Algorithms, by MIT Press, 2009, and Skiena's, The Algorithm Design Manual, by Springer, 2008.

Just how much impact can algorithm selection have?

Calculating Fibonacci numbers is a traditional example when teaching recursiveness. Here, we will use it to compare the performance of two algorithms, one recursive and one sequential.

In case you are not familiar with them, Fibonacci numbers are defined recursively in a sequence where the next is the sum of the previous two, and the first two numbers are ones (our base cases). The actual sequence is 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, and so on. This is called a Fibonacci sequence, and it exhibits interesting properties, such as being related to the golden ratio, which you should definitely look up if don't know what it is.

Our fibonacci_recursive() function receives the position of the Fibonacci number we want to calculate as n, restricted to integers greater than or equal to one. If n is a base case, that is, if it's below 1, we will simply return it (not that if we're computing the Fibonacci number at the second position, our operation n - 2 would be zero, which is not a valid position, that's why we need to use <= instead of ==). Otherwise, we will return the sum of the recursive calls to the previous two with fibonacci_recursive(n - 1) and fibonacci_recursive(n - 2), as shown in the following code snippet:

fibonacci_recursive <- function(n) {
 if(n <= 1) { return(n) }
 return(fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2)) 
}

As you can see in the following code snippet, our function works as expected. However, what happens when we want to retrieve the 35^th or 40^th Fibonacci number? As you may experience when running this code, the further the Fibonacci number is from the base cases, the more time it will take, and somewhere around the 30^th position, it starts being noticeably slower. If you try to compute the 100^th Fibonacci number, you'll be waiting for a long while before you get the result:

fibonacci_recursive(1) #> [1] 1 
fibonacci_recursive(2) #> [1] 1 
fibonacci_recursive(3) #> [1] 2 
fibonacci_recursive(4) #> [1] 3 
fibonacci_recursive(5) #> [1] 5 
fibonacci_recursive(35) #> [1] 9227465

Why is this happening? The answer is that this algorithm is doing a lot of unnecessary work, making it a bad algorithm. To understand why, let's mentally go through the execution of the...