eBook - ePub

R Programming By Example

Name: R Programming By Example
Author: Omar Trejo Navarro, Peter C. Figliozzi

Omar Trejo Navarro, Peter C. Figliozzi

Partager le livre

470 pages
English
ePUB (adapté aux mobiles)
Disponible sur iOS et Android

eBook - ePub

R Programming By Example

Omar Trejo Navarro, Peter C. Figliozzi

Détails du livre

Aperçu du livre

Table des matières

Citations

À propos de ce livre

This step-by-step guide demonstrates how to build simple-to-advanced applications through examples in R using modern tools.About This Book• Get a firm hold on the fundamentals of R through practical hands-on examples• Get started with good R programming fundamentals for data science• Exploit the different libraries of R to build interesting applications in RWho This Book Is ForThis books is for aspiring data science professionals or statisticians who would like to learn about the R programming language in a practical manner. Basic programming knowledge is assumed.What You Will Learn• Discover techniques to leverage R's features, and work with packages• Perform a descriptive analysis and work with statistical models using R• Work efficiently with objects without using loops• Create diverse visualizations to gain better understanding of the data• Understand ways to produce good visualizations and create reports for the results• Read and write data from relational databases and REST APIs, both packaged and unpackaged• Improve performance by writing better code, delegating that code to a more efficient programming language, or making it parallelIn DetailR is a high-level statistical language and is widely used among statisticians and data miners to develop analytical applications. Often, data analysis people with great analytical skills lack solid programming knowledge and are unfamiliar with the correct ways to use R. Based on the version 3.4, this book will help you develop strong fundamentals when working with R by taking you through a series of full representative examples, giving you a holistic view of R.We begin with the basic installation and configuration of the R environment. As you progress through the exercises, you'll become thoroughly acquainted with R's features and its packages. With this book, you will learn about the basic concepts of R programming, work efficiently with graphs, create publication-ready and interactive 3D graphs, and gain a better understanding of the data at hand. The detailed step-by-step instructions will enable you to get a clean set of data, produce good visualizations, and create reports for the results. It also teaches you various methods to perform code profiling and performance enhancement with good programming practices, delegation, and parallelization.By the end of this book, you will know how to efficiently work with data, create quality visualizations and reports, and develop code that is modular, expressive, and maintainable.Style and ApproachThis is an easy-to-understand guide filled with real-world examples, giving you a holistic view of R and practical, hands-on experience.

Foire aux questions

Comment puis-je résilier mon abonnement ?

Il vous suffit de vous rendre dans la section compte dans paramètres et de cliquer sur « Résilier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez résilié votre abonnement, il restera actif pour le reste de la période pour laquelle vous avez payé. Découvrez-en plus ici.

Puis-je / comment puis-je télécharger des livres ?

Pour le moment, tous nos livres en format ePub adaptés aux mobiles peuvent être téléchargés via l’application. La plupart de nos PDF sont également disponibles en téléchargement et les autres seront téléchargeables très prochainement. Découvrez-en plus ici.

Quelle est la différence entre les formules tarifaires ?

Les deux abonnements vous donnent un accès complet à la bibliothèque et à toutes les fonctionnalités de Perlego. Les seules différences sont les tarifs ainsi que la période d’abonnement : avec l’abonnement annuel, vous économiserez environ 30 % par rapport à 12 mois d’abonnement mensuel.

Qu’est-ce que Perlego ?

Nous sommes un service d’abonnement à des ouvrages universitaires en ligne, où vous pouvez accéder à toute une bibliothèque pour un prix inférieur à celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! Découvrez-en plus ici.

Prenez-vous en charge la synthèse vocale ?

Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte à haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accélérer ou le ralentir. Découvrez-en plus ici.

Est-ce que R Programming By Example est un PDF/ePUB en ligne ?

Oui, vous pouvez accéder à R Programming By Example par Omar Trejo Navarro, Peter C. Figliozzi en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Computer Science et Programming Languages. Nous disposons de plus d’un million d’ouvrages à découvrir dans notre catalogue.

Informations

Éditeur

Packt Publishing

Année

2017

ISBN

9781788291361

Édition

Sujet

Computer Science

Sous-sujet

Programming Languages

Implementing an Efficient Simple Moving Average

During the last few decades, demand for computing power has steadily increased as the data volume has become larger and models have become more complex. It is obvious that minimizing the time needed for these calculations has become an important task and that there are obvious performance problems that need to be tackled. These performance problems arise from a mismatch between data volume and existing analytical methods. Eventually, a fundamental shift in data analysis techniques will be required, but for now, we must settle with improving the efficiency of our implementations.

R was designed as an interpreted language with a high-level expressiveness, and that's one of the reasons why it lacks much of the fine-grained control and basic constructs to support highly-performant code. As Arora nails it in the book, she edited, Conquering Big Data with High Performance Computing, by Springer, 2016: "While R is clearly a high productivity language, it has not necessarily been a high performance language."

It is not uncommon for the execution time of an R program to be measured in hours, or even in days. As the volume of data to be analyzed increases, the execution time can become prohibitively long, and it's often the case that data scientists and statisticians get stuck with these bottlenecks. When this happens, and if they don't know much about performance optimization, they'll probably just settle with reduced amounts of data, which can hinder their analysis. However, fear not; R programs can be slow, but well-written R programs are usually fast enough, and we will look at various techniques you can use to increase the performance of your R code.

This chapter is not meant to make you a performance optimization expert, but rather provide an overview that introduces you to the vast amount of techniques that can be used when attempting to increase your code's performance. We will look at many different techniques, each of which can have chapters and even books dedicated to them, so we will have to look at them from a very high level, but if you find yourself being constantly restricted by computing resources, they are something you will want to look further into.

Some of the important topics covered in this chapter are as follows:

Deciding how fast an implementation must be
The importance of using good algorithms
Reasons why R can be slow or inefficient at times
The big performance impact small changes can have
Measuring your code's performance to find bottlenecks
Comparing different implementations among themselves
Getting the most from your computer by parallelizing
Improving performance by interfacing with other languages

Required packages

We have already worked with some of the packages required for this chapter, such as ggplot2 and lubridate. The other three packages are introduced to benchmark functions and compare their performance among themselves, and for advanced optimization techniques like delegation and parallelization, which will be explained in their respective sections.

To be able to replicate all the examples in this chapter, you also need working compilers for Fortran and C++ code. Refer to Appendix, Required Packages, for instructions on how to install them for your operating system.

Let's take a look at the following table depicting the uses of the required packages:

Packages	Reason
ggplot2	High-quality graphs
lubridate	Easily transfer dates
microbenchmark	Benchmark functions' performance

Starting by using good algorithms

To be able to communicate the ideas contained in this chapter clearly, first I need to provide some simple definitions. When I refer to an algorithm, I mean an abstract specification for a process. When I refer to an implementation, I refer to the way an algorithm is actually programmed. Finally, when I refer to a program or an application, I mean a set of such algorithm implementations working together. Having said that, it's easy to see how an algorithm can be implemented in many different ways (for example, one implementation may be using a list, while another may be using an array). Each of these implementations will have different performances, and they are related, but not equivalent, to an algorithm's time-complexity.

For those unfamiliar with the last term, each algorithm has the following two basic properties

Time complexity: This property refers to the number of calculations an algorithm needs to execute, in relation to the size of input it receives. There are various mathematical tools to measure this complexity, the most common one being Big-O notation, which measures the worst-case scenario for an algorithm.
Space complexity: This property refers to the amount of memory required to execute the algorithm, again in relation to the size of the input it receives, and it can be also measured with the same mathematical tools.

It's a well-known fact that an inefficient algorithm implemented very efficiently can be orders of magnitude slower than an efficient algorithm implemented inefficiently. This means that most of the time, algorithm selection is much more important than implementation optimization.

There are many other things to consider when evaluating an algorithm other than the complexities mentioned previously, such as efficiency resources usage (for example, internet bandwith), as well as other properties such as security or implementation difficulty. We won't dig into these topics in this book. However, if you want your code to perform well, you must study data structures and algorithms formally. Great resources to get started on these topics are the book by Cormen, Leiserson, Rivest, and Stein, titled Introduction to Algorithms, by MIT Press, 2009, and Skiena's, The Algorithm Design Manual, by Springer, 2008.

Just how much impact can algorithm selection have?

Calculating Fibonacci numbers is a traditional example when teaching recursiveness. Here, we will use it to compare the performance of two algorithms, one recursive and one sequential.

In case you are not familiar with them, Fibonacci numbers are defined recursively in a sequence where the next is the sum of the previous two, and the first two numbers are ones (our base cases). The actual sequence is 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, and so on. This is called a Fibonacci sequence, and it exhibits interesting properties, such as being related to the golden ratio, which you should definitely look up if don't know what it is.

Our fibonacci_recursive() function receives the position of the Fibonacci number we want to calculate as n, restricted to integers greater than or equal to one. If n is a base case, that is, if it's below 1, we will simply return it (not that if we're computing the Fibonacci number at the second position, our operation n - 2 would be zero, which is not a valid position, that's why we need to use <= instead of ==). Otherwise, we will return the sum of the recursive calls to the previous two with fibonacci_recursive(n - 1) and fibonacci_recursive(n - 2), as shown in the following code snippet:

fibonacci_recursive <- function(n) {
 if(n <= 1) { return(n) }
 return(fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2)) 
}

As you can see in the following code snippet, our function works as expected. However, what happens when we want to retrieve the 35^th or 40^th Fibonacci number? As you may experience when running this code, the further the Fibonacci number is from the base cases, the more time it will take, and somewhere around the 30^th position, it starts being noticeably slower. If you try to compute the 100^th Fibonacci number, you'll be waiting for a long while before you get the result:

fibonacci_recursive(1) #> [1] 1 
fibonacci_recursive(2) #> [1] 1 
fibonacci_recursive(3) #> [1] 2 
fibonacci_recursive(4) #> [1] 3 
fibonacci_recursive(5) #> [1] 5 
fibonacci_recursive(35) #> [1] 9227465

Why is this happening? The answer is that this algorithm is doing a lot of unnecessary work, making it a bad algorithm. To understand why, let's mentally go through the execution of the...