R Programming By Example
eBook - ePub

R Programming By Example

  1. 470 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

R Programming By Example

About this book

This step-by-step guide demonstrates how to build simple-to-advanced applications through examples in R using modern tools.About This Book• Get a firm hold on the fundamentals of R through practical hands-on examples• Get started with good R programming fundamentals for data science• Exploit the different libraries of R to build interesting applications in RWho This Book Is ForThis books is for aspiring data science professionals or statisticians who would like to learn about the R programming language in a practical manner. Basic programming knowledge is assumed.What You Will Learn• Discover techniques to leverage R's features, and work with packages• Perform a descriptive analysis and work with statistical models using R• Work efficiently with objects without using loops• Create diverse visualizations to gain better understanding of the data• Understand ways to produce good visualizations and create reports for the results• Read and write data from relational databases and REST APIs, both packaged and unpackaged• Improve performance by writing better code, delegating that code to a more efficient programming language, or making it parallelIn DetailR is a high-level statistical language and is widely used among statisticians and data miners to develop analytical applications. Often, data analysis people with great analytical skills lack solid programming knowledge and are unfamiliar with the correct ways to use R. Based on the version 3.4, this book will help you develop strong fundamentals when working with R by taking you through a series of full representative examples, giving you a holistic view of R.We begin with the basic installation and configuration of the R environment. As you progress through the exercises, you'll become thoroughly acquainted with R's features and its packages. With this book, you will learn about the basic concepts of R programming, work efficiently with graphs, create publication-ready and interactive 3D graphs, and gain a better understanding of the data at hand. The detailed step-by-step instructions will enable you to get a clean set of data, produce good visualizations, and create reports for the results. It also teaches you various methods to perform code profiling and performance enhancement with good programming practices, delegation, and parallelization.By the end of this book, you will know how to efficiently work with data, create quality visualizations and reports, and develop code that is modular, expressive, and maintainable.Style and ApproachThis is an easy-to-understand guide filled with real-world examples, giving you a holistic view of R and practical, hands-on experience.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access R Programming By Example by Omar Trejo Navarro, Peter C. Figliozzi in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.

Implementing an Efficient Simple Moving Average

During the last few decades, demand for computing power has steadily increased as the data volume has become larger and models have become more complex. It is obvious that minimizing the time needed for these calculations has become an important task and that there are obvious performance problems that need to be tackled. These performance problems arise from a mismatch between data volume and existing analytical methods. Eventually, a fundamental shift in data analysis techniques will be required, but for now, we must settle with improving the efficiency of our implementations.
R was designed as an interpreted language with a high-level expressiveness, and that's one of the reasons why it lacks much of the fine-grained control and basic constructs to support highly-performant code. As Arora nails it in the book, she edited, Conquering Big Data with High Performance Computing, by Springer, 2016: "While R is clearly a high productivity language, it has not necessarily been a high performance language."
It is not uncommon for the execution time of an R program to be measured in hours, or even in days. As the volume of data to be analyzed increases, the execution time can become prohibitively long, and it's often the case that data scientists and statisticians get stuck with these bottlenecks. When this happens, and if they don't know much about performance optimization, they'll probably just settle with reduced amounts of data, which can hinder their analysis. However, fear not; R programs can be slow, but well-written R programs are usually fast enough, and we will look at various techniques you can use to increase the performance of your R code.
This chapter is not meant to make you a performance optimization expert, but rather provide an overview that introduces you to the vast amount of techniques that can be used when attempting to increase your code's performance. We will look at many different techniques, each of which can have chapters and even books dedicated to them, so we will have to look at them from a very high level, but if you find yourself being constantly restricted by computing resources, they are something you will want to look further into.
Some of the important topics covered in this chapter are as follows:
  • Deciding how fast an implementation must be
  • The importance of using good algorithms
  • Reasons why R can be slow or inefficient at times
  • The big performance impact small changes can have
  • Measuring your code's performance to find bottlenecks
  • Comparing different implementations among themselves
  • Getting the most from your computer by parallelizing
  • Improving performance by interfacing with other languages

Required packages

We have already worked with some of the packages required for this chapter, such as ggplot2 and lubridate. The other three packages are introduced to benchmark functions and compare their performance among themselves, and for advanced optimization techniques like delegation and parallelization, which will be explained in their respective sections.
To be able to replicate all the examples in this chapter, you also need working compilers for Fortran and C++ code. Refer to Appendix, Required Packages, for instructions on how to install them for your operating system.
Let's take a look at the following table depicting the uses of the required packages:
Packages
Reason
ggplot2
High-quality graphs
lubridate
Easily transfer dates
microbenchmark
Benchmark functions' performance

Starting by using good algorithms

To be able to communicate the ideas contained in this chapter clearly, first I need to provide some simple definitions. When I refer to an algorithm, I mean an abstract specification for a process. When I refer to an implementation, I refer to the way an algorithm is actually programmed. Finally, when I refer to a program or an application, I mean a set of such algorithm implementations working together. Having said that, it's easy to see how an algorithm can be implemented in many different ways (for example, one implementation may be using a list, while another may be using an array). Each of these implementations will have different performances, and they are related, but not equivalent, to an algorithm's time-complexity.
For those unfamiliar with the last term, each algorithm has the following two basic properties
  • Time complexity: This property refers to the number of calculations an algorithm needs to execute, in relation to the size of input it receives. There are various mathematical tools to measure this complexity, the most common one being Big-O notation, which measures the worst-case scenario for an algorithm.
  • Space complexity: This property refers to the amount of memory required to execute the algorithm, again in relation to the size of the input it receives, and it can be also measured with the same mathematical tools.
It's a well-known fact that an inefficient algorithm implemented very efficiently can be orders of magnitude slower than an efficient algorithm implemented inefficiently. This means that most of the time, algorithm selection is much more important than implementation optimization.
There are many other things to consider when evaluating an algorithm other than the complexities mentioned previously, such as efficiency resources usage (for example, internet bandwith), as well as other properties such as security or implementation difficulty. We won't dig into these topics in this book. However, if you want your code to perform well, you must study data structures and algorithms formally. Great resources to get started on these topics are the book by Cormen, Leiserson, Rivest, and Stein, titled Introduction to Algorithms, by MIT Press, 2009, and Skiena's, The Algorithm Design Manual, by Springer, 2008.

Just how much impact can algorithm selection have?

Calculating Fibonacci numbers is a traditional example when teaching recursiveness. Here, we will use it to compare the performance of two algorithms, one recursive and one sequential.
In case you are not familiar with them, Fibonacci numbers are defined recursively in a sequence where the next is the sum of the previous two, and the first two numbers are ones (our base cases). The actual sequence is 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, and so on. This is called a Fibonacci sequence, and it exhibits interesting properties, such as being related to the golden ratio, which you should definitely look up if don't know what it is.
Our fibonacci_recursive() function receives the position of the Fibonacci number we want to calculate as n, restricted to integers greater than or equal to one. If n is a base case, that is, if it's below 1, we will simply return it (not that if we're computing the Fibonacci number at the second position, our operation n - 2 would be zero, which is not a valid position, that's why we need to use <= instead of ==). Otherwise, we will return the sum of the recursive calls to the previous two with fibonacci_recursive(n - 1) and fibonacci_recursive(n - 2), as shown in the following code snippet:
fibonacci_recursive <- function(n) {
if(n <= 1) { return(n) }
return(fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2))
}
As you can see in the following code snippet, our function works as expected. However, what happens when we want to retrieve the 35th or 40th Fibonacci number? As you may experience when running this code, the further the Fibonacci number is from the base cases, the more time it will take, and somewhere around the 30th position, it starts being noticeably slower. If you try to compute the 100th Fibonacci number, you'll be waiting for a long while before you get the result:
fibonacci_recursive(1) #> [1] 1 
fibonacci_recursive(2) #> [1] 1
fibonacci_recursive(3) #> [1] 2
fibonacci_recursive(4) #> [1] 3
fibonacci_recursive(5) #> [1] 5
fibonacci_recursive(35) #> [1] 9227465
Why is this happening? The answer is that this algorithm is doing a lot of unnecessary work, making it a bad algorithm. To understand why, let's mentally go through the execution of the...

Table of contents

  1. Title Page
  2. Copyright
  3. Credits
  4. About the Author
  5. About the Reviewer
  6. www.PacktPub.com
  7. Customer Feedback
  8. Preface
  9. Introduction to R
  10. Understanding Votes with Descriptive Statistics
  11. Predicting Votes with Linear Models
  12. Simulating Sales Data and Working with Databases
  13. Communicating Sales with Visualizations
  14. Understanding Reviews with Text Analysis
  15. Developing Automatic Presentations
  16. Object-Oriented System to Track Cryptocurrencies
  17. Implementing an Efficient Simple Moving Average
  18. Adding Interactivity with Dashboards
  19. Required Packages