R High Performance Programming
eBook - ePub

R High Performance Programming

Aloysius Lim, William Tjhi

Share book
  1. 176 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

R High Performance Programming

Aloysius Lim, William Tjhi

Book details
Book preview
Table of contents
Citations

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is R High Performance Programming an online PDF/ePUB?
Yes, you can access R High Performance Programming by Aloysius Lim, William Tjhi in PDF and/or ePUB format, as well as other popular books in Computer Science & Open Source Programming. We have over one million books available in our catalogue for you to explore.

Information

R High Performance Programming


Table of Contents

R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Understanding R's Performance – Why Are R Programs Sometimes Slow?
Three constraints on computing performance – CPU, RAM, and disk I/O
R is interpreted on the fly
R is single-threaded
R requires all data to be loaded into memory
Algorithm design affects time and space complexity
Summary
2. Profiling – Measuring Code's Performance
Measuring total execution time
Measuring execution time with system.time()
Repeating time measurements with rbenchmark
Measuring distribution of execution time with microbenchmark
Profiling the execution time
Profiling a function with Rprof()
The profiling results
Profiling memory utilization
Monitoring memory utilization, CPU utilization, and disk I/O using OS tools
Identifying and resolving bottlenecks
Summary
3. Simple Tweaks to Make R Run Faster
Vectorization
Use of built-in functions
Preallocating memory
Use of simpler data structures
Use of hash tables for frequent lookups on large data
Seeking fast alternative packages in CRAN
Summary
4. Using Compiled Code for Greater Speed
Compiling R code before execution
Compiling functions
Just-in-time (JIT) compilation of R code
Using compiled languages in R
Prerequisites
Including compiled code inline
Calling external compiled code
Considerations for using compiled code
R APIs
R data types versus native data types
Creating R objects and garbage collection
Allocating memory for non-R objects
Summary
5. Using GPUs to Run R Even Faster
General purpose computing on GPUs
R and GPUs
Installing gputools
Fast statistical modeling in R with gputools
Summary
6. Simple Tweaks to Use Less RAM
Reusing objects without taking up more memory
Removing intermediate data when it is no longer needed
Calculating values on the fly instead of storing them persistently
Swapping active and nonactive data
Summary
7. Processing Large Datasets with Limited RAM
Using memory-efficient data structures
Smaller data types
Sparse matrices
Symmetric matrices
Bit vectors
Using memory-mapped files and processing data in chunks
The bigmemory package
The ff package
Summary
8. Multiplying Performance with Parallel Computing
Data parallelism versus task parallelism
Implementing data parallel algorithms
Implementing task parallel algorithms
Running the same task on workers in a cluster
Running different tasks on workers in a cluster
Executing tasks in parallel on a cluster of computers
Shared memory versus distributed memory parallelism
Optimizing parallel performance
Summary
9. Offloading Data Processing to Database Systems
Extracting data into R versus processing data in a database
Preprocessing data in a relational database using SQL
Converting R expressions to SQL
Using dplyr
Using PivotalR
Running statistical and machine learning algorithms in a database
Using columnar databases for improved performance
Using array databases for maximum scientific-computing performance
Summary
10. R and Big Data
Understanding Hadoop
Setting up Hadoop on Amazon Web Services
Processing large datasets in batches using Hadoop
Uploading data to HDFS
Analyzing HDFS data with RHadoop
Other Hadoop packages for R
Summary
Index

R High Performance Programming

Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: January 2015
Production reference: 1230115
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-926-3
www.packtpub.com

Credits

Authors
Aloysius Lim
William Tjhi
Reviewers
Richard Cotton
Kirill MĂŒller
John Silberholz
Commissioning Editor
Kunal Parikh
Acquisition Editor
Richard Brookes-Bland
Content Development Editor
Susmita Sabat
Technical Editor
Shiny Poojary
Copy Editor
Neha Vyas
Project Coordinator
Milton Dsouza
Proofreaders
Ameesha Green
Clyde Jenkins
Jonathan Todd
Indexer
Tejal Soni
Graphics
Sheetal Aute
Valentina D'silva
Production Coordinator
Komal Ramchandani
Cover Work
Komal Ramchandani

About the Authors

Aloysius Lim has a knack for translating complex data and models into easy-to-understand insights. As cofounder of About People, a data science and design consultancy, he loves solving problems and helping others to find practical solutions to business challenges using data. His breadth of experience—7 years in the government, education, and retail industries—equips him with unique perspectives to find creative solutions.
William Tjhi is a data scientist with years of experience working in academia, government, and industry. He began his data science journey as a PhD candidate researching new algorithms to improve the robustness of high-dimensional data clustering. Upon receiving his doctorate, he moved from basic to applied research, solving problems among others in molecular biology and epidemiology using machine learning. He published some of his research in peer-reviewed journals and conferences. With the rise of Big Data, William left academia for industry, where he started practicing data science in both business and public sector settings. William is passionate about R and has been using it as his primary analysis tool since his research days. He was once part of Revolution Analytics, and there he contributed to make R more suitable for Big Data.

About the Reviewers

Richard Cotton is a data scientist with a mixed background in proteomics, debt collection, and chemical health and safety, and he has worked extensively on tools to give nontechnical users access to stat...

Table of contents