Simulation for Data Science with R
eBook - ePub

Simulation for Data Science with R

  1. 398 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Simulation for Data Science with R

About this book

Harness actionable insights from your data with computational statistics and simulations using R

About This Book

  • Learn five different simulation techniques (Monte Carlo, Discrete Event Simulation, System Dynamics, Agent-Based Modeling, and Resampling) in-depth using real-world case studies
  • A unique book that teaches you the essential and fundamental concepts in statistical modeling and simulation

Who This Book Is For

This book is for users who are familiar with computational methods. If you want to learn about the advanced features of R, including the computer-intense Monte-Carlo methods as well as computational tools for statistical simulation, then this book is for you. Good knowledge of R programming is assumed/required.

What You Will Learn

  • The book aims to explore advanced R features to simulate data to extract insights from your data.
  • Get to know the advanced features of R including high-performance computing and advanced data manipulation
  • See random number simulation used to simulate distributions, data sets, and populations
  • Simulate close-to-reality populations as the basis for agent-based micro-, model- and design-based simulations
  • Applications to design statistical solutions with R for solving scientific and real world problems
  • Comprehensive coverage of several R statistical packages like boot, simPop, VIM, data.table, dplyr, parallel, StatDA, simecol, simecolModels, deSolve and many more.

In Detail

Data Science with R aims to teach you how to begin performing data science tasks by taking advantage of Rs powerful ecosystem of packages. R being the most widely used programming language when used with data science can be a powerful combination to solve complexities involved with varied data sets in the real world.

The book will provide a computational and methodological framework for statistical simulation to the users. Through this book, you will get in grips with the software environment R. After getting to know the background of popular methods in the area of computational statistics, you will see some applications in R to better understand the methods as well as gaining experience of working with real-world data and real-world problems. This book helps uncover the large-scale patterns in complex systems where interdependencies and variation are critical. An effective simulation is driven by data generating processes that accurately reflect real physical populations. You will learn how to plan and structure a simulation project to aid in the decision-making process as well as the presentation of results.

By the end of this book, you reader will get in touch with the software environment R. After getting background on popular methods in the area, you will see applications in R to better understand the methods as well as to gain experience when working on real-world data and real-world problems.

Style and approach

This book takes a practical, hands-on approach to explain the statistical computing methods, gives advice on the usage of these methods, and provides computational tools to help you solve common problems in statistical simulation and computer-intense methods.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

Simulation for Data Science with R


Table of Contents

Simulation for Data Science with R
Credits
About the Author
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Introduction
What is simulation and where is it applied?
Why use simulation?
Simulation and big data
Choosing the right simulation technique
Summary
References
2. R and High-Performance Computing
The R statistical environment
Basics in R
Some very basic stuff about R
Installation and updates
Help
The R workspace and the working directory
Data types
Vectors in R
Factors in R
list
data.frame
array
Missing values
Generic functions, methods, and classes
Data manipulation in R
Apply and friends with basic R
Basic data manipulation with the dplyr package
dplyr – creating a local data frame
dplyr – selecting lines
dplyr – order
dplyr – selecting columns
dplyr – uniqueness
dplyr – creating variables
dplyr – grouping and aggregates
dplyr – window functions
Data manipulation with the data.table package
data.table – variable construction
data.table – indexing or subsetting
data.table – keys
data.table – fast subsetting
data.table – calculations in groups
High performance computing
Profiling to detect computationally slow functions in code
Further benchmarking
Parallel computing
Interfaces to C++
Visualizing information
The graphics system in R
The graphics package
Warm-up example – a high-level plot
Control of graphics parameters
The ggplot2 package
References
3. The Discrepancy between Pencil-Driven Theory and Data-Driven Computational Solutions
Machine numbers and rounding problems
Example – the 64-bit representation of numbers
Convergence in the deterministic case
Example – convergence
Condition of problems
Summary
References
4. Simulation of Random Numbers
Real random numbers
Simulating pseudo random numbers
Congruential generators
Linear and multiplicative congruential generators
Lagged Fibonacci generators
More generators
Simulation of non-uniform distributed random variables
The inversion method
The alias method
Estimation of counts in tables with log-linear models
Rejection sampling
Simulating values from a normal distribution
Simulating random numbers from a Beta distribution
Truncated distributions
Metropolis - Hastings algorithm
A few words on Markov chains
The Metropolis sampler
The Gibbs sampler
The two-phase Gibbs sampler
The multiphase Gibbs sampler
Application in linear regression
The diagnosis of MCMC samples
Tests for random numbers
The evaluation of random numbers – an example of a test
Summary
References
5. Monte Carlo Methods for Optimization Problems
Numerical optimization
Gradient ascent/descent
Newton-Raphson methods
Further general-purpose optimization methods
Dealing with stochastic optimization
Simplified procedures (Star Trek, Spaceballs, and Spaceballs princess)
Metropolis-Hastings revisited
Gradient-based stochastic optimization
Summary
References
6. Probability Theory Shown by Simulation
Some basics on probability theory
Probability distributions
Discrete probability distributions
Continuous probability distributions
Winning the lottery
The weak law on large numbers
Emperor penguins and your boss
Limits and convergence of random variables
Convergence of the sample mean – weak law of large numbers
Showing the weak law of large numbers by simulation
The central limit theorem
Properties of estimators
Properties of estimators
Confidence intervals
A note on robust estimators
Summary
References
7. Resampling Methods
The bootstrap
A motivating example with odds ratios
Why the bootstrap works
A closer look at the bootstrap
The plug-in principle
Estimation of standard errors with bootstrapping
An example of a complex estimation using the bootstrap
The parametric bootstrap
Estimating bias with bootstrap
Confidence intervals by bootstrap
The jackknife
Disadvantages of the jackknife
The delete-d jackknife...

Table of contents

  1. Simulation for Data Science with R

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Simulation for Data Science with R by Matthias Templ in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.