
Foundations of Statistics for Data Scientists
With R and Python
- 468 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
About this book
Foundations of Statistics for Data Scientists: With R and Python is designed as a textbook for a one- or two-term introduction to mathematical statistics for students training to become data scientists. It is an in-depth presentation of the topics in statistical science with which any data scientist should be familiar, including probability distributions, descriptive and inferential statistical methods, and linear modeling. The book assumes knowledge of basic calculus, so the presentation can focus on "why it works" as well as "how to do it." Compared to traditional "mathematical statistics" textbooks, however, the book has less emphasis on probability theory and more emphasis on using software to implement statistical methods and to conduct simulations to illustrate key concepts. All statistical analyses in the book use R software, with an appendix showing the same analyses with Python.
Key Features:
- Shows the elements of statistical science that are important for students who plan to become data scientists.
- Includes Bayesian and regularized fitting of models (e.g., showing an example using the lasso), classification and clustering, and implementing methods with modern software (R and Python).
- Contains nearly 500 exercises.
The book also introduces modern topics that do not normally appear in mathematical statistics texts but are highly relevant for data scientists, such as Bayesian inference, generalized linear models for non-normal responses (e.g., logistic regression and Poisson loglinear models), and regularized model fitting. The nearly 500 exercises are grouped into "Data Analysis and Applications" and "Methods and Concepts." Appendices introduce R and Python and contain solutions for odd-numbered exercises. The book's website (http://stat4ds.rwth-aachen.de/) has expanded R, Python, and Matlab appendices and all data sets from the examples and exercises.
Frequently asked questions
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Information
1Introduction to Statistical Science
1.1 Statistical Science: Description and Inference
1.1.1 Design, Descriptive Statistics, and Inferential Statistics
- Design: Planning how to gather relevant data for the subject matter of interest.
- Description: Summarizing the data.
- Inference: Making evaluations, such as estimations and predictions, based on the data.
Table of contents
- Cover Page
- Half-Title Page
- Series Page
- Title Page
- Copyright Page
- Contents
- Preface
- 1 Introduction to Statistical Science
- 2 Probability Distributions
- 3 Sampling Distributions
- 4 Statistical Inference: Estimation
- 5 Statistical Inference: Significance Testing
- 6 Linear Models and Least Squares
- 7 Generalized Linear Models
- 8 Classification and Clustering
- 9 Statistical Science: A Historical Overview
- Appendix A Using R in Statistical Science
- Appendix B Using Python in Statistical Science
- Appendix C Brief Solutions to Exercises
- Bibliography
- Example Index
- Subject Index