Foundations of Statistics for Data Scientists
eBook - ePub

Foundations of Statistics for Data Scientists

With R and Python

Alan Agresti, Maria Kateri

Partager le livre
  1. 468 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Foundations of Statistics for Data Scientists

With R and Python

Alan Agresti, Maria Kateri

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

Foundations of Statistics for Data Scientists: With R and Python is designed as a textbook for a one- or two-term introduction to mathematical statistics for students training to become data scientists. It is an in-depth presentation of the topics in statistical science with which any data scientist should be familiar, including probability distributions, descriptive and inferential statistical methods, and linear modeling. The book assumes knowledge of basic calculus, so the presentation can focus on "why it works" as well as "how to do it." Compared to traditional "mathematical statistics" textbooks, however, the book has less emphasis on probability theory and more emphasis on using software to implement statistical methods and to conduct simulations to illustrate key concepts. All statistical analyses in the book use R software, with an appendix showing the same analyses with Python.

Key Features:

  • Shows the elements of statistical science that are important for students who plan to become data scientists.
  • Includes Bayesian and regularized fitting of models (e.g., showing an example using the lasso), classification and clustering, and implementing methods with modern software (R and Python).
  • Contains nearly 500 exercises.

The book also introduces modern topics that do not normally appear in mathematical statistics texts but are highly relevant for data scientists, such as Bayesian inference, generalized linear models for non-normal responses (e.g., logistic regression and Poisson loglinear models), and regularized model fitting. The nearly 500 exercises are grouped into "Data Analysis and Applications" and "Methods and Concepts." Appendices introduce R and Python and contain solutions for odd-numbered exercises. The book's website ( http://stat4ds.rwth-aachen.de/ ) has expanded R, Python, and Matlab appendices and all data sets from the examples and exercises.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Foundations of Statistics for Data Scientists est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Foundations of Statistics for Data Scientists par Alan Agresti, Maria Kateri en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Economia et Statistiche per il settore aziendale ed economico. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Année
2021
ISBN
9781000462937

1Introduction to Statistical Science

DOI: 10.1201/9781003159834-1
Compared to mathematics and the physical and natural sciences, statistical science is quite young. The statistical methods that you'll learn about in this book were mainly developed within the past century. Modern computing power is causing a revolution in the sorts of data analyses that are possible, so new methods are continually being developed. In recent years, new statistical methods have resulted from challenges in analyzing data in diverse fields, such as medicine (e.g., genetic data relating to disease presence, data for personalized medical decisions) and business (e.g., data on consumer buying behavior, data from experiments comparing advertising strategies). This book presents the foundations underlying the methods of statistical science, explaining when and why these methods work, and shows how to use statistical software to apply them.
Statistical software also has become increasingly powerful and easily available. This has had benefits in the data analyses that are now possible, but a danger is that prospective data scientists might think that statistical science is merely a computational toolbox consisting of a variety of algorithms. A goal of this book, by contrast, is to show that the methods of statistical science result from a unified theory, although that theory itself has slight variations in the way it is implemented or interpreted. Another danger of the ubiquity of statistical software is that prospective data scientists might expect that software can automatically perform good data analyses without input from the user. We'll see, however, that careful thought is needed to decide which statistical methods are appropriate for any particular situation, as they all make certain assumptions, and some methods work poorly when the assumptions are violated. Moreover, a data scientist needs to be able to interpret and explain the results that software yields.
In this chapter, we introduce statistical science as a field that deals with describing data and using them to make inferences. We define types of variables that represent how measured characteristics can vary from observation to observation. We also introduce graphical and numerical methods for describing the data. When a study can use randomization in collecting the data or conducting an experiment, data analysts can exploit the random variation to make reliable estimations and predictions.

1.1 Statistical Science: Description and Inference

You already have a sense of what the word statistics means. You regularly hear statistics quoted about sports events, the economy, medical research, and opinions, beliefs, and behaviors of people. In this sense, a statistic is merely a number calculated from data —the observations that provide information about the subject matter. But the field of statistical science has a much broader sense—as a field that gives us a way of gathering and analyzing the data in an objective manner.
Statistical science
Statistical science is the science of developing and applying methods for collecting, analyzing, and interpreting data.
Many methods of statistical science incorporate reasoning using tools of probability. The methods enable us to deal with uncertainty and variability in virtually all scientific fields. With statistical methods, we learn from the data while measuring, controlling, and communicating uncertainty.

1.1.1 Design, Descriptive Statistics, and Inferential Statistics

Statistical science has three aspects:
  1. Design: Planning how to gather relevant data for the subject matter of interest.
  2. Description: Summarizing the data.
  3. Inference: Making evaluations, such as estimations and predictions, based on the data.
Design refers to planning a study so that it yields useful data. For example, for a poll taken to determine public opinion on some issue, the design specifies how to select the people to interview and constructs the questionnaire for interviews. For a research study to compare an experimental diet with a standard diet to address obesity, the design specifies how to obtain people for the study, how to determine which people use each diet, and specifies the characteristics to measure to compare the diets.
Description refers to summarizing the data, to mine the information that the data provide. For any study, the raw data are a complete listing of observations that can be overwhelming for comprehension. To present the results, we reduce the data to simpler and more understandable form without distorting or losing much information. Graphs, tables, and numerical summaries such as averages and percentages are called descriptive statistics.
Inference refers to using the data to make estimations and other sorts of evaluations, such as predictions. These evaluations take into account random variability that occurs with the characteristics measured and the resulting uncertainty in decision-making. For instance, suppose that in the study comparing two diets, the people on the experimental diet had an average weight loss of 7.0 kilograms. What can we say about the average weight change if hypothetically all obese people used this diet? An inferential statistical method provides an interval of numbers within which we can predict that the average weight change would fall. The analysis...

Table des matiĂšres