Although a central theme of the book is the use of statistical models in understanding and interpreting sports data, before presenting the details of these methods, it is important to understand the basic properties of data. These properties are the subject of Chapter 2
, which covers the fundamental methods of describing and summarizing data.
As noted in the previous section, the use of probability theory and statistical methodology to describe relationships and express conclusions is a crucial part of analytic methods. Chapter 3
covers those aspects of probability theory that are necessary to understand the randomness inherent in sports data. These concepts are applied to a number of scenarios in sports in which consideration of the underlying probabilities leads to useful insights. As noted previously, appreciating and understanding randomness is one of the main contributions of analytic methods.
has several goals. One is to describe the statistical reasoning that underlies the analytic methods described in this book. Another is to present some basic statistical concepts, such as the margin of error
and statistical significance
, that play a central role in dealing with the randomness of sports data. Finally, Chapter 4
covers some basic statistical methods that are essential in studying sports data.
develop the core statistical procedures for analyzing data based on sports results. Chapter 5
is concerned with detecting the presence of a relationship between variables and measuring the strength of such
relationships. Several different methods are presented, designed to deal with different types of data and different goals for the analysis.
takes the basic theme of Chapter 5
—the relationship between variables—and goes a step further, covering methods for summarizing the relationship between two variables in a concise and useful way. These methods, known collectively as linear regression,
use statistical methodology to find a function relating the two variables. The simplest method of this type yields a linear function for the variables; Chapter 6
also covers more sophisticated methods that are used when the relationship is nonlinear.
In Chapter 7
, these methods are extended to the case of several variables when we wish to describe one of the variables, known as the response variable
, in terms of the others, known as predictors
. These methods, also known as linear regression, are perhaps the most commonly used statistical procedures, with applications in a wide range of scientific fields. Chapter 7
contains a detailed discussion of the basic methodology, along with more advanced topics such as the use of categorical variables as predictors, methods for finding the most important predictor, and interaction, which occurs when the effect of one of the predictors depends on the values of other predictors. In addition to the descriptions of the relevant statistical methodology, Chapters 6
include important information on the strengths and limitations of these methods as well as on the implementation of the methodology and the interpretation of the results.
discusses some more advanced methods that build on the topics covered in Chapters 5
. Many of these methods are extensions of the regression methodology covered in Chapters 6
, such as logistic regression for modeling the relationship between a binary response variable and predictor variables, and spline models for modeling highly nonlinear relationships. Other methods, such as using pooling to estimate team- and player-specific parameters, principal components analysis for summarizing data, and the use of random effects to analyze variability, introduce new concepts.
The topics covered in this book are similar to those that would be covered in courses on statistical methodology. However, they have been chosen specifically because of their importance and usefulness in analyzing sports data. Therefore, statistical methods that are not useful in analyzing sports data are not covered. Furthermore, many of the topics that are discussed are fairly advanced in the sense that they would not typically be covered in an introductory statistics course.