
Multivariate Statistics
Classical Foundations and Modern Machine Learning
- 504 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
About this book
This book explores multivariate statistics from both traditional and modern perspectives. The first section covers core topics like multivariate normality, MANOVA, discrimination, PCA, and canonical correlation analysis. The second section includes modern concepts such as gradient boosting, random forests, variable importance, and causal inference.
A key theme is leveraging classical multivariate statistics to explain advanced topics and prepare for contemporary methods. For example, linear models provide a foundation for understanding regu-larization with AIC and BIC, leading to a deeper analysis of regularization through generalization error and the VC theorem. Discriminant analysis introduces the weighted Bayes rule, which leads into modern classification techniques for class-imbalanced machine learning problems. Steepest descent serves as a precursor to matching pursuit and gradient boosting. Axis-aligned trees like CART, a classical tool, set the stage for more recent methods like super greedy trees.
Another central theme is training error. Introductory courses often caution that reducing training error too aggressively can lead to overfitting. At the same time, training error, also referred to as empirical risk, is a foundational concept in statistical learning theory. In regression, training error corresponds to the residual sum of squares, and minimizing it results in the least squares solution, which can lead to overfitting. Regardless of this concern, empirical risk plays a pivotal role in evaluating the potential for effective learning. The principle of empirical risk minimization demonstrates that minimizing training error can be advantageous when paired with regularization. This idea is further examined through techniques such as penalization, matching pursuit, gradient boosting, and super greedy tree constructions.
Key Features:
• Covers both classical and contemporary multivariate statistics.
• Each chapter includes a carefully selected set of exercises that vary in degree of difficulty and are both applied and theoretical.
• The book can also serve as a reference for researchers due to the diverse topics covered, including new material on super greedy trees, rule-based variable selection, and machine learning for causal inference.
• Extensive treatment on trees that provides a comprehensive and unified approach to understanding trees in terms of partitions and empirical risk minimization.
• New content on random forests, including random forest quantile classifiers for class-imbalanced problems, multivariate random forests, subsampling for confidence regions, super greedy forests. An entire chapter is dedicated to random survival forests, featuring new material on random hazard forests extending survival forests to time-varying covariates.
Frequently asked questions
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Information
Table of contents
- Cover Page
- Half-Title Page
- Title Page
- Copyright Page
- Contents
- Preface
- Author
- 1 Introduction
- 2 Properties of Random Vectors and Background Material
- 3 Multivariate Normal Distribution
- 4 Linear Regression
- 5 Multivariate Regression
- 6 Discriminant Analysis and Classification
- 7 Generalization Error
- 8 Principal Component Analysis
- 9 Canonical Correlation Analysis
- 10 Newton's Method
- 11 Steepest Descent
- 12 Gradient Boosting
- 13 Detailed Analysis of L2Boost
- 14 Coordinate Descent
- 15 Trees
- 16 Random Forests
- 17 Random Forests Variable Selection
- 18 Splitting Effect on Random Forests
- 19 Random Survival Forests
- 20 Causal Estimates using Machine Learning
- Index