
Statistical and Machine-Learning Data Mining:
Techniques for Better Predictive Modeling and Analysis of Big Data, Third Edition
- 656 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Statistical and Machine-Learning Data Mining:
Techniques for Better Predictive Modeling and Analysis of Big Data, Third Edition
About this book
Interest in predictive analytics of big data has grown exponentially in the four years since the publication of Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data, Second Edition. In the third edition of this bestseller, the author has completely revised, reorganized, and repositioned the original chapters and produced 13 new chapters of creative and useful machine-learning data mining techniques. In sum, the 43 chapters of simple yet insightful quantitative techniques make this book unique in the field of data mining literature.
What is new in the Third Edition:
-
- The current chapters have been completely rewritten.
-
- The core content has been extended with strategies and methods for problems drawn from the top predictive analytics conference and statistical modeling workshops.
-
- Adds thirteen new chapters including coverage of data science and its rise, market share estimation, share of wallet modeling without survey data, latent market segmentation, statistical regression modeling that deals with incomplete data, decile analysis assessment in terms of the predictive power of the data, and a user-friendly version of text mining, not requiring an advanced background in natural language processing (NLP).
-
- Includes SAS subroutines which can be easily converted to other languages.
As in the previous edition, this book offers detailed background, discussion, and illustration of specific methods for solving the most commonly experienced problems in predictive modeling and analysis of big data. The author addresses each methodology and assigns its application to a specific type of problem. To better ground readers, the book provides an in-depth discussion of the basic methodologies of predictive modeling and analysis. While this type of overview has been attempted before, this approach offers a truly nitty-gritty, step-by-step method that both tyros and experts in the field can enjoy playing with.
Frequently asked questions
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Information
- Definition of the problemāDetermining the best way to tackle the problem is not always obvious. Management objectives are often expressed qualitatively, in which case the selection of the outcome or target (dependent) variable is subjectively biased. When the objectives are clearly stated, the appropriate dependent variable is often not available, in which case a surrogate must be used.
- Determining techniqueāThe technique first selected is often the one with which the data analyst is most comfortable; it is not necessarily the best technique for solving the problem.
- Use of competing techniquesāApplying alternative techniques increases the odds that a thorough analysis is conducted.
- Rough comparisons of efficacyāComparing variability of results across techniques can suggest additional techniques or the deletion of alternative techniques.
- Comparison in terms of a precise (and thereby inadequate) criterionāAn explicit criterion is difficult to define. Therefore, precise surrogates are often used.
- Optimization in terms of a precise and inadequate criterionāAn explicit criterion is difficult to define. Therefore, precise surrogates are often used.
- Comparison in terms of several optimization criteriaāThis constitutes the final step in determining the best solution.
- FlexibilityāTechniques with greater flexibility to delve into the data
- PracticalityāAdvice for procedures of analyzing data
- InnovationāTechniques for interpreting results
- UniversalityāUse all statistics that apply to analyzing data
- SimplicityāAbove all, the belief that simplicity is the golden rule
Table of contents
- Cover
- Half Title
- Title
- Copyright
- Dedication
- Contents
- Preface to Third Edition
- Preface of Second Edition
- Acknowledgments
- Author
- 1. Introduction
- 2. Science Dealing with Data: Statistics and Data Science
- 3. Two Basic Data Mining Methods for Variable Assessment
- 4. CHAID-Based Data Mining for Paired-Variable Assessment
- 5. The Importance of Straight Data: Simplicity and Desirability for Good Model-Building Practice
- 6. Symmetrizing Ranked Data: A Statistical Data Mining Method for Improving the Predictive Power of Data
- 7. Principal Component Analysis: A Statistical Data Mining Method for Many-Variable Assessment
- 8. Market Share Estimation: Data Mining for an Exceptional Case
- 9. The Correlation Coefficient: Its Values Range between Plus and Minus 1, or Do They?
- 10. Logistic Regression: The Workhorse of Response Modeling
- 11. Predicting Share of Wallet without Survey Data
- 12. Ordinary Regression: The Workhorse of Profit Modeling
- 13. Variable Selection Methods in Regression: Ignorable Problem, Notable Solution
- 14. CHAID for Interpreting a Logistic Regression Model
- 15. The Importance of the Regression Coefficient
- 16. The Average Correlation: A Statistical Data Mining Measure for Assessment of Competing Predictive Models and the Importance of the Predictor Variables
- 17. CHAID for Specifying a Model with Interaction Variables
- 18. Market Segmentation Classification Modeling with Logistic Regression
- 19. Market Segmentation Based on Time-Series Data Using Latent Class Analysis
- 20. Market Segmentation: An Easy Way to Understand the Segments
- 21. The Statistical Regression Model: An Easy Way to Understand the Model
- 22. CHAID as a Method for Filling in Missing Values
- 23. Model Building with Big Complete and Incomplete Data
- 24. Art, Science, Numbers, and Poetry
- 25. Identifying Your Best Customers: Descriptive, Predictive, and Look-Alike Profiling
- 26. Assessment of Marketing Models
- 27. Decile Analysis: Perspective and Performance
- 28. Net T-C Lift Model: Assessing the Net Effects of Test and Control Campaigns
- 29. Bootstrapping in Marketing: A New Approach for Validating Models
- 30. Validating the Logistic Regression Model: Try Bootstrapping
- 31. Visualization of Marketing Models: Data Mining to Uncover Innards of a Model
- 32. The Predictive Contribution Coefficient: A Measure of Predictive Importance
- 33. Regression Modeling Involves Art, Science, and Poetry, Too
- 34. Opening the Dataset: A Twelve-Step Program for Dataholics
- 35. Genetic and Statistic Regression Models: A Comparison
- 36. Data Reuse: A Powerful Data Mining Effect of the GenIQ Model
- 37. A Data Mining Method for Moderating Outliers Instead of Discarding Them
- 38. Overfitting: Old Problem, New Solution
- 39. The Importance of Straight Data: Revisited
- 40. The GenIQ Model: Its Definition and an Application
- 41. Finding the Best Variables for Marketing Models
- 42. Interpretation of Coefficient-Free Models
- 43. Text Mining: Primer, Illustration, and TXTDM Software
- 44. Some of My Favorite Statistical Subroutines
- Index