
Discovering Knowledge in Data
An Introduction to Data Mining
Daniel T. Larose, Chantal D. Larose
Discovering Knowledge in Data
An Introduction to Data Mining
Daniel T. Larose, Chantal D. Larose
About This Book
The field of data mining lies at the confluence of predictive analytics, statistical analysis, and business intelligence. Due to the ever-increasing complexity and size of data sets and the wide range of applications in computer science, business, and health care, the process of discovering knowledge in data is more relevant than ever before.
This book provides the tools needed to thrive in today's big data world. The author demonstrates how to leverage a company's existing databases to increase profits and market share, and carefully explains the most current data science methods and techniques. The reader will "learn data mining by doing data mining". By adding chapters on data modelling preparation, imputation of missing data, and multivariate statistical analysis, Discovering Knowledge in Data, Second Edition remains the eminent reference on data mining.
- The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.
- Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and Visualization
- Offers extensive coverage of the R statistical programming language
- Contains 280 end-of-chapter exercises
- Includes a companion website for university instructorswho adopt the book
Information
Chapter 1
An Introduction to Data Mining
- 1.1 What is Data Mining?
- 1.2 Wanted: Data Miners
- 1.3 The Need for Human Direction of Data Mining
- 1.4 The Cross-Industry Standard Practice for Data Mining
- 1.5 Fallacies of Data Mining
- 1.6 What Tasks Can Data Mining Accomplish?
1.1 What is Data Mining?
1.2 Wanted: Data Miners
- The explosive growth in data collection, as exemplified by the supermarket scanners above,
- The storing of the data in data warehouses, so that the entire enterprise has access to a reliable, current database,
- The availability of increased access to data from web navigation and intranets,
- The competitive pressure to increase market share in a globalized economy,
- The development of âoff-the-shelfâ commercial data mining software suites,
- The tremendous growth in computing power and storage capacity.
There will be a shortage of talent necessary for organizations to take advantage of big data. A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from big data . . . . We project that demand for deep analytical positions in a big data world could exceed the supply being produced on current trends by 140,000 to 190,000 positions. . . . In addition, we project a need for 1...