eBook - ePub

Discovering Knowledge in Data

Name: Discovering Knowledge in Data
ISBN: 9781118873571

An Introduction to Data Mining

Daniel T. Larose,

Chantal D. Larose,

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Discovering Knowledge in Data

An Introduction to Data Mining

Daniel T. Larose,

Chantal D. Larose,

About this book

The field of data mining lies at the confluence of predictive analytics, statistical analysis, and business intelligence. Due to the ever-increasing complexity and size of data sets and the wide range of applications in computer science, business, and health care, the process of discovering knowledge in data is more relevant than ever before.

This book provides the tools needed to thrive in today's big data world. The author demonstrates how to leverage a company's existing databases to increase profits and market share, and carefully explains the most current data science methods and techniques. The reader will "learn data mining by doing data mining". By adding chapters on data modelling preparation, imputation of missing data, and multivariate statistical analysis, Discovering Knowledge in Data, Second Edition remains the eminent reference on data mining.

The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.
Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and Visualization
Offers extensive coverage of the R statistical programming language
Contains 280 end-of-chapter exercises
Includes a companion website for university instructors who adopt the book

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Wiley

Year

2014

Print ISBN

9780470908747

Edition

eBook ISBN

9781118873571

Topic

Computer Science

Subtopic

Data Mining

Index

Computer Science

Chapter 1
An Introduction to Data Mining

1.1 What is Data Mining?
1.2 Wanted: Data Miners
1.3 The Need for Human Direction of Data Mining
1.4 The Cross-Industry Standard Practice for Data Mining
1.5 Fallacies of Data Mining
1.6 What Tasks Can Data Mining Accomplish?
1. References
2. Exercises

1.1 What is Data Mining?

The McKinsey Global Institute (MGI) reports [1] that most American companies with more than 1000 employees had an average of at least 200 terabytes of stored data. MGI projects that the amount of data generated worldwide will increase by 40% annually, creating profitable opportunities for companies to leverage their data to reduce costs and increase their bottom line. For example, retailers harnessing this “big data” to best advantage could expect to realize an increase in their operating margin of more than 60%, according to the MGI report. And healthcare providers and health maintenance organizations (HMOs) that properly leverage their data storehouses could achieve $300 in cost savings annually, through improved efficiency and quality.

The MIT Technology Review reports [2] that it was the Obama campaign's effective use of data mining that helped President Obama win the 2012 presidential election over Mitt Romney. They first identified likely Obama voters using a data mining model, and then made sure that these voters actually got to the polls. The campaign also used a separate data mining model to predict the polling outcomes county-by-county. In the important swing county of Hamilton County, Ohio, the model predicted that Obama would receive 56.4% of the vote; the Obama share of the actual vote was 56.6%, so that the prediction was off by only 0.02%. Such precise predictive power allowed the campaign staff to allocate scarce resources more efficiently.

About 13 million customers per month contact the West Coast customer service call center of the Bank of America, as reported by CIO Magazine [3]. In the past, each caller would have listened to the same marketing advertisement, whether or not it was relevant to the caller's interests. However, “rather than pitch the product of the week, we want to be as relevant as possible to each customer,” states Chris Kelly, vice president and director of database marketing at Bank of America in San Francisco. Thus Bank of America's customer service representatives have access to individual customer profiles, so that the customer can be informed of new products or services that may be of greatest interest to him or her. This is an example of mining customer data to help identify the type of marketing approach for a particular customer, based on customer's individual profile.

So, what is data mining?

Data mining is the process of discovering useful patterns and trends in large data sets.

While waiting in line at a large supermarket, have you ever just closed your eyes and listened? You might hear the beep, beep, beep, of the supermarket scanners, reading the bar codes on the grocery items, ringing up on the register, and storing the data on company servers. Each beep indicates a new row in the database, a new “observation” in the information being collected about the shopping habits of your family, and the other families who are checking out.

Clearly, a lot of data is being collected. However, what is being learned from all this data? What knowledge are we gaining from all this information? Probably not as much as you might think, because there is a serious shortage of skilled data analysts.

1.2 Wanted: Data Miners

As early as 1984, in his book Megatrends [4], John Naisbitt observed that “We are drowning in information but starved for knowledge.” The problem today is not that there is not enough data and information streaming in. We are in fact inundated with data in most fields. Rather, the problem is that there are not enough trained human analysts available who are skilled at translating all of these data into knowledge, and thence up the taxonomy tree into wisdom.

The ongoing remarkable growth in the field of data mining and knowledge discovery has been fueled by a fortunate confluence of a variety of factors:

The explosive growth in data collection, as exemplified by the supermarket scanners above,
The storing of the data in data warehouses, so that the entire enterprise has access to a reliable, current database,
The availability of increased access to data from web navigation and intranets,
The competitive pressure to increase market share in a globalized economy,
The development of “off-the-shelf” commercial data mining software suites,
The tremendous growth in computing power and storage capacity.

Unfortunately, according to the McKinsey report [1],

There will be a shortage of talent necessary for organizations to take advantage of big data. A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from big data . . . . We project that demand for deep analytical positions in a big data world could exceed the supply being produced on current trends by 140,000 to 190,000 positions. . . . In addition, we project a need for 1...

Cover
Series
Title Page
Copyright
Preface
Chapter 1: An Introduction to Data Mining
Chapter 2: Data Preprocessing
Chapter 3: Exploratory Data Analysis
Chapter 4: Univariate Statistical Analysis
Chapter 5: Multivariate Statistics
Chapter 6: Preparing to Model the Data
Chapter 7: k-Nearest Neighbor Algorithm
Chapter 8: Decision Trees
Chapter 9: Neural Networks
Chapter 10: Hierarchical and k-Means Clustering
Chapter 11: Kohonen Networks
Chapter 12: Association Rules
Chapter 13: Imputation of Missing Data
Chapter 14: Model Evaluation Techniques
Appendix: Data Summarization and Visualization
Index
End User License Agreement

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is Discovering Knowledge in Data an online PDF/ePUB?

Yes, you can access Discovering Knowledge in Data by Daniel T. Larose,Chantal D. Larose in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Mining. We have over 1.5 million books available in our catalogue for you to explore.

Related ISBNs

9781118879337,

9781801076050,

Discovering Knowledge in Data

An Introduction to Data Mining

Discovering Knowledge in Data

An Introduction to Data Mining

About this book

Trusted by 375,005 students

Information

Chapter 1
An Introduction to Data Mining

1.1 What is Data Mining?

1.2 Wanted: Data Miners

Table of contents

Frequently asked questions