eBook - ePub

Data Mining Techniques

Name: Data Mining Techniques
ISBN: 9781118087459

For Marketing, Sales, and Customer Relationship Management

Gordon S. Linoff,

Michael J. A. Berry,

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Data Mining Techniques

For Marketing, Sales, and Customer Relationship Management

Gordon S. Linoff,

Michael J. A. Berry,

About this book

The leading introductory book on data mining, fully updated and revised!

When Berry and Linoff wrote the first edition of Data Mining Techniques in the late 1990s, data mining was just starting to move out of the lab and into the office and has since grown to become an indispensable tool of modern business. This new edition—more than 50% new and revised— is a significant update from the previous one, and shows you how to harness the newest data mining methods and techniques to solve common business problems. The duo of unparalleled authors share invaluable advice for improving response rates to direct marketing campaigns, identifying new customer segments, and estimating credit risk. In addition, they cover more advanced topics such as preparing data for analysis and creating the necessary infrastructure for data mining at your company.

Features significant updates since the previous edition and updates you on best practices for using data mining methods and techniques for solving common business problems
Covers a new data mining technique in every chapter along with clear, concise explanations on how to apply each technique immediately
Touches on core data mining techniques, including decision trees, neural networks, collaborative filtering, association rules, link analysis, survival analysis, and more
Provides best practices for performing data mining using simple tools such as Excel

Data Mining Techniques, Third Edition covers a new data mining technique with each successive chapter and then demonstrates how you can apply that technique for improved marketing, sales, and customer support to get immediate results.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Wiley

Year

2011

Print ISBN

9780470650936

Edition

eBook ISBN

9781118087459

Topic

Computer Science

Subtopic

Data Warehousing

Index

Computer Science

Chapter 1

What Is Data Mining and Why Do It?

In the first edition of this book, the first sentence of the first chapter began with the words, “Somerville, Massachusetts, home to one of the authors of this book…” and went on to tell of two small businesses in that town and how they had formed learning relationships with their customers. One of those businesses, a hair braider, no longer braids the hair of the little girl. In the years since the first edition, the little girl grew up, and moved away, and no longer wears her hair in cornrows. Her father, one of the authors, moved to nearby Cambridge. But one thing has not changed. The author is still a loyal customer of the Wine Cask, where some of the same people who first introduced him to cheap Algerian reds in 1978 and later to the wine-growing regions of France are now helping him to explore the wines of Italy and Germany.

Decades later, the Wine Cask still has a loyal customer. That loyalty is no accident. The staff learns the tastes of their customers and their price ranges. When asked for advice, the response is based on accumulated knowledge of that customer's tastes and budgets as well as on their knowledge of their stock.

The people at the Wine Cask know a lot about wine. Although that knowledge is one reason to shop there rather than at a big discount liquor store, their intimate knowledge of each customer is what keeps customers coming back. Another wine shop could open across the street and hire a staff of expert oenophiles, but achieving the same level of intimate customer knowledge would take them months or years.

Well-run small businesses naturally form learning relationships with their customers. Over time, they learn more and more about their customers, and they use that knowledge to serve them better. The result is happy, loyal customers and profitable businesses.

Larger companies, with hundreds of thousands or millions of customers, do not enjoy the luxury of actual personal relationships with each one. Larger firms must rely on other means to form learning relationships with their customers. In particular, they must learn to take full advantage of something they have in abundance — the data produced by nearly every customer interaction. This book is about analytic techniques that can be used to turn customer data into customer knowledge.

What Is Data Mining?

Although some data mining techniques are quite new, data mining itself is not a new technology, in the sense that people have been analyzing data on computers since the first computers were invented — and without computers for centuries before that. Over the years, data mining has gone by many different names, such as knowledge discovery, business intelligence, predictive modeling, predictive analytics, and so on. The definition of data mining as used by the authors is:

Data mining is a business process for exploring large amounts of data to discover meaningful patterns and rules.

This definition has several parts, all of which are important.

Data Mining Is a Business Process

Data mining is a business process that interacts with other business processes. In particular, a process does not have a beginning and an end: it is ongoing. Data mining starts with data, then through analysis informs or inspires action, which, in turn, creates data that begets more data mining.

The practical consequence is that organizations who want to excel at using their data to improve their business do not view data mining as a sideshow. Instead, their business strategy must include collecting data, analyzing data for long-term benefit, and acting on the results.

At the same time, data mining readily fits in with other strategies for understanding markets and customers. Market research, customer panels, and other techniques are compatible with data mining and more intensive data analysis. The key is to recognize the focus on customers and the commonality of data across the enterprise.

Large Amounts of Data

One of the authors regularly asks his audiences, “How much is a lot of data?” when he speaks. Students give answers such as, “all the transactions for 10 million customers” or “terabytes of data.” His more modest answer, “65,356 rows,” still gets sighs of comprehension even though Microsoft has allowed more than one million rows in Excel spreadsheets since 2007.

A tool such as Excel is incredibly versatile for working with relatively small amounts of data. It allows a wide variety of computations on the values in each row or column; pivot tables are amazingly practical for understanding data and trends; and the charts offer a powerful mechanism for data visualization.

In the early days of data mining (the 1960s and 1970s), data was scarce. Some of the techniques described in this book were developed on data sets containing a few hundred records. Back then, a typical data set might have had a few attributes about mushrooms, and whether they are poisonous or edible. Another might have had attributes of cars, with the goal of estimating gas mileage. Whatever the particular data set, it is a testament to the strength of the techniques developed in those days that they still work on data that no longer fits in a spreadsheet.

Because computing power is readily available, a large amount of data is not a handicap; it is an advantage. Many of the techniques in this book work better on large amounts of data than on small amounts — you can substitute data for cleverness. In other words, data mining lets computers do what computers do best — dig through lots and lots of data. This, in turn, lets people do what people do best, which is set up the problem and understand the results.

That said, some case studies in this book still use relatively small data sizes. Perhaps the smallest is a clustering case study in Chapter 13. This case study finds demographically similar towns, among just a few hundred towns in New England. As powerful as Excel is, it does not have a built-in function that says “group these towns by similarity.”

That is where data mining comes in. Whether the goal is to find similar groups of New England towns, or to determine the causes of customer attrition, or any of a myriad of other goals sprinkled throughout the chapters, data mining techniques can leverage data where simpler desktop tools no longer work so well.

Meaningful Patterns and Rules

Perhaps the most important part of the definition of data mining is the part about meaningful patterns. Although data mining can certainly be fun, helping the business is more important than amusing the miner.

In many ways finding patterns in data is not tremendously difficult. The operational side of the business generates the data, necessarily generating patterns at the same time. However, the goal of data mining — at least as the authors use the term — is not to find just any patterns in data, but to find patterns that are useful for the business.

This can mean finding patterns to help routine business operations. Consider a call center application that assigns customers a color. “Green” means be very nice, because the caller is a valuable customer, worth the expense of keeping happy; “yellow” means use some caution because the customer may be valuable but also has signs of some risk; and “red” means do not give the customer any special treatment because the customer is highly risky. Finding patterns can also mean targeting retention campaigns to customers who are most likely to leave. It can mean optimizing customer acquisition both for the short-term gains in customer numbers and for the medium- and long-term benefit in customer value.

Increasingly, companies are developing business models centered around data mining — although they may not use that term. One company that the authors have worked with helps retailers make recommendations on the web; this company only gets paid when web shoppers click on its recommendations. That is only one example. Some companies aggregate data from different sources, bringing the data together to get a more complete customer picture. Some companies, such as LinkedIn, use information provided by some people to provide premium services to others — and everyone benefits when recruiters can find the right candidates for open job positions. In all these cases, the goal is to direct products and services to the people who are most likely to need them, making the process of buying and selling more efficient for everyone involved.

Data Mining and Customer Relationship Management

This book is not about data mining in general, but specifically about data mining for customer relationship management. Firms of all sizes need to learn to emulate what small, service-oriented businesses have always done well — creating one-to-one relationships with their customers. Customer relationship management is a broad topic that is the subject of many articles, books, and conferences. Everything from lead-tracking software to campaign management software to call center software gets labeled as a customer relationship management tool. The focus of this book is narrower — the role that data mining can play in improving customer relationship management by improving the company's ability to form learning relationships with its customers.

In every industry, forward-looking companies are moving toward the goal of understanding each customer individually and using that understanding to make it easier (and more profitable) for the customer to do business with them rather than with competitors. These same firms are learning to look at the value of each customer so that they know which ones are worth investing money and effort to hold on to and which ones should be allowed to depart. This change in focus from broad market segments to individual customers requires changes throughout the enterprise, and nowhere more so than in marketing, sales, and customer support.

Building a business around the customer relationship is a revolutionary change for most companies. Banks have traditionally focused on maintaining the spread between the rate they pay to bring money in and the rate they charge to lend money out. Telephone companies have concentrated on connecting calls through the network. Insurance companies have focused on processing claims, managing investments, and maintaining their loss ratio. Turning a product-focused organization into a customer-centric one takes more than data mining. A data mining result that suggests offering a particular customer a widget instead of a gizmo will be ignored if the manager's bonus depends on the number of gizmos sold this quarter and not on the number of widgets (even if the latter are more profitable or induce customers to be more profitable in the long term).

In a narrow sense, data mining is a collection of tools and techniques. It is one of several technologies required to support a customer-centric enterprise. In a broader sense, data mining is an attitude that business actions should be based on learning, that informed decisions are better than uninformed decisions, and that measuring results is beneficial to the business. Data mining is also a process and a methodology for applying analytic tools and techniques. For data mining to be effective, the other requirements for analytic CRM must also be in place. To form a learning relationship with its customers, a company must be able to

Notice what its customers are doing
Remember what it and its customers have done over time
Learn from what it has remembered
Act on what it has learned to make customers more profitable

Although the focus of this book is on the third bullet — learning from what has happened in the past — that learning cannot take place in a vacuum. There must be transaction processing systems to capture customer interactions, data warehouses to store historical customer behavior information, data mining to translate history into plans for future action, and a customer relationship strategy to put those plans into practice.

Data mining, to repeat the earlier definition, is a business process for exploration and analysis of large quantities of data in order to discover meaningful patterns and rules. This book assumes that the goal of data mining is to allow a company to improve its marketing, sales, and customer support operations through a better understanding of its customers. Keep in mind, however, that the data mining techniques and tools described in this book are equally applicable in fields as varied as law enforcement, radio astronomy, medicine, and industrial process control.

Why Now?

Most data mining techniques have existed, at least as academic algorithms, for decades (the oldest, survival analysis, actually dates back centuries). Data mining has caught on in a big way, increasing dramatically since the 1990s. This is due to the convergence of several factors:

Data is being produced.
Data is being warehoused.
Computing power is affordable.
Interest in customer relationship management is strong.
Commercial data mining software products are readily available.

The combination of these factors means that data mining is increasingly appearing as a foundation of business strategies. Google was not the first search engine, but it was the first search engine to combine sophisticated algorithms for searching with a business model based on maximizing the value of click-through revenue. Across almost every business domain, companies are discovering that they have information — information about subscribers, about Web visitors, about shippers, and payment patterns, calling patterns, friends and neighbors. Companies are increasingly turning to data analysis to leverage their information.

Data Is Being Produced

Data mining makes the most sense where large volumes of data are available. In fact, most data mining algorithms require somewhat large amounts of data to build and train models.

One of the underlying themes of this book is that data is everywhere and available in copious amounts. This is especially true for companies that have custo...

Cover
Title Page
Copyright
Dedication
About the Authors
Credits
Acknowledgments
Introduction
Chapter 1: What Is Data Mining and Why Do It?
Chapter 2: Data Mining Applications in Marketing and Customer Relationship Management
Chapter 3: The Data Mining Process
Chapter 4: What You Should Know About Data
Chapter 5: Descriptions and Prediction: Profiling and Predictive Modeling
Chapter 6: Data Mining Using Classic Statistical Techniques
Chapter 7: Decision Trees
Chapter 8: Artificial Neural Networks
Chapter 9: Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering
Chapter 10: Knowing When to Worry: Using Survival Analysis to Understand Customers
Chapter 11: Genetic Algorithms and Swarm Intelligence
Chapter 12: Tell Me Something New: Pattern Discovery and Data Mining
Chapter 13: Finding Islands of Similarity: Automatic Cluster Detection
Chapter 14: Alternative Approaches to Cluster Detection
Chapter 15: Market Basket Analysis and Association Rules
Chapter 16: Link Analysis
Chapter 17: Data Warehousing, OLAP, Analytic Sandboxes, and Data Mining
Chapter 18: Building Customer Signatures
Chapter 19: Derived Variables: Making the Data Mean More
Chapter 20: Too Much of a Good Thing? Techniques for Reducing the Number of Variables
Chapter 21: Listen Carefully to What Your Customers Say: Text Mining
Index

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is Data Mining Techniques an online PDF/ePUB?

Yes, you can access Data Mining Techniques by Gordon S. Linoff,Michael J. A. Berry in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Warehousing. We have over 1.5 million books available in our catalogue for you to explore.

Related ISBNs