eBook - ePub

IBM SPSS Modeler Essentials

Name: IBM SPSS Modeler Essentials
ISBN: 9781788296823

Keith McCormick,

Jesus Salcedo,

Bowen Wei,

238 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

IBM SPSS Modeler Essentials

Keith McCormick,

Jesus Salcedo,

Bowen Wei,

About this book

Get to grips with the fundamentals of data mining and predictive analytics with IBM SPSS Modeler

Key Features

Get up–and-running with IBM SPSS Modeler without going into too much depth.
Identify interesting relationships within your data and build effective data mining and predictive analytics solutions
A quick, easy–to-follow guide to give you a fundamental understanding of SPSS Modeler, written by the best in the business

Book Description

IBM SPSS Modeler allows users to quickly and efficiently use predictive analytics and gain insights from your data. With almost 25 years of history, Modeler is the most established and comprehensive Data Mining workbench available. Since it is popular in corporate settings, widely available in university settings, and highly compatible with all the latest technologies, it is the perfect way to start your Data Science and Machine Learning journey.

This book takes a detailed, step-by-step approach to introducing data mining using the de facto standard process, CRISP-DM, and Modeler's easy to learn "visual programming" style. You will learn how to read data into Modeler, assess data quality, prepare your data for modeling, find interesting patterns and relationships within your data, and export your predictions. Using a single case study throughout, this intentionally short and focused book sticks to the essentials. The authors have drawn upon their decades of teaching thousands of new users, to choose those aspects of Modeler that you should learn first, so that you get off to a good start using proven best practices.

This book provides an overview of various popular data modeling techniques and presents a detailed case study of how to use CHAID, a decision tree model. Assessing a model's performance is as important as building it; this book will also show you how to do that. Finally, you will see how you can score new data and export your predictions. By the end of this book, you will have a firm understanding of the basics of data mining and how to effectively use Modeler to build predictive models.

What you will learn

Understand the basics of data mining and familiarize yourself with Modeler's visual programming interface
Import data into Modeler and learn how to properly declare metadata
Obtain summary statistics and audit the quality of your data
Prepare data for modeling by selecting and sorting cases, identifying and removing duplicates, combining data files, and modifying and creating fields
Assess simple relationships using various statistical and graphing techniques
Get an overview of the different types of models available in Modeler
Build a decision tree model and assess its results
Score new data and export predictions

Who this book is for

This book is ideal for those who are new to SPSS Modeler and want to start using it as quickly as possible, without going into too much detail. An understanding of basic data mining concepts will be helpful, to get the best out of the book.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Edition

Topic

Computer Science

Subtopic

Data Mining

Index

Computer Science

Model Assessment and Scoring

In the previous chapter we built a model. In this chapter, we are going to discuss different ways of assessing and improving the results of a model. In addition, you will also begin to learn how to use your models in the real world. Specifically, we will cover the following topics:

Contrasting model assessment with the Evaluation phase
Model assessment using the Analysis node
Modifying CHAID settings
Model comparison using the Analysis node
Model assessment and comparison using the Evaluation node
Scoring new data
Exporting predictions

Contrasting model assessment with the Evaluation phase

As we briefly discussed in Chapter 1, Introduction to Data Mining and Predictive Analytics, model assessment (a modeling phase task) is quite different from the Evaluation phase. Some of the same tools can apply to both, but the stage of the project and the thought process is quite different. During model assessment, you are potentially comparing a large number of models. You may even try dozens of variations of algorithms, settings, and modifications to the data.

Therefore, you need easy, objective criteria on which to rank these models. Our colleagues and management simply will not have the time to be brought in to judge the efficacy of dozens of models so we need to narrow it down to just a couple of models before the Evaluation phase begins. Tom Khabaza, one of the original authors of CRISP-DM has written the Nine Laws of Data Mining (http://khabaza.codimension.net/index_files/9laws.htm) and the 8th Law of Data Mining is the Value Law, which states:

The value of data mining results is not determined by the accuracy or stability of predictive models.

Well, in this lesson, particularly in the Analysis node section, we are going to focus on the accuracy and stability of a model. So, what is this law getting at? An extended quote will help make the distinction:

Accuracy and stability are useful measures of how well a predictive model makes its predictions. Accuracy means how often the predictions are correct (where they are truly predictions) and stability means how much (or rather how little) the predictions would change if the data used to create the model were a different sample from the same population. Given the central role of the concept of prediction in data mining, the accuracy and stability of a predictive model might be expected to determine its value, but this is not the case.

The value of a predictive model arises in two ways:

The model's predictions drive improved (more effective) action.
The model delivers insight (new knowledge), which leads to improved strategy.

So the important thing here is that when we move beyond model assessment and into evaluation, we have to shift our focus from accuracy and stability to action and strategy. We need an intervention strategy that uses our predictions to drive action. The Evaluation phase will be to measure, as specifically as possible, how the improved actions produced measurably better results. Better is usually measured in dollar terms, but not always. So, we have to return our project to the language of the business and compare our performance to the specific goals laid out in the Business Understanding phase.

Model assessment using the Analysis node

When a model seems satisfactory based on performance, fields included, and the relationships between the predictors and the target, the next step is model assessment. Formally, model evaluation is the assessment of how a model performs on unseen data. Modeler makes this easy because of the Partition field.

We previously used the Partition node to split the data file into Testing and Training partitions. In the previous chapter, we were careful when studying the model not to use the Testing partition. Doing so would compromise model testing because we would learn how well the model performed on the unseen data. In this chapter, we will use the Analysis and Evaluation nodes to further assess our model.

The Analysis node allows you to evaluate the accuracy of a model, and it organizes output by the Partition field values. Analysis nodes perform various comparisons between predicted values and actual values for one or more generated models. The Analysis node is contained in the Output palette:

Open the Assessment stream.
Add an Analysis node from the Output palette.
Connect the generated CHAID model to the Analysis node.
Edit the Analysis node:

The Analysis node provides several types of output. Coincidence matrices (for symbolic targets) are cross tabulations between the predicted and actual values. The Confidence figures (if available) option provides summary information for models that produce confidence values.

By default, the Analysis node will organize output by the Partition field. We can also ask that all the output be broken down by one or more categorical fields:

Click Coincidence matrices (for symbolic targets).
Click Run:

The table shows the overall accuracy of the model in the Training and Testing partitions. When we were examining the CHAID model, we only saw information on the Training dataset and rule specify accuracy; here we are provided with the overall model accuracy.

As we can see from the output, the overall accuracy of the CHAID model on the Training data is 92.42%. Whether this level of accuracy is acceptable will depend on many factors. More important is how well the model performed on the unseen Testing dataset. The accuracy of the CHAID model for the Testing group is the fundamental overall test of the model. If the accuracy on the Testing data is acceptable, then we can deem the model validated.

For this data , the Testing dataset accuracy is 92.27%. The typical outcome when Testing data is passed through a model node is that the accuracy drops by some amount. If accuracy drops or changes by a large amount (about 5%), it suggests that the model overfit the Training data or that the validation data differed in some systematic way from the Training data (although the random sampling done by the Partition node minimizes the chance of this). If accuracy drops or changes by only a small amount, it provides evidence that the model will work well in the future; that is, we have a reliable model that will generalize to new data. When this is favorable, the model is described as being stable. The small change of 0.15% in accuracy from the Training to the Testing data indicates that the CHAID model is validated.

In summary, we are focused on two questions:

Is the test accuracy value sufficiently high
Is the stability sufficient—as revealed by a small difference between train and test accuracy

The Coincidence Matrix table will be of special interest when there are target categories in which we ...

Title Page
Copyright
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Dedication
Preface
Introduction to Data Mining and Predictive Analytics
The Basics of Using IBM SPSS Modeler
Importing Data into Modeler
Data Quality and Exploration
Cleaning and Selecting Data
Combining Data Files
Deriving New Fields
Looking for Relationships Between Fields
Introduction to Modeling Options in IBM SPSS Modeler
Decision Tree Models
Model Assessment and Scoring

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access IBM SPSS Modeler Essentials by Keith McCormick, Jesus Salcedo, Bowen Wei in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Mining. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions