IBM SPSS Modeler Essentials
eBook - ePub

IBM SPSS Modeler Essentials

Keith McCormick, Jesus Salcedo, Bowen Wei

Share book
  1. 238 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

IBM SPSS Modeler Essentials

Keith McCormick, Jesus Salcedo, Bowen Wei

Book details
Book preview
Table of contents
Citations

About This Book

Get to grips with the fundamentals of data mining and predictive analytics with IBM SPSS Modeler

Key Features

  • Get up–and-running with IBM SPSS Modeler without going into too much depth.
  • Identify interesting relationships within your data and build effective data mining and predictive analytics solutions
  • A quick, easy–to-follow guide to give you a fundamental understanding of SPSS Modeler, written by the best in the business

Book Description

IBM SPSS Modeler allows users to quickly and efficiently use predictive analytics and gain insights from your data. With almost 25 years of history, Modeler is the most established and comprehensive Data Mining workbench available. Since it is popular in corporate settings, widely available in university settings, and highly compatible with all the latest technologies, it is the perfect way to start your Data Science and Machine Learning journey.

This book takes a detailed, step-by-step approach to introducing data mining using the de facto standard process, CRISP-DM, and Modeler's easy to learn "visual programming" style. You will learn how to read data into Modeler, assess data quality, prepare your data for modeling, find interesting patterns and relationships within your data, and export your predictions. Using a single case study throughout, this intentionally short and focused book sticks to the essentials. The authors have drawn upon their decades of teaching thousands of new users, to choose those aspects of Modeler that you should learn first, so that you get off to a good start using proven best practices.

This book provides an overview of various popular data modeling techniques and presents a detailed case study of how to use CHAID, a decision tree model. Assessing a model's performance is as important as building it; this book will also show you how to do that. Finally, you will see how you can score new data and export your predictions. By the end of this book, you will have a firm understanding of the basics of data mining and how to effectively use Modeler to build predictive models.

What you will learn

  • Understand the basics of data mining and familiarize yourself with Modeler's visual programming interface
  • Import data into Modeler and learn how to properly declare metadata
  • Obtain summary statistics and audit the quality of your data
  • Prepare data for modeling by selecting and sorting cases, identifying and removing duplicates, combining data files, and modifying and creating fields
  • Assess simple relationships using various statistical and graphing techniques
  • Get an overview of the different types of models available in Modeler
  • Build a decision tree model and assess its results
  • Score new data and export predictions

Who this book is for

This book is ideal for those who are new to SPSS Modeler and want to start using it as quickly as possible, without going into too much detail. An understanding of basic data mining concepts will be helpful, to get the best out of the book.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is IBM SPSS Modeler Essentials an online PDF/ePUB?
Yes, you can access IBM SPSS Modeler Essentials by Keith McCormick, Jesus Salcedo, Bowen Wei in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Modelling & Design. We have over one million books available in our catalogue for you to explore.

Information

Year
2017
ISBN
9781788296823
Edition
1

Model Assessment and Scoring

In the previous chapter we built a model. In this chapter, we are going to discuss different ways of assessing and improving the results of a model. In addition, you will also begin to learn how to use your models in the real world. Specifically, we will cover the following topics:
  • Contrasting model assessment with the Evaluation phase
  • Model assessment using the Analysis node
  • Modifying CHAID settings
  • Model comparison using the Analysis node
  • Model assessment and comparison using the Evaluation node
  • Scoring new data
  • Exporting predictions

Contrasting model assessment with the Evaluation phase

As we briefly discussed in Chapter 1, Introduction to Data Mining and Predictive Analytics, model assessment (a modeling phase task) is quite different from the Evaluation phase. Some of the same tools can apply to both, but the stage of the project and the thought process is quite different. During model assessment, you are potentially comparing a large number of models. You may even try dozens of variations of algorithms, settings, and modifications to the data.
Therefore, you need easy, objective criteria on which to rank these models. Our colleagues and management simply will not have the time to be brought in to judge the efficacy of dozens of models so we need to narrow it down to just a couple of models before the Evaluation phase begins. Tom Khabaza, one of the original authors of CRISP-DM has written the Nine Laws of Data Mining (http://khabaza.codimension.net/index_files/9laws.htm) and the 8th Law of Data Mining is the Value Law, which states:
The value of data mining results is not determined by the accuracy or stability of predictive models.
Well, in this lesson, particularly in the Analysis node section, we are going to focus on the accuracy and stability of a model. So, what is this law getting at? An extended quote will help make the distinction:
Accuracy and stability are useful measures of how well a predictive model makes its predictions. Accuracy means how often the predictions are correct (where they are truly predictions) and stability means how much (or rather how little) the predictions would change if the data used to create the model were a different sample from the same population. Given the central role of the concept of prediction in data mining, the accuracy and stability of a predictive model might be expected to determine its value, but this is not the case.
The value of a predictive model arises in two ways:
  1. The model's predictions drive improved (more effective) action.
  2. The model delivers insight (new knowledge), which leads to improved strategy.
So the important thing here is that when we move beyond model assessment and into evaluation, we have to shift our focus from accuracy and stability to action and strategy. We need an intervention strategy that uses our predictions to drive action. The Evaluation phase will be to measure, as specifically as possible, how the improved actions produced measurably better results. Better is usually measured in dollar terms, but not always. So, we have to return our project to the language of the business and compare our performance to the specific goals laid out in the Business Understanding phase.

Model assessment using the Analysis node

When a model seems satisfactory based on performance, fields included, and the relationships between the predictors and the target, the next step is model assessment. Formally, model evaluation is the assessment of how a model performs on unseen data. Modeler makes this easy because of the Partition field.
We previously used the Partition node to split the data file into Testing and Training partitions. In the previous chapter, we were careful when studying the model not to use the Testing partition. Doing so would compromise model testing because we would learn how well the model performed on the unseen data. In this chapter, we will use the Analysis and Evaluation nodes to further assess our model.
The Analysis node allows you to evaluate the accuracy of a model, and it organizes output by the Partition field values. Analysis nodes perform various comparisons between predicted values and actual values for one or more generated models. The Analysis node is contained in the Output palette:
  1. Open the Assessment stream.
  2. Add an Analysis node from the Output palette.
  3. Connect the generated CHAID model to the Analysis node.
  4. Edit the Analysis node:
The Analysis node provides several types of output. Coincidence matrices (for symbolic targets) are cross tabulations between the predicted and actual values. The Confidence figures (if available) option provides summary information for models that produce confidence values.
By default, the Analysis node will organize output by the Partition field. We can also ask that all the output be broken down by one or more categorical fields:
  1. Click Coincidence matrices (for symbolic targets).
  2. Click Run:
The table shows the overall accuracy of the model in the Training and Testing partitions. When we were examining the CHAID model, we only saw information on the Training dataset and rule specify accuracy; here we are provided with the overall model accuracy.
As we can see from the output, the overall accuracy of the CHAID model on the Training data is 92.42%. Whether this level of accuracy is acceptable will depend on many factors. More important is how well the model performed on the unseen Testing dataset. The accuracy of the CHAID model for the Testing group is the fundamental overall test of the model. If the accuracy on the Testing data is acceptable, then we can deem the model validated.
For this data , the Testing dataset accuracy is 92.27%. The typical outcome when Testing data is passed through a model node is that the accuracy drops by some amount. If accuracy drops or changes by a large amount (about 5%), it suggests that the model overfit the Training data or that the validation data differed in some systematic way from the Training data (although the random sampling done by the Partition node minimizes the chance of this). If accuracy drops or changes by only a small amount, it provides evidence that the model will work well in the future; that is, we have a reliable model that will generalize to new data. When this is favorable, the model is described as being stable. The small change of 0.15% in accuracy from the Training to the Testing data indicates that the CHAID model is validated.
In summary, we are focused on two questions:
  • Is the test accuracy value sufficiently high
  • Is the stability sufficient—as revealed by a small difference between train and test accuracy
The Coincidence Matrix table will be of special interest when there are target categories in which we ...

Table of contents