Practical Predictive Analytics
eBook - ePub

Practical Predictive Analytics

Ralph Winters

  1. 576 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Practical Predictive Analytics

Ralph Winters

Book details
Book preview
Table of contents
Citations

About This Book

Make sense of your data and predict the unpredictableAbout This Book• A unique book that centers around develop six key practical skills needed to develop and implement predictive analytics• Apply the principles and techniques of predictive analytics to effectively interpret big data• Solve real-world analytical problems with the help of practical case studies and real-world scenarios taken from the world of healthcare, marketing, and other business domainsWho This Book Is ForThis book is for those with a mathematical/statistics background who wish to understand the concepts, techniques, and implementation of predictive analytics to resolve complex analytical issues. Basic familiarity with a programming language of R is expected.What You Will Learn• Master the core predictive analytics algorithm which are used today in business• Learn to implement the six steps for a successful analytics project• Classify the right algorithm for your requirements• Use and apply predictive analytics to research problems in healthcare• Implement predictive analytics to retain and acquire your customers• Use text mining to understand unstructured data• Develop models on your own PC or in Spark/Hadoop environments• Implement predictive analytics products for customersIn DetailThis is the go-to book for anyone interested in the steps needed to develop predictive analytics solutions with examples from the world of marketing, healthcare, and retail. We'll get started with a brief history of predictive analytics and learn about different roles and functions people play within a predictive analytics project. Then, we will learn about various ways of installing R along with their pros and cons, combined with a step-by-step installation of RStudio, and a description of the best practices for organizing your projects.On completing the installation, we will begin to acquire the skills necessary to input, clean, and prepare your data for modeling. We will learn the six specific steps needed to implement and successfully deploy a predictive model starting from asking the right questions through model development and ending with deploying your predictive model into production. We will learn why collaboration is important and how agile iterative modeling cycles can increase your chances of developing and deploying the best successful model.We will continue your journey in the cloud by extending your skill set by learning about Databricks and SparkR, which allow you to develop predictive models on vast gigabytes of data.Style and ApproachThis book takes a practical hands-on approach wherein the algorithms will be explained with the help of real-world use cases. It is written in a well-researched academic style which is a great mix of theoretical and practical information. Code examples are supplied for both theoretical concepts as well as for the case studies. Key references and summaries will be provided at the end of each chapter so that you can explore those topics on their own.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Practical Predictive Analytics an online PDF/ePUB?
Yes, you can access Practical Predictive Analytics by Ralph Winters in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Year
2017
ISBN
9781785880469
Edition
1

Using Market Basket Analysis as a Recommender Engine

"It's not wise to violate the rules until you know how to observe them."
- T.S. Eliot
In this chapter, we will cover the following topics:
  • Market basket analysis using the arules package
  • Data transformation and cleaning techniques using semi-structured market basket transaction data
  • Learn how to transform transaction objects into dataframes
  • Use cluster analysis for prediction using the flexclus package
  • Utilize some text mining using RTextTools and tm packages

What is market basket analysis?

If you have survived the last chapter, you will now be introduced to the world of market basket analysis (MBA). Market basket analysis (also sometimes called affinity analysis), is a predictive analytics technique that is used heavily in the retail industry in order to identify baskets of items that are purchased together. The typical use case for this is the supermarket shopping cart in which a shopper would typically purchase an assortment of items such as milk, bread, cheese, and so on, and the algorithm will predict how purchasing certain items together will affect the purchase of other items. It is one of those methods that retailers use to know to start sending you coupons and emails for things that you didn't know you needed!
One often quoted example of MBA is the relationship between diapers and beer:
"One super market chain discovered in its analysis that customers that bought diapers often bought beer as well, have put the diapers close to beer coolers, and their sales increased dramatically"
- http://en.wikipedia.org/wiki/Market_basket
However, it is not only restricted to the retail industry. MBA can be used in the insurance industry to look at the various products that an insured person currently has, such as a car, home, and so on, and suggest other possible products such as life, disability, or investment products.
MBA is generally considered an unsupervised learning algorithm, in that target variables are usually not specified. However, as you will see later it is possible to refine the association rules, so that specific items can be specified as target variables.
MBA is also considered a type of recommender engine in which purchases of a set of items imply the purchase of others. Certainly, MBA and other recommender engines can share the same types of input data. However, MBA was developed before the advent of collaborative filtering techniques, as pioneered by Amazon, and is more suggestive of the integration of collected web data, while MBA is more associated with the RFID bar coding technologies found in scanners. However, in both cases suggestions of future purchases based on past purchases is the goal.

Examining the groceries transaction file

Critical to the understanding of MBA are the concepts of support, confidence, and lift. These are the measures that evaluated the goodness of fit for a set of association rules. You will also learn some specific definitions that are used in MBA, such as consequence, antecedent, and itemsets.
To introduce these concepts, we will first illustrate these terms through a very simplistic example. We will use only the first 10 transactions contained in the Groceries transaction file, which is contained in the arules package:
 library(arules) 
After the arules library is loaded, you can see a short description of the Groceries dataset by entering ?Groceries at the command line. The following description appears in the help window:
"The Groceries data set contains 1 month (30 days) of real-world point-of-sale transaction data from a typical local grocery outlet. The data set contains 9835 transactions and the items are aggregated to 169 categories".
For more information about how this dataset was collected, refer to the original publication (Michael Hahsler, 2006).
Once the arules package is loaded, load the Groceries dataset into memory:
 data(Groceries) 

Format of the groceries transaction Files

The Groceries is a transaction class object, not a dataframe. This R object represents transaction data used for mining item sets or rules. Logically, it is organized as a list of grocery receipts along with the items that were purchased together. Every individual line is referred to as a transaction and every column of the transaction represents a specific item purchased.
For example, here are three transactions consisting of a varying number of purchases:
Transaction 1
Milk
Cereal
Transaction 2
Beef
Transaction 3
Butter
Sugar
Cream
However, a transaction object is not physically organized in strict database table format. It is in a special R object format known as transactions.
You can also run a summary(Groceries), which will give you some high level information about the structure of the transactions file, as well as the most frequent individual items found in the market basket:
You will see later on how we can convert dataframes to transaction objects, and vice versa. For more information about this object, you can enter ?transactionInfo at the console line.
To see an example of a simple market basket for the Groceries file, run the following code:
Examine the output produced from the inspect() function, which prints transactions 10-19 from the market basket. Note that each transaction can consist of a single transaction ID, along with a list of purchased items. It can be a single purchased item (Transaction #4, Beef purchased by itself), or consist of multiple purchases (Transaction #1, Milk and Cereal):
 inspect(Groceries[10:19]) 
This is the following output:
 items 
[1] {whole milk,cereals}
[2] {tropical fruit,other vegetables,white bread,bottled water,chocolate}
[3] {citrus fruit,tropical fruit,whole
milk,butter,curd,yogurt,flour,bottled water,dishes}
[4] {beef}
[5] {frankfurter,rolls/buns,soda}
[6] {chicken,tropical fruit}
[7] {butter,sugar,fruit/vegetable juice,newspapers}
[8] {fruit/vegetable juice}
[9] {packaged fruit/vegetables}
[10]{chocolate}

The sample market basket

Each transaction numbered 1-10 listed previously represents a basket of items purchased by a shopper. These are typically all items that are associated with a particular transaction or invoice. Each basket is enclosed within braces {}, and is referred to as an itemset. An itemset is a group of items that occur together.
Market basket algorithms construct rules in the fo...

Table of contents

Citation styles for Practical Predictive Analytics

APA 6 Citation

Winters, R. (2017). Practical Predictive Analytics (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/527033/practical-predictive-analytics-pdf (Original work published 2017)

Chicago Citation

Winters, Ralph. (2017) 2017. Practical Predictive Analytics. 1st ed. Packt Publishing. https://www.perlego.com/book/527033/practical-predictive-analytics-pdf.

Harvard Citation

Winters, R. (2017) Practical Predictive Analytics. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/527033/practical-predictive-analytics-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Winters, Ralph. Practical Predictive Analytics. 1st ed. Packt Publishing, 2017. Web. 14 Oct. 2022.