eBook - ePub

Mastering Machine Learning with R - Second Edition

Name: Mastering Machine Learning with R - Second Edition
ISBN: 9781787284487

Cory Lesmeister,

420 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Mastering Machine Learning with R - Second Edition

Cory Lesmeister,

About this book

Master machine learning techniques with R to deliver insights in complex projectsAbout This Book• Understand and apply machine learning methods using an extensive set of R packages such as XGBOOST• Understand the benefits and potential pitfalls of using machine learning methods such as Multi-Class Classification and Unsupervised Learning• Implement advanced concepts in machine learning with this example-rich guideWho This Book Is ForThis book is for data science professionals, data analysts, or anyone with a working knowledge of machine learning, with R who now want to take their skills to the next level and become an expert in the field.What You Will Learn• Gain deep insights into the application of machine learning tools in the industry• Manipulate data in R efficiently to prepare it for analysis• Master the skill of recognizing techniques for effective visualization of data• Understand why and how to create test and training data sets for analysis• Master fundamental learning methods such as linear and logistic regression• Comprehend advanced learning methods such as support vector machines• Learn how to use R in a cloud service such as AmazonIn DetailThis book will teach you advanced techniques in machine learning with the latest code in R 3.3.2. You will delve into statistical learning theory and supervised learning; design efficient algorithms; learn about creating Recommendation Engines; use multi-class classification and deep learning; and more.You will explore, in depth, topics such as data mining, classification, clustering, regression, predictive modeling, anomaly detection, boosted trees with XGBOOST, and more. More than just knowing the outcome, you'll understand how these concepts work and what they do.With a slow learning curve on topics such as neural networks, you will explore deep learning, and more. By the end of this book, you will be able to perform machine learning with R in the cloud using AWS in various scenarios with different datasets.Style and approachThe book delivers practical and real-world solutions to problems and a variety of tasks such as complex recommendation systems. By the end of this book, you will have gained expertise in performing R machine learning and will be able to build complex machine learning projects using R and its packages.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Packt Publishing

Year

2017

eBook ISBN

9781787284487

Edition

Topic

Computer Science

Subtopic

Artificial Intelligence (AI) & Semantics

Index

Computer Science

Market Basket Analysis, Recommendation Engines, and Sequential Analysis

It's much easier to double your business by doubling your conversion rate than by doubling your traffic.
- Jeff Eisenberg, CEO of BuyerLegends.com

I don't see smiles on the faces of people at Whole Foods.
- Warren Buffett

One would have to live on the dark side of the moon in order to not observe each and every day the results of the techniques that we are about to discuss in this chapter. If you visit www.amazon.com, watch movies on www.netflix.com, or visit any retail website, you will be exposed to terms such as "related products", "because you watched...", "customers who bought x also bought y", or "recommended for you", at every twist and turn. With large volumes of historical real-time or near real-time information, retailers utilize the algorithms discussed here to attempt to increase both the buyer's quantity and value of their purchases.

The techniques to do this can be broken down into two categories: association rules and recommendation engines. Association rule analysis is commonly referred to as market basket analysis as one is trying to understand what items are purchased together. With recommendation engines, the goal is to provide a customer with other items that they will enjoy based on how they have rated previously viewed or purchased items.
Another technique a business can use is to understand the sequence in which you purchase or use their products and services. This is called sequential analysis. A very common implementation of this methodology is to understand how customers click through various webpages and/or links.

In the examples coming up, we will endeavor to explore how R can be used to develop such algorithms. We will not cover their implementation, as that is outside the scope of this book. We will begin with a market basket analysis of purchasing habits at a grocery store, then dig into building a recommendation engine on website reviews, and finally, analyze the sequence of web pages.

An overview of a market basket analysis

Market basket analysis is a data mining technique that has the purpose of finding the optimal combination of products or services and allows marketers to exploit this knowledge to provide recommendations, optimize product placement, or develop marketing programs that take advantage of cross-selling. In short, the idea is to identify which items go well together, and profit from it.

You can think of the results of the analysis as an if...then statement. If a customer buys an airplane ticket, then there is a 46 percent probability that they will buy a hotel room, and if they go on to buy a hotel room, then there is a 33 percent probability that they will rent a car.

However, it is not just for sales and marketing. It is also being used in fraud detection and healthcare; for example, if a patient undergoes treatment A, then there is a 26 percent probability that they might exhibit symptom X. Before going into the details, we should have a look at some terminology, as it will be used in the example:

Itemset: This is a collection of one or more items in the dataset.
Support: This is the proportion of the transactions in the data that contain an itemset of interest.
Confidence: This is the conditional probability that if a person purchases or does x, they will purchase or do y; the act of doing x is referred to as the antecedent or Left-Hand Side (LHS), and y is the consequence or Right-Hand Side (RHS).

Lift: This is the ratio of the support of x occurring together with y divided by the probability that x and y occur if they are independent. It is the confidence divided by the probability of x times the probability of y; for example, say that we have the probability of x and y occurring together as 10 percent and the probability of x is 20 percent and y is 30 percent, then the lift would be 10 percent (20 percent times 30 percent) or 16.67 percent.

The package in R that you can use to perform a market basket analysis is arules: Mining Association Rules and Frequent Itemsets. The package offers two different methods of finding rules. Why would one have different methods? Quite simply, if you have massive datasets, it can become computationally expensive to examine all the possible combinations of the products. The algorithms that the package supports are apriori and ECLAT. There are other algorithms to conduct a market basket analysis, but apriori is used most frequently, and so, that will be our focus.

With apriori, the principle is that, if an itemset is frequent, then all of its subsets must also be frequent. A minimum frequency (support) is determined by the analyst prior to executing the algorithm, and once established, the algorithm will run as follows:

Let k=1 (the number of items)
Generate itemsets of a length that are equal to or greater than the specified support
Iterate k + (1...n), pruning those that are infrequent (less than the support)
Stop the iteration when no new frequent itemsets are identified

Once you have an ordered summary of the most frequent itemsets, you can continue the analysis process by examining the confidence and lift in order to identify the associations of interest.

Business understanding

For our business case, we will focus on identifying the association rules for a grocery store. The dataset will be from the arules package and is called Groceries. This dataset consists of actual transactions over a 30-day period from a real-world grocery store and consists of 9,835 different purchases. All the items purchased are put into one of 169 categories, for example, bread, wine, meat, and so on.
Let's say that we are a start-up microbrewery trying to make a headway in this grocery outlet and want to develop an understanding of what potential customers will purchase along with beer. This knowledge may just help us in identifying the right product placement within the store or support a cross-selling campaign.

Data understanding and preparation

For this anal...

Title Page
Copyright
Credits
About the Author
About the Reviewers
Packt Upsell
Customer Feedback
Preface
A Process for Success
Linear Regression - The Blocking and Tackling of Machine Learning
Logistic Regression and Discriminant Analysis
Advanced Feature Selection in Linear Models
More Classification Techniques - K-Nearest Neighbors and Support Vector Machines
Classification and Regression Trees
Neural Networks and Deep Learning
Cluster Analysis
Principal Components Analysis
Market Basket Analysis, Recommendation Engines, and Sequential Analysis
Creating Ensembles and Multiclass Classification
Time Series and Causality
Text Mining
R on the Cloud
R Fundamentals
Sources

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Mastering Machine Learning with R - Second Edition by Cory Lesmeister in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions