eBook - ePub

Hands-On Recommendation Systems with Python

Name: Hands-On Recommendation Systems with Python
Author: Rounak Banik

Start building powerful and personalized, recommendation engines with Python

Rounak Banik

Share book

146 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Hands-On Recommendation Systems with Python

Start building powerful and personalized, recommendation engines with Python

Rounak Banik

Book details

Book preview

Table of contents

Citations

About This Book

With Hands-On Recommendation Systems with Python, learn the tools and techniques required in building various kinds of powerful recommendation systems (collaborative, knowledge and content based) and deploying them to the web

Key Features

Build industry-standard recommender systems
Only familiarity with Python is required
No need to wade through complicated machine learning theory to use this book

Book Description

Recommendation systems are at the heart of almost every internet business today; from Facebook to Netflix to Amazon. Providing good recommendations, whether it's friends, movies, or groceries, goes a long way in defining user experience and enticing your customers to use your platform.

This book shows you how to do just that. You will learn about the different kinds of recommenders used in the industry and see how to build them from scratch using Python. No need to wade through tons of machine learning theory—you'll get started with building and learning about recommenders as quickly as possible..

In this book, you will build an IMDB Top 250 clone, a content-based engine that works on movie metadata. You'll use collaborative filters to make use of customer behavior data, and a Hybrid Recommender that incorporates content based and collaborative filtering techniques

With this book, all you need to get started with building recommendation systems is a familiarity with Python, and by the time you're fnished, you will have a great grasp of how recommenders work and be in a strong position to apply the techniques that you will learn to your own problem domains.

What you will learn

Get to grips with the different kinds of recommender systems
Master data-wrangling techniques using the pandas library
Building an IMDB Top 250 Clone
Build a content based engine to recommend movies based on movie metadata
Employ data-mining techniques used in building recommenders
Build industry-standard collaborative filters using powerful algorithms
Building Hybrid Recommenders that incorporate content based and collaborative fltering

Who this book is for

If you are a Python developer and want to develop applications for social networking, news personalization or smart advertising, this is the book for you. Basic knowledge of machine learning techniques will be helpful, but not mandatory.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Hands-On Recommendation Systems with Python an online PDF/ePUB?

Yes, you can access Hands-On Recommendation Systems with Python by Rounak Banik in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2018

ISBN

9781788992534

Edition

Topic

Computer Science

Subtopic

Artificial Intelligence (AI) & Semantics

Index

Computer Science

Getting Started with Data Mining Techniques

In 2003, Linden, Smith, and York of Amazon.com published a paper entitled Item-to-Item Collaborative Filtering, which explained how product recommendations at Amazon work. Since then, this class of algorithmg has gone on to dominate the industry standard for recommendations. Every website or app with a sizeable user base, be it Netflix, Amazon, or Facebook, makes use of some form of collaborative filters to suggest items (which may be movies, products, or friends):

As described in the first chapter, collaborative filters try to leverage the power of the community to give reliable, relevant, and sometime, even surprising recommendations. If Alice and Bob largely like the same movies (say The Lion King, Aladdin, and Toy Story) and Alice also likes Finding Nemo, it is extremely likely that Bob, who hasn't watched Finding Nemo, will like it too.

We will be building powerful collaborative filters in the next chapter. However, before we do that, it is important that we have a good grasp of the underlying techniques, principles, and algorithms that go into building collaborative filters.

Therefore, in this chapter, we will cover the following topics:

Similarity measures: Given two items, how do we mathematically quantify how different or similar they are to each other? Similarity measures help us in answering this question.
We have already made use of a similarity measure (the cosine score) while building our content recommendation engine. In this chapter, we will be looking at a few other popular similarity scores.
Dimensionality reduction: When building collaborative filters, we are usually dealing with millions of users rating millions of items. In such cases, our user and item vectors are going to be of a dimension in the order of millions. To improve performance, speed up calculations, and avoid the curse of dimensionality, it is often a good idea to reduce the number of dimensions considerably, while retaining most of the information. This section of the chapter will describe techniques that do just that.
Supervised learning: Supervised learning is a class of machine learning algorithm that makes use of label data to infer a mapping function that can then be used to predict the label (or class) of unlabeled data. We will be looking at some of the most popular supervised learning algorithms, such as support vector machines, logistic regression, decision trees, and ensembling.
Clustering: Clustering is a type of unsupervised learning where the algorithm tries to divide all the data points into a certain number of clusters. Therefore, without the use of a label dataset, the clustering algorithm is able to assign classes to all the unlabel points. In this section, we will be looking at k-means clustering, a simple but powerful algorithm popularly used in collaborative filters.
Evaluation methods and metrics: We will take a look at a few evaluation metrics that are used to gauge the performance of these algorithms. The metrics include accuracy, precision, and recall.

The topics covered in this chapter merit an entire textbook. Since this is a hands-on recommendation engine tutorial, we will not be delving too deeply into the functioning of most of the algorithms. Nor will we code them up from scratch. What we will do is gain an understanding of how and when they work, their advantages and disadvantages, and their easy-to-use implementations using the scikit-learn library.

Problem statement

Collaborative filtering algorithms try to solve the prediction problem (as described in the Chapter 1, Getting Started with Recommender Systems). In other words, we are given a matrix of i users and j items. The value in the ith row and the jth column (denoted by rij) denotes the rating given by user i to item j:

Matrix of i users and j items

Our job is to complete this matrix. In other words, we need to predict all the cells in the matrix that we have no data for. For example, in the preceding diagram, we are asked to predict whether user E will like the music player item. To accomplish this task, some ratings are available (such as User A liking the music player and video games) whereas others are not (for instance, we do not know whether Users C and D like video games).

Similarity measures

From the rating matrix in the previous section, we see that every user can be represented as a j-dimensional vector where the kth dimension denotes the rating given by that user to the kth item. For instance, let 1 denote a like, -1 denote a dislike, and 0 denote no rating. Therefore, user B can be represented as (0, 1, -1, -1). Similarly, every item can also be represented as an i-dimensional vector where the kth dimension denotes the rating given to that item by the kth user. The video games item is therefore represented as (1, -1, 0, 0, -1).

We have already computed a similarity score for like-dimensional vectors when we built our content-based recommendation engine. In this section, we will take a look at the other similarity measures and also revisit the cosine similarity score in the context of the other scores.

Euclidean distance

The Euclidean distance can be defined as the length of the line segment joining the two data points plotted on an n-dimensional Cartesian plane. For example, consider two points plotted in a 2D plane:

Euclidean distance

The distance, d, between the two points gives us the Euclidean distance and its formula in the 2D space is given in the preceding graph.

More generally, consider two n-dimensional points (or vectors):

v1: (q1, q2,...., qn)
v2: (r1, r2,....., rn)

Then, the Euclidean score is mathematically defined as:

Euclidean scores can take any value between 0 and infinity. The lower the Euclidean score (or distance), the more similar the two vectors are to each other. Let's now define a simple function using NumPy, which allows us to compute the Euclidean distance between two n-dimensional vectors using the aforementioned formula:

#Function to compute Euclidean Distance. 
def euclidean(v1, v2):
 
 #Convert 1-D Python lists to numpy vectors
 v1 = np.array(v1)
 v2 = np.array(v2)
 
 #Compute vector which is the element wise square of the difference
 diff = np.power(np.array(v1)- np.array(v2), 2)
 
 #Perform summation of the elements of the above vector
 sigma_val = np.sum(diff)
 
 #Compute square root and return final Euclidean score
 euclid_score = np.sqrt(sigma_val)
 
 return euclid_score

Next, let's define three users who have rat...