Hands-On Recommendation Systems with Python
eBook - ePub

Hands-On Recommendation Systems with Python

Start building powerful and personalized, recommendation engines with Python

Rounak Banik

Partager le livre
  1. 146 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Hands-On Recommendation Systems with Python

Start building powerful and personalized, recommendation engines with Python

Rounak Banik

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

With Hands-On Recommendation Systems with Python, learn the tools and techniques required in building various kinds of powerful recommendation systems (collaborative, knowledge and content based) and deploying them to the web

Key Features

  • Build industry-standard recommender systems
  • Only familiarity with Python is required
  • No need to wade through complicated machine learning theory to use this book

Book Description

Recommendation systems are at the heart of almost every internet business today; from Facebook to Netflix to Amazon. Providing good recommendations, whether it's friends, movies, or groceries, goes a long way in defining user experience and enticing your customers to use your platform.

This book shows you how to do just that. You will learn about the different kinds of recommenders used in the industry and see how to build them from scratch using Python. No need to wade through tons of machine learning theory—you'll get started with building and learning about recommenders as quickly as possible..

In this book, you will build an IMDB Top 250 clone, a content-based engine that works on movie metadata. You'll use collaborative filters to make use of customer behavior data, and a Hybrid Recommender that incorporates content based and collaborative filtering techniques

With this book, all you need to get started with building recommendation systems is a familiarity with Python, and by the time you're fnished, you will have a great grasp of how recommenders work and be in a strong position to apply the techniques that you will learn to your own problem domains.

What you will learn

  • Get to grips with the different kinds of recommender systems
  • Master data-wrangling techniques using the pandas library
  • Building an IMDB Top 250 Clone
  • Build a content based engine to recommend movies based on movie metadata
  • Employ data-mining techniques used in building recommenders
  • Build industry-standard collaborative filters using powerful algorithms
  • Building Hybrid Recommenders that incorporate content based and collaborative fltering

Who this book is for

If you are a Python developer and want to develop applications for social networking, news personalization or smart advertising, this is the book for you. Basic knowledge of machine learning techniques will be helpful, but not mandatory.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Hands-On Recommendation Systems with Python est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Hands-On Recommendation Systems with Python par Rounak Banik en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Computer Science et Artificial Intelligence (AI) & Semantics. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Année
2018
ISBN
9781788992534

Getting Started with Data Mining Techniques

In 2003, Linden, Smith, and York of Amazon.com published a paper entitled Item-to-Item Collaborative Filtering, which explained how product recommendations at Amazon work. Since then, this class of algorithmg has gone on to dominate the industry standard for recommendations. Every website or app with a sizeable user base, be it Netflix, Amazon, or Facebook, makes use of some form of collaborative filters to suggest items (which may be movies, products, or friends):
As described in the first chapter, collaborative filters try to leverage the power of the community to give reliable, relevant, and sometime, even surprising recommendations. If Alice and Bob largely like the same movies (say The Lion King, Aladdin, and Toy Story) and Alice also likes Finding Nemo, it is extremely likely that Bob, who hasn't watched Finding Nemo, will like it too.
We will be building powerful collaborative filters in the next chapter. However, before we do that, it is important that we have a good grasp of the underlying techniques, principles, and algorithms that go into building collaborative filters.
Therefore, in this chapter, we will cover the following topics:
  • Similarity measures: Given two items, how do we mathematically quantify how different or similar they are to each other? Similarity measures help us in answering this question.
    We have already made use of a similarity measure (the cosine score) while building our content recommendation engine. In this chapter, we will be looking at a few other popular similarity scores.
  • Dimensionality reduction: When building collaborative filters, we are usually dealing with millions of users rating millions of items. In such cases, our user and item vectors are going to be of a dimension in the order of millions. To improve performance, speed up calculations, and avoid the curse of dimensionality, it is often a good idea to reduce the number of dimensions considerably, while retaining most of the information. This section of the chapter will describe techniques that do just that.
  • Supervised learning: Supervised learning is a class of machine learning algorithm that makes use of label data to infer a mapping function that can then be used to predict the label (or class) of unlabeled data. We will be looking at some of the most popular supervised learning algorithms, such as support vector machines, logistic regression, decision trees, and ensembling.
  • Clustering: Clustering is a type of unsupervised learning where the algorithm tries to divide all the data points into a certain number of clusters. Therefore, without the use of a label dataset, the clustering algorithm is able to assign classes to all the unlabel points. In this section, we will be looking at k-means clustering, a simple but powerful algorithm popularly used in collaborative filters.
  • Evaluation methods and metrics: We will take a look at a few evaluation metrics that are used to gauge the performance of these algorithms. The metrics include accuracy, precision, and recall.
The topics covered in this chapter merit an entire textbook. Since this is a hands-on recommendation engine tutorial, we will not be delving too deeply into the functioning of most of the algorithms. Nor will we code them up from scratch. What we will do is gain an understanding of how and when they work, their advantages and disadvantages, and their easy-to-use implementations using the scikit-learn library.

Problem statement

Collaborative filtering algorithms try to solve the prediction problem (as described in the Chapter 1, Getting Started with Recommender Systems). In other words, we are given a matrix of i users and j items. The value in the ith row and the jth column (denoted by rij) denotes the rating given by user i to item j:
Matrix of i users and j items
Our job is to complete this matrix. In other words, we need to predict all the cells in the matrix that we have no data for. For example, in the preceding diagram, we are asked to predict whether user E will like the music player item. To accomplish this task, some ratings are available (such as User A liking the music player and video games) whereas others are not (for instance, we do not know whether Users C and D like video games).

Similarity measures

From the rating matrix in the previous section, we see that every user can be represented as a j-dimensional vector where the kth dimension denotes the rating given by that user to the kth item. For instance, let 1 denote a like, -1 denote a dislike, and 0 denote no rating. Therefore, user B can be represented as (0, 1, -1, -1). Similarly, every item can also be represented as an i-dimensional vector where the kth dimension denotes the rating given to that item by the kth user. The video games item is therefore represented as (1, -1, 0, 0, -1).
We have already computed a similarity score for like-dimensional vectors when we built our content-based recommendation engine. In this section, we will take a look at the other similarity measures and also revisit the cosine similarity score in the context of the other scores.

Euclidean distance

The Euclidean distance can be defined as the length of the line segment joining the two data points plotted on an n-dimensional Cartesian plane. For example, consider two points plotted in a 2D plane:
Euclidean distance
The distance, d, between the two points gives us the Euclidean distance and its formula in the 2D space is given in the preceding graph.
More generally, consider two n-dimensional points (or vectors):
  • v1: (q1, q2,...., qn)
  • v2: (r1, r2,....., rn)
Then, the Euclidean score is mathematically defined as:
Euclidean scores can take any value between 0 and infinity. The lower the Euclidean score (or distance), the more similar the two vectors are to each other. Let's now define a simple function using NumPy, which allows us to compute the Euclidean distance between two n-dimensional vectors using the aforementioned formula:
#Function to compute Euclidean Distance. 
def euclidean(v1, v2):

#Convert 1-D Python lists to numpy vectors
v1 = np.array(v1)
v2 = np.array(v2)

#Compute vector which is the element wise square of the difference
diff = np.power(np.array(v1)- np.array(v2), 2)

#Perform summation of the elements of the above vector
sigma_val = np.sum(diff)

#Compute square root and return final Euclidean score
euclid_score = np.sqrt(sigma_val)

return euclid_score
Next, let's define three users who have rat...

Table des matiĂšres