Applied Unsupervised Learning with Python
eBook - ePub

Applied Unsupervised Learning with Python

Discover hidden patterns and relationships in unstructured data with Python

  1. 482 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Applied Unsupervised Learning with Python

Discover hidden patterns and relationships in unstructured data with Python

About this book

Design clever algorithms that can uncover interesting structures and hidden relationships in unstructured, unlabeled data

Key Features

  • Learn how to select the most suitable Python library to solve your problem
  • Compare k-Nearest Neighbor (k-NN) and non-parametric methods and decide when to use them
  • Delve into the applications of neural networks using real-world datasets

Book Description

Unsupervised learning is a useful and practical solution in situations where labeled data is not available.

Applied Unsupervised Learning with Python guides you on the best practices for using unsupervised learning techniques in tandem with Python libraries and extracting meaningful information from unstructured data. The course begins by explaining how basic clustering works to find similar data points in a set. Once you are well versed with the k-means algorithm and how it operates, you'll learn what dimensionality reduction is and where to apply it. As you progress, you'll learn various neural network techniques and how they can improve your model. While studying the applications of unsupervised learning, you will also understand how to mine topics that are trending on Twitter and Facebook and build a news recommendation engine for users. You will complete the course by challenging yourself through various interesting activities such as performing a Market Basket Analysis and identifying relationships between different merchandises.

By the end of this course, you will have the skills you need to confidently build your own models using Python.

What you will learn

  • Understand the basics and importance of clustering
  • Build k-means, hierarchical, and DBSCAN clustering algorithms from scratch with built-in packages
  • Explore dimensionality reduction and its applications
  • Use scikit-learn (sklearn) to implement and analyse principal component analysis (PCA)on the Iris dataset
  • Employ Keras to build autoencoder models for the CIFAR-10 dataset
  • Apply the Apriori algorithm with machine learning extensions (Mlxtend) to study transaction data

Who this book is for

This course is designed for developers, data scientists, and machine learning enthusiasts who are interested in unsupervised learning. Some familiarity with Python programming along with basic knowledge of mathematical concepts including exponents, square roots, means, and medians will be beneficial.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Applied Unsupervised Learning with Python by Benjamin Johnston, Aaron Jones, Christopher Kruger in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Visualisation. We have over one million books available in our catalogue for you to explore.

Information

Chapter 1

Introduction to Clustering

Learning Objectives

By the end of this chapter, you will be able to:
  • Distinguish between supervised learning and unsupervised learning
  • Explain the concept of clustering
  • Implement k-means clustering algorithms using built-in Python packages
  • Calculate the Silhouette Score for your data
In this chapter, we will have a look at the concept of clustering.

Introduction

Have you ever been asked to take a look at some data and come up empty handed? Maybe you were not familiar with the dataset, or maybe you didn't even know where to start. This may have been extremely frustrating, and even embarrassing, depending on who asked you to take care of the task.
You are not alone, and, interestingly enough, there are many times the data itself is simply too confusing to be made sense of. As you try and figure out what all those numbers in your spreadsheet mean, you're most likely mimicking what many unsupervised algorithms do when they try to find meaning in data. The reality is that many datasets in the real world don't have any rhyme or reason to them. You will be tasked with analyzing them with little background preparation. Don't fret, however – this book will prepare you so that you'll never be frustrated again when dealing with data exploration tasks.
For this book, we have developed some best-in-class content to help you understand how unsupervised algorithms work and where to use them. We'll cover some of the foundations of finding clusters in your data, how to reduce the size of your data so it's easier to understand, and how each of these sides of unsupervised learning can be applied in the real world. We hope you will come away from this book with a strong real-world understanding of unsupervised learning, the problems that it can solve, and those it cannot.
Thanks for joining us and we hope you enjoy the ride!

Unsupervised Learning versus Supervised Learning

Unsupervised learning is one of the most exciting areas of development in machine learning today. If you have explored machine learning bookwork before, you are probably familiar with the common breakout of problems in either supervised or unsupervised learning. Supervised learning encompasses the problem set of having a labeled dataset that can be used to either classify (for example, predicting smokers and non-smokers if you're looking at a lung health dataset) or fit a regression line on (for example, predicting the sale price of a home based on how many bedrooms it has). This model most closely mirrors an intuitive human approach to learning.
If you wanted to learn how to not burn your food with a basic understanding of cooking, you could build a dataset by putting your food on the burner and seeing how long it takes (input) for your food to burn (output). Eventually, as you continue to burn your food, you will build a mental model of when burning will occur and avoid it in the future. Development in supervised learning was once fast-paced and valuable, but it has since simmered down in recent years – many of the obstacles to knowing your data have already been tackled:
Figure 1.1: Differences between unsupervised and supervised learning
Figure 1.1: Differences between unsupervised and supervised learning
Conversely, unsupervised learning encompasses the problem set of having a tremendous amount of data that is unlabeled. Labeled data, in this case, would be data that has a supplied "target" outcome that you are trying to find the correlation to with supplied data (you know that you are looking for whether your food was burned in the preceding example). Unlabeled data is when you do not know what the "target" outcome is, and you only have supplied input data.
Building upon the previous example, imagine you were just dropped on planet Earth with zero knowledge of how cooking works. You are given 100 days, a stove, and a fridge full of food without any instructions on what to do. Your initial exploration of a kitchen could go in infinite directions – on day 10, you may finally learn how to open the fridge; on day 30, you may learn that food can go on the stove; and after many more days, you may unwittingly make an edible meal. As you can see, trying to find meaning in a kitchen devoid of adequate informational structure leads to very noisy data that is completely irrelevant to actually preparing a meal.
Unsupervised learning can be an answer to this problem. By looking back at your 100 days of data, clustering can be used to find patterns of similar days where a meal was produced, and you can easily review what you did on those days. However, unsupervised learning isn't a magical answer –simply finding clusters can be just as likely to help you to find pockets of similar yet ultimately useless data.
This challenge is what makes unsupervised learning so exciting. How can we find smarter techniques to speed up the process of finding clusters of information that are beneficial to our end goals?

Clustering

Being able to find groups of similar data that exist in your dataset can be extremely valuable if you are trying to find its underlying meaning. If you were a store owner and you wanted to understand which customers are more valuable without a set idea of what valuable is, clustering would be a great place to start to find patterns in your data. You may have a few high-level ideas of what denotes a valuable customer, but you aren't entirely sure in the face of a large mountain of available data. Through clustering you can find commonalities among similar groups in your data. If you look more deeply at a cluster of similar people, you may learn that everyone in that group visits your website for longer periods of time than others. This can show you what the value is and also provides a clean sample size for future supervised learning experiments.

Identifying Clusters

The following figure shows two scatterplots:
Figures 1.2: Two distinct scatterplots
Figures 1.2: Two distinct scatterplots
The following figure separates the scatterplots into two distinct clusters:
Figure 1.3: Scatterplots clearly showing clusters that exist in a provided dataset
Figure 1.3: Scatterplots clearly showing clusters that exist in a provided dataset
Both figures display randomly generated number pairs (x,y coordinates) pulled from a Gaussian distribution. Simply by glancing at Figure 1.2, it should be plainly obvious where the clusters exist in your data – in real life, it will never be this easy. Now that you know that the data can be clearly separated into two clusters, you can start to understand what differences exist between the two groups.
Rewinding a bit from where unsupervised learning fits into the larger machine learning environmen...

Table of contents

  1. Preface
  2. Chapter 1
  3. Chapter 2
  4. Chapter 3
  5. Chapter 4
  6. Chapter 5
  7. Chapter 6
  8. Chapter 7
  9. Chapter 8
  10. Chapter 9
  11. Appendix