eBook - ePub

Mastering Machine Learning with scikit-learn - Second Edition

Name: Mastering Machine Learning with scikit-learn - Second Edition
ISBN: 9781788298490

Gavin Hackeling,

254 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Mastering Machine Learning with scikit-learn - Second Edition

Gavin Hackeling,

About this book

Use scikit-learn to apply machine learning to real-world problemsAbout This Book• Master popular machine learning models including k-nearest neighbors, random forests, logistic regression, k-means, naive Bayes, and artificial neural networks• Learn how to build and evaluate performance of efficient models using scikit-learn• Practical guide to master your basics and learn from real life applications of machine learningWho This Book Is ForThis book is intended for software engineers who want to understand how common machine learning algorithms work and develop an intuition for how to use them, and for data scientists who want to learn about the scikit-learn API. Familiarity with machine learning fundamentals and Python are helpful, but not required. What You Will Learn• Review fundamental concepts such as bias and variance• Extract features from categorical variables, text, and images• Predict the values of continuous variables using linear regression and K Nearest Neighbors• Classify documents and images using logistic regression and support vector machines• Create ensembles of estimators using bagging and boosting techniques• Discover hidden structures in data using K-Means clustering• Evaluate the performance of machine learning systems in common tasksIn DetailMachine learning is the buzzword bringing computer science and statistics together to build smart and efficient models. Using powerful algorithms and techniques offered by machine learning you can automate any analytical model.This book examines a variety of machine learning models including popular machine learning algorithms such as k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. It discusses data preprocessing, hyperparameter optimization, and ensemble methods. You will build systems that classify documents, recognize images, detect ads, and more. You will learn to use scikit-learn's API to extract features from categorical variables, text and images; evaluate model performance, and develop an intuition for how to improve your model's performance.By the end of this book, you will master all required concepts of scikit-learn to build efficient models at work to carry out advanced tasks with the practical approach.Style and approachThis book is motivated by the belief that you do not understand something until you can describe it simply. Work through toy problems to develop your understanding of the learning algorithms and models, then apply your learnings to real-life problems.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Packt Publishing

Year

2017

Edition

eBook ISBN

9781788298490

Topic

Computer Science

Subtopic

Data Modelling & Design

Index

Computer Science

K-means

In previous chapters, we discussed supervised learning tasks; we examined algorithms for regression and classification that learned from labeled training data. In this chapter, we will introduce our first unsupervised learning task: clustering. Clustering is used to find groups of similar observations within a set of unlabeled data. We will discuss the K-means clustering algorithm, apply it to an image compression problem, and learn to measure its performance. Finally, we will work through a semi-supervised learning problem that combines clustering with classification.

Clustering

Recall from Chapter 1, The Fundamentals of Machine Learning that the goal of unsupervised learning is to discover hidden structures or patterns in unlabeled training data. Clustering, or cluster analysis, is the task of grouping observations so that members of the same group, or cluster, are more similar to each other by some metric than they are to members of other clusters. As with supervised learning, we will represent an observation as an n-dimensional vector.

For example, assume that your training data consists of the samples plotted in the following figure:

Clustering might produce the following two groups, indicated by squares and circles:

Clustering can also produce the following four groups:

Clustering is commonly used to explore a dataset. Social networks can be clustered to identify communities and to suggest missing connections between people. In biology, clustering is used to find groups of genes with similar expression patterns. Recommendation systems sometimes employ clustering to identify products or media that might appeal to a user. In marketing, clustering is used to find segments of similar consumers. In the following sections, we will work through an example of using the K-means algorithm to cluster a dataset.

K-means

The K-means algorithm is a clustering method that is popular because of its speed and scalability. K-means is an iterative process of moving the centers of the clusters, called the centroids, to the mean position of their constituent instances and re-assigning instances to the clusters with the closest centroids. The titular k is a hyperparameter that specifies the number of clusters that should be created; K-means automatically assigns observations to clusters but cannot determine the appropriate number of clusters. k must be a positive integer that is l...

Title Page
Copyright
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
The Fundamentals of Machine Learning
Simple Linear Regression
Classification and Regression with k-Nearest Neighbors
Feature Extraction
From Simple Linear Regression to Multiple Linear Regression
From Linear Regression to Logistic Regression
Naive Bayes
Nonlinear Classification and Regression with Decision Trees
From Decision Trees to Random Forests and Other Ensemble Methods
The Perceptron
From the Perceptron to Support Vector Machines
From the Perceptron to Artificial Neural Networks
K-means
Dimensionality Reduction with Principal Component Analysis

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Mastering Machine Learning with scikit-learn - Second Edition by Gavin Hackeling in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Modelling & Design. We have over 1.5 million books available in our catalogue for you to explore.

Mastering Machine Learning with scikit-learn - Second Edition

Mastering Machine Learning with scikit-learn - Second Edition

About this book

Trusted by 375,005 students

Information

K-means

Clustering

K-means

Table of contents

Frequently asked questions