Mastering Machine Learning with scikit-learn - Second Edition
eBook - ePub

Mastering Machine Learning with scikit-learn - Second Edition

Gavin Hackeling

Share book
  1. 254 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Mastering Machine Learning with scikit-learn - Second Edition

Gavin Hackeling

Book details
Book preview
Table of contents
Citations

About This Book

Use scikit-learn to apply machine learning to real-world problemsAbout This Book• Master popular machine learning models including k-nearest neighbors, random forests, logistic regression, k-means, naive Bayes, and artificial neural networks• Learn how to build and evaluate performance of efficient models using scikit-learn• Practical guide to master your basics and learn from real life applications of machine learningWho This Book Is ForThis book is intended for software engineers who want to understand how common machine learning algorithms work and develop an intuition for how to use them, and for data scientists who want to learn about the scikit-learn API. Familiarity with machine learning fundamentals and Python are helpful, but not required. What You Will Learn• Review fundamental concepts such as bias and variance• Extract features from categorical variables, text, and images• Predict the values of continuous variables using linear regression and K Nearest Neighbors• Classify documents and images using logistic regression and support vector machines• Create ensembles of estimators using bagging and boosting techniques• Discover hidden structures in data using K-Means clustering• Evaluate the performance of machine learning systems in common tasksIn DetailMachine learning is the buzzword bringing computer science and statistics together to build smart and efficient models. Using powerful algorithms and techniques offered by machine learning you can automate any analytical model.This book examines a variety of machine learning models including popular machine learning algorithms such as k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. It discusses data preprocessing, hyperparameter optimization, and ensemble methods. You will build systems that classify documents, recognize images, detect ads, and more. You will learn to use scikit-learn's API to extract features from categorical variables, text and images; evaluate model performance, and develop an intuition for how to improve your model's performance.By the end of this book, you will master all required concepts of scikit-learn to build efficient models at work to carry out advanced tasks with the practical approach.Style and approachThis book is motivated by the belief that you do not understand something until you can describe it simply. Work through toy problems to develop your understanding of the learning algorithms and models, then apply your learnings to real-life problems.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Mastering Machine Learning with scikit-learn - Second Edition an online PDF/ePUB?
Yes, you can access Mastering Machine Learning with scikit-learn - Second Edition by Gavin Hackeling in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Year
2017
ISBN
9781788298490
Edition
2

K-means

In previous chapters, we discussed supervised learning tasks; we examined algorithms for regression and classification that learned from labeled training data. In this chapter, we will introduce our first unsupervised learning task: clustering. Clustering is used to find groups of similar observations within a set of unlabeled data. We will discuss the K-means clustering algorithm, apply it to an image compression problem, and learn to measure its performance. Finally, we will work through a semi-supervised learning problem that combines clustering with classification.

Clustering

Recall from Chapter 1, The Fundamentals of Machine Learning that the goal of unsupervised learning is to discover hidden structures or patterns in unlabeled training data. Clustering, or cluster analysis, is the task of grouping observations so that members of the same group, or cluster, are more similar to each other by some metric than they are to members of other clusters. As with supervised learning, we will represent an observation as an n-dimensional vector.
For example, assume that your training data consists of the samples plotted in the following figure:
Clustering might produce the following two groups, indicated by squares and circles:
Clustering can also produce the following four groups:
Clustering is commonly used to explore a dataset. Social networks can be clustered to identify communities and to suggest missing connections between people. In biology, clustering is used to find groups of genes with similar expression patterns. Recommendation systems sometimes employ clustering to identify products or media that might appeal to a user. In marketing, clustering is used to find segments of similar consumers. In the following sections, we will work through an example of using the K-means algorithm to cluster a dataset.

K-means

The K-means algorithm is a clustering method that is popular because of its speed and scalability. K-means is an iterative process of moving the centers of the clusters, called the centroids, to the mean position of their constituent instances and re-assigning instances to the clusters with the closest centroids. The titular k is a hyperparameter that specifies the number of clusters that should be created; K-means automatically assigns observations to clusters but cannot determine the appropriate number of clusters. k must be a positive integer that is l...

Table of contents