eBook - ePub

Mastering Machine Learning with scikit-learn - Second Edition

Name: Mastering Machine Learning with scikit-learn - Second Edition
Author: Gavin Hackeling

Gavin Hackeling

Compartir libro

254 páginas
English
ePUB (apto para móviles)
Disponible en iOS y Android

eBook - ePub

Mastering Machine Learning with scikit-learn - Second Edition

Gavin Hackeling

Detalles del libro

Vista previa del libro

Índice

Citas

Información del libro

Use scikit-learn to apply machine learning to real-world problemsAbout This Book• Master popular machine learning models including k-nearest neighbors, random forests, logistic regression, k-means, naive Bayes, and artificial neural networks• Learn how to build and evaluate performance of efficient models using scikit-learn• Practical guide to master your basics and learn from real life applications of machine learningWho This Book Is ForThis book is intended for software engineers who want to understand how common machine learning algorithms work and develop an intuition for how to use them, and for data scientists who want to learn about the scikit-learn API. Familiarity with machine learning fundamentals and Python are helpful, but not required. What You Will Learn• Review fundamental concepts such as bias and variance• Extract features from categorical variables, text, and images• Predict the values of continuous variables using linear regression and K Nearest Neighbors• Classify documents and images using logistic regression and support vector machines• Create ensembles of estimators using bagging and boosting techniques• Discover hidden structures in data using K-Means clustering• Evaluate the performance of machine learning systems in common tasksIn DetailMachine learning is the buzzword bringing computer science and statistics together to build smart and efficient models. Using powerful algorithms and techniques offered by machine learning you can automate any analytical model.This book examines a variety of machine learning models including popular machine learning algorithms such as k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. It discusses data preprocessing, hyperparameter optimization, and ensemble methods. You will build systems that classify documents, recognize images, detect ads, and more. You will learn to use scikit-learn's API to extract features from categorical variables, text and images; evaluate model performance, and develop an intuition for how to improve your model's performance.By the end of this book, you will master all required concepts of scikit-learn to build efficient models at work to carry out advanced tasks with the practical approach.Style and approachThis book is motivated by the belief that you do not understand something until you can describe it simply. Work through toy problems to develop your understanding of the learning algorithms and models, then apply your learnings to real-life problems.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?

Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.

¿Cómo descargo los libros?

Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.

¿En qué se diferencian los planes de precios?

Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.

¿Qué es Perlego?

Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.

¿Perlego ofrece la función de texto a voz?

Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.

¿Es Mastering Machine Learning with scikit-learn - Second Edition un PDF/ePUB en línea?

Sí, puedes acceder a Mastering Machine Learning with scikit-learn - Second Edition de Gavin Hackeling en formato PDF o ePUB, así como a otros libros populares de Computer Science y Data Processing. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial

Packt Publishing

Año

2017

ISBN

9781788298490

Edición

Categoría

Computer Science

Categoría

Data Processing

K-means

In previous chapters, we discussed supervised learning tasks; we examined algorithms for regression and classification that learned from labeled training data. In this chapter, we will introduce our first unsupervised learning task: clustering. Clustering is used to find groups of similar observations within a set of unlabeled data. We will discuss the K-means clustering algorithm, apply it to an image compression problem, and learn to measure its performance. Finally, we will work through a semi-supervised learning problem that combines clustering with classification.

Clustering

Recall from Chapter 1, The Fundamentals of Machine Learning that the goal of unsupervised learning is to discover hidden structures or patterns in unlabeled training data. Clustering, or cluster analysis, is the task of grouping observations so that members of the same group, or cluster, are more similar to each other by some metric than they are to members of other clusters. As with supervised learning, we will represent an observation as an n-dimensional vector.

For example, assume that your training data consists of the samples plotted in the following figure:

Clustering might produce the following two groups, indicated by squares and circles:

Clustering can also produce the following four groups:

Clustering is commonly used to explore a dataset. Social networks can be clustered to identify communities and to suggest missing connections between people. In biology, clustering is used to find groups of genes with similar expression patterns. Recommendation systems sometimes employ clustering to identify products or media that might appeal to a user. In marketing, clustering is used to find segments of similar consumers. In the following sections, we will work through an example of using the K-means algorithm to cluster a dataset.

K-means

The K-means algorithm is a clustering method that is popular because of its speed and scalability. K-means is an iterative process of moving the centers of the clusters, called the centroids, to the mean position of their constituent instances and re-assigning instances to the clusters with the closest centroids. The titular k is a hyperparameter that specifies the number of clusters that should be created; K-means automatically assigns observations to clusters but cannot determine the appropriate number of clusters. k must be a positive integer that is l...