Learning Data Mining with Python
eBook - ePub

Learning Data Mining with Python

Robert Layton

Buch teilen
  1. 344 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

Learning Data Mining with Python

Robert Layton

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

If you are a programmer who wants to get started with data mining, then this book is for you.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Learning Data Mining with Python als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu Learning Data Mining with Python von Robert Layton im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Commerce & Business Intelligence. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Jahr
2015
ISBN
9781784396053

Learning Data Mining with Python


Table of Contents

Learning Data Mining with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Getting Started with Data Mining
Introducing data mining
Using Python and the IPython Notebook
Installing Python
Installing IPython
Installing scikit-learn
A simple affinity analysis example
What is affinity analysis?
Product recommendations
Loading the dataset with NumPy
Implementing a simple ranking of rules
Ranking to find the best rules
A simple classification example
What is classification?
Loading and preparing the dataset
Implementing the OneR algorithm
Testing the algorithm
Summary
2. Classifying with scikit-learn Estimators
scikit-learn estimators
Nearest neighbors
Distance metrics
Loading the dataset
Moving towards a standard workflow
Running the algorithm
Setting parameters
Preprocessing using pipelines
An example
Standard preprocessing
Putting it all together
Pipelines
Summary
3. Predicting Sports Winners with Decision Trees
Loading the dataset
Collecting the data
Using pandas to load the dataset
Cleaning up the dataset
Extracting new features
Decision trees
Parameters in decision trees
Using decision trees
Sports outcome prediction
Putting it all together
Random forests
How do ensembles work?
Parameters in Random forests
Applying Random forests
Engineering new features
Summary
4. Recommending Movies Using Affinity Analysis
Affinity analysis
Algorithms for affinity analysis
Choosing parameters
The movie recommendation problem
Obtaining the dataset
Loading with pandas
Sparse data formats
The Apriori implementation
The Apriori algorithm
Implementation
Extracting association rules
Evaluation
Summary
5. Extracting Features with Transformers
Feature extraction
Representing reality in models
Common feature patterns
Creating good features
Feature selection
Selecting the best individual features
Feature creation
Principal Component Analysis
Creating your own transformer
The transformer API
Implementation details
Unit testing
Putting it all together
Summary
6. Social Media Insight Using Naive Bayes
Disambiguation
Downloading data from a social network
Loading and classifying the dataset
Creating a replicable dataset from Twitter
Text transformers
Bag-of-words
N-grams
Other features
Naive Bayes
Bayes' theorem
Naive Bayes algorithm
How it works
Application
Extracting word counts
Converting dictionaries to a matrix
Training the Naive Bayes classifier
Putting it all together
Evaluation using the F1-score
Getting useful features from models
Summary
7. Discovering Accounts to Follow Using Graph Mining
Loading the dataset
Classifying with an existing model
Getting follower information from Twitter
Building the network
Creating a graph
Creating a similarity graph
Finding subgraphs
Connected components
Optimizing criteria
Summary
8. Beating CAPTCHAs with Neural Networks
Artificial neural networks
An introduction to neural networks
Creating the dataset
Drawing basic CAPTCHAs
Splitting the image into individual letters
Creating a training dataset
Adjusting our training dataset to our methodology
Training and classifying
Back propagation
Predicting words
Improving accuracy using a dictionary
Ranking mechanisms for words
Putting it all together
Summary
9. Authorship Attribution
Attributing documents to authors
Applications and use cases
Attributing authorship
Getting the data
Function words
Counting function words
Classifying with function words
Support vector machines
Classifying with SVMs
Kernels
Character n-grams
Extracting character n-grams
Using the Enron dataset
Accessing the Enron dataset
Creating a dataset loader
Putting it all together
Evaluation
Summary
10. Clustering News Articles
Obtaining news articles
Using a Web API to get data
Reddit as a data source
Getting the data
Extracting text from arbitrary websites
Finding the stories in arbitrary websites
Putting it all together
Grouping news articles
The k-means algorithm
Evaluating the results
Extracting topic information from clusters
Using clustering algorithms as transformers
Clustering ensembles
Evidence accumulation
How it works
Implementation
Online learning
An introduction to online learning
Implementation
Summary
11. Classifying Objects in Images Using Deep Learning
Object classification
Application scenario and goals
Use cases
Deep neural networks
Intuition
Implementation
An introduction to Theano
An introduction to Lasagne
Implementing neural networks with nolearn
GPU optimization
When to use GPUs for computation
Running our code on a GPU
Setting up the environment
Application
Getting the data
Creating the neural network
Putting it all together
Summary
12. Working with Big Data
Big data
Application scenario and goals
MapReduce
Intuition
A word count example
Hadoop MapReduce
Application
Getting the data
Naive Bayes prediction
The mrjob package
Extracting the blog posts
Training Naive Bayes
Putting it all together
Training on Amazon's EMR infrastructure
Summary
A. Next Steps…
Chapter 1 – Getting Started with Data Mining
Scikit-learn tutorials
Extending the IPython Notebook
Chapter 2 – Classifying with scikit-learn Estimators
Scalability with the nearest neighbor
More complex pipelines
Comparing classifiers
Chapter 3: Predicting Sports Winners with Decision Trees
More on pandas
More complex features
Chapter 4 – Recommending Movies Using Affinity Analysis
New datasets
The Eclat algorithm
Chapter 5 – Extracting Features with Transformers
Adding noise
Vowpal Wabbit
Chapter 6 – Social Media Insight Using Naive Bayes
Spam detection
Natural language processing and part-of-speech tagging
Chapter 7 – Discovering Accounts to Follow Using Graph Mining
More complex algorithms
NetworkX
Chapter 8 – Beating CAPTCHAs with Neural Networks
Better (worse?) CAPTCHAs
Deeper networks
Reinforcement learning
Chapter 9 – Authorship Attribution
Increasing the sample size
Blogs dataset
Local n-grams
Chapter 10 – Clustering News Articles
Evaluation
Temporal analysis
Real-time clusterings
Chapter 11: Classifying Objects in Images Using Deep Learning
Keras and Pylearn2
Mahotas
Chapter 12 – Working with Big Data
Courses on Hadoop
Pydoop
Recommendation engine
More resources
Index

Learning Data Mining with Python

Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or...

Inhaltsverzeichnis