Learning Data Mining with Python
eBook - ePub

Learning Data Mining with Python

Robert Layton

Share book
  1. 344 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Learning Data Mining with Python

Robert Layton

Book details
Book preview
Table of contents
Citations

About This Book

If you are a programmer who wants to get started with data mining, then this book is for you.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Learning Data Mining with Python an online PDF/ePUB?
Yes, you can access Learning Data Mining with Python by Robert Layton in PDF and/or ePUB format, as well as other popular books in Commerce & Business Intelligence. We have over one million books available in our catalogue for you to explore.

Information

Year
2015
ISBN
9781784396053
Edition
1

Learning Data Mining with Python


Table of Contents

Learning Data Mining with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Getting Started with Data Mining
Introducing data mining
Using Python and the IPython Notebook
Installing Python
Installing IPython
Installing scikit-learn
A simple affinity analysis example
What is affinity analysis?
Product recommendations
Loading the dataset with NumPy
Implementing a simple ranking of rules
Ranking to find the best rules
A simple classification example
What is classification?
Loading and preparing the dataset
Implementing the OneR algorithm
Testing the algorithm
Summary
2. Classifying with scikit-learn Estimators
scikit-learn estimators
Nearest neighbors
Distance metrics
Loading the dataset
Moving towards a standard workflow
Running the algorithm
Setting parameters
Preprocessing using pipelines
An example
Standard preprocessing
Putting it all together
Pipelines
Summary
3. Predicting Sports Winners with Decision Trees
Loading the dataset
Collecting the data
Using pandas to load the dataset
Cleaning up the dataset
Extracting new features
Decision trees
Parameters in decision trees
Using decision trees
Sports outcome prediction
Putting it all together
Random forests
How do ensembles work?
Parameters in Random forests
Applying Random forests
Engineering new features
Summary
4. Recommending Movies Using Affinity Analysis
Affinity analysis
Algorithms for affinity analysis
Choosing parameters
The movie recommendation problem
Obtaining the dataset
Loading with pandas
Sparse data formats
The Apriori implementation
The Apriori algorithm
Implementation
Extracting association rules
Evaluation
Summary
5. Extracting Features with Transformers
Feature extraction
Representing reality in models
Common feature patterns
Creating good features
Feature selection
Selecting the best individual features
Feature creation
Principal Component Analysis
Creating your own transformer
The transformer API
Implementation details
Unit testing
Putting it all together
Summary
6. Social Media Insight Using Naive Bayes
Disambiguation
Downloading data from a social network
Loading and classifying the dataset
Creating a replicable dataset from Twitter
Text transformers
Bag-of-words
N-grams
Other features
Naive Bayes
Bayes' theorem
Naive Bayes algorithm
How it works
Application
Extracting word counts
Converting dictionaries to a matrix
Training the Naive Bayes classifier
Putting it all together
Evaluation using the F1-score
Getting useful features from models
Summary
7. Discovering Accounts to Follow Using Graph Mining
Loading the dataset
Classifying with an existing model
Getting follower information from Twitter
Building the network
Creating a graph
Creating a similarity graph
Finding subgraphs
Connected components
Optimizing criteria
Summary
8. Beating CAPTCHAs with Neural Networks
Artificial neural networks
An introduction to neural networks
Creating the dataset
Drawing basic CAPTCHAs
Splitting the image into individual letters
Creating a training dataset
Adjusting our training dataset to our methodology
Training and classifying
Back propagation
Predicting words
Improving accuracy using a dictionary
Ranking mechanisms for words
Putting it all together
Summary
9. Authorship Attribution
Attributing documents to authors
Applications and use cases
Attributing authorship
Getting the data
Function words
Counting function words
Classifying with function words
Support vector machines
Classifying with SVMs
Kernels
Character n-grams
Extracting character n-grams
Using the Enron dataset
Accessing the Enron dataset
Creating a dataset loader
Putting it all together
Evaluation
Summary
10. Clustering News Articles
Obtaining news articles
Using a Web API to get data
Reddit as a data source
Getting the data
Extracting text from arbitrary websites
Finding the stories in arbitrary websites
Putting it all together
Grouping news articles
The k-means algorithm
Evaluating the results
Extracting topic information from clusters
Using clustering algorithms as transformers
Clustering ensembles
Evidence accumulation
How it works
Implementation
Online learning
An introduction to online learning
Implementation
Summary
11. Classifying Objects in Images Using Deep Learning
Object classification
Application scenario and goals
Use cases
Deep neural networks
Intuition
Implementation
An introduction to Theano
An introduction to Lasagne
Implementing neural networks with nolearn
GPU optimization
When to use GPUs for computation
Running our code on a GPU
Setting up the environment
Application
Getting the data
Creating the neural network
Putting it all together
Summary
12. Working with Big Data
Big data
Application scenario and goals
MapReduce
Intuition
A word count example
Hadoop MapReduce
Application
Getting the data
Naive Bayes prediction
The mrjob package
Extracting the blog posts
Training Naive Bayes
Putting it all together
Training on Amazon's EMR infrastructure
Summary
A. Next Steps…
Chapter 1 – Getting Started with Data Mining
Scikit-learn tutorials
Extending the IPython Notebook
Chapter 2 – Classifying with scikit-learn Estimators
Scalability with the nearest neighbor
More complex pipelines
Comparing classifiers
Chapter 3: Predicting Sports Winners with Decision Trees
More on pandas
More complex features
Chapter 4 – Recommending Movies Using Affinity Analysis
New datasets
The Eclat algorithm
Chapter 5 – Extracting Features with Transformers
Adding noise
Vowpal Wabbit
Chapter 6 – Social Media Insight Using Naive Bayes
Spam detection
Natural language processing and part-of-speech tagging
Chapter 7 – Discovering Accounts to Follow Using Graph Mining
More complex algorithms
NetworkX
Chapter 8 – Beating CAPTCHAs with Neural Networks
Better (worse?) CAPTCHAs
Deeper networks
Reinforcement learning
Chapter 9 – Authorship Attribution
Increasing the sample size
Blogs dataset
Local n-grams
Chapter 10 – Clustering News Articles
Evaluation
Temporal analysis
Real-time clusterings
Chapter 11: Classifying Objects in Images Using Deep Learning
Keras and Pylearn2
Mahotas
Chapter 12 – Working with Big Data
Courses on Hadoop
Pydoop
Recommendation engine
More resources
Index

Learning Data Mining with Python

Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or...

Table of contents