Learning Apache Mahout
eBook - ePub

Learning Apache Mahout

Chandramani Tiwary

Share book
  1. 250 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Learning Apache Mahout

Chandramani Tiwary

Book details
Book preview
Table of contents
Citations

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Learning Apache Mahout an online PDF/ePUB?
Yes, you can access Learning Apache Mahout by Chandramani Tiwary in PDF and/or ePUB format, as well as other popular books in Informatique & Algorithmes de programmation. We have over one million books available in our catalogue for you to explore.

Information

Year
2015
ISBN
9781783555215

Learning Apache Mahout


Table of Contents

Learning Apache Mahout
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introduction to Mahout
Why Mahout
Simple techniques and more data is better
Sampling is difficult
Community and license
When Mahout
Data too large for single machine
Data already on Hadoop
Algorithms implemented in Mahout
How Mahout
Setting up the development environment
Configuring Maven
Configuring Mahout
Configuring Eclipse with the Maven plugin and Mahout
Mahout command line
A clustering example
Reuter's raw data file
A classification example
Mahout API ā€“ a Java program example
The dataset
Parallel versus in-memory execution mode
Summary
2. Core Concepts in Machine Learning
Supervised learning
Determine the objective
Decide the training data
Create and clean the training set
Feature extraction
Train the models
Bagging
Boosting
Validation
Holdout-set validation
K-fold cross validation
Evaluation
Bias-variance trade-off
Function complexity and amount of training data
Dimensionality of the input space
Noise in data
Unsupervised learning
Cluster analysis
Objective
Feature representation
Feature normalization
Row normalization
Column normalization
Rescaling
Standardization
A notion of similarity and dissimilarity
Euclidean distance measure
Squared Euclidean distance measure
Manhattan distance measure
Cosine distance measure
Tanimoto distance measure
Algorithm for clustering
A stopping criteria
Frequent pattern mining
Measures for identifying interesting rules
Support
Confidence
Lift
Conviction
Things to consider
Actionable rules
What association to look for
Recommender system
Collaborative filtering
Cold start
Scalability
Sparsity
Content-based filtering
Model efficacy
Classification
Confusion matrix
ROC curve and AUC
Features of ROC graphs
Evaluating classifier using the ROC curve
Area-based accuracy measure
Euclidian distance comparison
Example
Regression
Mean absolute error
Root mean squared error
R-square
Adjusted R-square
Recommendation system
Score difference
Precision and recall
Clustering
The internal evaluation
The intra-cluster distance
The inter-cluster distance
The Daviesā€“Bouldin index
The Dunn index
The external evaluation
The Rand index
F-measure
Summary
3. Feature Engineering
Feature engineering
Feature construction
Categorical features
Merging categories
Converting to binary variables
Converting to continuous variables
Continuous features
Binning
Binarization
Feature standardization
Rescaling
Mean standardization
Scaling to unit norm
Feature transformation derived from the problem domain
Ratios
Frequency
Aggregate transformations
Normalization
Mathematical transformations
Feature extraction
Feature selection
Filter-based feature selection
Wrapper-based feature selection
Backward selection
Forward selection
Recursive feature elimination
Embedded feature selection
Dimensionality reduction
Summary
4. Classification with Mahout
Classification
White box models
Black box models
Logistic regression
Mahout logistic regression command line
Getting the data
Model building via command line
Splitting the dataset
Train the model command line option
Interpreting the output
Testing the model
Prediction
Adaptive regression model
Code example with logistic regression
Train the model
The LogisticRegressionParameter and CsvRecordFactory classes
A code example without the parameter class
Testing the online regression model
Getting predictions from OnlineLogisticRegression
A CrossFoldLearner example
Random forest
Bagging
Random subsets of features
Out-of-bag error estimate
Random forest using the command line
Predictions from random forest
NaĆÆve Bayes classifier
Numeric features with naĆÆve Bayes
Command line
Summary
5. Frequent Pattern Mining and Topic Modeling
Frequent pattern mining
Building FP Tree
Constructing the tree
Identify...

Table of contents