Apache Spark Machine Learning Blueprints
eBook - ePub

Apache Spark Machine Learning Blueprints

  1. 252 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Apache Spark Machine Learning Blueprints

About this book

Develop a range of cutting-edge machine learning projects with Apache Spark using this actionable guide

About This Book

  • Customize Apache Spark and R to fit your analytical needs in customer research, fraud detection, risk analytics, and recommendation engine development
  • Develop a set of practical Machine Learning applications that can be implemented in real-life projects
  • A comprehensive, project-based guide to improve and refine your predictive models for practical implementation

Who This Book Is For

If you are a data scientist, a data analyst, or an R and SPSS user with a good understanding of machine learning concepts, algorithms, and techniques, then this is the book for you. Some basic understanding of Spark and its core elements and application is required.

What You Will Learn

  • Set up Apache Spark for machine learning and discover its impressive processing power
  • Combine Spark and R to unlock detailed business insights essential for decision making
  • Build machine learning systems with Spark that can detect fraud and analyze financial risks
  • Build predictive models focusing on customer scoring and service ranking
  • Build a recommendation systems using SPSS on Apache Spark
  • Tackle parallel computing and find out how it can support your machine learning projects
  • Turn open data and communication data into actionable insights by making use of various forms of machine learning

In Detail

There's a reason why Apache Spark has become one of the most popular tools in Machine Learning – its ability to handle huge datasets at an impressive speed means you can be much more responsive to the data at your disposal. This book shows you Spark at its very best, demonstrating how to connect it with R and unlock maximum value not only from the tool but also from your data.

Packed with a range of project "blueprints" that demonstrate some of the most interesting challenges that Spark can help you tackle, you'll find out how to use Spark notebooks and access, clean, and join different datasets before putting your knowledge into practice with some real-world projects, in which you will see how Spark Machine Learning can help you with everything from fraud detection to analyzing customer attrition. You'll also find out how to build a recommendation engine using Spark's parallel computing powers.

Style and approach

This book offers a step-by-step approach to setting up Apache Spark, and use other analytical tools with it to process Big Data and build machine learning projects.The initial chapters focus more on the theory aspect of machine learning with Spark, while each of the later chapters focuses on building standalone projects using Spark.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

Information

Apache Spark Machine Learning Blueprints


Table of Contents

Apache Spark Machine Learning Blueprints
Credits
About the Author
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the color images of this book
Errata
Piracy
Questions
1. Spark for Machine Learning
Spark overview and Spark advantages
Spark overview
Spark advantages
Spark computing for machine learning
Machine learning algorithms
MLlib
Other ML libraries
Spark RDD and dataframes
Spark RDD
Spark dataframes
Dataframes API for R
ML frameworks, RM4Es and Spark computing
ML frameworks
RM4Es
The Spark computing framework
ML workflows and Spark pipelines
ML as a step-by-step workflow
ML workflow examples
Spark notebooks
Notebook approach for ML
Step 1: Getting the software ready
Step 2: Installing the Knitr package
Step 3: Creating a simple report
Spark notebooks
Summary
2. Data Preparation for Spark ML
Accessing and loading datasets
Accessing publicly available datasets
Loading datasets into Spark
Exploring and visualizing datasets
Data cleaning
Dealing with data incompleteness
Data cleaning in Spark
Data cleaning made easy
Identity matching
Identity issues
Identity matching on Spark
Entity resolution
Short string comparison
Long string comparison
Record deduplication
Identity matching made better
Crowdsourced deduplication
Configuring the crowd
Using the crowd
Dataset reorganizing
Dataset reorganizing tasks
Dataset reorganizing with Spark SQL
Dataset reorganizing with R on Spark
Dataset joining
Dataset joining and its tool – the Spark SQL
Dataset joining in Spark
Dataset joining with the R data table package
Feature extraction
Feature development challenges
Feature development with Spark MLlib
Feature development with R
Repeatability and automation
Dataset preprocessing workflows
Spark pipelines for dataset preprocessing
Dataset preprocessing automation
Summary
3. A Holistic View on Spark
Spark for a holistic view
The use case
Fast and easy computing
Methods for a holistic view
Regression modeling
The SEM approach
Decision trees
Feature preparation
PCA
Grouping by category to use subject knowledge
Feature selection
Model estimation
MLlib implementation
The R notebooks' implementation
Model evaluation
Quick evaluations
RMSE
ROC curves
Results explanation
Impact assessments
Deployment
Dashboard
Rules
Summary
4. Fraud Detection on Spark
Spark for fraud detection
The use case
Distributed computing
Methods for fraud detection
Random forest
Decision trees
Feature preparation
Feature extraction from LogFile
Data merging
Model estimation
MLlib implementation
R notebooks implementation
Model evaluation
A quick evaluation
Confusion matrix and false positive ratios
Results explanation
Big influencers and their impacts
Deploying fraud detection
Rules
Scoring
Summary
5. Risk Scoring on Spark
Spark for risk scoring
The use case
Apache Spark notebooks
Methods of risk scoring
Logistic regression
Preparing coding in R
Random forest and decision trees
Preparing coding
Data and feature preparation
OpenRefine
Model estimation
The DataScientistWorkbench for R notebooks
R notebooks implementation
Model evaluation
Confusion matrix
ROC
Kolmogorov-Smirnov
Results explanation
Big influencers and their impacts
Deployment
Scoring
Summary
6. Churn Prediction on Spark
Spark for churn prediction
The use case
Spark computing
Methods for churn prediction
Regression models
Decision trees and Random forest
Feature preparation
Feature extraction
Feature selection
Model estimation
Spark implementation with MLlib
Model evaluation
Results explanation
Calculating the impact of interventions
Deployment
Scoring
Intervention recommendations
Summary
7. Recommendations on Spark
Apache Spark for a recommendation engine
The use case
SPSS on Spark
Methods for recommendation
Collaborative filtering
Preparing coding
Data treatment with SPSS
Missing data nodes on SPSS modeler
Model estimation
SPSS on Spark – the SPSS Analytics server
Model evaluation
Recommendation deployment
Summary
8. Learning Analytics on Spark
Spark for attrition prediction
The use case
Spark computing
Methods of attrition prediction
Regression models
About regression
Preparing for coding
Decision trees
Preparing for coding
Feature preparation
Feature development
Feature selection
Principal components analysis
Subject knowledge aid
ML feature selection
Model estimation
Spark implementation with the Zeppelin notebook
Model evaluation
A quick evaluation
The confusion matrix and error ratios
Results explanation
Calculating the impact of interventions
Calculating the impact of main causes
Deployment
Rules
Scoring
Summary
9. City Analytics on Spark
Spark for service forecasting
The use case
Spark computing
Methods of service forecasting
Regression models
About regression
Preparing for coding
Time series modeling
About time series
Preparing for coding
Data and feature preparation
Data merging
Feature selection
Model estimation
Spark implementation with the Zeppelin notebook
Spark implementation with the R notebook
Model evaluation
RMSE calculation with MLlib
RMSE calculation with R
Explanations of the results
Biggest influencers
Visualizing trends
The rules of sending out alerts
Scores to rank city zones
Summary
10. Learning Telco Data on Spark
Spark for using Telco Data
The use case
Spark computing
Methods for learning from Telco Data
Descriptive statistics and visualization
Linear and logistic regression models
Decision tree and random...

Table of contents

  1. Apache Spark Machine Learning Blueprints

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Apache Spark Machine Learning Blueprints by Alex Liu in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.