Big Data Analytics with Java
eBook - ePub

Big Data Analytics with Java

Rajat Mehta

  1. 418 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Big Data Analytics with Java

Rajat Mehta

Book details
Book preview
Table of contents
Citations

About This Book

Learn the basics of analytics on big data using Java, machine learning and other big data toolsAbout This Book• Acquire real-world set of tools for building enterprise level data science applications• Surpasses the barrier of other languages in data science and learn create useful object-oriented codes• Extensive use of Java compliant big data tools like apache spark, Hadoop, etc.Who This Book Is ForThis book is for Java developers who are looking to perform data analysis in production environment. Those who wish to implement data analysis in their Big data applications will find this book helpful.What You Will Learn• Start from simple analytic tasks on big data• Get into more complex tasks with predictive analytics on big data using machine learning• Learn real time analytic tasks• Understand the concepts with examples and case studies• Prepare and refine data for analysis• Create charts in order to understand the data• See various real-world datasetsIn DetailThis book covers case studies such as sentiment analysis on a tweet dataset, recommendations on a movielens dataset, customer segmentation on an ecommerce dataset, and graph analysis on actual flights dataset.This book is an end-to-end guide to implement analytics on big data with Java. Java is the de facto language for major big data environments, including Hadoop. This book will teach you how to perform analytics on big data with production-friendly Java. This book basically divided into two sections. The first part is an introduction that will help the readers get acquainted with big data environments, whereas the second part will contain a hardcore discussion on all the concepts in analytics on big data. It will take you from data analysis and data visualization to the core concepts and advantages of machine learning, real-life usage of regression and classification using Naive Bayes, a deep discussion on the concepts of clustering, and a review of simple neural networks on big data using deepLearning4j or plain Java Spark code. This book is a must-have book for Java developers who want to start learning big data analytics and want to use it in the real world.Style and approachThe approach of book is to deliver practical learning modules in manageable content. Each chapter is a self-contained unit of a concept in big data analytics. Book will step by step builds the competency in the area of big data analytics. Examples using real world case studies to give ideas of real applications and how to use the techniques mentioned. The examples and case studies will be shown using both theory and code.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Big Data Analytics with Java an online PDF/ePUB?
Yes, you can access Big Data Analytics with Java by Rajat Mehta in PDF and/or ePUB format, as well as other popular books in Commerce & Business Intelligence. We have over one million books available in our catalogue for you to explore.

Information

Year
2017
ISBN
9781787282193
Edition
1

Big Data Analytics with Java


Table of Contents

Big Data Analytics with Java
Credits
About the Author
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Big Data Analytics with Java
Why data analytics on big data?
Big data for analytics
Big data – a bigger pay package for Java developers
Basics of Hadoop – a Java sub-project
Distributed computing on Hadoop
HDFS concepts
Design and architecture of HDFS
Main components of HDFS
HDFS simple commands
Apache Spark
Concepts
Transformations
Actions
Spark Java API
Spark samples using Java 8
Loading data
Data operations – cleansing and munging
Analyzing data – count, projection, grouping, aggregation, and max/min
Actions on RDDs
Paired RDDs
Transformations on paired RDDs
Saving data
Collecting and printing results
Executing Spark programs on Hadoop
Apache Spark sub-projects
Spark machine learning modules
MLlib Java API
Other machine learning libraries
Mahout – a popular Java ML library
Deeplearning4j – a deep learning library
Compressing data
Avro and Parquet
Summary
2. First Steps in Data Analysis
Datasets
Data cleaning and munging
Basic analysis of data with Spark SQL
Building SparkConf and context
Dataframe and datasets
Load and parse data
Analyzing data – the Spark-SQL way
Spark SQL for data exploration and analytics
Market basket analysis – Apriori algorithm
Full Apriori algorithm
Implementation of the Apriori algorithm in Apache Spark
Efficient market basket analysis using FP-Growth algorithm
Running FP-Growth on Apache Spark
Summary
3. Data Visualization
Data visualization with Java JFreeChart
Using charts in big data analytics
Time Series chart
All India seasonal and annual average temperature series dataset
Simple single Time Series chart
Multiple Time Series on a single chart window
Bar charts
Histograms
When would you use a histogram?
How to make histograms using JFreeChart?
Line charts
Scatter plots
Box plots
Advanced visualization technique
Prefuse
IVTK Graph toolkit
Other libraries
Summary
4. Basics of Machine Learning
What is machine learning?
Real-life examples of machine learning
Type of machine learning
A small sample case study of supervised and unsupervised learning
Steps for machine learning problems
Choosing the machine learning model
What are the feature types that can be extracted from the datasets?
How do you select the best features to train your models?
How do you run machine learning analytics on big data?
Getting and preparing data in Hadoop
Preparing the data
Formatting the data
Storing the data
Training and storing models on big data
Apache Spark machine learning API
The new Spark ML API
Summary
5. Regression on Big Data
Linear regression
What is simple linear regression?
Where is linear regression used?
Predicting house prices using linear regression
Dataset
Data cleaning and munging
Exploring the dataset
Running and testing the linear regression model
Logistic regression
Which mathematical functions does logistic regression use?
Where is logistic regression used?
Predicting heart disease using logistic regression
Dataset
Data cleaning and munging
Data exploration
Running and testing the logistic regression model
Summary
6. Naive Bayes and Sentiment Analysis
Conditional probability
Bayes theorem
Naive Bayes algorithm
Advantages of Naive Bayes
Disadvantages of Naive Bayes
Sentimental analysis
Concepts for sentimental analysis
Tokenization
Stop words removal
Stemming
N-grams
Term presence and Term Frequency
TF-IDF
Bag of words
Dataset
Data exploration of text data
Sentimental analysis on this dataset
SVM or Support Vector Machine
Summary
7. Decision Trees
What is a decision tree?
Building a decision tree
Choosing the best features for splitting the datasets
Advantages of using decision trees
Disadvantages of using decision trees
Dataset
Data exploration
Cleaning and munging the data
Training and testing the model
Summary
8. Ensembling on Big Data
Ensembling
Types of ensembling
Bagging
Boosting
Advantages and disadvantages of ensembling
Random forests
Gradient boosted trees (GBTs)
Classification problem and dataset used
Data exploration
Training and testing our random forest model
Training and testing our gradient boosted tree model
Summary
9. Recommendation Systems
Recommendation systems and their types
Content-based recommendation systems
Dataset
Content-based recommender on MovieLens dataset
Collaborative recommendation systems
Advantages
Disadvantages
Alternating least square – collaborative filtering
Summary
10. Clustering and Customer Segmentation on Big Data
Clustering
Types of clustering
Hierarchical clustering
K-means clustering
Bisecting k-means clustering
Customer segmentation
Dataset
Data exploration
Clustering for customer segmentation
Changing the clustering algorithm
Summary
11. Massive Graphs on Big Data
Refresher on graphs
Representing graphs
Common terminology on graphs
Common algorithms on graphs
Plotting graphs
Massive graphs on big data
Graph analytics
GraphFrames
Building a graph using GraphFrames
Graph analytics on airports and their flights
Datasets
Graph analytics on flights data
Summary
12. Real-Time Analytics on Big Data
Real-time analytics
Big data stack for real-time analytics
Real-time SQL queries on big data
Real-time data ingestion and storage
Real-time data processing
Real-time SQL queries using Impala
Flight delay analysis using Impala
Apache Kafka
Spark Streaming
Typical uses of Spark Streaming
Base project setup
Trending videos
Sentiment analysis in real time
Summary
13. Deep Learning Using Big Data
Introduction to neural networks
Perceptron
Problems with perceptrons
Sigmoid neuron
Multi-layer perceptrons
Accuracy of mult...

Table of contents

Citation styles for Big Data Analytics with Java

APA 6 Citation

Mehta, R. (2017). Big Data Analytics with Java (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/527063/big-data-analytics-with-java-pdf (Original work published 2017)

Chicago Citation

Mehta, Rajat. (2017) 2017. Big Data Analytics with Java. 1st ed. Packt Publishing. https://www.perlego.com/book/527063/big-data-analytics-with-java-pdf.

Harvard Citation

Mehta, R. (2017) Big Data Analytics with Java. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/527063/big-data-analytics-with-java-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Mehta, Rajat. Big Data Analytics with Java. 1st ed. Packt Publishing, 2017. Web. 14 Oct. 2022.