Real-time Analytics with Storm and Cassandra
eBook - ePub

Real-time Analytics with Storm and Cassandra

Shilpi Saxena

Share book
  1. 220 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Real-time Analytics with Storm and Cassandra

Shilpi Saxena

Book details
Book preview
Table of contents
Citations

About This Book

About This Book

  • Create your own data processing topology and implement it in various real-time scenarios using Storm and Cassandra
  • Build highly available and linearly scalable applications using Storm and Cassandra that will process voluminous data at lightning speed
  • A pragmatic and example-oriented guide to implement various applications built with Storm and Cassandra

Who This Book Is For

If you want to efficiently use Storm and Cassandra together and excel at developing production-grade, distributed real-time applications, then this book is for you. No prior knowledge of using Storm and Cassandra together is necessary. However, a background in Java is expected.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Real-time Analytics with Storm and Cassandra an online PDF/ePUB?
Yes, you can access Real-time Analytics with Storm and Cassandra by Shilpi Saxena in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

Information

Year
2015
ISBN
9781784390006

Real-time Analytics with Storm and Cassandra


Table of Contents

Real-time Analytics with Storm and Cassandra
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Let's Understand Storm
Distributed computing problems
Real-time business solution for credit or debit card fraud detection
Aircraft Communications Addressing and Reporting system
Healthcare
Other applications
Solutions for complex distributed use cases
The Hadoop solution
A custom solution
Licensed proprietary solutions
Other real-time processing tools
A high-level view of various components of Storm
Delving into the internals of Storm
Quiz time
Summary
2. Getting Started with Your First Topology
Prerequisites for setting up Storm
Components of a Storm topology
Spouts
Bolts
Streams
Tuples – the data model in Storm
Executing a sample Storm topology – local mode
WordCount topology from the Storm-starter project
Executing the topology in the distributed mode
Set up Zookeeper (V 3.3.5) for Storm
Setting up Storm in the distributed mode
Launching Storm daemons
Executing the topology from Command Prompt
Tweaking the WordCount topology to customize it
Quiz time
Summary
3. Understanding Storm Internals by Examples
Customizing Storm spouts
Creating FileSpout
Tweaking WordCount topology to use FileSpout
The SocketSpout class
Anchoring and acking
The unreliable topology
Stream groupings
Local or shuffle grouping
Fields grouping
All grouping
Global grouping
Custom grouping
Direct grouping
Quiz time
Summary
4. Storm in a Clustered Mode
The Storm cluster setup
Zookeeper configurations
Cleaning up Zookeeper
Storm configurations
Storm logging configurations
The Storm UI
Section 1
Section 2
Section 3
Section 4
The visualization section
Storm monitoring tools
Quiz time
Summary
5. Storm High Availability and Failover
An overview of RabbitMQ
Installing the RabbitMQ cluster
Prerequisites for the setup of RabbitMQ
Setting up a RabbitMQ server
Testing the RabbitMQ server
Creating a RabbitMQ cluster
Enabling the RabbitMQ UI
Creating mirror queues for high availability
Integrating Storm with RabbitMQ
Creating a RabbitMQ feeder component
Wiring the topology for the AMQP spout
Building high availability of components
High availability of the Storm cluster
Guaranteed processing of the Storm cluster
The Storm isolation scheduler
Quiz time
Summary
6. Adding NoSQL Persistence to Storm
The advantages of Cassandra
Columnar database fundamentals
Types of column families
Types of columns
Setting up the Cassandra cluster
Installing Cassandra
Multiple data centers
Prerequisites for setting up multiple data centers
Installing Cassandra data centers
Introduction to CQLSH
Introduction to CLI
Using different client APIs to access Cassandra
Storm topology wired to the Cassandra store
The best practices for Storm/Cassandra applications
Quiz time
Summary
7. Cassandra Partitioning, High Availability, and Consistency
Consistent hashing
One or more node goes down
One or more node comes back up
Replication in Cassandra and strategies
Cassandra consistency
Write consistency
Read consistency
Consistency maintenance features
Quiz time
Summary
8. Cassandra Management and Maintenance
Cassandra – gossip protocol
Bootstrapping
Failure scenario handling – detection and recovery
Cassandra cluster scaling – adding a new node
Cassandra cluster – replacing a dead node
The replication factor
The nodetool commands
Cassandra fault tolerance
Cassandra monitoring systems
JMX monitoring
Datastax OpsCenter
Quiz time
Summary
9. Storm Management and Maintenance
Scaling the Storm cluster – adding new supervisor nodes
Scaling the Storm cluster and rebalancing the topology
Rebalancing using the GUI
Rebalancing using the CLI
Setting up workers and parallelism to enhance processing
Scenario 1
Scenario 2
Scenario 3
Storm troubleshooting
The Storm UI
Storm logs
Quiz time
Summary
10. Advance Concepts in Storm
Building a Trident topology
Understanding the Trident API
Local partition manipulation operation
Functions
Filters
partitionAggregate
Sum aggregate
CombinerAggregator
ReducerAggregator
Aggregator
Operations related to stream repartitioning
Data aggregations over the streams
Grouping over a field in a stream
Merge and join
Examples and illustrations
Quiz time
Summary
11. Distributed Cache and CEP with Storm
The need for distributed caching in Storm
Introduction to memcached
Setting up memcache
Building a topology with a cache
Introduction to the complex event processing engine
Esper
Getting started with Esper
Integrating Esper with Storm
Quiz time
Summary
A. Quiz Answers
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Index

Real-time Analytics with Storm and Cassandra

Copyright © 2015 Packt Publishing
All rights reserved. No part of this book ma...

Table of contents