Grokking Streaming Systems
eBook - ePub

Grokking Streaming Systems

Real-time event processing

Josh Fischer , Ning Wang

Share book
  1. 312 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Grokking Streaming Systems

Real-time event processing

Josh Fischer , Ning Wang

Book details
Book preview
Table of contents
Citations

About This Book

A friendly, framework-agnostic tutorial that will help you grok how streaming systems work—and how to build your own! In Grokking Streaming Systems you will learn how to: Implement and troubleshoot streaming systems
Design streaming systems for complex functionalities
Assess parallelization requirements
Spot networking bottlenecks and resolve back pressure
Group data for high-performance systems
Handle delayed events in real-time systems Grokking Streaming Systems is a simple guide to the complex concepts behind streaming systems. This friendly and framework-agnostic tutorial teaches you how to handle real-time events, and even design and build your own streaming job that's a perfect fit for your needs. Each new idea is carefully explained with diagrams, clear examples, and fun dialogue between perplexed personalities! About the technology
Streaming systems minimize the time between receiving and processing event data, so they can deliver responses in real time. For applications in finance, security, and IoT where milliseconds matter, streaming systems are a requirement. And streaming is hot! Skills on platforms like Spark, Heron, and Kafka are in high demand. About the book
Grokking Streaming Systems introduces real-time event streaming applications in clear, reader-friendly language. This engaging book illuminates core concepts like data parallelization, event windows, and backpressure without getting bogged down in framework-specific details. As you go, you'll build your own simple streaming tool from the ground up to make sure all the ideas and techniques stick. The helpful and entertaining illustrations make streaming systems come alive as you tackle relevant examples like real-time credit card fraud detection and monitoring IoT services. What's inside Implement and troubleshoot streaming systems
Design streaming systems for complex functionalities
Spot networking bottlenecks and resolve backpressure
Group data for high-performance systemsAbout the reader
No prior experience with streaming systems is assumed. Examples in Java. About the author
Josh Fischer and Ning Wang are Apache Committers, and part of the committee for the Apache Heron distributed stream processing engine.Table of Contents
PART 1 GETTING STARTED WITH STREAMING
1 Welcome to Grokking Streaming Systems
2 Hello, streaming systems!
3 Parallelization and data grouping
4 Stream graph
5 Delivery semantics
6 Streaming systems review and a glimpse ahead
PART 2 STEPPING UP
7 Windowed computations
8 Join operations
9 Backpressure
10 Stateful computation
11 Wrap-up: Advanced concepts in streaming systems

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Grokking Streaming Systems an online PDF/ePUB?
Yes, you can access Grokking Streaming Systems by Josh Fischer , Ning Wang in PDF and/or ePUB format, as well as other popular books in Computer Science & Web Programming. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Manning
Year
2022
ISBN
9781638356493

Part 1. Getting started with streaming

Part 1 of this book drops you head-first into the world of streaming systems. It can help you answer questions, such as “Why do streaming systems work this way?” and “Why would I ever use them?” Chapter 1 describes the high-level differences in what sets streaming systems apart from others. Chapter 2 is the hello world of streaming, where we walk you through the fundamentals of how these streaming systems work. Chapter 3 describes how to scale out these systems, and chapter 4 shows you how data can traverse streaming jobs. Chapter 5 spells out how these systems can help you reliably deliver data in real time, and chapter 6 recaps the important points from each chapter. By the end of part 1, you will have the knowledge necessary to jump into any streaming framework of your choice and hit the ground running.

1 Welcome to Grokking Streaming Systems

In this chapter
  • an introduction to stream processing
  • differentiating between stream processing systems and other systems
“If it weren’t for the rocks in its bed, the stream would have no song.”
—Carl Perkins
In this chapter, we will try to answer a few basic questions about streaming systems, starting with “what is stream processing?” and “what are these stream processing systems, or streaming systems, used for?” The objective is to cover some basic ideas that will be discussed in later chapters.

What is stream processing?

Stream processing has been one of the most popular technologies in the recent years in the big data domain. Streaming systems are the computer systems that process continuous event streams.
A key characteristic of stream processing is that the events are processed as soon as (or almost as soon as) they are available. This is to minimize the latency between the original event’s entrance into the streaming system and the end result from processing the event. In most cases, the latency varies from a few milliseconds to seconds, which can be considered real-time or near real-time; hence, stream processing is also called real-time processing. From the usage point of view, stream processing is typically used for analyzing different types of events. As a result, the terms real-time analytics, streaming analytics, and event processing might also be used to reference stream processing systems in different scenarios. In this book, stream processing is the chosen term, which is well-adopted by the industry.
Examples of events:
Here are a few examples of events:
  • The mouse clicks on a computer
  • The taps and swipes on a cell phone
  • The trains arriving at and leaving a station
  • The messages and emails sent out by a person
  • The temperatures collected by sensors in a laboratory
  • The interactions on a website (page views, user logins, clicks, and so on) from all users
  • The logs generated by computer servers in a data center
  • The transactions of all accounts in a bank
Note that, typically, there isn’t a predetermined ending time for the events processed in streaming systems. You can think of them as never-ending; hence, the events are often considered continuous and unbounded. Events are everywhere—literally. We are living in the information age. A lot of data is generated, collected, and processed all the time.
Think about it
Stream processing systems are the computer systems designed to process continuous event streams.

Streaming system examples

Let’s look at two examples:
  • The first example is a temperature-monitoring system in a laboratory. Many sensors are installed in different locations to collect temperature data every second. The streaming system is built to process the collected data and display the real-time information in a dashboard. It can also trigger alerts when any anomaly is detected. Laboratory administrators use the system to monitor all the rooms and make sure the temperature is in the right range.
  • The second example is the monitoring and analyzing systems that process user interactions, such as page views, user logins, or button clicks on a website. When you visit a website, it is common that a lot of events are logged. These raw events often have many fields, so it is not efficient to digest directly. Also, some of the fields are not human-readable and need to be translated before consuming. Streaming systems are very helpful for converting the raw events data into more useful information, such as number of requests, active users, views on each page, and suspicious user behaviors, in this context.
In the examples above, a huge number of events can be processed by streaming systems to dig out useful information hidden in the data in real time. Streaming systems are very useful because there is a lot of useful information hidden in these events, and real time is critical in many cases.

Streaming systems and real time

A streaming system refers to a system that extracts useful information from continuous streams of events. More specifically, as we mentioned at the beginning of this section, we would like streaming systems to process the events and generate results as soon as possible after the events are collected. This is desirable because it allows the results to be available with minimal delays and the proper reactions to be performed in time. Their real-time nature makes streaming systems very useful in many scenarios, such as the laboratory and the website, where low-latency results are desired.
In the laboratory, the monitoring system can trigger alerts, start backup devices automatically, and notify the administrators, when necessary. If failed equipment is not repaired or replaced in time and the temperature is not under control, the temperature-sensitive devices and samples could be affected or damaged. Some ongoing experiments may be interrupted as well. For a website, in addition to monitoring issues, charts and dashboards generated by streaming systems could be helpful for developers to understand how users engage with the website so they can improve their products accordingly.

How a streaming system works

After seeing some examples of events and streaming systems, you should now have some ideas about what streaming systems are. The next few pages will show you how streaming systems work from a very high level by comparing them with other types of systems.
Comparison of four typical computer systems
You’ll find that stream processing systems and other computer systems have many things in common. After all, a streaming system is still a computer system. Below are a few typical systems we chose to compare:
  • Applications
  • Backend service...

Table of contents