Kafka in Action
eBook - ePub

Kafka in Action

Dylan Scott, Viktor Gamov, Dave Klein

Share book
  1. 272 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Kafka in Action

Dylan Scott, Viktor Gamov, Dave Klein

Book details
Book preview
Table of contents
Citations

About This Book

Master the wicked-fast Apache Kafka streaming platform through hands-on examples and real-world projects. In Kafka in Action you will learn: Understanding Apache Kafka concepts
Setting up and executing basic ETL tasks using Kafka Connect
Using Kafka as part of a large data project team
Performing administrative tasks
Producing and consuming event streams
Working with Kafka from Java applications
Implementing Kafka as a message queue Kafka in Action is a fast-paced introduction to every aspect of working with Apache Kafka. Starting with an overview of Kafka's core concepts, you'll immediately learn how to set up and execute basic data movement tasks and how to produce and consume streams of events. Advancing quickly, you'll soon be ready to use Kafka in your day-to-day workflow, and start digging into even more advanced Kafka topics. About the technology
Think of Apache Kafka as a high performance software bus that facilitates event streaming, logging, analytics, and other data pipeline tasks. With Kafka, you can easily build features like operational data monitoring and large-scale event processing into both large and small-scale applications. About the book
Kafka in Action introduces the core features of Kafka, along with relevant examples of how to use it in real applications. In it, you'll explore the most common use cases such as logging and managing streaming data. When you're done, you'll be ready to handle both basic developer- and admin-based tasks in a Kafka-focused team. What's inside Kafka as an event streaming platform
Kafka producers and consumers from Java applications
Kafka as part of a large data projectAbout the reader
For intermediate Java developers or data engineers. No prior knowledge of Kafka required. About the author
Dylan Scott is a software developer in the insurance industry. Viktor Gamov is a Kafka-focused developer advocate. At Confluent, Dave Klein helps developers, teams, and enterprises harness the power of event streaming with Apache Kafka.Table of Contents
PART 1 GETTING STARTED
1 Introduction to Kafka
2 Getting to know Kafka
PART 2 APPLYING KAFK
3 Designing a Kafka project
4 Producers: Sourcing data
5 Consumers: Unlocking data
6 Brokers
7 Topics and partitions
8 Kafka storage
9 Management: Tools and logging
PART 3 GOING FURTHER
10 Protecting Kafka
11 Schema registry
12 Stream processing with Kafka Streams and ksqlDB

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Kafka in Action an online PDF/ePUB?
Yes, you can access Kafka in Action by Dylan Scott, Viktor Gamov, Dave Klein in PDF and/or ePUB format, as well as other popular books in Computer Science & Web Programming. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Manning
Year
2022
ISBN
9781638356196

Part 1. Getting started

In part 1 of this book, we’ll look at introducing you to Apache Kafka and start to look at real use cases where Kafka might be a good fit to try out:
  • In chapter 1, we give a detailed description of why you would want to use Kafka, and we dispel some myths you might have heard about Kafka in relation to Hadoop.
  • In chapter 2, we focus on learning about the high-level architecture of Kafka as well as the various other parts that make up the Kafka ecosystem: Kafka Streams, Connect, and ksqlDB.
When you’re finished with this part, you’ll be ready to get started reading and writing messages to and from Kafka. Hopefully, you’ll have picked up some key terminology as well.

1 Introduction to Kafka

This chapter covers
  • Why you might want to use Kafka
  • Common myths of big data and message systems
  • Real-world use cases to help power messaging, streaming, and IoT data processing
As many developers are facing a world full of data produced from every angle, they are often presented with the fact that legacy systems might not be the best option moving forward. One of the foundational pieces of new data infrastructures that has taken over the IT landscape is Apache Kafka®.1 Kafka is changing the standards for data platforms. It is leading the way to move from extract, transform, load (ETL) and batch workflows (in which work was often held and processed in bulk at one predefined time) to near-real-time data feeds [1]. Batch processing, which was once the standard workhorse of enterprise data processing, might not be something to turn back to after seeing the powerful feature set that Kafka provides. In fact, you might not be able to handle the growing snowball of data rolling toward enterprises of all sizes unless something new is approached.
With so much data, systems can get easily overloaded. Legacy systems might be faced with nightly processing windows that run into the next day. To keep up with this ever constant stream of data or evolving data, processing this information as it happens is a way to stay up to date and current on the system’s state.
Kafka touches many of the newest and the most practical trends in today’s IT fields and makes its easier for daily work. For example, Kafka has already made its way into microservice designs and the Internet of Things (IoT). As a de facto technology for more and more companies, Kafka is not only for super geeks or alpha-chasers. Let’s start by looking at Kafka’s features, introducing Kafka itself, and understanding more about the face of modern-day streaming platforms.

1.1 What is Kafka?

The Apache Kafka site (http://kafka.apache.org/intro) defines Kafka as a distributed streaming platform. It has three main capabilities:
  • Reading and writing records like a message queue
  • Storing records with fault tolerance
  • Processing streams as they occur [2]
Readers who are not as familiar with queues or message brokers in their daily work might need help when discussing the general purpose and flow of such a system. As a generalization, a core piece of Kafka can be thought of as providing the IT equivalent of a receiver that sits in a home entertainment system. Figure 1.1 shows the data flow between receivers and end users.
Figure 1.1 Producers, receivers, and data flow overview
As figure 1.1 shows, digital satellite, cable, and Blu-ray™ players can connect to a central receiver. You can think of those individual pieces as regularly sending data in a format that they know about. That flow of data can be thought of as nearly constant while a movie or CD is playing. The receiver deals with this constant stream of data and converts it into a usable format for the external devices attached to the other end (the receiver sends the video to your television and the audio to a decoder as well as to the speakers). So what does this have to do with Kafka exactly? Let’s look at the same relationship from Kafka’s perspective in figure 1.2.
Figure 1.2 Kafka’s flow from producers to consumers
Kafka includes clients to interface with other systems. One such client type is called a producer, which sends multiple data streams to the Kafka brokers. The brokers serve a similar function as the receiver in figure 1.1. Kafka also includes consumers, clients that can read data from the brokers and process it. Data does not have to be limited to only a single destination. The producers and consumers are completely decoupled, allowing each client to work independently. We’ll dig into the details of how this is done in later chapters.
As do other messaging platforms, Kafka acts (in reductionist terms) like a middleman for data coming into the system (from producers) and out of the system (for consumers or end users). The loose coupling can be achieved by allowing this separation between the producer and the end user of the message. The producer can send whatever message it wants and still have no clue about if anyone is subscribed. Further, Kafka has various ways that it can deliver messages to fit your business case. Kafka’s message delivery can take at least the following three delivery methods [3]:
  • At-least-once semantics—A message is sent as needed until it is acknowledged.
  • At-most-once semantics—A message is only sent once and not resent on failure.
  • Exactly-once semantics—A message is only seen once by the consumer of the message.
Let’s dig into what those messaging options mean. Let’s look at at-least-once semantics (figure 1.3). In this case, Kafka can be configured to allow a producer of messages to send the same message more than once and have it written to the brokers. If a message does not receive a guarantee that it was written to the broker, the producer can resend the message [3]. For those cases where you can’t miss a message, say that someone has paid an invoice, this guarantee might take some filtering on the consumer end, but it is one of the safest delivery methods.
Figure 1.3 At-least-once message flow
At-most-once semantics (figure 1.4) is when a producer of messages might send a message once and never retry. In the event of a failure, the producer moves on and doesn’t attempt to send it again [3]. Why would someone be okay with losing a message? If a popular website is tracking page views for visitors, it might be okay with missing a few page view events out of the millions it processes each day. Keeping the system performing and not waiting on acknowledgments might outweigh any cost of lost data.
Figure 1.4 At-most-once message flow
Kafka added the exactly-once semantics, also known as EOS, to its feature set in version 0.11.0. EOS generated a lot of mixed discussion with its release [3]. On the one hand, exactly-once semantics (figure 1.5) are ideal for a lot of use cases. This seemed like a logical guarantee for removing duplicate messages, making them a thing of the past. But most developers appreciate sending one message and receiving that same message on the consuming side as well.
Figure 1.5 Exactly-once message flow
Another discussion that followed the release of EOS was a debate on if exactly once was even possible. Although this goes into deeper computer science theory, it is helpful to be aware of how Kafka defines their EOS feature [4]. If a producer sends a message more than once, it will still be delivered only once to the end consumer. EOS has touchpoints at all Kafka layers—producers, topics, brokers, and consumers—and will be briefly tackled later in this book as we move along in our discussion for now.
Besides various delivery options, another common message broker benefit is that if the consuming application is down due to errors or maintenance, the producer does not need to wait on the consumer to handle the message. When consumers start to come back online and process data, they should be able to pick up where they left off and not drop any messages.

1.2 Kafka usage

With many traditional companies facing the challenges of becoming more and more technical and software driven, one question is foremost: how will they be prepared for the future? One possible answer is Kafka. Kafka is noted for being a high-p...

Table of contents