Scalable Data Streaming with Amazon Kinesis
eBook - ePub

Scalable Data Streaming with Amazon Kinesis

Design and secure highly available, cost-effective data streaming applications with Amazon Kinesis

  1. 314 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Scalable Data Streaming with Amazon Kinesis

Design and secure highly available, cost-effective data streaming applications with Amazon Kinesis

About this book

Explore Kinesis managed services such as Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and Kinesis Video Streams with the help of practical use cases

Key Features

  • Get well versed with the capabilities of Amazon Kinesis
  • Explore the monitoring, scaling, security, and deployment patterns of various Amazon Kinesis services
  • Learn how other Amazon Web Services and third-party applications such as Splunk can be used as destinations for Kinesis data

Book Description

Amazon Kinesis is a collection of secure, serverless, durable, and highly available purpose-built data streaming services. This data streaming service provides APIs and client SDKs that enable you to produce and consume data at scale.

Scalable Data Streaming with Amazon Kinesis begins with a quick overview of the core concepts of data streams, along with the essentials of the AWS Kinesis landscape. You'll then explore the requirements of the use case shown through the book to help you get started and cover the key pain points encountered in the data stream life cycle. As you advance, you'll get to grips with the architectural components of Kinesis, understand how they are configured to build data pipelines, and delve into the applications that connect to them for consumption and processing. You'll also build a Kinesis data pipeline from scratch and learn how to implement and apply practical solutions. Moving on, you'll learn how to configure Kinesis on a cloud platform. Finally, you'll learn how other AWS services can be integrated into Kinesis. These services include Redshift, Dynamo Database, AWS S3, Elastic Search, and third-party applications such as Splunk.

By the end of this AWS book, you'll be able to build and deploy your own Kinesis data pipelines with Kinesis Data Streams (KDS), Kinesis Data Firehose (KFH), Kinesis Video Streams (KVS), and Kinesis Data Analytics (KDA).

What you will learn

  • Get to grips with data streams, decoupled design, and real-time stream processing
  • Understand the properties of KFH that differentiate it from other Kinesis services
  • Monitor and scale KDS using CloudWatch metrics
  • Secure KDA with identity and access management (IAM)
  • Deploy KVS as infrastructure as code (IaC)
  • Integrate services such as Redshift, Dynamo Database, and Splunk into Kinesis

Who this book is for

This book is for solutions architects, developers, system administrators, data engineers, and data scientists looking to evaluate and choose the most performant, secure, scalable, and cost-effective data streaming technology to overcome their data ingestion and processing challenges on AWS. Prior knowledge of cloud architectures on AWS, data streaming technologies, and architectures is expected.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Year
2021
Print ISBN
9781800565401
Edition
1
eBook ISBN
9781800564336

Section 1: Introduction to Data Streaming and Amazon Kinesis

In this section, you will be introduced to the concept of data streams and how they are used to create scalable data solutions. 
This section comprises the following chapters:
  • Chapter 1, What Are Data Streams?
  • Chapter 2, Messaging and Data Streaming in AWS
  • Chapter 3, The SmartCity Bike-Sharing Service

Chapter 1: What Are Data Streams?

A data stream is a system where data continuously flows from multiple sources, just like water flows through a stream. The data is often produced and collected simultaneously in a continuous flow of many small files or records. Data streams are utilized by a wide range of business, medical, government, social media, and mobile applications. These applications include financial applications for the stock market and e-commerce ordering systems that collect orders and cover fulfillment of delivery.
In the entertainment space, live data is produced by sensing devices embedded in player equipment, video game players generate large amounts of data at a massive scale, and there are new social media posts thousands of times per second. Governments also leverage streaming data and geospatial services to monitor land, wildlife, and other activities.
Data volume and velocity are increasing at faster rates, creating new challenges in data processing and analytics. This book will detail these challenges and demonstrate how Amazon Kinesis can be used to address them. We will begin by discussing key concepts related to messaging in a technology-agnostic form to provide a solid foundation for building your Kinesis knowledge.
Incorporating data streams into your application architecture will allow you to deliver high-performance solutions that are secure, scalable, and fast. In this chapter, we will cover core streaming concepts so that you will have a detailed understanding of their application to distributed systems. You will learn what a data stream is, how to leverage data streams to scale, and examine a number of high-level use cases.
This chapter covers the following topics:
  • Introducing data streams
  • Challenges associated with distributed systems
  • Overview of messaging concepts
  • Examples of data streaming

Introducing data streams

Data streams are a way of storing a sequence of messages. They enable us to design systems where we think about state as a series of events instead of only entities and values, or rows and columns in a database. This shift in mindset and technology enables real-time analytics to extract the value from data by acting on it before it is stale. They also enable organizations to design and develop resilient software based on microservice architectures by helping them to decouple systems. We will begin with an overview of streaming data sources, why real-time data analysis is valuable, and how they can be used architecturally to decouple systems. We will then review the core challenges associated with distributed systems, and conclude with an overview of key messaging concepts and some high-level examples. Messages can contain a wide variety of information and come from different sources, so let's look at the primary sources and data formats.

Sources of data

The proliferation of data steadily increases from sources such as social media, IoT devices, web clickstreams, application logs, and video cameras. This data poses challenges to most systems, since it is typically high-velocity, intermittent, and bursty, making it difficult to adequately provision and design downstream systems. Payloads are generally small, except when containing audio or video data, and come in a variety of formats.
In this book, we will be focusing on three data formats. These formats include the following:
  • JavaScript Object Notation (JSON)
  • Log files
  • Time-encoded binary files such as video

JSON streams

JSON has become the dominant format for message serialization over the past 10 years. It is a lightweight data interchange format that is easy for humans to read and write and is based on the JavaScript object syntax. It has two data structures – hash tables and lists. A hash table consists of key-value pairs, {"key":"value"}, where the keys must be unique. A list is a set of values in a specific order, ["value 1", "value 2"]. The following code sample shows a sample IoT JSON message:
{
"deviceid" : "device001",
"eventTime": -192778200,
"temp" : 68.4,
"humidity" : 77.3,
"coords" : {
"latitude" : 32.779039,
"longitude" : -96.808660
}
}

Log file streams

Log files come in a variety of formats. Common ones include Apache Commons Logging, Apache Combined Log, Apache Error Log, and RFC3164 Syslog. They are plain text, and usually each line, delineated by a newline ('\n') character, is a separate log entry. In the following sample log, we see an HTTP GET request where the IP address is 10.13.37.01, the datetime of the request, the HTTP verb, the URL fragment, the HTTP version, the response code, and the size of the result.
The sample log line in Apache Commons Logging format is as follows:
10.13.37.01 - - [03/Sep/2017:12:00:01 +0830] "GET /mailman/listinfo/test HTTP/1.1" 200 2457

Time-encoded binary streams

Time-encoded binary streams consist of a time series of records where each record is related to the adjacent records (prior and subsequent records). These can be used for a wide variety of sensor data, from audio streams and RADAR signals to video streams. Throughout this book, the primary focus will be video streams and their applications.
Figure 1.1 – Time-encoded video data
Figure 1.1 – Time-encoded video data
As shown in Figure 1.1, video streams are composed of fragments, where each fragment is a self-contained sequence of media frames. There are no dependencies between fragments. We will discuss video streams in more detail in Chapter 7, Kinesis Video Streams. Now that we've covered the types of data that we'll be processing, let's take a step back to understand the value of real-time data in analytics.

The value of real-time data in analytics

Analysis is done to support decision making by individuals, organizations, or computer programs. Traditionally, data analysis has been done on batches of data, usually in long-running jobs that occur overnight and that happen periodically at predetermined times: nightly, weekly, quarterly, and so on. This not only limits the scope of actions available to decisions makers, but it is also only providing them with a representation of the past environment. Information is now available seconds after it is produced, so we need to design systems that provide decision makers with the freshest data available to make timely decisions.
The OODAObserve, Orient, Decide, Act – loop is a decision-making, conceptual framework that describes how decisions are made when reacting to an event. By breaking it down into these four components, we can optimize each to reduce the overall cycle time. The key idea is that if we make better decisions quicker than our opponent, we can outmaneuver them and win. By moving from batch to real-time analytics, we are reducing the observed portion of this cycle.
John Boyd
John Boyd was a USAF colonel and military strategist. He developed the OODA loop to better understand pilot combat operations. It has since been expanded and is used at a more strategic level by the military, sports teams, and businesses.
By reducing the OODA loop cycle time, new actions become available. They can be taken while events are unfolding and not merely responding to them after the event has occurred. These time-critica...

Table of contents

  1. Scalable Data Streaming with Amazon Kinesis
  2. Contributors
  3. Preface
  4. Section 1: Introduction to Data Streaming and Amazon Kinesis
  5. Chapter 1: What Are Data Streams?
  6. Chapter 2: Messaging and Data Streaming in AWS
  7. Chapter 3: The SmartCity Bike-Sharing Service
  8. Section 2: Deep Dive into Kinesis
  9. Chapter 4: Kinesis Data Streams
  10. Chapter 5: Kinesis Firehose
  11. Chapter 6: Kinesis Data Analytics
  12. Chapter 7: Amazon Kinesis Video Streams
  13. Section 3: Integrations
  14. Chapter 8: Kinesis Integrations
  15. Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Scalable Data Streaming with Amazon Kinesis by Tarik Makota,Brian Maguire,Danny Gagne,Rajeev Chakrabarti in PDF and/or ePUB format, as well as other popular books in Computer Science & Business Intelligence. We have over 1.5 million books available in our catalogue for you to explore.