eBook - ePub

Hands-On Infrastructure Monitoring with Prometheus

Name: Hands-On Infrastructure Monitoring with Prometheus
ISBN: 9781789808032

Implement and scale queries, dashboards, and alerting across machines and containers

Joel Bastos,

Pedro Araújo,

430 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Hands-On Infrastructure Monitoring with Prometheus

Implement and scale queries, dashboards, and alerting across machines and containers

Joel Bastos,

Pedro Araújo,

About this book

Build Prometheus ecosystems with metric-centric visualization, alerting, and querying

Key Features

Integrate Prometheus with Alertmanager and Grafana for building a complete monitoring system
Explore PromQL, Prometheus' functional query language, with easy-to-follow examples
Learn how to deploy Prometheus components using Kubernetes and traditional instances

Book Description

Prometheus is an open source monitoring system. It provides a modern time series database, a robust query language, several metric visualization possibilities, and a reliable alerting solution for traditional and cloud-native infrastructure. This book covers the fundamental concepts of monitoring and explores Prometheus architecture, its data model, and how metric aggregation works. Multiple test environments are included to help explore different configuration scenarios, such as the use of various exporters and integrations. You'll delve into PromQL, supported by several examples, and then apply that knowledge to alerting and recording rules, as well as how to test them. After that, alert routing with Alertmanager and creating visualizations with Grafana is thoroughly covered. In addition, this book covers several service discovery mechanisms and even provides an example of how to create your own. Finally, you'll learn about Prometheus federation, cross-sharding aggregation, and also long-term storage with the help of Thanos. By the end of this book, you'll be able to implement and scale Prometheus as a full monitoring system on-premises, in cloud environments, in standalone instances, or using container orchestration with Kubernetes.

What you will learn

Grasp monitoring fundamentals and implement them using Prometheus
Discover how to extract metrics from common infrastructure services
Find out how to take full advantage of PromQL
Design a highly available, resilient, and scalable Prometheus stack
Explore the power of Kubernetes Prometheus Operator
Understand concepts such as federation and cross-shard aggregation
Unlock seamless global views and long-term retention in cloud-native apps with Thanos

Who this book is for

If you're a software developer, cloud administrator, site reliability engineer, DevOps enthusiast or system admin looking to set up a fail-safe monitoring and alerting system for sustaining infrastructure security and performance, this book is for you. Basic networking and infrastructure monitoring knowledge will help you understand the concepts covered in this book.

]]>

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Packt Publishing

Year

2019

Print ISBN

9781789612349

eBook ISBN

9781789808032

Topic

Informatica

Subtopic

Sviluppo software

Section 1: Introduction

On completion of this section, the reader will have the basic knowledge they need to proceed with a deep dive into the Prometheus stack.

The following chapters are included in this section:

Chapter 1, Monitoring Fundamentals
Chapter 2, An Overview of the Prometheus Ecosystem
Chapter 3, Setting Up a Test Environment

Monitoring Fundamentals

This chapter lays the foundation for several key concepts that will be used throughout this book. Starting with the definition of monitoring, we will explore various views and factors that emphasize why systematic analysis assumes different levels of importance and makes an impact on organizations. You will learn about the advantages and disadvantages of different monitoring mechanics, taking a closer look at the Prometheus approach regarding collecting metrics. Finally, we will discuss some of the controversial decisions that were vital for the design and architecture of the Prometheus stack and why you should take them into account when designing your own monitoring system.

We will be covering the following topics in this chapter:

Defining of monitoring
Whitebox versus blackbox monitoring
Understanding metrics collection

Definition of monitoring

A consensual definition of monitoring is hard to come by because it quickly shifts between industry- or even job-specific contexts. The diversity of viewpoints, the components comprising the monitoring system, and even how the data is collected or used are all factors that contribute to the struggle of reaching a clear definition.

Without a common ground, it is difficult to sustain a discussion and, usually, expectations are mismatched. Therefore, in the following topics, we will outline a baseline, orientated to obtain a definition of monitoring that will guide us throughout this book.

The value of monitoring

With the growing complexity of infrastructures, exponentially driven by the adoption of microservices-oriented architectures, it has become critical to attain a global view of all the different components of an infrastructure. It is unthinkable to manually validate the health of each instance, caching service, database, or load balancer. There are way too many moving pieces to count—let alone keep a close eye on.

Nowadays, it is expected that monitoring will keep track of data from those components. However, data might come in several forms, allowing it to be used for different purposes.

Alerting is one of the standard uses of monitoring data, but the application of such data can go far beyond it. You may require historical information to assist you in capacity planning or incident investigations, or you may need a higher resolution to drill down into a problem and even higher freshness to decrease the mean time to recovery during an outage.

You can look at monitoring as a source of information for maintaining healthy systems, production- and business-wise.

Organizational contexts

Looking into an organizational context, roles such as system administrators, quality assurance engineers, Site Reliability Engineers (SREs), or product owners have different expectations from monitoring. Understanding the requirements of what each role surfaces makes it easier to comprehend why context is so useful when discussing monitoring. Let's expand the following statements while providing some examples:

System administrators are interested in high-resolution, low-latency, and high-diversity data. For a system administrator, the main objective of monitoring is to obtain visibility across the infrastructure and manage data from CPU usage to Hypertext Transfer Protocol (HTTP) request rate so that problems are quickly discovered and the root causes are identified as soon as possible. In this approach, exposing monitoring data in high resolution is critical to be able to drill down into the affected system. If a problem is occurring, you don't have the privilege to wait several hours for your next data point, and so data has to be provided in near real time or, in other words, with low latency. Lastly, since there is no easy way to identify or predict which systems are prone to be affected, we need to collect as much data as possible from all systems; namely, a high diversity of data.
Quality assurance engineers are interested in high-resolution, high-latency, and high-diversity data. Besides being important for quality assurance engineers to have high resolution monitoring data collected, which enables a deeper drill down into effects, the latency is not as critical as it is for system administrators. In this case, historical data is much more critical for comparing software releases than the freshness of the data. Since we can't wholly predict the ramifications of a new release, the available data needs to be spread across as much of the infrastructure as possible, touching every system the software release might use and invoke it or generally interact with it (directly or indirectly), so that we have as much data as possible.
SREs focused on capacity planning are interested in low-resolution, high-latency, and high-diversity data. In this scenario, historical data carries much more importance for SREs than the resolution that this data is presented in. For example, to predict the increase in infrastructure, it is not critical for a SRE to know that some months ago at 4 A.M., one of the nodes had a spike of CPU usage reaching 100% in 10 seconds, but is useful to understand the trend of the load across the fleet of nodes to infer the number of nodes required to handle new scale requirements. As such, it is also important for SREs to have a broad visualization of all the different parts of the infrastructure that are affected by those requirements to predict, for example, the amount of storage for logs, network bandwidth increase, and so on, making the high diversity of monitoring data mandatory.
Product owners are interested in low-resolution, high-latency, and low-diversity data. Where product owners are concerned, monitoring data usually steps away from infrastructure to the realm of business. Product owners strive to understand the trends of specific software products, where historical data is fundamental and resolution is not so critical. Keeping in mind the objective of evaluating the impact of software releases on the customers, latency is not as essential for them as it is for system administrators. The product owner manages a specific set of products, so a low diversity of monitoring data is expected, comprised mostly of business metrics.

The following table sums up the previous examples in a much more condensed form:

	Data resolution	Data latency	Data diversity
Infrastructure alerting	High	Low	High
Software release view	High	High	High
Capacity planning	Low	High	High
Product/business view	Low	High	Low

Monitoring components

The same way the monitoring definition changes across contexts, its components follow the same predicament. Depending on how broad you want to be, we can find some or all of these components in the following topics:

Metrics: This exposes a certain system resource, application action, or business characteristic as a specific point in time value. This information is obtained in an aggregated form; for example, you can find out how many requests per second were served but not the exact time for a specific request, and without context, you won't know the ID of the requests.
Logging: Containing much more data than a metric, this manifests itself as an event from a system or application, containing all the information that's produced by such an event. This informat...

Title Page
Copyright and Credits
About Packt
Contributors
Preface
Section 1: Introduction
Monitoring Fundamentals
An Overview of the Prometheus Ecosystem
Setting Up a Test Environment
Section 2: Getting Started with Prometheus
Prometheus Metrics Fundamentals
Running a Prometheus Server
Exporters and Integrations
Prometheus Query Language - PromQL
Troubleshooting and Validation
Section 3: Dashboards and Alerts
Defining Alerting and Recording Rules
Discovering and Creating Grafana Dashboards
Understanding and Extending Alertmanager
Section 4: Scalability, Resilience, and Maintainability
Choosing the Right Service Discovery
Scaling and Federating Prometheus
Integrating Long-Term Storage with Prometheus
Assessments
Other Books You May Enjoy

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is Hands-On Infrastructure Monitoring with Prometheus an online PDF/ePUB?

Yes, you can access Hands-On Infrastructure Monitoring with Prometheus by Joel Bastos,Pedro Araújo in PDF and/or ePUB format, as well as other popular books in Informatica & Sviluppo software. We have over 1.5 million books available in our catalogue for you to explore.

Hands-On Infrastructure Monitoring with Prometheus

Implement and scale queries, dashboards, and alerting across machines and containers

Hands-On Infrastructure Monitoring with Prometheus

Implement and scale queries, dashboards, and alerting across machines and containers

About this book

Key Features

Book Description

What you will learn

Who this book is for

Trusted by 375,005 students

Information

Section 1: Introduction

Monitoring Fundamentals

Definition of monitoring

The value of monitoring

Organizational contexts

Monitoring components

Table of contents

Frequently asked questions