Hands-On Infrastructure Monitoring with Prometheus
eBook - ePub

Hands-On Infrastructure Monitoring with Prometheus

Implement and scale queries, dashboards, and alerting across machines and containers

  1. 430 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Hands-On Infrastructure Monitoring with Prometheus

Implement and scale queries, dashboards, and alerting across machines and containers

About this book

Build Prometheus ecosystems with metric-centric visualization, alerting, and querying

Key Features

  • Integrate Prometheus with Alertmanager and Grafana for building a complete monitoring system
  • Explore PromQL, Prometheus' functional query language, with easy-to-follow examples
  • Learn how to deploy Prometheus components using Kubernetes and traditional instances

Book Description

Prometheus is an open source monitoring system. It provides a modern time series database, a robust query language, several metric visualization possibilities, and a reliable alerting solution for traditional and cloud-native infrastructure. This book covers the fundamental concepts of monitoring and explores Prometheus architecture, its data model, and how metric aggregation works. Multiple test environments are included to help explore different configuration scenarios, such as the use of various exporters and integrations. You'll delve into PromQL, supported by several examples, and then apply that knowledge to alerting and recording rules, as well as how to test them. After that, alert routing with Alertmanager and creating visualizations with Grafana is thoroughly covered. In addition, this book covers several service discovery mechanisms and even provides an example of how to create your own. Finally, you'll learn about Prometheus federation, cross-sharding aggregation, and also long-term storage with the help of Thanos. By the end of this book, you'll be able to implement and scale Prometheus as a full monitoring system on-premises, in cloud environments, in standalone instances, or using container orchestration with Kubernetes.

What you will learn

  • Grasp monitoring fundamentals and implement them using Prometheus
  • Discover how to extract metrics from common infrastructure services
  • Find out how to take full advantage of PromQL
  • Design a highly available, resilient, and scalable Prometheus stack
  • Explore the power of Kubernetes Prometheus Operator
  • Understand concepts such as federation and cross-shard aggregation
  • Unlock seamless global views and long-term retention in cloud-native apps with Thanos

Who this book is for

If you're a software developer, cloud administrator, site reliability engineer, DevOps enthusiast or system admin looking to set up a fail-safe monitoring and alerting system for sustaining infrastructure security and performance, this book is for you. Basic networking and infrastructure monitoring knowledge will help you understand the concepts covered in this book.

]]>

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Year
2019
Print ISBN
9781789612349
eBook ISBN
9781789808032

Section 1: Introduction

On completion of this section, the reader will have the basic knowledge they need to proceed with a deep dive into the Prometheus stack.
The following chapters are included in this section:
  • Chapter 1, Monitoring Fundamentals
  • Chapter 2, An Overview of the Prometheus Ecosystem
  • Chapter 3, Setting Up a Test Environment

Monitoring Fundamentals

This chapter lays the foundation for several key concepts that will be used throughout this book. Starting with the definition of monitoring, we will explore various views and factors that emphasize why systematic analysis assumes different levels of importance and makes an impact on organizations. You will learn about the advantages and disadvantages of different monitoring mechanics, taking a closer look at the Prometheus approach regarding collecting metrics. Finally, we will discuss some of the controversial decisions that were vital for the design and architecture of the Prometheus stack and why you should take them into account when designing your own monitoring system.
We will be covering the following topics in this chapter:
  • Defining of monitoring
  • Whitebox versus blackbox monitoring
  • Understanding metrics collection

Definition of monitoring

A consensual definition of monitoring is hard to come by because it quickly shifts between industry- or even job-specific contexts. The diversity of viewpoints, the components comprising the monitoring system, and even how the data is collected or used are all factors that contribute to the struggle of reaching a clear definition.
Without a common ground, it is difficult to sustain a discussion and, usually, expectations are mismatched. Therefore, in the following topics, we will outline a baseline, orientated to obtain a definition of monitoring that will guide us throughout this book.

The value of monitoring

With the growing complexity of infrastructures, exponentially driven by the adoption of microservices-oriented architectures, it has become critical to attain a global view of all the different components of an infrastructure. It is unthinkable to manually validate the health of each instance, caching service, database, or load balancer. There are way too many moving pieces to count—let alone keep a close eye on.
Nowadays, it is expected that monitoring will keep track of data from those components. However, data might come in several forms, allowing it to be used for different purposes.
Alerting is one of the standard uses of monitoring data, but the application of such data can go far beyond it. You may require historical information to assist you in capacity planning or incident investigations, or you may need a higher resolution to drill down into a problem and even higher freshness to decrease the mean time to recovery during an outage.
You can look at monitoring as a source of information for maintaining healthy systems, production- and business-wise.

Organizational contexts

Looking into an organizational context, roles such as system administrators, quality assurance engineers, Site Reliability Engineers (SREs), or product owners have different expectations from monitoring. Understanding the requirements of what each role surfaces makes it easier to comprehend why context is so useful when discussing monitoring. Let's expand the following statements while providing some examples:
  • System administrators are interested in high-resolution, low-latency, and high-diversity data. For a system administrator, the main objective of monitoring is to obtain visibility across the infrastructure and manage data from CPU usage to Hypertext Transfer Protocol (HTTP) request rate so that problems are quickly discovered and the root causes are identified as soon as possible. In this approach, exposing monitoring data in high resolution is critical to be able to drill down into the affected system. If a problem is occurring, you don't have the privilege to wait several hours for your next data point, and so data has to be provided in near real time or, in other words, with low latency. Lastly, since there is no easy way to identify or predict which systems are prone to be affected, we need to collect as much data as possible from all systems; namely, a high diversity of data.
  • Quality assurance engineers are interested in high-resolution, high-latency, and high-diversity data. Besides being important for quality assurance engineers to have high resolution monitoring data collected, which enables a deeper drill down into effects, the latency is not as critical as it is for system administrators. In this case, historical data is much more critical for comparing software releases than the freshness of the data. Since we can't wholly predict the ramifications of a new release, the available data needs to be spread across as much of the infrastructure as possible, touching every system the software release might use and invoke it or generally interact with it (directly or indirectly), so that we have as much data as possible.
  • SREs focused on capacity planning are interested in low-resolution, high-latency, and high-diversity data. In this scenario, historical data carries much more importance for SREs than the resolution that this data is presented in. For example, to predict the increase in infrastructure, it is not critical for a SRE to know that some months ago at 4 A.M., one of the nodes had a spike of CPU usage reaching 100% in 10 seconds, but is useful to understand the trend of the load across the fleet of nodes to infer the number of nodes required to handle new scale requirements. As such, it is also important for SREs to have a broad visualization of all the different parts of the infrastructure that are affected by those requirements to predict, for example, the amount of storage for logs, network bandwidth increase, and so on, making the high diversity of monitoring data mandatory.
  • Product owners are interested in low-resolution, high-latency, and low-diversity data. Where product owners are concerned, monitoring data usually steps away from infrastructure to the realm of business. Product owners strive to understand the trends of specific software products, where historical data is fundamental and resolution is not so critical. Keeping in mind the objective of evaluating the impact of software releases on the customers, latency is not as essential for them as it is for system administrators. The product owner manages a specific set of products, so a low diversity of monitoring data is expected, comprised mostly of business metrics.
The following table sums up the previous examples in a much more condensed form:
Data resolution
Data latency
Data diversity
Infrastructure alerting
High
Low
High
Software release view
High
High
High
Capacity planning
Low
High
High
Product/business view
Low
High
Low

Monitoring components

The same way the monitoring definition changes across contexts, its components follow the same predicament. Depending on how broad you want to be, we can find some or all of these components in the following topics:
  • Metrics: This exposes a certain system resource, application action, or business characteristic as a specific point in time value. This information is obtained in an aggregated form; for example, you can find out how many requests per second were served but not the exact time for a specific request, and without context, you won't know the ID of the requests.
  • Logging: Containing much more data than a metric, this manifests itself as an event from a system or application, containing all the information that's produced by such an event. This informat...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. About Packt
  4. Contributors
  5. Preface
  6. Section 1: Introduction
  7. Monitoring Fundamentals
  8. An Overview of the Prometheus Ecosystem
  9. Setting Up a Test Environment
  10. Section 2: Getting Started with Prometheus
  11. Prometheus Metrics Fundamentals
  12. Running a Prometheus Server
  13. Exporters and Integrations
  14. Prometheus Query Language - PromQL
  15. Troubleshooting and Validation
  16. Section 3: Dashboards and Alerts
  17. Defining Alerting and Recording Rules
  18. Discovering and Creating Grafana Dashboards
  19. Understanding and Extending Alertmanager
  20. Section 4: Scalability, Resilience, and Maintainability
  21. Choosing the Right Service Discovery
  22. Scaling and Federating Prometheus
  23. Integrating Long-Term Storage with Prometheus
  24. Assessments
  25. Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Hands-On Infrastructure Monitoring with Prometheus by Joel Bastos,Pedro Araújo in PDF and/or ePUB format, as well as other popular books in Informatica & Sviluppo software. We have over 1.5 million books available in our catalogue for you to explore.