eBook - ePub

The DevOps 2.5 Toolkit

Name: The DevOps 2.5 Toolkit
Author: Viktor Farcic

Viktor Farcic

Share book

322 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

The DevOps 2.5 Toolkit

Viktor Farcic

Book details

Book preview

Table of contents

Citations

About This Book

An advanced exploration of the skills and knowledge required for operating Kubernetes clusters, with a focus on metrics gathering and alerting, with the goal of making clusters and applications inside them autonomous through self-healing and self-adaptation.Key Features• The sixth book of DevOps expert Viktor Farcic's bestselling DevOps Toolkit series, with an overview of advanced core Kubernetes techniques, -oriented towards monitoring and alerting.• Takes a deep dive into monitoring, alerting, logging, auto-scaling, and other subjects aimed at making clusters resilient, self-sufficient, and self-adaptive• Discusses how to customise and create dashboards and alertsBook DescriptionBuilding on The DevOps 2.3 Toolkit: Kubernetes, and The DevOps 2.4 Toolkit: Continuous Deployment to Kubernetes, Viktor Farcic brings his latest exploration of the Docker technology as he records his journey to monitoring, logging, and autoscaling Kubernetes.The DevOps 2.5 Toolkit: Monitoring, Logging, and Auto-Scaling Kubernetes: Making Resilient, Self-Adaptive, And Autonomous Kubernetes Clusters is the latest book in Viktor Farcic's series that helps you build a full DevOps Toolkit. This book helps readers develop the necessary skillsets needed to be able to operate Kubernetes clusters, with a focus on metrics gathering and alerting with the goal of making clusters and applications inside them autonomous through self-healing and self-adaptation.Work with Viktor and dive into the creation of self-adaptive and self-healing systems within Kubernetes.What you will learn• Autoscaling Deployments and Statefulsets based on resource usage• Autoscaling nodes of a Kubernetes cluster• Debugging issues discovered through metrics and alerts• Extending HorizontalPodAutoscaler with custom metrics• Visualizing metrics and alerts• Collecting and querying logsWho this book is forReaders with an advanced-level understanding of Kubernetes and hands-on experience.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is The DevOps 2.5 Toolkit an online PDF/ePUB?

Yes, you can access The DevOps 2.5 Toolkit by Viktor Farcic in PDF and/or ePUB format, as well as other popular books in Informatica & Sviluppo software. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2019

ISBN

9781838642631

Edition

Topic

Informatica

Subtopic

Sviluppo software

Collecting and Querying Metrics and Sending Alerts

Insufficient facts always invite danger.

- Spock

So far, we explored how to leverage some of Kubernetes core features. We used HorizontalPodAutoscaler and Cluster Autoscaler. While the former relies on Metrics Server, the latter is not based on metrics, but on Scheduler's inability to place Pods within the existing cluster capacity. Even though Metrics Server does provide some basic metrics, we are in desperate need for more.

We have to be able to monitor our cluster and Metrics Server is just not enough. It contains a limited amount of metrics, it keeps them for a very short period, and it does not allow us to execute anything but simplest queries. I can't say that we are blind if we rely only on Metrics Server, but that we are severely impaired. Without increasing the number of metrics we're collecting, as well as their retention, we get only a glimpse into what's going on in our Kubernetes clusters.

Being able to fetch and store metrics cannot be the goal by itself. We also need to be able to query them in search for a cause of an issue. For that, we need metrics to be "rich" with information, and we need a powerful query language.

Finally, being able to find the cause of a problem is not worth much without being able to be notified that there is an issue in the first place. That means that we need a system that will allow us to define alerts that, when certain thresholds are reached, will send us notifications or, when appropriate, send them to other parts of the system that can automatically execute steps that will remedy issues.

If we accomplish that, we'll be a step closer to having not only a self-healing (Kubernetes already does that) but also a self-adaptive system that will react to changed conditions. We might go even further and try to predict that "bad things" will happen in the future and be proactive in resolving them before they even arise.

All in all, we need a tool, or a set of tools, that will allow us to fetch and store "rich" metrics, that will allow us to query them, and that will notify us when an issue happens or, even better, when a problem is about to occur.

We might not be able to build a self-adapting system in this chapter, but we can try to create a foundation. But, first things first, we need a cluster that will allow us to "play" with some new tools and concepts.

Creating a cluster

We'll continue using definitions from the vfarcic/k8s-specs (https://github.com/vfarcic/k8s-specs) repository. To be on the safe side, we'll pull the latest version first.

All the commands from this chapter are available in the 03-monitor.sh (https://gist.github.com/vfarcic/718886797a247f2f9ad4002f17e9ebd9) Gist.

 1 cd k8s-specs 2 3 git pull

In this chapter, we'll need a few things that were not requirements before, even though you probably already used them.

We'll start using UIs so we'll need NGINX Ingress Controller that will route traffic from outside the cluster. We'll also need environment variable LB_IP with the IP through which we can access worker nodes. We'll use it to configure a few Ingress resources.

The Gists used to test the examples in this chapters are below. Please use them as they are, or as inspiration to create your own cluster or to confirm whether the one you already have meets the requirements. Due to new requirements (Ingress and LB_IP), all the cluster setup Gists are new.

A note to Docker for Desktop users
You'll notice LB_IP=[...] command at the end of the Gist. You'll have to replace [...] with the IP of your cluster. Probably the easiest way to find it is through the ifconfig command. Just remember that it cannot be localhost, but the IP of your laptop (for example, 192.168.0.152).

A note to minikube and Docker for Desktop users
We have to increase memory to 3 GB. Please have that in mind in case you were planning only to skim through the Gist that matches your Kubernetes flavor.

The Gists are as follows.

gke-monitor.sh: GKE with 3 n1-standard-1 worker nodes, nginx Ingress, tiller, and cluster IP stored in environment variable LB_IP (https://gist.github.com/vfarcic/10e14bfbec466347d70d11a78fe7eec4).
eks-monitor.sh: EKS with 3 t2.small worker nodes, nginx Ingress, tiller, Metrics Server, and cluster IP stored in environment variable LB_IP (https://gist.github.com/vfarcic/211f8dbe204131f8109f417605dbddd5).
aks-monitor.sh: AKS with 3 Standard_B2s worker nodes, nginx Ingress, and tiller, and cluster IP stored in environment variable LB_IP (https://gist.github.com/vfarcic/5fe5c238047db39cb002cdfdadcfbad2).
docker-monitor.sh: Docker for Desktop with 2 CPUs, 3 GB RAM, nginx Ingress, tiller, Metrics Server, and cluster IP stored in environment variable LB_IP (https://gist.github.com/vfarcic/4d9ab04058cf00b9dd0faac11bda8f13).
minikube-monitor.sh: minikube with 2 CPUs, 3 GB RAM, ingress, storage-provisioner, default-storageclass, and metrics-server addons enabled, tiller, and cluster IP stored in environment variable LB_IP (https://gist.github.com/vfarcic/892c783bf51fc06dd7f31b939bc90248).

Now that we have a cluster, we'll need to choose the tools we'll use to accomplish our goals.

Choosing the tools for storing and querying metrics and alerting

HorizontalPodAutoscaler (HPA) and Cluster Autoscaler (CA) provide essential, yet very rudimentary mechanisms to scale our Pods and clusters.

While they do scaling decently well, they do not solve our need to be alerted when there's something wrong, nor do they provide enough information required to find the cause of an issue. We'll need to expand our setup with additional tools that will allow us to store and query metrics as well as to receive notifications when there is an issue.

If we focus on tools that we can install and manage ourselves, there is very little doubt about what to use. If we look at the list of Cloud Native Computing Foundation (CNCF) projects (https://www.cncf.io/projects/), only two graduated so far (October 2018). Those are Kubernetes and Prometheus (https://prometheus.io/). Given that we are looking for a tool that will allow us to store and query metrics and that Prometheus fulfills that need, the choice is straightforward. That is not to say that there are no other similar tools worth considering. There are, but they are all service based. We might explore them later but, for now, we're focused on those that we can run inside our cluster. So, we'll add Prometheus to the mix and try to answer a simple question. What is Prometheus?

Prometheus is a database (of sorts) designed to fetch (pull) and store highly dimensional time series data.

Time series are identified by a metric name and a set of key-value pairs. Data is stored both in memory and on disk. Former allows fast retrieval of information, while the latter exists for fault tolerance.

Prometheus' query language allows us to easily find data that can be used both for graphs and, more importantly, for alerting. It does not attempt to provide "great" visualization experience. For that, it integrates with Grafana (https://grafana.com/).

Unlike most other similar tools, we do not push data to Prometheus. Or, to be more precise, that is not the common way of getting metrics. Instead, Prometheus is a pull-based system that periodically fetches metrics from exporters. There are many third-party exporters we can use. But, in our case, the most crucial exporter is baked into Kubernetes. Prometheus can pull data from an exporter that transforms information from Kube API. Through it, we can fetch (almost) everything we might need. Or, at least, that's where the bulk of the information will be coming from.

Finally, storing metrics in Prometheus would not be of much use if w...