Chapter 1: Introduction to Kubernetes Infrastructure and Production-Readiness
With more and more organizations adopting Kubernetes for their infrastructure management, it is becoming the industry de facto standard for orchestrating and managing distributed applications both in the cloud and on premises.
Whether you are an individual contributor who is migrating their company's applications to the cloud or you are a decision-maker leading a cloud transformation initiative, you should plan the journey to Kubernetes and understand its challenges.
If this book has a core purpose, it is guiding you through the journey of building a production-ready Kubernetes infrastructure while avoiding the common pitfalls. This is our reason for writing about this topic, as we have witnessed failures and successes through the years of building and operating Kubernetes clusters on different scales. We are sure that you can avoid a lot of these failures, saving time and money, increasing reliability, and fulfilling your business goals.
In this chapter, you will learn about how to deploy Kubernetes production clusters with best practices. We will explain the roadmap that we will follow for the rest of the book, and explain foundational concepts that are commonly used to design and implement Kubernetes clusters. Understanding these concepts and the related principles are the key to building and operating production infrastructure. Besides, we will set your expectations about the book's scope.
We will go through the core problems that this book will solve and briefly cover topics such as Kubernetes production challenges, a production-readiness characteristics, the cloud-native landscape, and infrastructure design and management principles.
We will cover the following topics in this chapter:
- The basics of Kubernetes infrastructure
- Why Kubernetes is challenging in production
- Kubernetes production-readiness
- Kubernetes infrastructure best practices
- Cloud-native approach
The basics of Kubernetes infrastructure
If you are reading
this book, you already made your decision to take your Kubernetes infrastructure to an advanced level, which means you are beyond the stage of evaluating the technology. To build production infrastructure, the investment remains a burden and it still needs a solid justification to the business and the leadership within your organization. We will try to be very specific in this section about why we need a reliable Kubernetes infrastructure, and to clarify the challenges you should expect in production.
Kubernetes adoption is exploding across organizations all over the world, and we expect this growth to continue to increase, as the International Data Corporation
) predicts that
around 95 percent of new microservices will be deployed in containers by 2021. Most companies find that containers and Kubernetes help to optimize costs, simplify deployment and operations, and decrease time to market, as well as play a pivotal role in the hybrid cloud strategies. Similarly, Gartner predicts that more than 70 percent of organizations will run two or more containerized applications in production by 2021 compared to less than 20 percent in 2019
concerned about building a reliable Kubernetes cluster, we will cover an overview of the Kubernetes cluster architecture and its components, and then you will learn about production challenges.
Kubernetes has a distributed systems architecture – specifically, a client-server one. There are one or more master nodes, and this is where Kubernetes runs its control plane components.
There are worker nodes where Kubernetes deploys the pods and the workloads. A single cluster can manage up to 5,000 nodes. The Kubernetes cluster architecture
is shown in the following diagram:
Figure 1.1 – Kubernetes cluster architect
The preceding diagram
represents a typical highly available Kubernetes cluster architecture with the core components. It shows how the Kubernetes parts communicate with each other. Although you have a basic understanding of the Kubernetes cluster architecture, we will need to refresh this knowledge over the next section because we will interact with most of these components in deeper detail when creating and tuning the cluster configuration.
Control plane components
are the core software pieces that construct the Kubernetes master nodes. All of them together belong to the Kubernetes project, except etcd
, which is a separate project on its own. These components follow a distributed systems architecture and can easily scale horizontally to increase cluster capacity and provide high availability:
- kube-apiserver: The API server is the manager of the cluster components and it is the interface responsible for handling and serving the management APIs and middling the communication between cluster components.
- etcd: This is a distributed, highly available key-value data store that acts as the backbone of the cluster and stores all of its data.
- kube-controller-manager: This manages the controller processes that control the cluster – for example, the node controller that controls the nodes, the replication controller that controls the deployments, and the endpoint controller that controls services endpoints exposed in the cluster.
- kube-scheduler: This component is responsible for scheduling the pods across the nodes. It decides which pod goes to which node according to the scheduling algorithm, available resources, and the placement configuration.
are a set of
software agents that run on every worker node to maintain the running pods and provide network proxy services and the base runtime environment for the containers:
- kubelet: An agent service that runs on each node in the cluster, this periodically takes a set of pod specs (a manifest file in YAML format that describes a pod specification) and ensures that the pods described through these specs are running properly. Also, it is responsible for reporting to the master on the health of the node where it is running.
- kube-proxy: This is an agent service that runs on each node in the cluster to create, update, and delete network roles on the nodes, usually using Linux iptables. These network rules allow inter-pod and intra-pod communication inside and outside of the Kubernetes cluster.
- Container runtime: This is a software component that runs on each node in the cluster, and it is responsible for running the containers. Docker is the most famous container runtime; however, Kubernetes supports other runtimes, such as Container Runtime Interface (CRI-O) and containerd to run containers, and kubevirt and virtlet to run virtual machines.
Why Kubernetes is challenging in production
Kubernetes could be easy to
install, but it is complex to operate and maintain. Kubernetes in production brings challenges and difficulties along the way, from scaling, uptime, and security, to resilience, observability, resources utilization, and cost management. Kubernetes has succeeded in solving container management and orchestration, and it created a standard layer above the compute services. However, Kubernetes still lacks proper or complete support for some essential
services, such as Identity and Access Management
), storage, and image registries.
Usually, a Kubernetes cluster belongs
to a bigger company's production infrastructure, which includes databases, IAM, Lightweight Directory Access Protocol
), messaging, streaming, and others. Bringing a Kubernetes cluster to production requires connecting it to these external infrastructure parts.
Even during cloud transformation projects, we expect Kubernetes to manage and integrate with the on-premises infrastructure and services, and this takes production complexity to a next level.
Another challenge occurs when teams start adopting Kubernetes with the assumption that it will solve the scaling and uptime problems that their apps have, but they usually do not plan for day-2 issues. This ends up with catastrophic consequences regarding security, scaling, uptime, resource utilization, cluster migrations, upgrades, and performance tuning.
Besides the technical challenges, there are management challenges, especially when we use Kubernetes across large organizations that have multiple teams, and if the organization is not well
prepared to have the right team structure to operate and manage its Kubernetes infrastructure. This could lead to teams struggling to align around standard tools, best practices, a
nd delivery workflows.
the goal we need to achieve throughout this book, and we may not have a definitive definition for this buzzword. It could mean a cluster capable to serve production workloads and real traffic in a reliable and secure fashion. We can further extend this definition, but what many experts agree on is that there is a minimum set of requirements that you need to fulfill before you mark your cluster as production-ready.
We have gathered and categorized these readiness requirements according to the typical Kubernetes production layers (illustrated in the following diagram). We understand that there are still different production use cases for each organization, and product growth and business objectives are deeply affecting these use cases and hence the production readiness requirements. However, we can fairly consider the following production-ready checklist as an essential list for most mainstream use:
Figure 1.2 – Kubernetes infrastructure layers
This diagram describes the typical
layers of Kubernetes infrastruct...