Managing Kubernetes day-2 operations
Day-2 operations
When you start running your software in Kubernetes, it can be deceptively simple. You create a Pod, expose your ports, and presto, your app is up and running! You go to sleep, and the next day, you wake up and want to check that it's still running properly. This is colloquially known as day-2 operations. This is the lifecycle of your app after it's been initially configured.
In this section, we cover some of the basics of day-2 operations:
- Monitoring your application's health and diagnosing failures
- Managing resource use
- Scaling your application
Monitoring your application's health and diagnosing failures
Monitoring helps you determine your application's health and diagnose failures, and there are several ways to do this.
Metrics
Though Kubernetes doesn't include any metrics tooling by default, its extensibility means you can install supplementary tools when they're needed. If you're using a managed Kubernetes provider like AKS, EKS, or GKE, you can usually enable metrics with a single click.
If you prefer open-source tools, there are many options. The most popular opensource tool for gathering metrics is Prometheus, with Grafana usually used to visualize the data it collects. An easy way to get started is to use the kube-prometheus-stack helm chart, which includes a lot of defaults already set up for you.
Logs
By default, logs from your containers in Kubernetes are kept for the lifetime of your Pod (to a limit, which is configurable and 1MB default). The logs that Kubernetes collects are the standard output (stdout) of your container. If your application logs to a file, you must redirect the output to stdout.
A good trick is that /proc/1/fd/1 is stdout on Linux, so if you write logs to that file, it shows as a line in the Kubernetes logs. You can find more details on the default logging architecture in the Kubernetes documentation.
While logs that persist for the lifetime of your Pod can be useful to troubleshoot some issues, you'll often need to collect logs over a longer period and store them elsewhere. Much like metrics, managed Kubernetes providers often have tools to enable. Some open-source options are Loki or Elastic Stack.
Health probes
Kubernetes comes with a couple of options for checking the health of your application. These health probes are a way for Kubernetes to understand whether your application is running or not. There are 4 methods you can use to probe your application:
- Custom command: A command runs inside the Container, and returns either 0 for success or 1 for failure.
- HTTP request: A HTTP request gets sent to an endpoint in the Container. Any response code greater than or equal to 200, but less than 400 is a success, while any other response is a failure.
- TCP request: A TCP connection request gets sent to a port in the Container. If the connection is successful, the check is a success, otherwise, it's a failure.
- gRPC health check: Kubernetes uses the gRPC Health Checking Protocol to check the status of the application.
You can define 3 types of health probes, and they help Kubernetes understand the workload's health at different stages of its lifecycle:
- Readiness probe
- Liveness probe
- Startup probe
Readiness probe
A readiness probe lets Kubernetes know if the Container is ready to accept traffic. When this probe fails, no traffic is sent to the Pod from a service that may point to it. When the probe has been healthy for a configurable amount of time, traffic gets routed to the Pod again. Generally, you should configure this probe to fail quickly so that inbound traffic isn't sent to an unhealthy Pod.
Liveness probe
A liveness probe lets Kubernetes know if the Container is in an unrecoverable failure. When this probe fails, the Container restarts. This probe should only fail if the Container can't recover by itself, and a restart is likely to help.
Startup probe
A startup probe runs while the Container first boots and indicates to Kubernetes that the Container has finished loading. This prevents traffic from reaching the Container until it has finished initialization tasks. Additionally, liveness and readiness probes don't run until the startup probe has finished, allowing the other probes to fail quickly, even if it takes a while for the application to start.
Managing resource use for Kubernetes
When you run an application, you usually have an idea of a "normal" amount of resource consumption. But Kubernetes doesn't know this automatically. This is where resource requests and limits come into play. When you define a request on a container, Kubernetes uses this information to schedule the Pod onto an appropriate Node. When you define a limit, Kubernetes enforces that the Pod does not use any more resources than you defined.
You can set the resource requests for the following resources:
- CPU
- Measured in CPU Units
- 1 vCPU = 1 CPU Unit
- 1 CPU Unit = 1000 milli CPU (1000m)
- When the limit is exceeded, CPU use is throttled
- Measured in CPU Units
- Memory
- Measured in bytes
- Suffixes can be used, with the i indicating binary (^2) versions
- 1Ki = 1024 bytes
- 1Mi = 1024Ki
- 1K = 1000 bytes
- 1M = 1000K
- Suffixes can be used, with the i indicating binary (^2) versions
- When the limit is exceeded, the Pod is evicted (forcibly removed) from the Node
- Measured in bytes
- Ephemeral Storage
- Measured in bytes
- Uses the same convention as Memory
- When the limit is exceeded, the Pod is evicted (forcibly removed) from the Node
- Measured in bytes
Requests in Kubernetes
When you allocate a request, Kubernetes reserves the resources defined, ensuring that the Pod always has this memory at a minimum. Here's an example:
apiVersion: v1
kind: Pod
metadata:
name: foo
spec:
containers:
- name: foo
image: hello-world
resources:
requests:
memory: "64Mi"
cpu: "250m"
This definition is requesting 64Mi of memory, and 250m of CPU. This ensures that the Node this Pod gets scheduled on has at least ¼ of a CPU and 64 mebibytes of free memory.
Limits in Kubernetes
When you define a limit, Kubernetes ensures your Pod doesn't exceed the limits. Here's an example:
apiVersion: v1
kind: Pod
metadata:
name: foo
spec:
containers:
- name: foo
image: hello-world
resources:
limits:
memory: "256Mi"
cpu: "500m"
This definition sets the limit to 256Mi of memory and 500m of CPU. In a simplified way, if the Pod uses more than 500ms of CPU time in a 1000ms timespan, it won't be able to use anymore. This is called throttling. When your workload gets throttled, you should consider whether you've set your CPU limits correctly.
Scaling your application in Kubernetes
Workload requirements are often changing, based on things like user load or scheduled tasks. When your application needs more capacity, there are 2 methods. Classically, when this happens, you scale vertically by allocating more CPU and memory. Alternatively, you can scale horizontally in Kubernetes by creating additional replicas of your application.
Vertical scaling
Vertical scaling usually doesn't require you to make many changes to your workload. This means that it's almost universal that more CPU and memory increases the amount of work your workload can do at any given time. There are some downsides to vertical scaling, unfortunately. These include:
- Hitting limits with the size of the machine you can run on
- Degraded capacity during maintenance
- Hitting other limits (like network or disk speed) which can often be more expensive to scale
To scale vertically in Kubernetes, you can simply manually increase the requests and limits of your workloads. To do this automatically, you can use the Vertical Pod Autoscaler (VPA) to make recommendations, or even automatically scale your workloads. This is available as a plugin to install, and you can find the VPA documentation in the repository.
Horizontal scaling in Kubernetes
Horizontal scaling solves many of the problems that vertical scaling brings. Horizontal scaling also lets you easily scale down your workloads when you don't need as much capacity. Running multiple replicas also makes your workload more resilient to failure. A single Node crashing has less of an impact on capacity when you have multiple replicas running. The main issue with horizontal scaling is that the workload must be supported.
To scale horizontally efficiently, the workload needs to be able to share work efficiently, which is a complex problem in computer science. Most open-source, cloud-native software can scale horizontally. Be sure to check the documentation for your workload to see how to scale horizontally.
To scale horizontally in Kubernetes, you can simply increase the number of replicas in your deployment specification. Alternatively, Kubernetes comes preinstalled with something called the Horizontal Pod Autoscaler (HPA). This is a controller that can automatically scale your workload based on utilization or other metrics. You can read more in the HPA documentation.
Now that we've covered the basics, let's look at the benefits of Continuous Integration and Continuous Delivery to Kubernetes.