Hello Everyone, Welcome you all to this article on health Checks in Kubernetes.
In this article, we’ll discuss the types of probes, different types of health checks, how to troubleshoot, and best practices.
We perform Kubernetes health checks through probes. In Kubernetes, probes are mechanisms that monitor the health and status of containers. They ensure that your applications are running correctly and can detect issues to take appropriate actions.
There are three main types of probes: Liveness, Readiness, and Startup probes:
1. Liveness Probe
The Liveness Probe is like a heartbeat check for your container. Its main job is to ensure that your application is still running and hasn't gone into a deadlock or some other irrecoverable state. If the liveness probe fails, Kubernetes will kill the container, and, depending on your restart policy, it may restart it. This is important for maintaining the health of your application over time.
2. Readiness Probe
The Readiness Probe is all about traffic management. It checks whether your container is ready to handle incoming requests. If the readiness probe fails, Kubernetes will temporarily remove the container from the service's endpoints, meaning it won't receive any traffic until it passes the readiness check again. This ensures that only healthy instances of your application are serving requests. Another important feature of the readiness probe is related to the Rolling Update deployment strategy. Without the probe, a container is considered ready once it's running. With the probe, the update won't progress until the container is actually ready, i.e., passes the readiness probe.
3. Startup Probe
The Startup Probe is designed for applications that take a while to initialize. It checks if your application has started up correctly. If the startup probe fails, Kubernetes will kill the container and may restart it based on the restart policy. This probe is only active during the startup phase of the container, making it ideal for applications with long and unpredictable initialization times. On its success, it enables readiness and liveness. This probe can be seen as a dynamic alternative to the static initialDelaySeconds parameter of the two other probe types.
Types of k8s Health Checks
Kubernetes offers mechanisms to ensure that your applications are running smoothly and can recover from failures. These mechanisms include HTTP requests, commands, TCP connections, and gRPC health checks. Let’s explore readiness probes, which are important for managing traffic and making sure only healthy instances of your application are serving requests:
1. HTTP Requests
HTTP-based readiness probes are used to check if an application is ready to handle incoming requests by sending HTTP GET requests to a specified endpoint. If the endpoint returns a success status code (2xx or 3xx), the container is considered ready to serve traffic. This ensures that the application is fully initialized and ready to process requests before it starts receiving them.
In this example, Kubernetes sends an HTTP GET request to /ready on port 8080 every 5 seconds after an initial delay of 5 seconds. If the endpoint returns a success code, the container is considered ready to serve traffic.
2. Commands
Command-based readiness probes execute a command inside the container. If the command returns a zero exit code, the container is considered ready. Otherwise, it is considered not ready. It's important to handle zombie processes when using exec probes. A zombie process occurs when a child process has completed execution, but its parent has not yet read its exit status. This can be managed by ensuring that the parent process calls wait() or waitpid() to retrieve the child's exit status. For more information on handling zombie processes, you can refer this.
In this example, Kubernetes runs the command cat /tmp/ready every 5 seconds after an initial delay of 5 seconds. If the file /tmp/ready exists, the command succeeds, and the container is considered ready.
3. TCP Connections
TCP-based readiness probes check if an application is ready by attempting to open a TCP connection to a specified port. If the connection is successful, the container is considered ready.
In this example, Kubernetes attempts to open a TCP connection to port 8080 every 10 seconds after an initial delay of 15 seconds. If the connection is successful, the container is considered ready.
4. gRPC
gRPC-based readiness probes use the gRPC Health Checking Protocol to check if an application is ready. If the gRPC endpoint returns a healthy status, the container is considered ready.
In this example, Kubernetes uses the gRPC Health Checking Protocol to check the readiness of the grpc-server container on port 50051 every 10 seconds after an initial delay of 10 seconds. The successThreshold and failureThreshold parameters can be configured to determine how many consecutive successes or failures are required before the container is considered ready or not ready, respectively. The default values are 1 for both.
Using All Types for Kubernetes Health Check
You can combine different types of probes to ensure health checks for your application. For example, you use an HTTP probe for readiness, a command probe for liveness, and a TCP probe for startup.
The configuration defines:
- The readiness probe uses an HTTP GET request to check if the application is ready to serve traffic.
- The liveness probe uses a command to check if the application is still running correctly.
- The startup probe uses a TCP connection to check if the application has started successfully.
By configuring these probes, you can ensure that your application is robust and can recover from various failure scenarios.
>> Take a look at ultimate guide to how you can keep your k8s clusters lean
Common Troubleshooting Steps
Understanding and effectively troubleshooting health checks is important for maintaining the reliability and performance of your applications. You can troubleshoot health checks to enable faster resolution of issues.
1. Check Container Status: Use the following command to see the status of your containers within pods.
This will give you an overview of which pods are running, pending, or failed, and you can drill down to container statuses.
2. Describe the Pod: For detailed information about a specific pod and its containers, use:
This command provides insights into the pod’s status, including health check failures.
3. Check Container Logs: Logs can provide valuable information about why a container is failing health checks
Look for error messages that indicate why the application within the container is not healthy.
4. Check Events: Kubernetes events can provide a timeline of what happened to your pods and their containers:
Events can help you understand the sequence of actions and identify where things went wrong.
5. Resource Limits: Check if the pod is running out of resources (CPU, memory) which might cause it to fail health checks. If it runs out of memory, it will get OOM killed. CPU throttling can cause a probe failure.
6. Configuration Errors: Verify that the health check configurations (endpoints, commands, ports) are correct.
By following these steps and understanding the underlying mechanisms of health checks, you can quickly identify and resolve issues, ensuring that your applications remain healthy and responsive.
Best practices for Kubernetes health checks
1. When configuring health checks in Kubernetes, it's important to choose the protocol that best suits your application's requirements. HTTP probes are perfect for web services as they can provide detailed status information via health endpoints, although they might be slower. TCP probes are more suitable for applications that don't support either HTTP or gRPC but use other protocols (like databases). Command probes are ideal for custom checks that need to execute specific commands within the container, while gRPC probes are optimal for applications using the gRPC protocol, offering built-in support for health checks. Security is also a key consideration; for example, HTTP probes might need SSL/TLS encryption for secure communication, whereas TCP probes typically do not require authentication or encryption but should be configured to minimize exposure.
2. Enhancing efficiency through connection reuse is beneficial. Utilizing connection pools to reuse existing connections reduces the overhead of establishing new ones. Monitoring and adjusting connection pool settings can ensure optimal performance. Enabling HTTP keep-alive allows the reuse of the same TCP connection for multiple requests, further improving efficiency.
3. For command probes, custom scripts can manage complex health checks that standard probes cannot handle. These scripts should return appropriate exit codes to accurately indicate health status. Using environment variables or command-line arguments can make these scripts configurable and reusable. Documenting custom scripts and storing them in a version-controlled repository ensures easy maintenance and sharing.
4. Utilizing HTTP/2 can offer advantages, including multiplexing, server push, and header compression. Configuring the HTTP/2 server to handle health check requests reliably can enhance the robustness of your health checks. Using HTTP/2 over TLS ensures secure and dependable health checks.
5. Accurate health status indication requires defining appropriate HTTP response codes. Specific codes like 200-400 (OK) or 503 (Service Unavailable) should be used, and these codes should be recognized by your monitoring and alerting systems. Combining HTTP response codes with other metrics, such as response time or error rate, provides a health assessment.
6. To avoid overloading the system, it's important to limit resource consumption. Reducing the frequency of resource-intensive operations like network requests or custom scripts is recommended. Whenever possible, use simpler methods like HTTP or TCP probes.
Note: HTTP calls can be expensive.
7. For worker containers (not serving traffic), consider using a lease file mechanism. This involves touching a file during each iteration of the main loop and checking the timestamp of that file from an exec probe. This method is relatively easy and avoids the overhead of embedding an HTTP server. Writing logs and checking them from the probe also works.
8. Preventing common issues can save a lot of trouble. For example, using TCP health checks for HTTP applications can be misleading, as they might mark the application as healthy based solely on port binding. Implement proper health endpoints that check dependencies for HTTP applications. Always implement readiness checks to prevent the application from receiving traffic prematurely. For databases like Redis, avoid using TCP health checks; instead, use command probes to ensure the database is in the desired state. Lastly, avoid verifying unnecessary dependencies in health checks to prevent cascading failures.
By following these best practices, you can ensure that your Kubernetes health checks are effective and contribute to the overall stability and reliability of your deployments.
As you’ve seen in the above discussion - healthchecks improve your app reliability and uptime. But they can also be potentially expensive, stealing resources from your application logic. In order to get maximum Kubernetes reliability at the lowest possible cost - check out: PerfectScale. PerfectScale is designed to optimize and scale your Kubernetes environments effortlessly, ensuring that your clusters are always running at peak performance.
Our advanced algorithms and machine learning techniques ensure your services are precisely tuned to meet demand, cutting down on waste and optimizing every layer of your K8s stack. Join industry leaders like Paramount Pictures and Creditas who have already optimized their Kubernetes environments with PerfectScale. Sign up or Book a demo to experience the immediate benefits of automated Kubernetes cost optimization and management, ensuring your environment is always perfectly scalable.