Second-day Kubernetes operations begin the day your environment goes live and starts serving real customers. This is the beginning of a never-ending journey of supporting the day-to-day operations of your environment.
You see, the entire day-to-day operations has a single purpose– "to provide your customers with the best possible experience using your applications".
However, as cloud expenses keep increasing, the executive perspective is more like, "the best possible experience, but at the lowest possible cost".
Resilience Pairs with Horizontal Pod and Cluster Autoscaler
A default way to “ensure performance and resilience while keeping costs low” is leveraging Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler.
Essentially, they dynamically scale your environments to meet the load of your application during peak times, then scale down when as the load decreases. When both the HPA and Cluster Autoscaler are installed and configured, we expect our environment to have a high resilience level combined with a steady cost pattern following the demand fluctuations.
However, in most situations, you will still see resilience and performance issues and overall costs steadily increasing. How can this be? It is most likely due to your system not being sized properly, resulting in HPA and Cluster Autoscaler running inefficiently.
Let's dive in deeper.
Scheduling with Autoscaler
Let’s be honest, there is no magic here.
Kubernetes' horizontal scalability heavily relies on the proper vertical sizing definitions of pods and nodes. Request values define how many resources the node should allocate for the specific pod.
Both the Cluster Autoscaler and HPA are tightly coupled to the pod request definitions.
When a pod is assigned to the node, (for example, a node with eight cores of CPU and 16 gigabytes of memory), the relevant fraction of node resources are reserved for the pod.
Now, when Kubernetes needs to schedule additional pods, it'll place them on the same node only if it has enough resources remaining to fit the pod's request.
If not, the pod goes to the unschedulable queue and Cluster Autoscaler will constantly look for a node with enough capacity. Once there is an available node with enough capacity, it will simply add a pod to our cluster.
The Cluster Autoscaler will scale up the number of nodes only when pods can't be scheduled on the existing nodes. And it will scale down the particular node only if the sum of requests of the node is less than a set threshold.
The same goes with the HPA, specifically with the resource-based HPA. New replicas will start when the utilization of current pods exceeds a set percentage of pod requests.
The Juggle: Under-Provisioned vs. Over-Provisioned Resources
A critical aspect of second-day operations is improving the efficiency and resiliency of your environment, which is often called optimizing or "right-sizing" your Kubernetes environment.
As we begin to better understand the importance of pod requests, how do we actually rightsize our pods? What are the correct values for the request and limit?
This is the concept of right-sizing or continuously optimizing resources. The goal is to provision as few resources as possible without compromising performance. The request should guarantee enough resources for a proper operation and the limit should protect the nodes from over-utilization.
Let's break down some scenarios:
If pod requests are too big, you’re allocating too many resources causing waste which is increasing both your cloud bill and your applications' CO2 emissions. This waste gets multiplied when Cluster Autoscaler spins up new nodes to support pods with requests filled with resources that are unnecessary.
But, if the requests are under-provisioned, Kubernetes will not guarantee that pod will have enough resources to run. This can directly lead to OOM and CPU throttling, as well as potential pod evictions, which tend to have the biggest impact on performance at peak load times.
If we forget to provision requests at all, Kubernetes will not allocate enough resources for a pod on the node during the assignment. This may cause unexpected pod eviction because of node memory pressure or CPU pressure.
Under-provisioning limits can cause services to underperform or fail during load bursts. Even if there are free resources available in the cluster or node, the pod can experience CPU throttling or OOM.
Over-provisioned limits will set a wrong cut-off threshold ending up with the failure of the entire node. Failure of the node under load spikes can easily end up with a domino effect and cause a complete outage for the system.
This can also be the case if you fail to define limits, however, in some situations, it is okay to remove CPU limits. Due to the compressible nature of CPU, the Completely Fair Schedule (CFS) will figure out how to properly distribute CPU time between the pods.
Right-Sizing the Right Way (with Data)
Finally, our mission of right-sizing is clear.
Roll up your sleeves and set each and every pod with as few resources as possible, without compromising the performance.
But how do you actually decide what is the right amount? Is it a half-core or four cores? Is it 100 megabytes or one gigabyte?
It seems like an easy task with the service owners going workload by workload, looking at all the metrics, and adjusting them accordingly. Unfortunately, this plan is not realistic. Most Kubernetes environments are highly distributed and constantly changing making this task extremely complex and time-consuming.
This level of complexity requires a solution. In my experience, good DevOps solutions consist of 70% of philosophy and 30% of technology. In this situation, the philosophical part of such a solution is to establish an effective feedback loop to pinpoint, quantify, and address relevant problems.
The technology part is the shift from data to intelligence. What is the difference between data and intelligence? Data is not considered intelligence until it is something that can be applied or acted upon. In other words, humans are not good at analyzing massive amounts of data. Switching from data to actionable intelligence will streamline the decision-making process.
This approach will allow you to shift from continuous firefighting to proactively pinpointing, predicting, and fixing problems. You’ll switch from guesstimation mode to data-driven decision-making. The end result of such an approach will be improved resilience, less SLA/SLO breaches, reduced waste and carbon footprint, and effective governance of the platform.
PerfectScale specialize in solutions built to help you right-size and continuously optimize your Kubernetes environment. To get an evaluation of your Kubernetes environment and see if it is running efficiently, book a demo today to start a free trial of our solution.
PerfectScale Co-Founder and CTO
About the author
Eli is a passionate technologist with a background in telecom (Comverse, Vonage), cyber (Imperva, Cyren), and storage (IBM). He has over six years of experience as a DevOps Manager, where he specialized in building large-scale SaaS systems based on Kubernetes. His current focus as PerfectScales CTO is on building solutions that solve the pains of operating highly-disturbed Kubernetes systems, keeping them running both efficiently and resiliently.