Putting K8s Resources to Sleep with KEDA

In the current era of cloud computing, businesses heavily depend on container orchestration platforms such as Kubernetes to effectively handle their workloads. Kubernetes facilitates the scalability, adaptability, and robustness of applications. Nevertheless, there are situations where organizations find it advantageous to shut down their Kubernetes workloads outside regular hours. This could primarily be driven by cost-efficiency, security considerations, or the need for maintenance.

The off-hours challenge can be solved from multiple perspectives, but the two main ones that come to mind are:

Powering off the Kubernetes nodes
Powering off the Kubernetes applications

The former (1) might be a good solution for some, because it simply makes sure that the nodes are powered off. Essentially, nothing can be more cost effective and secure than that. However, this solution takes away the dynamic behavior of Kubernetes workloads.

In this article, I will explore 3 ways to automatically shut down Kubernetes applications. The last one being a “Bonus” for the tech-savvy.

Cron Scaler
Custom Metric Scaler
Network Scaler*

Kubernetes KEDA for Event-Driven Autoscaling

KEDA is a Kubernetes-based Event Driven Autoscaler. With KEDA, you can drive the scaling of any container in Kubernetes based on the number of events needing to be processed.

An Event can be anything that is generated from an action. It could be an API call, a message in a queue, a filesystem change etc. In our case, while evaluating ways to scale to-and-from 0, the “event” is either “time” itself, or a sleep request. KEDA augments the functionality of the native Kubernetes Horizontal Pod Autoscaler (HPA) by managing it.

However, as you might be aware, HPA cannot scale workloads to 0. KEDA, however, can! By deleting the HPA completely and recreating it if necessary.

In order for KEDA to scale your workloads to “0” during off-hours, we're going to explore a few methodologies. The first is simply time based, and the second allows you to control the sleep schedule of your precious workloads with more granularity. The third implements an immature but clever solution to this problem, using a custom resource called a “Network Scaler”.

NOTE: The code examples for the below guide are available in our GitHub Repository.

The Simple Solution — Cron Scaler

The main KEDA CRD (Custom Resource Definition) is called ScaledObject. The ScaledObject defines how many replicas a certain workload (Deployment, StatefulSet, etc) has at a specific time. As of writing this article, The Cron Scaler currently has issues when specifying what off-hours are. But, it can dictate what the on-hours are, so that your workloads know when to be awake, and otherwise — sleep.

Let’s assume you have a simple deployment, and you issue the following command in the terminal: kubectl apply -f deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleepy-workload
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleepy-workload
  template:
    metadata:
      labels:
        app: sleepy-workload
    spec:
      containers:
        - name: busybox
          image: busybox
          command:
            - sleep
            - "3600"

Now, let’s add a ScaledObject KEDA CRD and attach it to our sleep workload (your timezone can be found here). Let’s assume you want this workload alive from 9:00 AM — 17:00 PM on New York time.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sleepy-workload-scaler
spec:
  scaleTargetRef:
    # Mandatory. Must be in the same namespace as the ScaledObject
    name: sleepy-workload
  # Polling interval of when to check scaling events
  pollingInterval: 10
  # Makes the workload immediately sleep when told, instead of reading a bed-time story before for X seconds
  cooldownPeriod: 0
  # The workload is ASLEEP by DEFAULT, otherwise, it's Awake.
  minReplicaCount: 0 
  triggers:
    - type: cron
      metadata:
        # The acceptable values would be a value from the IANA Time Zone Database.
        timezone: America/New_York  
        # At 09:00 on every day-of-week from Monday through Friday
        start: 0 9 * * 1-5
        # At 17:00 on every day-of-week from Monday through Friday 
        end: 0 17 * * 1-5
        # ie. Your MINIMUM replica count for this workload
        desiredReplicas: "2"

When you apply this into your Namespace, you’ll see how your sleep-workload sleeps and wakes up on a schedule. This solution is simple and elegant.

(KEDA calculates the replica count of all triggers based on a MAX function. If you add another scale to the triggers array, your CRON will be used as the minimum replica count, and any addition could surpass it)

The Extensive Solution — Custom Metrics API

What if you wanted an external system to dictate when and how workloads sleep, and you wanted your workloads to be aware of such state and act accordingly. That is where Metrics API Scaler comes into play.

This scaler allows you to define an external endpoint for KEDA to query in order to adjust to the amount of replicas it should have.

In this use case, I have used AWS DynamoDB, AWS Lambda and Jenkins as an automation server and cron scheduler. You could however, switch in-place any of these technologies with any other Database+API Server+Automation Scheduler you choose.

The Cron scaler is not part of this solution. For this one, we implement our business logic behind a custom API endpoint, pointed at by the KEDA Metrics Scaler:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sleepy-workload-scaler
spec:
  scaleTargetRef:
    # Mandatory. Must be in the same namespace as the ScaledObject
    name: sleep-workload
  # Polling interval of when to check scaling events
  pollingInterval: 10
  # Makes the workload immediately sleep when told, instead of reading a bed-time story before for X seconds
  cooldownPeriod: 0
  # The workload is ASLEEP by DEFAULT, otherwise, it's Awake.
  minReplicaCount: 0
  triggers:
    - type: metrics-api
      metricType: Value
      metadata:
        # Your on-hours minimum replica count. Any value less than that returned
        # By the API will be ignored.
        targetValue: "2" 
        # Your custom API endpoint
        url: "https://off-hours-api.my.domain/sleep?workload=sleepy-workload&replicas=2"
        # What key contains the information on the replica count in the JSON response
        valueLocation: "replicaCount"

In this example, our sleepy-workload scaler points at an external URL, and provides it with two Query Parameters:

Workload = The workload’s name (you could switch Workload with Namespace to turn off an entire namespace, having all your workloads point to the same endpoint with the same query, which is their common Namespace).
Replicas = If I am not asleep, how many replicas should I have?

The metrics scaler then expects the following JSON response from the GET request if the workload is Awake:

{
  "replicaCount": 2
}

And if the workload is considered Asleep

{
  "replicaCount": 0
}

Let’s assume we have the following DynamoDB table called state-of-my-workloads:

I have written a pseudo-code that allows your workload to be awoken and be put to sleep. Let’s write a simple Python Lambda that answers the precise inquiry KEDA is requesting. You could implement and deploy any API server your’e familiar with that is suited to your technology stack and use case:

def get_sleep_value(name_of_workload):
  ...
  is_sleeping = True # or false
  return is_sleeping

def lambda_handler(event, context):
  workload = query_params.get("workload", "Unknown")
  replicas = query_params.get("replicas", 0)

  res = {
    "statusCode": 200,
    "headers": {"Content-Type": "application/json"},
    "body": json.dumps(
        {"replicaCount": (0 if get_sleep_value(workload) else replicas)}
    ),
  }
  return res

The get_sleep_value() can be implemented in the following way, if let’s say, you keep the state of your workloads/namespaces in AWS DynamoDB:

def get_sleep_value(workload):
    dynamodb = boto3.resource("dynamodb")
    table = dynamodb.Table("state-of-my-workloads")
    response = table.get_item(Key={"workload": workload})
    item = response.get("Item")
    print(item)
    if item is not None:
        sleep_value = item.get("sleep")
        if sleep_value is not None:
            return sleep_value
    return False

In this function, we return the Boolean value of the sleep status of your workload. In return, the main function returns the amount of replicas provided by the replicas query parameter.

Now, you could use any automation solution to switch your Boolean sleep value in your workload’s state table, or connect it to a manual portal in which people can turn their workloads on and off on-demand.

The Clever Solution — Network Scaler

The KEDA Network Scaler is and has been on a Beta status for years. Regardless, it’s an exciting add-on challenging one of the most popular scaling problems in the world — network traffic. By monitoring the metric of network traffic as opposed to just compute, we are opening ourselves to a whole new range of capabilities. One of those is, as you guessed it — down-scaling. This is an example of an implementation of the “serverless” pattern on Kubernetes.

That means, that if your workload has a network endpoint, it will scale from 0 to 1, upon being called. It’s the perfect metaphor for knocking on the workload’s door and waking it up.

To get started, you need to add the KEDA http add-on to your existing KEDA installment in the cluster:

helm install http-add-on kedacore/keda-add-ons-http

While working with HTTP, you’d obviously need a Kubernetes Service to point to your workloads:

apiVersion: v1
kind: Service
metadata:
  name: sleepy-workload-service
spec:
  selector:
    app: sleepy-workload
  ports:
  - name: sleepy-workload-service-port
    protocol: TCP
    port: 80
    targetPort: sleepy-workload-http-port

And now comes the magic. Because you installed the HTTP add-on, you now have the following CRD available. You can configure it as such (addon version v0.7):

kind: HTTPScaledObject
apiVersion: http.keda.sh/v1alpha1
metadata:
  name: sleepy-workload
spec:
  hosts:
    - myhost.com
  scaleTargetRef:
    name: sleepy-workload
    kind: Deployment
    apiVersion: apps/v1
    service: sleepy-workload-service
    port: 80
  replicas:
    min: 0
    max: 1

What happens now, is that the http-add-on operator will pick up the CRD and when it’s done configuring, you’ll see a new Service that’s ready to route HTTP traffic to your Deployment. In order to connect the http-add-on service to your workload’s service, there is one last action that needs doing.

It is important to note that the routing of the add-on service behaves similar to an ingress, and has the possibility of combining itself with the ingress, as per the official doc. That means that it routes traffic based on Hostnames and Paths. For our demo purposes, we’ll mimic this capability by using only the ClusterIP type of Service.

We can now test this mechanism by port-forwarding HTTP traffic to this service, and you’ll see it waking up and auto-scaling as per the min/max you’ve .

In order to do that, you’d have to first expose the new service, so we’ll do that with port-forward:

kubectl port-forward svc/keda-add-ons-http-interceptor-proxy 8080:80 -n keda

And in another terminal:

curl localhost:8080 -H 'Host: myhost.com'

And viola!

_____________________

This article has listed several ways of using KEDA as a mechanism to achieve down-scaling to 0. There are many more methods out there and there is no right or wrong answer. These solutions are simply using logic that already exists in open-source in order to achieve this goal. If you have any other suggestions I’d love to read them in the comments, alongside any questions you might have.

‍

Amitai G is a freelance DevOps consultant with an extensive background in DevOps practices and Kubernetes' ongoing maintenance and operations. As the author of the @elementtech.dev Medium channel, he specializes in writing articles that provide engineers with a deeper understanding and best practice guidance across the various aspects of the Cloud Native landscape.
‍

Ready to take your Kubernetes resource management to the next level? With PerfectScale by DoiT, you can maximize cost savings by intelligently managing your Kubernetes resources. Our advanced algorithms and machine learning techniques ensure your workloads are optimally scaled, reducing waste and cutting costs without compromising performance. Join forward-thinking companies who have already optimized their Kubernetes environments with PerfectScale. Sign up and Book a demo to experience the immediate benefits of automated Kubernetes cost optimization, resource management. Ensure your environment is always perfectly scaled and cost-efficient, even when demand is low.