Cut GKE Node Costs with NAP and PerfectScale

Managing diverse node pools in Google Kubernetes Engine (GKE) can be complex and costly. Each node pool in GKE is a group of nodes with the same configuration, designed to satisfy specific workload requirements. While this flexibility allows for optimized performance, it also introduces challenges in terms of manual configuration and maintenance. Administrators need to create and manage multiple node pools to accommodate different workloads, which can lead to increased operational overhead and potential misconfigurations.

In Google Kubernetes Engine (GKE), the standard Cluster Autoscaler dynamically adjusts the size of existing node pools based on the resource requests of scheduled pods. However, it's important to note that this autoscaling is bounded by the minimum and maximum node counts defined for each node pool. If the demand exceeds the specified maximum, the autoscaler cannot add more nodes, potentially leading to unschedulable pods. Conversely, the autoscaler won't reduce the node count below the defined minimum, which might result in underutilized resources and increased costs. Also, the standard Cluster Autoscaler doesn't create new node pools; it only scales the existing ones. This limitation necessitates manual configuration and tuning to efficiently handle varied workload demands, which can be both time-consuming and error-prone.

To address these challenges, GKE offers Node Auto-Provisioning (NAP), which dynamically creates and scales node pools based on real-time workload needs.

In this article, you will learn about GKE NAP, pitfalls of manual pool management, how GKE’S NAP works, how Compute Classes work with GKE NAP, benefits of NAP, and how PerfectScale will help you.

Let's dive in!

What is GKE NAP?

GKE's Node Auto-Provisioning (NAP) enhances the standard Cluster Autoscaler by not only adjusting the size of existing node pools but also automatically creating and deleting node pools based on real-time workload requirements. This dynamic scaling ensures that applications have the necessary resources without manual intervention.

NAP considers various factors, including CPU, memory, and ephemeral storage resource requests, as well as GPU needs and specific node affinities or taints defined in pod specifications. By analyzing these parameters, NAP provisions nodes that best fit the workload, optimizing resource utilization and reducing waste. Also, NAP supports the use of Spot VMs, allowing for cost-effective scaling by utilizing preemptible instances when appropriate.

While NAP automates node provisioning, ensuring that workloads are right-sized is essential to maximize efficiency. PerfectScale addresses this by continuously analyzing Kubernetes environments to provide actionable insights for resource optimization.

Pitfalls of Manual Node Pool Management

Managing node pools manually in Google Kubernetes Engine (GKE) presents challenges:

Operational Overhead

Manually managing node pools demands constant monitoring and adjustments to align with workload demands. Administrators must regularly assess resource utilization, scale nodes up or down, and reconfigure node pools to accommodate changing application requirements. This hands-on approach is time-consuming and increases the risk of human error.

Risk of Overprovisioning

To prevent performance issues, teams often overprovision resources, allocating more CPU and memory than necessary. While this approach aims to ensure application stability, it leads to increased costs due to underutilized resources.

PerfectScale helps here by providing data-driven insights, ensuring that the resources are right-sized, which helps eliminate waste and maintain performance without overspending.

Underutilization Concerns

Static configurations in manually managed node pools can result in underutilized resources during periods of low demand. Without dynamic scaling, nodes may remain idle, consuming resources and incurring costs without contributing to workload processing.

Complexity in Scaling

Scaling node pools manually is a complex and error-prone process, especially in dynamic environments where workload demands fluctuate rapidly. Administrators must predict resource requirements and adjust node pools accordingly, a task that becomes increasingly challenging as applications scale. This complexity can lead to delayed responses to workload changes.

>> Take a look at GKE Cost Optimization Best Practices

How GKE NAP Works?

When a pod cannot be scheduled due to insufficient resources, NAP analyzes the pod's specifications such as CPU, memory, and storage requirements and automatically provisions a new node pool tailored to accommodate the workload. This dynamic scaling ensures that applications receive the necessary resources promptly without manual configuration.

NAP's decision-making process considers various factors:

CPU and Memory Requirements: GKE’s NAP determines the optimal machine type to match the pod's resource needs.

Ephemeral Storage: It considers the storage demands of the pod to provision nodes with adequate disk space.

GPU & TPU Requests: If a pod requires GPU resources, NAP selects appropriate machine types equipped with GPUs. Similarly, for TPU workloads, GKE supports the creation of TPU-enabled node pools, allowing for the automatic provisioning of nodes with TPU capabilities when such resources are requested by pods.

Node Affinities and Taints: It respects pod specifications regarding node selection preferences and tolerations.

By assessing these parameters, NAP ensures that the newly provisioned nodes are well-suited to handle the specific workloads, enhancing performance and efficiency.

‍

Enhancing NAP with Compute Classes

When you use GKE's Node Auto-Provisioning (NAP) with Custom Compute Classes, it allows you to further fine-tune how your Kubernetes cluster scales, especially for workloads with specific infrastructure requirements. Compute Classes act as user-defined profiles that dictate what kind of nodes NAP should provision, based on characteristics such as CPU platform (e.g., Intel Ice Lake), GPU, local SSD, machine type families (like E2, N2, C2), and even Spot or standard VM preferences.

These Compute Classes are incredibly valuable when you're dealing with workloads that have unique performance or cost constraints. For example, if you’re running a latency-sensitive application that benefits from a particular CPU type, you can define a Compute Class that ensures only nodes with that CPU platform are provisioned. Conversely, for cost-sensitive batch jobs, you might create a Compute Class that restricts provisioning to Spot VMs or less expensive machine families.

To use Compute Classes with NAP, ensure the following:

a. Enable Node Auto-Provisioning: Activate NAP in your GKE cluster settings.
b. Configure Compute Classes: Define Compute Classes with the nodePoolAutoCreation field set to enabled: true. This configuration allows GKE to create and delete node pools based on the specified priorities.

apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
  name: custom-compute-class
spec:
  priorities:
    - nodepools: [existing-pool]
    - machineFamily: e2
    - machineFamily: c3
  nodePoolAutoCreation:
    enabled: true

The above configuration defines a custom compute class in GKE that prioritizes scheduling Pods on an existing node pool named existing-pool; if that's not feasible, it enables GKE's Node Auto-Provisioning to automatically create new node pools using the e2 machine family, and if e2 resources are unavailable, it falls back to the c3 machine family, ensuring efficient and flexible resource allocation based on workload demands.

Role of PerfectScale in Workload Optimization

PerfectScale is a Kubernetes optimization platform designed to enhance resource efficiency and reduce costs in Kubernetes environments. It provides automated insights and actions to ensure that workloads are appropriately sized, leading to improved performance and cost savings.

Some Features of PerfectScale:

1. Pod Right-Sizing: PerfectScale continuously analyzes pod resource usage, offering recommendations to adjust CPU and memory requests and limits. This ensures that pods are neither over-provisioned (wasting resources) nor under-provisioned (risking performance issues).

2. Automated Recommendations: PerfectScale provides actionable insights to optimize workloads, which can be implemented with minimal effort. These recommendations help in maintaining an optimal balance between performance and cost.

3. Complementing Node Auto-Provisioning (NAP): While GKE's NAP handles the provisioning of nodes based on workload demands, PerfectScale ensures that the workloads themselves are right-sized. This synergy leads to a harmonious balance between supply (nodes) and demand (workloads), optimizing overall cluster efficiency.

‍

Getting Started: Implementing NAP and PerfectScale

Here's a comprehensive guide to get you started:

1. Enabling Node Auto-Provisioning (NAP) in GKE

Via gcloud CLI:

To enable NAP using the gcloud CLI, execute the following command:

gcloud container clusters update CLUSTER_NAME \
    --enable-autoprovisioning \
    --min-cpu MINIMUM_CPU \
    --min-memory MINIMUM_MEMORY \
    --max-cpu MAXIMUM_CPU \
    --max-memory MAXIMUM_MEMORY

Replace CLUSTER_NAME, MINIMUM_CPU, MINIMUM_MEMORY, MAXIMUM_CPU, and MAXIMUM_MEMORY with your cluster name and desired resource limits.

2. Defining Custom Compute Classes

Custom Compute Classes allow you to specify preferences for node provisioning, such as machine families, CPU platforms, and more.

apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
  name: custom-compute-class
spec:
  priorities:
    - nodepools: [existing-pool]
    - machineFamily: e2
    - machineFamily: c3
  nodePoolAutoCreation:
    enabled: true

In this configuration, GKE will first attempt to schedule pods on the existing-pool, If resources are insufficient, it will auto-provision nodes from the e2 machine family. If e2 resources are unavailable, it will fall back to the c3 machine family.

3. Assigning Compute Classes to Workloads

To ensure that specific workloads utilize the defined compute classes, label your pods accordingly.

apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
  labels:
    compute-class: custom-compute-class
spec:
  containers:
  - name: sample-container
    image: your-image

This labeling ensures that the pod uses the custom-compute-class for node provisioning.

4. Integrating PerfectScale with GKE

PerfectScale offers intelligent rightsizing and cost management capabilities for Kubernetes clusters.

These are the integration Steps:

a. Sign up for PerfectScale.

b. Connect your GKE cluster by following the integration instructions provided by PerfectScale.

c. Utilize the PerfectScale dashboard to monitor resource usage, receive optimization recommendations, and implement changes to enhance performance and reduce costs.

PerfectScale's platform-agnostic nature ensures compatibility with various Kubernetes environments, including GKE.

Want to learn how to fully leverage GKE NAP and Compute Classes? Don’t miss our free Kubernetes Optimization Workshop on May 22.

Benefits of GKE NAP

Here are the advantages of implementing GKE NAP in your Kubernetes environment:

1. Reduced Manual Intervention

NAP automates the creation and scaling of node pools based on real-time workload demands. When unschedulable pods are detected due to insufficient resources, NAP evaluates the pod specifications and provisions appropriate node pools without requiring manual configuration.

2. Optimized Resource Utilization

By analyzing CPU, memory, and other resource requests, NAP ensures that the provisioned nodes match the specific requirements of the workloads. This precise allocation prevents overprovisioning and underutilization, leading to more efficient use of resources. Also, NAP can work in conjunction with compute classes to further tailor node configurations, enhancing performance and cost-effectiveness.

3. Cost Savings

When you implement NAP, it can lead to significant cost reductions. By avoiding overprovisioning and ensuring that resources are allocated based on actual workload needs, organizations can minimize unnecessary cloud expenses. Furthermore, NAP supports the use of Spot VMs for fault-tolerant workloads, offering lower pricing compared to standard VMs, thus providing additional avenues for cost savings.

4. Enhanced Scalability and Flexibility

NAP enables clusters to scale dynamically in response to workload fluctuations. This elasticity ensures that applications maintain performance during peak demand periods and scale down during low-demand periods, optimizing resource usage.

5. Improved Operational Efficiency

By automating node pool management, NAP reduces the operational overhead associated with manual scaling and configuration. This efficiency allows DevOps teams to focus on higher-value tasks, such as application development and optimization, rather than infrastructure management.

6. Integration with PerfectScale for Enhanced Optimization

While NAP automates node provisioning, integrating it with tools like PerfectScale can further enhance workload optimization. PerfectScale provides insights and automation for workload optimization, ensuring efficient resource utilization and cost savings. By combining NAP's automated scaling with PerfectScale's optimization capabilities, organizations can achieve a more streamlined and cost-effective Kubernetes environment.

Cut GKE Node Costs with NAP and PerfectScale

What is GKE NAP?

Pitfalls of Manual Node Pool Management

Operational Overhead

Risk of Overprovisioning

Underutilization Concerns

Complexity in Scaling

How GKE NAP Works?

Enhancing NAP with Compute Classes

Role of PerfectScale in Workload Optimization

Some Features of PerfectScale:

Getting Started: Implementing NAP and PerfectScale

1. Enabling Node Auto-Provisioning (NAP) in GKE

2. Defining Custom Compute Classes

3. Assigning Compute Classes to Workloads

4. Integrating PerfectScale with GKE

Benefits of GKE NAP

Reduce your cloud bill and improve application performance today

Latest Articles

GPU Optimization with Exceptional PerfectScale Visibility

On Demand Webinar: Manage & Scale GenAI on Kubernetes

GCP Cloud Billing with PerfectScale

About the author

Cut GKE Node Costs with NAP and PerfectScale

What is GKE NAP?

Pitfalls of Manual Node Pool Management

Operational Overhead

Risk of Overprovisioning

Underutilization Concerns

Complexity in Scaling

How GKE NAP Works?

Enhancing NAP with Compute Classes

Role of PerfectScale in Workload Optimization

Some Features of PerfectScale:

Getting Started: Implementing NAP and PerfectScale

1. Enabling Node Auto-Provisioning (NAP) in GKE

2. Defining Custom Compute Classes

3. Assigning Compute Classes to Workloads

4. Integrating PerfectScale with GKE

Benefits of GKE NAP

Reduce your cloud bill and improve application performance today

GKE Cost Optimization Best Practices

Amazon EKS Cost Optimization Best Practices

AKS Cost Optimization Best Practices

Latest Articles

GPU Optimization with Exceptional PerfectScale Visibility

On Demand Webinar: Manage & Scale GenAI on Kubernetes

GCP Cloud Billing with PerfectScale

About the author