GPUs have become an integral part of modern cloud infrastructure. As many teams increasingly adopt AI, machine learning, and large language models (LLMs) to evolve their applications, GPU usage is growing rapidly, along with the Kubernetes cloud costs to maintain such workloads. This trend introduces a new layer of complexity to Kubernetes resource management: GPU utilization optimization.
“Unlike other Kubernetes resources, GPU requests are treated in a different way: a pod either gets access to a whole GPU or none at all. Without fractional allocation, GPUs are often underutilized, resulting in GPU waste. Since GPUs can make up to 75% of hourly infrastructure costs, any inefficiency leads to substantial financial waste. Eliminating overallocation can therefore have a major impact, unlocking significant cost savings," said Eli Birger, CTO of PerfectScale by DoiT. "Without visibility into actual GPU utilization, engineers lack the insights to right-size workloads, implement advanced pod scheduling strategies, and make data-driven infrastructure decisions.”
To help teams overcome these challenges, we’re excited to introduce GPU Visibility from PerfectScale by DoiT - the feature that gives you a clear picture of GPU usage across your Kubernetes clusters, helping to identify inefficiencies, right-size workloads, and take full advantage of your GPU resources.

Let’s take a closer look at how GPU Visibility helps uncover inefficiencies and optimize GPU utilization with ease.
What does GPU visibility provide?
PerfectScale’s GPU feature is a powerful solution that collects GPU utilization metrics and delivers deep visibility within the context of the entire cluster. This view enables precise analysis of the Kubernetes environment, unlocking accurate data-driven optimization.

Navigate to PerfectScale’s Infrafit to access detailed GPU utilization insights for your node groups. The utilization chart provides a clear breakdown of GPU usage and requests across your infrastructure. This view is particularly helpful for quickly identifying underutilized or idle GPU capacity, prioritizing optimization efforts on the most critical areas, reducing GPU waste, and enhancing overall cost-efficiency.
Once a problematic area is identified, simply click on the corresponding node group to drill down into a more detailed infrastructure view. This level of granularity provides a comprehensive breakdown of individual instances within the group, along with key metrics for each instance.

After spotting the issue with the provided visibility, the next step is to decide how to fix it. The platform helps you understand what’s going wrong (GPUs not being fully used, some workloads provisioned with more resources than needed, or certain instances running without doing much) and choose the right strategy to solve the problem. For example, you might right-size instances to better fit the workload, balance the load and improve bin-packing, or turn off instances that are not needed.
Exploring usage scenarios
Right-sizing GPU instances

In many cases, workloads don’t require the full capacity of the GPU machine. By identifying underutilized instances, you can replace expensive instances with smaller and more cost-effective ones that are better suited for the task. This helps reduce unnecessary spending while still meeting performance requirements.
To make confident data-driven decisions, you can seamlessly leverage PerfectScale’s GPU visibility and spot where this adjustment makes the most sense.
Maximizing utilization with GPU splitting
While switching to smaller GPU instances is often a great way to reduce costs, it's not always the best or most effective approach.
Some workloads, for example, shared ML pipelines, don’t scale well when split across separate machines and might require tighter resource control. In such cases, smaller instances alone may not provide the desired performance, efficiency, or flexibility.

This is where NVIDIA’s Multi-Instance GPU (MIG) might help. MIG lets you split one physical GPU into several smaller, isolated units, where each has its own compute, memory, and bandwidth. These pieces act like separate GPUs, so you can run multiple small or parallel jobs at the same time without conflicts.
PerfectScale makes it easy to pinpoint underutilized GPUs that run workloads that don’t need full capacity. Instead of switching to a smaller machine, you can keep the bigger GPU and divide it into MIG instances, matching each slice to specific workload needs.
Increasing efficiency with KAI Scheduler
Some tasks don’t need a GPU all the time, for example, ephemeral workloads, CI/CD jobs, or model training steps with long CPU waits. They only use it for a short period, and giving each of these jobs a dedicated GPU significantly wastes resources and money.

With PerfectScale GPU visibility, you can easily identify such patterns as workloads that don’t fully use their allocated GPU. Once identified, you can implement Kai-Scheduler to consolidate multiple jobs onto a single GPU, maximizing usage and reducing costs without sacrificing performance.
You’ve seen how PerfectScale helps uncover GPU waste and apply smart optimization strategies. Now it’s time to put that insight to work!
Dive deeper into our Documentation Portal, or schedule a technical session with our team for expert assistance.
Not using PerfectScale yet? Start for free today and simplify your K8s optimization journey.
