Manage & Scale GenAI on Kubernetes

If you're working with LLMs or production AI workloads and want to leverage Kubernetes effectively, this session is for you. Join us for a deep dive into managing and scaling Generative AI on Kubernetes.

What we'll cover:

How to run AI models for inference on Kubernetes for production: from packaging your model to scaling and performance monitoring
Kubernetes, GPUs, and quota management
How Kubernetes itself is evolving to better support LLM workloads (DRA, Gateway Extension, LeaderWorkerSet, Kueue)
Together with the ecosystem to manage training and inference workload (vLLM, Kubeflow, KServe, Llama Stack, llm-d)

This webinar is a practical companion to the book "Generative AI on Kubernetes", authored by our hosts, Roland Huß and Daniel Zonca, and offering hands-on strategies for running and optimizing your infrastructure to support these large-scale workloads.

Who is this webinar for:

DevOps engineers and platform teams looking to support AI/LLM workloads
ML/AI engineers deploying models in production environments
Kubernetes administrators and architects interested in AI scalability
Anyone curious about running or scaling LLMs using modern Kubernetes tools

‍

About the Presenters

Roland Huß
Distinguished Engineer,Red Hat

Roland Huß is a Distinguished Engineer at Red Hat with over 25 years of programming experience. He currently works as the Llama Stack architect within Red Hat OpenShift AI (RHOAI), where he focuses on integrating the Llama Stack to advance AI-driven development workflows. He is also a co-author of Kubernetes Patterns (O’Reilly), sharing his extensive expertise in cloud-native architecture, AI integration, and serverless innovation.

Daniele Zonca

Senior Principal Software Engineer, Red Hat

Daniele Zonca is a Senior Principal Software Engineer at Red Hat and the architect model serving for Red Hat OpenShift AI product. He is one of the founders of the TrustyAI project and contributes to many open source projects like KServe, vLLM or Kubeflow. Before that he led the Big Data development team in one of the major European banks designing and implementing analytical engines.

Anton Weiss
‍Chief Storyteller PerfectScale by DoiT

Anton has a storied career in creating engaging and informative content that helps practitioners navigate through the complexities of ongoing Kubernetes operations. With previous experience as a CD Unit Leader, Head of DevOps, and CTO and CEO he has worn many hats as a consultant, instructor, and public speaker. He is passionate about leveraging his expertise to support the needs of DevOps, Platform Engineering, and Kubernetes communities.

‍

On Demand Webinar: Manage & Scale GenAI on Kubernetes

What we'll cover:

Who is this webinar for:

‍

About the Presenters

Roland Huß
Distinguished Engineer,Red Hat

Daniele Zonca

Senior Principal Software Engineer, Red Hat

Anton Weiss
‍Chief Storyteller PerfectScale by DoiT

Reduce your cloud bill and improve application performance today

Latest Articles

5 Strategies to Optimize Java in 2026

KubeCon + CloudNativeCon North America 2025

Top Ten Kubernetes Reliability Risks

About the author

On Demand Webinar: Manage & Scale GenAI on Kubernetes

What we'll cover:

Who is this webinar for:

‍

About the Presenters

Roland HußDistinguished Engineer,Red Hat

Daniele Zonca

Senior Principal Software Engineer, Red Hat

Anton Weiss‍Chief Storyteller PerfectScale by DoiT

Reduce your cloud bill and improve application performance today

Scaling Success: The Role of GenAI in Modern DevOps

Scaling Out GenAI with Message Queues on Kubernetes

AI & SDLC: How Systems Thinking Impacts Software Delivery

Latest Articles

5 Strategies to Optimize Java in 2026

KubeCon + CloudNativeCon North America 2025

Top Ten Kubernetes Reliability Risks

About the author

Roland Huß
Distinguished Engineer,Red Hat

Anton Weiss
‍Chief Storyteller PerfectScale by DoiT