If you're working with LLMs or production AI workloads and want to leverage Kubernetes effectively, this session is for you. Join us for a deep dive into managing and scaling Generative AI on Kubernetes.
What we'll cover:
How to run AI models for inference on Kubernetes for production: from packaging your model to scaling and performance monitoring
Kubernetes, GPUs, and quota management
How Kubernetes itself is evolving to better support LLM workloads (DRA, Gateway Extension, LeaderWorkerSet, Kueue)
Together with the ecosystem to manage training and inference workload (vLLM, Kubeflow, KServe, Llama Stack, llm-d)
This webinar is a practical companion to the book "Generative AI on Kubernetes", authored by our hosts, Roland Huß and Daniel Zonca, and offering hands-on strategies for running and optimizing your infrastructure to support these large-scale workloads.
Who is this webinar for:
DevOps engineers and platform teams looking to support AI/LLM workloads
ML/AI engineers deploying models in production environments
Kubernetes administrators and architects interested in AI scalability
Anyone curious about running or scaling LLMs using modern Kubernetes tools
About the Presenters
.png)
Roland Huß
Distinguished Engineer,Red Hat
Roland Huß is a Distinguished Engineer at Red Hat with over 25 years of programming experience. He currently works as the Llama Stack architect within Red Hat OpenShift AI (RHOAI), where he focuses on integrating the Llama Stack to advance AI-driven development workflows. He is also a co-author of Kubernetes Patterns (O’Reilly), sharing his extensive expertise in cloud-native architecture, AI integration, and serverless innovation.
.png)
Daniele Zonca
Senior Principal Software Engineer, Red Hat
Daniele Zonca is a Senior Principal Software Engineer at Red Hat and the architect model serving for Red Hat OpenShift AI product. He is one of the founders of the TrustyAI project and contributes to many open source projects like KServe, vLLM or Kubeflow. Before that he led the Big Data development team in one of the major European banks designing and implementing analytical engines.

Anton Weiss
Chief Storyteller PerfectScale by DoiT
Anton has a storied career in creating engaging and informative content that helps practitioners navigate through the complexities of ongoing Kubernetes operations. With previous experience as a CD Unit Leader, Head of DevOps, and CTO and CEO he has worn many hats as a consultant, instructor, and public speaker. He is passionate about leveraging his expertise to support the needs of DevOps, Platform Engineering, and Kubernetes communities.
