Kubernetes v1.34, codenamed “Of Wind & Will (O’ WaW),” introduces 58 enhancements; 23 moving to stable, 22 entering beta, and 13 debuting in alpha Kubernetes. The name “Of Wind & Will” captures the spirit of this release: navigating the unpredictable “winds” of tooling and legacy constraints, guided forward by the community’s unwavering “will” Kubernetes.
These are the features we are most excited about in Kubernetes v1.34: DRA enhancements, pod-level resources, etc. These enhancements significantly improve Kubernetes' capabilities in performance optimization, efficient scaling, and effective resource management.
Let's discuss the major enhancements in Kubernetes v1.34:
Kubernetes v1.34 Stable Features
1. Delayed Creation of Job’s Replacement Pods
Feature Group: SIG Apps | KEP: #3939
In Kubernetes v1.34, Job controllers introduce a new setting that gives you control over when replacement Pods are created. Previously, the controller would launch a replacement Pod as soon as it noticed another Pod beginning to terminate. This led to two Pods running side by side for a short period, which could strain cluster resources, delay scheduling, or even trigger unwanted autoscaler scale-ups. For certain workloads, such as machine learning jobs with TensorFlow or JAX, this behavior was especially problematic because they require exactly one Pod per index at a time.
With the new .spec.podReplacementPolicy field, you can now choose to delay replacement until the original Pod has fully terminated. Setting it to Failed ensures the Job waits until the Pod reaches status.phase: Failed before creating a new one. This change helps avoid resource contention, reduces unnecessary scaling events, and ensures frameworks that depend on strict Pod sequencing run smoothly.
apiVersion: batch/v1
kind: Job
metadata:
name: safe-replacement-job
spec:
parallelism: 1
podReplacementPolicy: Failed
template:
spec:
containers:
- name: worker
image: your-image
restartPolicy: OnFailure
In this version, the .spec.podReplacementPolicy: Failed line ensures Kubernetes waits for the current Pod to exit completely before launching a replacement, avoiding overlap.
2. Recovery from Volume Expansion Failure
Feature Group: SIG Storage | KEP: #1790
In Kubernetes v1.34, storage management becomes more resilient with the graduation of recovery from volume expansion failure to stable. Previously, when you tried to expand a PersistentVolumeClaim (PVC) to a larger size that wasn’t supported by the underlying storage system, the operation would simply fail and leave the volume stuck in a bad state. The only way forward was manual cleanup or recreating the volume both time-consuming and risky for running workloads.
Now, Kubernetes lets you cancel failed expansion attempts and retry with a smaller size that the storage provider actually supports. This makes volume scaling far more flexible and reduces the risk of workloads being blocked by a single unsupported resize request. For example, if you attempted to expand a PVC from 10Gi to 500Gi and the storage backend doesn’t allow such a large jump, you can cancel and retry with 100Gi instead without recreating the volume.
3. More Efficient Requeueing Through Plugin-Specific Callbacks
Feature Group: SIG Scheduling | KEP: #4247
Kubernetes v1.34 brings a major efficiency boost to how the scheduler handles Pods that cannot be placed on a node right away. In earlier versions, the scheduler would reattempt scheduling after a fixed backoff delay or when certain broad cluster events occurred. This often led to unnecessary retries where nothing had changed to make the Pod schedulable, wasting both CPU cycles and time.
With this update, each scheduling plugin can now register callback functions that guide the scheduler on when it’s actually worth retrying. For example, if a Pod was unschedulable because of missing GPU resources, the GPU plugin can signal the scheduler to only retry once a node with available GPUs appears, instead of blindly retrying on every event.
This targeted requeueing not only reduces wasted work but also improves scheduling throughput, especially in large clusters or environments using dynamic resource allocation. Another advantage is that certain plugins can now safely skip the usual backoff period, allowing Pods to be scheduled much faster when the right conditions are met.
By letting plugins decide when to trigger retries, Kubernetes achieves a smarter and more responsive scheduling process, resulting in better performance and faster Pod placement without unnecessary overhead.
4. Streaming List Responses
Feature Group: SIG API Machinery | KEP: #5116
Previously, when clients requested large resource lists such as thousands of Pods or Custom Resources, the API server had to build the entire response in a single memory buffer before sending it out. This approach created heavy memory pressure, slowed down response times, and could even affect the reliability of the whole cluster.
With the new streaming encoding, responses are sent incrementally instead of being built in one giant chunk. This change applies automatically to both JSON and Kubernetes Protobuf response formats, and the feature is now stable. The result is a much smaller and predictable memory footprint on the API server, which in turn improves cluster performance, especially in large-scale environments where such large list requests are frequent.
By moving to streaming list responses, Kubernetes avoids costly memory spikes, reduces the chance of API server slowdowns, and ensures that even the largest workloads can fetch data efficiently without hurting cluster stability.
5. Resilient Watch Cache Initialization
Feature Group: SIG API Machinery & SIG Scalability | KEP: #4568
The watch cache is a crucial layer that maintains an eventually consistent view of cluster state stored in etcd, allowing controllers and clients to watch resources without overwhelming etcd directly.
Previously, if the watch cache wasn’t fully initialized during API server startup, or if it had to be re-initialized later due to errors, it could cause problems. Controllers and clients that relied on watches might fail to connect properly, leading to temporary instability in the control plane.
In Kubernetes v1.34, the initialization process has been hardened to better handle failures. The cache now recovers more gracefully, ensuring that even if there are startup hiccups or re-initialization events, the system can still establish and maintain watches reliably.
By stabilizing watch cache initialization, Kubernetes improves the overall robustness of the control plane, making clusters more resilient and ensuring workloads that depend on watches continue to function smoothly.
6. Relaxing DNS Search Path Validation
Feature Group: SIG Network | KEP: #4427
Previously, Kubernetes applied strict checks to the DNS search path defined in a Pod’s dnsConfig. While this ensured consistency, it created integration issues in complex or legacy networking environments. Administrators trying to mix internal Kubernetes DNS with external domains were forced to use awkward workarounds just to make workloads resolve names correctly.
A common example is when Pods need to reach both internal cluster services and external domains. In the old behavior, the system resolver would append the cluster’s internal search domains to every query, even for external hostnames. This not only generated unnecessary DNS lookups against the cluster DNS server but could also cause resolution errors.
With the new relaxed validation, administrators can configure a Pod’s DNS search path more freely. For example, setting a single dot (.) as the first entry in .spec.dnsConfig.searches tells the resolver not to append cluster search domains to external queries. This improves efficiency by reducing unnecessary DNS traffic and makes it easier to support hybrid networking setups without bending the rules.
apiVersion: v1
kind: Pod
metadata:
name: pod
spec:
containers:
- name: app
image: your-image
dnsConfig:
searches:
- .
- svc.cluster.local
In this example, the leading . ensures external lookups are sent directly to the correct DNS servers, while still keeping cluster service resolution intact.
7. Support for Direct Service Return (DSR) in Windows kube-proxy
Feature Group: SIG Windows | KEP: #5100
In Kubernetes v1.34, Windows kube-proxy now fully supports Direct Server Return (DSR). Previously, when traffic comes through a load balancer, both the request and the response pass through it. This can create extra hops, add latency, and put unnecessary load on the balancer.
With DSR enabled, only the request enters through the load balancer. The response, however, skips the balancer and goes directly back to the client. This not only cuts down network latency but also reduces the processing overhead on the load balancer, making the entire path more efficient.
Although DSR support was first introduced for Windows back in Kubernetes v1.14, it was still considered an experimental feature. With v1.34, it has now reached stable status, meaning it is production-ready and officially recommended for Windows clusters.
For Windows operators running Kubernetes Services with heavy throughput, enabling DSR can result in smoother performance, faster response times, and better overall resource efficiency.
8. Sleep Action for Container Lifecycle Hooks
Feature Group: SIG Node | KEP: #3960, #4818
Kubernetes v1.34 introduces the Sleep action for container lifecycle hooks as a stable feature. Lifecycle hooks like PostStart (runs after a container starts) and PreStop (runs before a container shuts down) have always been available, but until now, developers had to write custom scripts or add sidecar logic just to pause for a few seconds during startup or shutdown.
With the new Sleep action, you can simply tell Kubernetes to pause the container for a specific duration without needing extra code. For example, you might want a container to wait a few seconds before receiving traffic so that all dependencies are ready, or to hold during shutdown so that ongoing requests can finish cleanly. If you specify a negative or zero value, the action returns immediately and acts as a no-op.
This capability first appeared in v1.29 and was refined in v1.32 with support for zero values. Now in v1.34, it has officially reached stable status, meaning you can rely on it for production workloads. It’s a small but powerful improvement for ensuring graceful container lifecycle management.
apiVersion: v1
kind: Pod
metadata:
name: web-with-sleep-hooks
spec:
containers:
- name: web
image: your-image
lifecycle:
preStop:
sleep:
seconds: 10 # pause for 10s before shutting down
In this example, the container will delay shutdown by 10 seconds before the termination process completes.
9. Linux Node Swap Support
Feature Group: SIG Node | KEP: #2400
For a long time, Kubernetes did not allow nodes to use swap space. This meant that when a node ran out of memory, processes inside Pods could get killed suddenly, resulting in crashes and instability. Applications with big memory footprints where parts of the memory were rarely used were especially affected, since they couldn’t benefit from swap to keep running smoothly.
To fix this, Kubernetes introduced configurable swap support back in v1.22, and after a long journey through alpha and beta, it’s finally stable in v1.34. The main mode you’ll use is called LimitedSwap. In this mode, Pods can use swap, but only within their memory limits. That means swap won’t let a Pod use unlimited memory; it just gives the node more breathing room by moving less-used pages to disk instead of killing processes.
By default, the kubelet still runs in NoSwap mode, so unless you change the configuration, Pods won’t touch swap. But if you enable LimitedSwap, you can make workloads more resilient and efficient, especially in clusters with tight memory resources or in edge environments where every bit of hardware counts.
10. Dynamic Resource Allocation (DRA) Reaches GA
Feature Group: SIG Node, WG Device Management | KEP: #4381
In Kubernetes v1.34, Dynamic Resource Allocation (DRA) has graduated to General Availability (GA). This marks a big milestone for workloads that rely on GPUs, TPUs, SmartNICs, or other specialized hardware.
Until recently, Kubernetes scheduling for these devices was rigid and often vendor-specific. DRA changes that by introducing a standardized way to request, allocate, and configure hardware through structured parameters similar to how dynamic provisioning works for PersistentVolumes. This makes device handling more flexible, portable, and future-proof.
DRA introduces new APIs under resource.k8s.io/v1, now stable and enabled by default. These include:
ResourceClaim: lets Pods declare their need for a device.
DeviceClass: defines how a type of device can be allocated.
ResourceClaimTemplate: simplifies claiming devices across workloads.
ResourceSlice: represents the actual capacity available for allocation.
Pods now include a new resourceClaims field in their .spec, which links them to device claims. This gives developers fine-grained control over how resources are bound, shared, or pre-allocated before Pods start running.
For teams running AI/ML training, high-performance computing, or network-intensive workloads, GA support means they can confidently rely on DRA for production.
# DeviceClass that describes how a GPU should be allocated
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: nvidia-gpu
spec:
selectors:
- cel:
expression: device.driver == "nvidia"
---
# ResourceClaimTemplate for workloads needing a GPU
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: gpu-claim-template
spec:
spec:
devices:
requests:
- name: gpu
exactly:
deviceClassName: nvidia-gpu
Kubernetes v1.34 Beta Features
11. Pod-level Resource Requests and Limits
Feature Group: SIG Scheduling, SIG Autoscaling | KEP: #2837
In Kubernetes v1.34, you can now define resource requests and limits at the Pod level, not just for individual containers. Until now, developers had to set CPU and memory values separately for each container in a Pod. This meant either over-provisioning every container (to be safe) or splitting resources very carefully across them, which was time-consuming and inefficient.
With this new capability, you can simply declare a total “budget” of CPU and memory for the entire Pod. The containers inside that Pod then share from this pool, as long as the combined usage stays within the Pod’s defined limit. This makes resource planning much simpler, especially for multi-container Pods that work closely together, like sidecar patterns.
The change also improves scheduling and autoscaling behavior. Since Kubernetes now has a clearer picture of a Pod’s overall resource needs, the scheduler can make better placement decisions, and the Horizontal Pod Autoscaler (HPA) can react more accurately.
apiVersion: v1
kind: Pod
metadata:
name: pod-level-resources-example
spec:
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
containers:
- name: service-A
image: nginx
resources:
requests:
cpu: "0.5"
memory: "1Gi"
- name: sidecar-B
image: busybox
# No container-level resources here - uses what's left in the Pod budget
In this example, the Pod as a whole requests 2 CPUs and 4 Gi of memory, and its total usage is capped at 4 CPUs and 8 Gi, service-A explicitly reserves 0.5 CPU and 1 Gi memory, sidecar-B, having no individual resources declared, shares the remaining budget.
This config is enabled by default in v1.34 (beta), assuming the PodLevelResources feature gate is enabled.
12. Prioritized Alternatives for Device Allocation
Feature Group: WG Device Management | KEP: #4816
In Kubernetes v1.34, Dynamic Resource Allocation (DRA) introduces a new way to express flexible hardware preferences through prioritized alternatives. Many workloads can run on different types of devices for example, a training job might prefer a single high-end GPU but could also run on a pair of mid-range GPUs if the preferred hardware isn’t available.
With the new firstAvailable field in ResourceClaims and ResourceClaimTemplates, users can now define an ordered list of acceptable device configurations. The scheduler will try the list in order, granting the best possible option based on what’s available in the cluster. If none of the listed options can be satisfied, the workload may even proceed with no device allocation, if that’s explicitly allowed.
This feature gives developers and operators more control over balancing performance and availability without manually reworking manifests or writing complex scheduling logic. By encoding fallback strategies directly in resource claims, Kubernetes ensures workloads make optimal use of heterogeneous hardware while maintaining scheduling efficiency.
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: prioritized-gpu-claim
spec:
spec:
devices:
requests:
- name: gpu-choice
firstAvailable:
# First preference: 1 high-end GPU
- name: high-end
deviceClassName: nvidia-a100
count: 1
# Second preference: 2 mid-range GPUs
- name: mid-range
deviceClassName: nvidia-v100
count: 2
# Third preference: no GPU at all (still let Pod run)
- name: none
deviceClassName: ""
count: 0
In this example, the workload first asks for one NVIDIA A100 GPU. If that’s not available, it will fall back to two NVIDIA V100 GPUs. If neither option exists, the scheduler will still let the Pod run without any GPU.
13. Kubelet Now Reports Allocated DRA Resources
Feature Group: WG Device Management | KEP: #3695
In Kubernetes v1.34, the kubelet has been enhanced to report resources allocated through the Dynamic Resource Allocation (DRA) framework. Until now, kubelet only exposed CPU, memory, and standard device resources through its PodResources API. With this change, node agents and monitoring tools can also see which DRA resources (like GPUs, FPGAs, or network interfaces) have been assigned to Pods on a node.
This is a big step for better observability and integration. For example, if a Pod has requested a special accelerator through a DRA driver, the kubelet will now make that visible through its API. Monitoring agents, schedulers, and other node-level components can query the PodResources API and act on this information whether that’s for debugging, fine-grained scheduling, or capacity planning.
Starting with v1.34, this feature is enabled by default, so you don’t need to flip any feature gates.
14. Non-Blocking API Calls in the Scheduler
Feature Group: SIG Scheduling | KEP: #5229
In Kubernetes v1.34, the scheduler takes a big step forward in handling API operations more efficiently. Until now, when the scheduler needed to make an API call such as checking node status or updating bindings, it would block the scheduling cycle until the API server responded. This meant if the API server was slow or overloaded, the scheduler could get stuck, causing delays in scheduling Pods across the cluster.
With the new non-blocking API mechanism, these operations are handled asynchronously. Instead of waiting, the scheduler puts requests into a prioritized queue where duplicates are removed and progress can continue in parallel. This allows the scheduler to keep working on other Pods while the API calls finish in the background.
The benefits are clear:
Lower scheduling latency because the scheduler doesn’t stall during API delays.
No thread starvation, so scheduler workers remain free to handle new Pods.
Faster retries for unschedulable Pods, since they can be re-evaluated immediately without waiting for earlier API calls to finish.
This design keeps backward compatibility, so existing clusters don’t need changes to benefit. To help operators, new metrics have been added to track pending API operations, making it easier to spot issues if the queue starts growing.
15. In-place Pod Resize Improvements
Feature Group: SIG Node, SIG Autoscaling | KEP: #1287
In Kubernetes v1.34, in-place Pod resizing takes another big step forward. Originally introduced as an alpha feature, it reached beta and became enabled by default in v1.33. Now in v1.34, it gets smarter with two important upgrades: support for decreasing memory allocations and integration with Pod-level resource settings.
Before this, you could only request more CPU or memory for a running container without deleting and recreating the Pod. That was already useful for stateful apps, databases, or long-running ML workloads where restarts were disruptive. However, memory could only grow, never shrink. This meant if an app no longer needed the extra memory, the node couldn’t reclaim it. With v1.34, Kubernetes now supports scaling memory both up and down, letting clusters make better use of available resources.
The second improvement is tighter integration with Pod-level resources. Instead of just managing requests and limits on individual containers, Kubernetes now coordinates these updates at the Pod level. This ensures smoother scheduling, fairer resource allocation, and better alignment with autoscaling decisions.
The feature remains in beta in v1.34, so it’s not yet fully GA, but it’s mature enough to use in most clusters. If you’re running workloads that change their resource usage over time, this update gives you more flexibility without the downtime of Pod restarts.
16. Graceful Node Shutdown for Windows Nodes
Feature Group: SIG Windows | KEP: #4802
In Kubernetes v1.34, Windows nodes finally gain the same graceful shutdown handling that Linux nodes have had for some time. This means when a Windows machine is shut down or restarted, whether for maintenance, patching, or a scheduled reboot, the kubelet can detect the shutdown signal and begin safely terminating Pods instead of abruptly killing them.
The kubelet now hooks into Windows’ pre-shutdown notifications. As soon as the system signals it’s about to power down, the kubelet starts the usual Pod termination flow: it runs any lifecycle hooks you’ve defined, applies the configured terminationGracePeriodSeconds, and gives workloads the chance to exit cleanly. This avoids data corruption, stuck workloads, or failed restarts that happened before when Pods were simply cut off.
This improvement is enabled by default starting with Kubernetes v1.34 (beta feature). It’s especially valuable in production environments where Windows nodes are part of the cluster, as it makes updates, restarts, and maintenance far more reliable. Now, just like Linux nodes, your Windows workloads get a smoother shutdown process with respect for Pod lifecycles.
Kubernetes v1.34 Alpha Features
17. Expressing Pod Placement with .status.nominatedNodeName
Feature Group: SIG Scheduling | KEP: #5278
In Kubernetes v1.34, the scheduler has become smarter about how it signals Pod placement. Until now, the .status.nominatedNodeName field was mainly used when preemption was happening; it told you which Node the Pod was expected to land on after higher-priority Pods were cleared.
The problem was that while the scheduler was still working to finalize a Pod’s binding, autoscalers and other external tools had no clue that this Node was already “spoken for.” As a result, an autoscaler might wrongly assume the Node was underutilized and delete it, even though the scheduler was about to place a Pod there.
To fix this, Kubernetes now lets the scheduler use .status.nominatedNodeName more broadly, not just for preemption. With the NominatedNodeNameForExpectation feature gate enabled, the field also shows scheduler intent for normal Pod placements. This means external components like cluster autoscalers can see these “internal reservations” and make better decisions about scaling or Node lifecycle management.
This change reduces wasted cycles, avoids Pods being disrupted unnecessarily, and makes scheduling more predictable across the whole cluster.
Example: when a Pod is pending and the scheduler has nominated a node, you can see the nominatedNodeName field in its status:
apiVersion: v1
kind: Pod
metadata:
name: demo-pod
...
status:
nominatedNodeName: ip-10-0-123-45.ec2.internal
Here, the nominatedNodeName indicates the scheduler’s current intent to place the Pod on that specific Node.
18. Support for KYAML, a Kubernetes Dialect of YAML
Feature Group: SIG CLI | KEP: #5295
In Kubernetes v1.34, a new alpha feature introduces KYAML, a safer and less ambiguous dialect of YAML designed specifically for Kubernetes resources. YAML has long been the default format for Kubernetes manifests, but it comes with pitfalls: whitespace-sensitive formatting, implicit type conversions, and inconsistent quoting rules that can confuse even experienced users. JSON avoids some of these issues but lacks comments and requires strict syntax, which often makes it less user-friendly.
KYAML strikes a balance by removing YAML’s ambiguous behaviors while still keeping it compatible with existing tools. Any KYAML file is also valid YAML, so you don’t need to rewrite your manifests to start using it. Beginning in v1.34, kubectl can emit output in KYAML format when you enable it by setting the environment variable KUBECTL_KYAML=true. You can then request output with kubectl get -o kyaml …, just like you do today with -o yaml or -o json. This allows developers to take advantage of KYAML’s strictness while keeping full compatibility with the YAML-based workflows they already rely on.
This change is particularly valuable for teams managing large sets of manifests or working in regulated environments, where predictability and avoiding hidden parsing issues is critical. While still alpha, KYAML lays the foundation for safer Kubernetes configuration management in the future.
# Example: using KYAML output with kubectl v1.34
export KUBECTL_KYAML=true
kubectl get pods -o kyaml
19. Device Resource Allocation (DRA)
Feature Group: WG Device Management | KEPs: #4680, #5004, #5075, #5007
In Kubernetes v1.34, several new alpha features extend Device Resource Allocation (DRA), making it more powerful and flexible for clusters with advanced hardware.
Resource health status for DRA (#4680): Troubleshooting Pods that crash due to faulty GPUs, NICs, or other devices has always been tricky. With this feature, the kubelet can now expose the health status of allocated devices directly in the Pod’s status. This improves observability by letting you quickly see if a Pod is failing because its underlying device is unhealthy. To use it, enable the ResourceHealthStatus feature gate and run a DRA driver that implements the DRAResourceHealth gRPC service.
Extended resource mapping (#5004): Many workloads already use extended resources (.spec.resources.limits) for GPUs and other hardware. This feature makes DRA resources show up the same way, providing a simple compatibility layer. That means application developers can adopt DRA without changing their manifests, while cluster administrators can advertise DRA-managed resources seamlessly.
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
name: nvidia-gpus
spec:
# Use CEL selector to match devices with a specific driver + type attribute
selectors:
- cel:
expression: device.driver == "nvidia.com" && device.attributes["nvidia.com/model"] == "gpu"
# Expose this DeviceClass as an extended resource name
extendedResourceName: hardware.example.com/gpu
---
apiVersion: v1
kind: Pod
metadata:
name: gpu-demo
spec:
containers:
- name: cuda-job
image: nvidia/cuda:12.2.0-runtime-ubuntu22.04
command: ["nvidia-smi"]
resources:
requests:
hardware.example.com/gpu: 2
limits:
hardware.example.com/gpu: 2
DRA consumable capacity (#5075): Previously, device drivers could only expose whole devices or coarse “slices.” With the new DRAConsumableCapacity feature gate, drivers can define and share fine-grained portions of a device across multiple Pods, namespaces, or claims. This unlocks new use cases such as bandwidth-aware networking or multi-tenant GPU sharing, while ensuring the scheduler respects capacity limits.
Device binding conditions (#5007): To make scheduling more reliable, the scheduler now delays binding a Pod to a Node until it confirms that all external devices (like FPGAs or attachable NICs) are ready. This check happens in the PreBind phase, reducing the risk of Pods failing at startup due to missing hardware dependencies.
20. Container Restart Rules
Feature Group: SIG Node | KEP: #5307
In Kubernetes v1.34, Pods get more flexibility in how individual containers are restarted when they exit. Up until now, every container inside a Pod had to follow the same .spec.restartPolicy. This was fine for simple Pods but limiting for more complex setups. For example, if you’re using init containers for setup, you usually don’t want Kubernetes to keep retrying them endlessly when they fail. On the other hand, in workloads like machine learning training, you might want a failed container to restart quickly without forcing the entire Pod to be recreated; otherwise, you risk losing training progress.
The new ContainerRestartRules feature gate changes this. When enabled, you can now set a restartPolicy per container instead of only at the Pod level. Kubernetes also introduces a new field called restartPolicyRules, which lets you define restart behavior based on exit codes. This means you could, for example, restart a container only if it failed with certain retriable errors, while treating other failures as final.
This fine-grained control helps make better use of resources and improves reliability in scenarios where different containers in the same Pod have very different roles or lifecycles
apiVersion: v1
kind: Pod
metadata:
name: ml-training
labels:
app: trainer
spec:
# This is the fallback policy if a container doesn't specify its own
restartPolicy: Never
containers:
- name: trainer
image: myrepo/ml-trainer:latest
command: ["python", "train.py"]
# Per-container restart control
restartPolicy: Always
restartPolicyRules:
# Retry on exit code 42 (example: transient GPU error)
- exitCode: 42
action: Restart
# Stop restarting if container fails with "out of memory" (137)
- exitCode: 137
action: FailContainer
- name: metrics-sidecar
Image: <img>
args: ["--scrape-interval=10s"]
restartPolicy: Always
initContainers:
- name: db-migrator
image: <img>
command: ["sh", "-c", "run-migrations.sh"]
# No need to retry endlessly if migration fails
restartPolicy: OnFailure
restartPolicyRules:
- exitCode: 1
action: FailContainer
It shows how different containers in the same Pod can have restart behaviors like retrying ML training on transient GPU errors, always keeping a metrics sidecar alive, and preventing endless retries for a failing init container.
21. Load Environment Variables from Files Created at Runtime
Feature Group: SIG Node | KEP: #5307
In Kubernetes v1.34, you now have the option to load environment variables from files that are created at runtime, thanks to the new EnvFiles feature (behind a feature gate). Until now, environment variables were limited to values set directly in Pod specs, ConfigMaps, or Secrets. That worked fine for static values, but it was restrictive when you needed values that are only known at runtime.
With EnvFiles, you can let one container (often an init container) generate the required variable and write it to a file, while another container in the same Pod can automatically load that file as its environment variables. This means you don’t have to wrap your application’s entry point with a custom script just to inject runtime values.
This is especially useful for AI/ML training jobs, where each Pod needs initialization with values that aren’t known until the job actually starts. Instead of hardcoding or hacking around it, you can now declare the variable through a generated file and have Kubernetes handle the injection cleanly.
apiVersion: v1
kind: Pod
metadata:
name: ml-training-job
spec:
# Init container generates runtime config
initContainers:
- name: generate-runtime-vars
image: busybox:1.36
command: ["sh", "-c", "echo 'EPOCHS=25' > /config/runtime.env && echo 'LR=0.001' >> /config/runtime.env"]
volumeMounts:
- name: config-volume
mountPath: /config
# Main container consumes the generated file as env vars
containers:
- name: trainer
image: python:3.12-slim
command: ["python", "-c"]
args:
- |
import os
print("Training with", os.environ["EPOCHS"], "epochs at learning rate", os.environ["LR"])
envFromFile:
- path: runtime.env
volumeName: config-volume
volumes:
- name: config-volume
emptyDir: {}
Init container generates /config/runtime.env in an emptyDir volume, and the main container uses envFromFile to load all variables automatically, no wrapper scripts needed.
Kubernetes 1.34 brings a total of 58 Kubernetes Enhancement Proposals (KEPs). These enhancements include Kubernetes functionalities, flexibility, resource management, observability, and more.
Beyond the major changes we've discussed, there are other features added by the k8s team. We encourage you to have a look at the Kubernetes v1.34 release notes and check this for more details.
