How SNCF use cut K8s waste and increase reliability at scale

Learn how PerfectScale by DoiT helped SNCF cut K8s costs by 30%, while improving the stability and sustainability of their environment.
Brendan Cooper
January 27, 2026
Subscribe to our newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Meet SNCF

SNCF is one of Europe’s largest transportation groups, operating France’s national rail network and global mobility services through brands such as TGV, OUIGO, Eurostar, TER, Transilien and Keolis. With over 270,000 employees and €40B+ in annual revenue, SNCF relies on high-availability digital systems to power ticketing, timetables, onboard services, and real-time operational logistics. Kubernetes underpins many of these services across hundreds of clusters running in mission-critical environments.

As part of a company-wide push toward digital modernization and efficiency, SNCF needed to control the escalating cost of running these clusters without sacrificing resilience. Rightsizing had been attempted manually via traditional observability solutions (eg: Datadog, Prometheus etc. and FinOps workshops; however, the approach could not scale across 200+ projects and up to 250 clusters, some of which hosted more than 1,000 workloads. Over-provisioning was widespread, as engineers understandably defaulted to safety in the face of uncertainty, particularly in production environments, when enforcing optimization carried real risk of service disruption.

“Our challenge wasn’t just to reduce spend. We needed to reduce waste in a way that was safe for production services and sustainable at our scale.” Thomas Comtet, Senior Staff Engineer, SNCF

The Challenge

The platform team was caught between Kubernetes’ FinOps promises (strategic for adoption), risk tolerance from developers (reliability, uptime, etc.) and increasing financial pressure from leadership. Manual rightsizing required tribal knowledge, lengthy meetings and restarts, and every optimization effort had to be restarted each time ownership changed or engineers rotated. Datadog-based analysis often overestimated usage due to aggregation effects, leading to mistrust in the recommendations. The work was inconsistent, slow, and could never be completed across the full footprint.

SNCF needed a system that would:

  • Provide trustworthy, behavior-based rightsizing intelligence
  • Work safely in production environments
  • Reduce cost without compromising reliability
  • Operate continuously, not episodically
  • Scale across clusters and teams without friction

Following the Rugby World Cup and Olympic Games held in France in 2024, SNCF was mandated to refocus on cost efficiency in digital operations without slowing modernization of its core rail platforms.

The Solution

Adopting PerfectScale by DoiT as a production-grade optimization control plane

SNCF discovered PerfectScale, now “PerfectScale by DoiT”, at KubeCon and began an engagement with the PerfectScale team. The key differentiator wasn’t another dashboard, it was the ability to generate risk-aware, in-place rightsizing recommendations that could be safely applied in live production environments.

“What convinced us was that PerfectScale did not ask us to trust theory. It showed us exactly what could change without hurting stability.” Thomas Comtet

Moving from recommendations to automation with ArgoCD and CR-based control

PerfectScale by DoiT integrates via Custom Resources and ArgoCD so that Autopilot can be activated at the namespace level as a feature flag. SNCF established a standard Autopilot configuration and deployed it across non-production environments automatically, while allowing fine-grained overrides in edge cases. For production, SNCF introduced automation gradually, starting with the entire cloud-native stack (Datadog, Kyverno, KEDA, AWS Load Balancer Controller, Karpenter), and validated reliability before extending to application namespaces.

Embedding optimization as governance, not a one-time project

Rather than running a finite rightsizing initiative, SNCF turned optimization into a continuous operating behavior, enforced automatically through governance with PerfectScale by DoiT. Because recommendations are grounded in observed workload behavior and enforced through automation, they are no longer debated but executed as policy.

Anticipating in-place resizing and future gains

The PerfectScale team anticipated that in-place resizing would remove the hidden cost of restart-based optimizations. Without PerfectScale, SNCF would have needed to custom-engineer this capability or train dozens of teams to apply it safely. By standardizing with PerfectScale by DoiT, SNCF avoided the engineering burden and accelerated its roadmap toward safer optimization in production.

The Results

Sustained cost savings while scaling 30% more workloads
Since adoption, SNCF’s Kubernetes usage increased by roughly 30% without increasing cloud cost. In other words, without PerfectScale by DoiT, the bill would have risen significantly to support those workloads. Instead, actual cloud billing in September 2025 was lower than in January 2025, despite the higher volume.

“PerfectScale allowed us to grow capacity without growing cost. We effectively absorbed 30% more usage for free.” Thomas Comtet

Annualized savings are estimated at ~€500K per year, with the majority (~€350K) coming from non-production environments via automation. In production, the choice to maintain conservative headroom policies means that automating for efficiency still delivers a net benefit to resilience, while optimizing for cost only when safe to do so.

Automation adoption across critical estate
SNCF currently activates automation on 45% of non-production namespaces, 1% of production namespaces, and 100% of the cloud-native stack in both environments. This means the most critical shared infrastructure across clusters is governed automatically.

“The real impact is cultural. Engineers stopped guessing. Optimization is no longer a negotiation, it’s a governed, automated behavior embedded into how teams work.” Thomas Comtet

Governance and stability, not just savings
Cost reduction alone would not have been acceptable if it degraded operations. Instead, cluster stability improved as PerfectScale by DoiT redistributed resources based on demand curves and failure risk, thereby reducing the probability of CPU starvation while eliminating excess capacity.

“The savings were real, but the stability gain is what got internal teams to trust it. We could optimize without fear.” Thomas Comtet

What's Next?

SNCF plans to continue expanding automation across additional non-production namespaces and gradually into selected production environments for early adopters. The team is also evaluating in-place pod rightsizing to further minimize restart-based disruptions and improve workload stability.

Beyond its own adoption roadmap, SNCF has become an active contributor to PerfectScale’s product evolution, regularly sharing feature requests, many of which have already been implemented. Recent enhancements, such as Java workload support and in-place pod rightsizing, were directly influenced by SNCF’s feedback and quickly adopted by their engineering teams.

This ongoing collaboration highlights not only the strength of the solution itself but also the responsiveness and partnership-driven approach of the PerfectScale team. “We proved it in production,” said Thomas Comtet. “Now we’re scaling what works and helping make it even better.”

Thomas Comtet, Senior Staff Engineer, SNCF

“PerfectScale by DoiT gave us what dashboards and meetings never could: a safe, automated way to reduce Kubernetes waste without risking uptime. It enables us to scale 30% more projects without increasing spend, while making optimization a built-in behavior rather than a manual effort. We now treat optimization as governance, not guesswork.”

Reduce your cloud bill and improve application performance today

Install in minutes and instantly receive actionable intelligence.
Learn how PerfectScale by DoiT helped SNCF cut K8s costs by 30%, while improving the stability and sustainability of their environment.
This is some text inside of a div block.
This is some text inside of a div block.

About the author

This is some text inside of a div block.
more from this author
Reduce your cloud bill and improve application performance today

Install in minutes and instantly receive actionable intelligence.

By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.