LDR: To start, this is not going to be one of those traditional “Containers vs VMs” conversations. This series is more of a first-hand account of the evolution of application infrastructures from the operations team’s point of view; from physical servers all the way to the rise of Kubernetes.
The rise in popularity of Kubernetes is a tale of overcoming the operational complexities of scaling application infrastructure to support the growing demand for applications and services.
I like to think of it as a story of abstraction, in which we have added flexibility and scalability by subtracting dependencies that slowed operations. We still have not removed all the complexities. Hell, you could easily argue things got more complex during this evolution, but this progression has driven results that have changed the way technology impacts the world we live in today.
Let’s dive deeper into what this means by taking you through my accounts of moving from manually configuring servers to managing at-scale DevOps operations.
The architecture of a physical server was pretty straightforward. You had a server, and within that server you had an OS that was running your application’s services.
Physical servers came with a lot of operational burdens which were painful and tedious to deal with.
Every new server would need to be manually connected to electricity and the network. Then you needed to manually install the OS, networking, monitoring, firewall, basic libraries, and security patches - it took a lot of effort. Deploying an application on physical servers would also require you to manually install, connect, and configure each machine.
Also, physical servers did not scale on-demand. Everything described above was repeated as you scaled. It required a complete setup and deployment for every new server, which could easily take days if not weeks or even months if you need to order the hardware. You always needed reserve servers stashed away in case you had urgent production needs.
Physical servers were also not efficient from a capacity standpoint. Say you purchased a server for a mission-critical DB. You would purchase more capacity than you needed just to allocate some room for growth. This led to wasted, unused resources. Then, as your operation grew, this server soon wouldn’t be big enough for your DB, so you purchased a bigger one, then a bigger one, then a bigger one (an endless, painful cycle).
Another problem is that you weren't just running the DB on that server. You needed to run a monitoring agent, configuration management agent and some supervisor/nanny process to make sure your critical app was running properly. These processes weren’t guaranteed to play nice. A single monitoring process can suddenly take over all of the disk space or some other process can have a memory leak and cause OS level OOM.
Or those situations when you needed to upgrade a process (like a security fix for a monitoring agent), which in turn required an OS library upgrade. You would gladly do these upgrades only to later discover that they caused an incompatibility issue with a mission-critical service that now can’t start…
Luckily for us, a new solution came out to help and remove some of these pains: