In our previous post, Kube Explained: Part 1, I described how the introduction of the cloud resulted in CI/CD, Microservices, and a massive amount of pressure to standardize backend infrastructure tooling.
In this post, we’ll cover the first, and most important, domino to fall in this wave of standardization: the container. For the first time, containers standardized the packaging, distribution, and lifecycle of backend services and in doing so, paved the way for container orchestration, Kubernetes, and the explosion of innovation that’s surrounded that ecosystem.
There are plenty of technical deep dives, step-by-step tutorials, and vendor pitches floating around the internet. In this post, Part 2 of our Kubernetes Explained series, I’ll instead attempt to explain the context surrounding the rise of containers – what they are, why they happened, and the key pieces of the ecosystem one must contend with.
Containers combine two key innovations:
Typically when people speak of virtualization, they mean x86 virtual machines as popularized originally by VMware, and later cloud providers like Amazon EC2. Virtual machines virtualize the CPU, Ram, and Disk of a server, parceling these resources out amongst one or more virtual machines (VMs) and providing each of them the illusion of running on their own dedicated hardware.
Virtual machines have four key features that proved crucial in supporting their adoption, and enabling the development of the cloud:
The key innovation of containers was to maintain the programmatic control and multiplexing capabilities of virtual machines while relaxing the heterogeneity and multi-tenancy features. As a result, they maintained the flexibility VMs provided, with much better performance and flexibility.
Cloud providers need heterogeneity and multi-tenancy because they’re serving many customers who have different requirements, and who don’t trust one another. However, an individual software engineering organization building out their backend infrastructure doesn’t need either of these features. They don’t need multi-tenancy, because they are definitionally a single tenant. And they don’t need heterogeneity, because they can standardize on a single operating system.
Furthermore, virtual machines come with costs. They’re heavy, boot slowly, and require RAM and CPU resources for the OS. These costs are worth it if you need all of the features VMs provide, but if you don’t need heterogeneity and multi-tenancy, they’re not worth paying.
VMs work by virtualizing the CPU, Ram, and Disk of a server. Containers work by virtualizing the system call interface of an operating system (typically Linux).
Over time, and long before the introduction of Docker, the Linux Kernel added three key features:
/) for a particular process to be changed. In effect this allows one to isolate parts of the filesystem to separate processes.
Just like VMs, modern container runtimes provide a programmatic API, allowing containers to be started/stopped and generally managed with software. They also allow for multiplexing, in that many containers can run on a single Linux server. However, unlike VMs, they don’t have their own operating system and thus don't support the same degree of heterogeneity. Additionally, many consider containers less secure than virtual machines (I’m personally not 100% convinced on this point, but it’s the conventional wisdom and this discussion is best left for a future post).
A second key innovation in containers, less discussed but no less important, was the standardization of packaging and distribution.
Virtual machines don’t really have a practical, universal, and shareable packaging mechanism. Of course, each hypervisor has some way of storing configuration and disk images. However, in practice, VM images are so large and inconsistent between hypervisors, that VM image sharing never took off in the way one might have expected (… vagrant being a notable exception).
Instead of shipping around VM images, the standard approach before the development of containers was to boot a clean new Linux VM and then use a tool like Chef or Puppet to install the software, libraries, configuration, and data needed on that VM. There are so many package managers, configuration managers, linux distributions, libraries, programming languages, and various other components required to make this work each of which with their own idiosyncrasies and incompatibilities, that this approach never quite worked as smoothly as one would hope.
Docker fixed the issue by standardizing the container image. A container image is simply a Linux file hierarchy, containing everything a particular container needs to run. This includes everything from core OS tools (
/bin/sh, etc), to the libraries that the software needs, the actual application code, and its configuration.
Container images are both completely standard (an image that runs in one place will run everywhere), and much smaller than virtual machine images. This makes them much easier to store and move around. The standardization of the container image for the first time, allows a developer to build a container on their laptop, and have complete confidence that it will work as expected in each step of the CI/CD process, from testing to production.
In this section I’ll briefly cover the various components of the ecosystem that’s grown up around containers. This is a massive and complex topic with many projects fulfilling various niches. My goal in this post is not to be comprehensive, but instead, sketch out the major components, and suggest good places to start for those new to the area.
A container registry is a (usually SaaS) service that stores and distributes container images. In the standard cloud-native architecture, code is packaged into a container by CI/CD and then pushed to a registry where it sits until pulled by various test or production environments that may want to run it.
Everyone running containers uses a registry, so if you’re thinking about containerization, this is a key decision you’ll have to make. There are tons of registries out there, but I’ll specifically call out two options that most teams should investigate first.
As mentioned, a container is a collection fo operating system features that allow the system call interface to be virtualized. Those features, however, are extremely low level and as a result software is required to wrap those features in a more usable abstraction. There are a large number of container runtimes or (runtime related projects) that perform this function – Docker, Rkt, LXD, Podman, Containerd, etc. Most of the projects are of interest to hard-core container enthusiasts and have their place. However, for most people, vanilla open-source Docker is likely the right place to start.
Once containers arrived, a secondary problem of great importance immediately revealed itself. One needs software to manage containers. I.e. to start them, stop them, upgrade them, restart them when they fail, scale them when more or needed. Software that performs these tasks is called a container orchestrator.
In the early days of containers, it looked like there would be a number of popular container orchestrators fighting for market share over the long term. In a relatively short period fo time, Docker Swarm, Mesos DCOS, Nomad (Hashicorp), ECS (Amazon), and of course Kubernetes, all appeared solving essentially this problem.
While each of these options has advantages, disadvantages, and a significant install base. In recent years it’s become clear that Kubernetes will emerge as the industry-standard container orchestrator, and overtime will run the majority of container workloads. Kubernetes isn’t perfect, but as the standard, I recommend it for new deployments.
Containers are basically light-weight virtual machines that virtualize the Linux system call interface instead of the lower-level x86 architecture. There’s really no reason not to use them, even for your monolith. They simplify the deployment process at very little cost. When evaluating containers, start with Docker, Docker Hub or your cloud provider’s registry, and Kubernetes.
In future posts I will dig into Kubernetes itself, container networking, storage, and development. Stay tuned.
By: Ethan J. Jackson
Kelda gives each developer a personal development sandbox that's lightweight, fast, and easy to use. Kelda development environments run on Kubernetes in your cloud while allowing developers to continue using their favorite local development tools. Kelda combines a fantastic developer experience with the performance, stability, sophisticated features possible when running in the cloud.
Learn microservices’ biggest pain point of providing a good local development experience and how you can get the best of both worlds.