I spent half a year deep-diving into the world of containers and their orchestration. I have been enjoying it very much and learned a lot. On my journey, I need to tackle lots of interesting and specific concepts. But there is one commonality almost every project in this area possesses. When it comes to containers - the Go programming language is ubiquitous!
All posts in categoryContainers:
In the previous articles, we discussed the scope of the container runtime shim and drafted the minimum viable version. Now, it's time to move on and have some fun with more advanced scenarios! Have you ever wondered how
docker run -i or
kubectl run --stdin work? If so, this article is for you! We will try to replicate this piece of functionality in our experimental container manager. And as you have probably guessed, the container runtime shim will do a lot of heavy lifting here again.
conman - interactive container demo
Well, at this moment we already know what we need to deal with. In order to use runc from code we need to implement our shim as a daemon and this daemon has to be as long-lived as the underlying container process. In this article, we will try to develop a minimum viable runtime shim and integrate it with our experimental container manager.
The minimal shim implementation takes as its input a path to the container bundle (with the
config.json) as well as the list of the predefined locations (for the container log file, container pidfile, container exit status file, etc). The shim needs to create a container by executing runc with the provided parameters and then serve the container process until its termination. The planned functionality of this shim version includes:
- Detaching the shim from the container manager process.
- Launching runc and handling container creation errors.
- Reporting the status of the container creation back to the manager.
- Streaming container's stdout and stderr to the log file.
- Tracking and reporting the exit code of the container.
A container runtime shim is a piece of software that resides in between a container manager (containerd, cri-o, podman) and a container runtime (runc, crun) solving the integration problem of these counterparts.
The easiest way to spot a shim is to inspect the process tree on a Linux host with a running docker container:
ps auxf output on a host running
docker run -it ubuntu bash; notice
containerd-shim process in between
On the one hand, runtimes need shims to be able to survive managers restarts. On the other hand, shims are helping container managers to deal with the quirky behavior of runtimes. As a part of the container manager implementation series, we will try to create our own shim and then integrate it with conman, an experimental container manager. Hopefully, during the development, we will gain an in-depth understanding of the topic.
However, before jumping to the shim development, we need to familiarize ourselves with the container runtime component of the choice. Unsurprisingly, conman uses runc as a container runtime, so I will start the article by covering basic runc use cases alongside its design quirks. Then I'll show the naive way to use runc from code and explain some related pitfalls. The final part of the article will provide an overview of the shim's design.
When I'm diving into a new codebase, I always start from the project structure analysis. And my favorite tool is
tree. However, not every project is perfectly balanced. Some files and folders tend to be more popular and contain much more code than others. Seems like yet another incarnation of the Pareto principle.
So, when the
tree's capabilities aren't enough, I jump to
cloc. This tool is much more powerful and can show nice textual statistics for the number of code lines and programming languages used per the whole project or per each file individually.
However, some projects are really huge and some lovely visualization would be truly helpful! And here the FlameGraph goes! What if we feed the
cloc's output for the Kubernetes codebase to FlameGraph? Thanks to the author of this article for the original cloc-to-flamegraph one-liner:
git clone https://github.com/brendangregg/FlameGraph go get -d github.com/kubernetes/kubernetes cd $(go env GOPATH)/src/github.com/kubernetes/kubernetes cloc --csv-delimiter="$(printf '\t')" --by-file --quiet --csv . | \ sed '1,2d' | \ cut -f 2,5 | \ sed 's/\//;/g' | \ ~/FlameGraph/flamegraph.pl \ --width=3600 \ --height=32 \ --fontsize=8 \ --countname=lines \ --nametype=package \ > kubernetes.html open kubernetes.html
With this article, I want to start a series about the implementation of a container manager. What the heck is a container manager? Some prominent examples would be containerd, cri-o, dockerd, and podman. People here and there keep calling them container runtimes, but I would like to reserve the term runtime for a lower-level thingy - the OCI runtime (de facto runc), and a higher-level component controlling multiple such runtime instances I'd like to call a container manager. In general, by a container manager, I mean a piece of software doing a complete container lifecycle management on a single host. In the following series, I will try to guide
you myself through the challenge of the creation of yet another container manager. By no means, the implementation is going to be feature-complete, correct or safe to use. The goal is rather to prove the already proven concept. So, mostly for the sake of fun, let the show begin!
Containers gave birth to more advanced server-side architectures and sophisticated deployment techniques. Containers nowadays are so widespread that there is already a bunch of standard-alike specifications (1, 2, 3, 4, ...) describing different aspects of the containers universe. Of course, on the lowest level lie Linux primitives such as namespaces and cgroups. But containerization software is already so massive that it would be barely possible to implement it without its own concern separation layers. What I'm trying to achieve in this ongoing effort is to guide myself starting from the lowest layers to the topmost ones, having as much practice (code, installation, configuration, integration, etc) and, of course, fun as possible. The content of this page is going to be changing over time, reflecting my understanding of the topic.
Have you ever been wondering how docker (or kubectl)
attach command is implemented under the hood? If so, you're in the right place! This article covers the basics of Linux pseudoterminal capabilities and continuously shows how attach-like feature can be implemented in a ridiculously small amount of code.
Well, I don't see any practical applications of the approach I'm going to describe... However, I do think that messing about with things like this is the only way to gain extra knowledge of any system internals. We are going to speak Docker and Linux here. What if we want to take a base Docker image, I mean really base, just an image made with a single line Dockerfile like
FROM debian:latest, and convert it to something launchable on a real or virtual machine? In other words, can we create a disk image having exactly the same Linux userland a running container has and then boot from it? For this we would start with dumping container's root file system, luckily it's as simple as just running
docker export, however, to finally accomplish the task a bunch of additional steps is needed...