Container networking is simple

Working with containers always feels like magic. In a good way for those who understand the internals and in a terrifying - for those who don't. Luckily, we've been looking under the hood of the containerization technology for quite some time already and even managed to uncover that containers are just isolated and restricted Linux processes, that images aren't really needed to run containers, and on the contrary - to build an image we need to run some containers.

Now comes a time to tackle the container networking problem. Or, more precisely, a single-host container networking problem. In this article, we are going to answer the following questions:

  • How to virtualize network resources to make containers think each of them has a dedicated network stack?
  • How to turn containers into friendly neighbors, prevent them from interfering, and teach to communicate well?
  • How to reach the outside world (e.g. the Internet) from inside the container?
  • How to reach containers running on a machine from the outside world (aka port publishing)?

As a result, it'll become apparent that the single-host container networking is nothing more than a simple combination of the well-known Linux facilities:

  • network namespaces;
  • virtual Ethernet devices (veth);
  • virtual network switches (bridge);
  • IP routing and network address translation (NAT).

And for better or worse, no code is required to make the networking magic happen...

Read more

Traefik: canary deployments with weighted load balancing

Traefik is The Cloud Native Edge Router yet another reverse proxy and load balancer. Omitting all the cloud-native buzzwords, what really makes Traefik different from Nginx, HAProxy, and alike is the automatic and dynamic configurability it provides out of the box. And the most prominent part of it is probably its ability to do automatic service discovery. If you put Traefik in front of Docker, Kubernetes, or even an old-fashioned VM/bare-metal deployment and show it how to fetch the information about the running services, it'll automagically expose them to the outside world. If you follow some conventions of course...

Read more

Service proxy, pod, sidecar, oh my!

Imagine you're developing a service... For certainty, let's call it A. It's going to provide some public HTTP API to its clients. However, to serve requests it needs to call another service. Let's call this upstream service - B.

Obviously, neither network nor service B is ideal. If service A wants to decrease the impact of the failing upstream requests on its public API success rate, it has to do something about errors. For instance, it could start retrying failed requests.

Read more

You need containers to build an image

...unless your Dockerfile has no RUN instructions, but that's rarely the case.

For people who found their way to containers through Docker (well, most of us I believe) it may seem like images are of somewhat primary nature. We've been taught to start from a Dockerfile, build an image using that file, and only then run a container from that image. Alternatively, we could run a container specifying an image from a registry, yet the main idea persists - an image comes first, and only then the container.

But what if I tell you that the actual workflow is reverse? Even when you are building your very first image using Docker, podman, or buildah, you are already, albeit implicitly, running containers under the hood!

Let's avoid any unfoundedness and take a closer look at the image building procedure.

Read more

You don't need an image to run a container

As we already know, containers are just isolated and restricted Linux processes. We also learned that it's fairly simple to create a container with a single executable file inside starting from scratch image (i.e. without putting a full Linux distribution in there). This time we will go even further and demonstrate that containers don't require images at all. And after that, we will try to justify the actual need for images and their place in the containerverse.

You might have heard that Docker uses a tool called runc to run containers. Well, to be more accurate, Docker depends on a lower-level piece of software called containerd which in turn relies on a standardized container runtime implementation. And in the wild, most of the time runc plays the role of such a component.

Read more

Not every container has an operating system inside

...but every one of them needs your Linux kernel.

Disclaimer 1: before going any further it's important to understand the difference between a kernel, an operating system, and a distribution.

  • Linux kernel is the core part of the Linux operating system. It's what originally Linus wrote.

  • Linux operating system is a combination of the kernel and a user-land (libraries, GNU utilities, config files, etc).

  • Linux distribution is a particular version of the Linux operating system like Debian, CentOS, or Alpine.

Disclaimer 2: the title of this article should have sounded like "Not every container has whole Linux distribution in it". But I personally find this wording a bit boring 🤪


The majority of Docker examples out there explicitly or implicitly rely on some flavor of the Linux operating system running inside a container. I tried to quickly compile a list of the most prominent samples:

Running an interactive shell in the debian jessie distribution:

$ docker run -it debian:jessie

Running an nginx web-sever in a container and examine its config using cat utility:

$ docker run -d -P --name nginx nginx:latest
$ docker exec -it nginx cat /etc/nginx/nginx.conf

Building an image based on Alpine Linux:

$ cat <<EOF > Dockerfile
FROM alpine:3.7
RUN apk add --no-cache mysql-client
ENTRYPOINT ["mysql"]
EOF

$ docker build -t mysql-alpine .
$ docker run mysql-alpine

And so forth and so on...

For the newcomers learning the containerization through hands-on experience, this may lead to a false impression that containers are somewhat indistinguishable from full-fledged operating systems and that they are always based on well-known and wide-spread Linux distributions like debian, centos, or alpine.

Read more

Working with container images in Go

I've been working on adding basic images support to my experimental container manager and to my surprise, the task turned to be more complex than I initially expected. I spent some time looking for ways to manage container images directly from my application code. There is plenty of tools out there (docker, containerd, podman, buildah, cri-o, etc) providing image management capabilities. However, if you don't want to have a dependency on an external daemon running in your system, or you don't feel like shelling out for exec-ing a command-line tool from the code, the options are at best limited.

I've reviewed a bunch of the said tools focusing on the underlying means they use to deal with images and at last, I found two appealing libraries. The first one is github.com/containers/image library "[...] aimed at working in various way with containers' images and container image registries". The second one is github.com/containers/storage "[...] which aims to provide methods for storing filesystem layers, container images, and containers". The libraries are meant to be used in conjunction and form a very powerful image management tandem. But unfortunately, I could not find a sufficient amount of documentation, especially how to get started kind of it.

Without the docs the only way to learn how to use the libraries for me was to analyze the code of their dependants (most prominently - buildah and cri-o). It took me a while to forge a working example which is capable of:

  • pulling images from remote repositories;
  • storing images locally;
  • creating and mounting containers (i.e. writable instances of images).

In the rest of the article, I'll try to show how to use the libraries to perform the said task and highlight the most interesting parts of this journey.

Disclaimer: This is by no means an attempt to fully or even partially document the libraries!

Read more

Master Go While Learning Containers

I spent half a year deep-diving into the world of containers and their orchestration. I have been enjoying it very much and learned a lot. On my journey, I need to tackle lots of interesting and specific concepts. But there is one commonality almost every project in this area possesses. When it comes to containers - the Go programming language is ubiquitous!

Read more

Implementing Container Runtime Shim: Interactive Containers

In the previous articles, we discussed the scope of the container runtime shim and drafted the minimum viable version. Now, it's time to move on and have some fun with more advanced scenarios! Have you ever wondered how docker run -i or kubectl run --stdin work? If so, this article is for you! We will try to replicate this piece of functionality in our experimental container manager. And as you have probably guessed, the container runtime shim will do a lot of heavy lifting here again.

conman - interactive container demo

Read more

How to use Flask with gevent (uWSGI and Gunicorn editions)

Python is booming and Flask is a pretty popular web-framework nowadays. Probably, quite some new projects are being started in Flask. But people should be aware, it's synchronous by design and ASGI is not a thing yet. So, if someday you realize that your project really needs asynchronous I/O but you already have a considerable codebase on top of Flask, this tutorial is for you. The charming gevent library will enable you to keep using Flask while start benefiting from all the I/O being asynchronous. In the tutorial we will see:

  • How to monkey patch a Flask app to make it asynchronous w/o changing its code.
  • How to run the patched application using gevent.pywsgi application server.
  • How to run the patched application using Gunicorn application server.
  • How to run the patched application using uWSGI application server.
  • How to configure Nginx proxy in front of the application server.

Read more

My 10 Years of Programming Experience

Regardless of whether it's the end of the calendar decade or not it's the end of a programming decade for me. I started early in 2010 and since then I've been programming almost every day, including weekends and vacations. This was a really exciting period in my life and I realized that it's been a while since 2010 only recently. So, I decided to put into words some of my learnings from that time. Warning: the content of this article is highly opinionated and extremely subjective.

Read more

Implementing Container Runtime Shim: First Code

Well, at this moment we already know what we need to deal with. In order to use runc from code we need to implement our shim as a daemon and this daemon has to be as long-lived as the underlying container process. In this article, we will try to develop a minimum viable runtime shim and integrate it with our experimental container manager.

The minimal shim implementation takes as its input a path to the container bundle (with the config.json) as well as the list of the predefined locations (for the container log file, container pidfile, container exit status file, etc). The shim needs to create a container by executing runc with the provided parameters and then serve the container process until its termination. The planned functionality of this shim version includes:

  • Detaching the shim from the container manager process.
  • Launching runc and handling container creation errors.
  • Reporting the status of the container creation back to the manager.
  • Streaming container's stdout and stderr to the log file.
  • Tracking and reporting the exit code of the container.

Read more

Implementing Container Runtime Shim: runc

A container runtime shim is a piece of software that resides in between a container manager (containerd, cri-o, podman) and a container runtime (runc, crun) solving the integration problem of these counterparts.

The easiest way to spot a shim is to inspect the process tree on a Linux host with a running docker container:

ps auxf output on a host running docker run -it ubuntu bash; notice containerd-shim process in between containerd and bash.

On the one hand, runtimes need shims to be able to survive managers restarts. On the other hand, shims are helping container managers to deal with the quirky behavior of runtimes. As a part of the container manager implementation series, we will try to create our own shim and then integrate it with conman, an experimental container manager. Hopefully, during the development, we will gain an in-depth understanding of the topic.

However, before jumping to the shim development, we need to familiarize ourselves with the container runtime component of the choice. Unsurprisingly, conman uses runc as a container runtime, so I will start the article by covering basic runc use cases alongside its design quirks. Then I'll show the naive way to use runc from code and explain some related pitfalls. The final part of the article will provide an overview of the shim's design.

Read more

Kubernetes Repository On Flame

When I'm diving into a new codebase, I always start from the project structure analysis. And my favorite tool is tree. However, not every project is perfectly balanced. Some files and folders tend to be more popular and contain much more code than others. Seems like yet another incarnation of the Pareto principle.

So, when the tree's capabilities aren't enough, I jump to cloc. This tool is much more powerful and can show nice textual statistics for the number of code lines and programming languages used per the whole project or per each file individually.

However, some projects are really huge and some lovely visualization would be truly helpful! And here the FlameGraph goes! What if we feed the cloc's output for the Kubernetes codebase to FlameGraph? Thanks to the author of this article for the original cloc-to-flamegraph one-liner:

git clone https://github.com/brendangregg/FlameGraph
go get -d github.com/kubernetes/kubernetes

cd $(go env GOPATH)/src/github.com/kubernetes/kubernetes

cloc --csv-delimiter="$(printf '\t')" --by-file --quiet --csv . | \
    sed '1,2d' | \
    cut -f 2,5 | \
    sed 's/\//;/g' | \
    ~/FlameGraph/flamegraph.pl \
        --width=3600 \
        --height=32 \
        --fontsize=8 \
        --countname=lines \
        --nametype=package \
    > kubernetes.html

open kubernetes.html

Read more

conman - [the] container manager: inception

With this article, I want to start a series about the implementation of a container manager. What the heck is a container manager? Some prominent examples would be containerd, cri-o, dockerd, and podman. People here and there keep calling them container runtimes, but I would like to reserve the term runtime for a lower-level thingy - the OCI runtime (de facto runc), and a higher-level component controlling multiple such runtime instances I'd like to call a container manager. In general, by a container manager, I mean a piece of software doing a complete container lifecycle management on a single host. In the following series, I will try to guide you myself through the challenge of the creation of yet another container manager. By no means, the implementation is going to be feature-complete, correct or safe to use. The goal is rather to prove the already proven concept. So, mostly for the sake of fun, let the show begin!

Read more

A journey from containerization to orchestration and beyond

Containers gave birth to more advanced server-side architectures and sophisticated deployment techniques. Containers nowadays are so widespread that there is already a bunch of standard-alike specifications (1, 2, 3, 4, ...) describing different aspects of the containers universe. Of course, on the lowest level lie Linux primitives such as namespaces and cgroups. But containerization software is already so massive that it would be barely possible to implement it without its own concern separation layers. What I'm trying to achieve in this ongoing effort is to guide myself starting from the lowest layers to the topmost ones, having as much practice (code, installation, configuration, integration, etc) and, of course, fun as possible. The content of this page is going to be changing over time, reflecting my understanding of the topic.

Read more

Layman's iptables 101

Gee, it's my turn to throw some gloom light on iptables! There are hundreds or even thousands of articles on the topic out there, including introductory ones. I'm not going to put either formal and boring definitions here nor long lists of useful commands. I would rather try to use layman's terms and scribbling as much as possible to give you some insights about the domain before going to all these tables, rules, targets, and policies. By the way, the first time I faced this tool I was pretty much confused by the terminology too!

Read more

From Docker Container to Bootable Linux Disk Image

Well, I don't see any practical applications of the approach I'm going to describe... However, I do think that messing about with things like this is the only way to gain extra knowledge of any system internals. We are going to speak Docker and Linux here. What if we want to take a base Docker image, I mean really base, just an image made with a single line Dockerfile like FROM debian:latest, and convert it to something launchable on a real or virtual machine? In other words, can we create a disk image having exactly the same Linux userland a running container has and then boot from it? For this we would start with dumping container's root file system, luckily it's as simple as just running docker export, however, to finally accomplish the task a bunch of additional steps is needed...

Read more

From Callback Hell to async/await Heaven

In the previous article, we learned how to implement a simple but workable event loop. However, programs which are supposed to be run by the event loop are full of callbacks. This is the usual problem of event-loop-driven environments. When business logic becomes reasonably complicated, callbacks make program's code hardly readable and painfully maintainable. And the callback hell begins! There is plenty of ways to deal with the artificial complexity arose due to callbacks, but the most impressive one is to make the code great flat again. And by flat, I mean callback-less and synchronous-like. Usually, it's done by introducing async/await syntactic feature. But every high-level abstraction is built on top of some basic and fundamental ideas. Let's check the async/await sugar out and see what exactly happens under the hood.



Read more

Explaining event loop in 100 lines of code

There is plenty of articles out there about the event loop. However, as a software engineer, I prefer to read code, not text. And there is no better way of learning a new concept than implementing it yourself. So, let's try to grasp the idea of the event loop by coding a new and shiny one.

NB: In the article, we will try to describe the idea of the event loop in general, not a specific implementation of the event loop in Node.js or Python, or some other language/library.

Read more

Truly optional scalar types in protobuf3 (with Go examples)

In contrast to protobuf2 there is no way in protobuf3 to mark some fields as optional and some other fields as required. Instead, any field might be omitted leading this field to be set to its default zero-value. I believe there were many good reasons for such a design decision. However, while this behavior might be superior to the proto2's explicit distinction between required and optional fields, it also has some unfortunate implications.

Read more

Node.js Writable streams distilled

Writable streams are an abstraction for a destination to which data is written... And this time it's a concise abstraction! Compared to vague readable streams (multiple operation modes behind single abstraction) writable streams implement only one mode and hence expose only one essential method write(chunk). Nevertheless, the idea of writable streams is not trivial and it's due to one nice feature it provides - backpressure. Let's try to wrap our mind around writable stream abstraction.

Read more

Node.js Readable streams distilled

Readable stream is an abstraction for some data source. Which could be hard to grasp and even harder to use...

Everybody knows that readable streams support two modes of operating (flowing and paused) and piping to writable streams. It's not that easy to understand the purposes of these mechanisms and behavioral differences though. Since one readable stream abstraction stands for multiple usage modes its public interface (i.e. the set of methods and events) is a bit inconsistent. Usage of readable streams might be totally confusing without the understanding of the underlying ideas. In this article, we will make an attempt to justify the abstraction of readable streams trying to implement our own file reader. Also, we will have a look at some nicer ways to consume readable streams.

Read more