OpenFaaS - Run Containerized Functions On Your Own Terms

For a long time, serverless has been just a synonym of AWS Lambda for me. Lambda offered a convenient way to attach an arbitrary piece of code to a platform event like a cloud instance changing its state, a DynamoDB record update, or a new SNS message. But also, from time to time, I would come up with a piece of logic that is not big enough to deserve its own service and at the same time doesn't fit well with the scope of any of the existing services. So, I'd often also put it into a function to invoke it later with a CLI command or an HTTP call.

I moved away from AWS a couple of years ago, and since then, I've been missing this ease of deploying serverless functions. So I was pleasantly surprised when I learned about the OpenFaaS project. It makes it simple to deploy functions to a Kubernetes cluster, or even to a VM, with nothing but containerd.

Intrigued? Then read on!

Read more

Learning Containers From The Bottom Up

When I started using containers back in 2015, my initial understanding was that they were just lightweight virtual machines with a subsecond startup time. With such a rough idea in my head, it was easy to follow tutorials from the Internet on how to put a Python or a Node.js application into a container. But pretty quickly, I realized that thinking of containers as of VMs is a risky oversimplification that doesn't allow me to judge:

  • What's doable with containers and what's not
  • What's an idiomatic use of containers and what's not
  • What's safe to run in containers and what's not.

Since the "container is a VM" abstraction turned out to be quite leaky, I had to start looking into the technology's internals to understand what containers really are, and Docker was the most obvious starting point. But Docker is a behemoth doing a wide variety of things, and the apparent simplicity of docker run nginx can be deceptive. There was plenty of materials on Docker, but most of them were:

  • Either shallow introductory tutorials
  • Or hard reads indigestible for a newbie.

So, it took me a while to pave my way through the containerverse.

I tried tackling the domain from different angles, and over the years, I managed to come up with a learning path that finally worked out for me. Some time ago, I shared this path on Twitter, and evidently, it resonated with a lot of people:

This article is not an attempt to explain containers in one go. Instead, it's a front-page for my multi-year study of the domain. It outlines the said learning path and then walks you through it, pointing to more in-depth write-ups on this same blog.

Mastering containers is no simple task, so take your time, and don't skip the hands-on parts!

Read more

Containers vs. Pods - Taking a Deeper Look

Containers could have become a lightweight VM replacement. However, the most widely used form of containers, standardized by Docker/OCI, encourages you to have just one process service per container. Such an approach has a bunch of pros - increased isolation, simplified horizontal scaling, higher reusability, etc. However, there is a big con - in the wild, virtual (or physical) machines rarely run just one service.

While Docker tries to offer some workarounds to create multi-service containers, Kubernetes makes a bolder step and chooses a group of cohesive containers, called a Pod, as the smallest deployable unit.

When I stumbled upon Kubernetes a few years ago, my prior VM and bare-metal experience allowed me to get the idea of Pods pretty quickly. Or so thought I... ๐Ÿ™ˆ

Starting working with Kubernetes, one of the first things you learn is that every pod gets a unique IP and hostname and that within a pod, containers can talk to each other via localhost. So, it's kinda obvious - a pod is like a tiny little server.

After a while, though, you realize that every container in a pod gets an isolated filesystem and that from inside one container, you don't see processes running in other containers of the same pod. Ok, fine! Maybe a pod is not a tiny little server but just a group of containers with a shared network stack.

But then you learn that containers in one pod can communicate via shared memory! So, probably the network namespace is not the only shared thing...

This last finding was the final straw for me. So, I decided to have a deep dive and see with my own eyes:

  • How Pods are implemented under the hood
  • What is the actual difference between a Pod and a Container
  • How one can create Pods using Docker.

And on the way, I hope it'll help me to solidify my Linux, Docker, and Kubernetes skills.

Read more

List of Hands-On Learning Resources to Master Containers, Kubernetes, and Clouds

There are many resources for people who want to learn Linux, Containers, or Kubernetes. However, most of these resources don't come with an interactive, hands-on learning experience. You can read tens of fine blog articles and watch hundreds of engaging YouTube videos, maybe even take some courses with theoretical quizzes at the end, but it's doubtful you'll master any of the above technologies without actively practicing them.

Theoretical-only knowledge of, say, Kubernetes doesn't really count. Hands-on exercises should be a must-have learning element. Some resources, including this blog, strive to provide reproducible instructions so that students can try out the new skills. However, for that, a running system is needed. Setting up such a system can make the learning curve substantially steeper or even make the task fully unbearable for inexperienced students.

So, where can a student practice the new skills?

One option is to experiment on a real staging (or production ๐Ÿ™ˆ) environment. But it can be quite harmful. Luckily, there is an alternative. Some learning platforms offer interactive playgrounds mimicking real-world setups. On these platforms, students can SSH into disposable Linux servers, or even access multi-server stages right from their browsers!

Experimenting with the new skills in such sandboxes makes the learning hands-on. At the same time, these platforms free students from the need for provisioning playgrounds. It brings students closer to real-world environments while keeping the learning process safe - playgrounds can always be destroyed and recreated without damaging any real production systems.

I got so fascinated by the idea of interactive playgrounds recently that I spent a week researching platforms that provide in-browser learn-by-doing experience. Below are my findings, alphabetically ordered:

Read more

Reverse Proxy, HTTP Keep-Alive Timeout, and sporadic HTTP 502s

These mysterious HTTP 502s happened to me already twice over the past few years. Since the amount of service-to-service communications every year goes only up, I expect more and more people to experience the same issue. So, sharing it here.

TL;DR: HTTP Keep-Alive between a reverse proxy and an upstream server combined with some misfortunate downstream- and upstream-side timeout settings can make clients receiving HTTP 502s from the proxy.

Read more

Why and How to Use containerd from the Command Line

containerd is a high-level container runtime, aka container manager. To put it simply, it's a daemon that manages the complete container lifecycle on a single host: creates, starts, stops containers, pulls and stores images, configures mounts, networking, etc.

containerd is designed to be easily embeddable into larger systems. Docker uses containerd under the hood to run containers. Kubernetes can use containerd via CRI to manage containers on a single node. But smaller projects also can benefit from the ease of integrating with containerd - for instance, faasd uses containerd (we need more d's!) to spin up a full-fledged Function-as-a-Service solution on a standalone server.

Docker and Kubernetes use containerd

However, using containerd programmatically is not the only option. It also can be used from the command line via one of the available clients. The resulting container UX may not be as comprehensive and user-friendly as the one provided by the docker client, but it still can be useful, for instance, for debugging or learning purposes.

containerd command-line clients (ctr, nerdctl, crictl)

Read more

What Is a Standard Container (2021 edition)

TL;DR Per OCI Runtime Specification:

  • Containers are isolated and restricted boxes for running processes ๐Ÿ“ฆ
  • Containers pack an app and all its dependencies (including OS libs) together
  • Containers are for portability - any compliant runtime can run standard containers
  • Containers can be implemented using Linux, Windows, and other OS-es
  • Virtual Machines also can be used as standard containers ๐Ÿค

There are many ways to create containers, especially on Linux and alike. Besides the super widespread Docker implementation, you may have heard about LXC, systemd-nspawn, or maybe even OpenVZ.

The general concept of the container is quite vague. What's true and what's not often depends on the context, but the context itself isn't always given explicitly. For instance, there is a common saying that containers are Linux processes or that containers aren't Virtual Machines. However, the first statement is just an oversimplified attempt to explain Linux containers. And the second statement simply isn't always true.

In this article, I'm not trying to review all possible ways of creating containers. Instead, the article is an analysis of the OCI Runtime Specification. The spec turned out to be an insightful read! For instance, it gives a definition of the standard container (and no, it's not a process) and sheds some light on when Virtual Machines can be considered containers.

Containers, as they brought to us by Docker and Podman, are OCI-compliant. Today we even use the terms container, Docker container, and Linux container interchangeably. However, this is just one type of OCI-compliant container. So, let's take a closer look at the OCI Runtime Specification.

Read more

How to Expose Multiple Containers On the Same Port

Disclaimer: In 2021, there is still a place for simple setups with just one machine serving all traffic. So, no Kubernetes and no cloud load balancers in this post. Just good old Docker and Podman.

Even when you have just one physical or virtual server, it's often a good idea to run multiple instances of your application on it. Luckily, when the application is containerized, it's actually relatively simple. With multiple application containers, you get horizontal scaling and a much-needed redundancy for a very little price. Thus, if there is a sudden need for handling more requests, you can adjust the number of containers accordingly. And if one of the containers dies, there are others to handle its traffic share, so your app isn't a SPOF anymore.

The tricky part here is how to expose such a multi-container application to the clients. Multiple containers mean multiple listening sockets. But most of the time, clients just want to have a single point of entry.

Benefits of exposing multiple Docker containers on the same port

Read more

How to Set Go net/http Socket Options - setsockopt() example

Go standard library makes it super easy to start an HTTP server:

package main

import "net/http"

func main() {
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("Hello there!\n"))
    })

    http.ListenAndServe(":8080", nil)
}

...or send an HTTP request:

package main

import "net/http"

func main() {
    resp, err := http.Get("http://example.com/")
    body, err := io.ReadAll(resp.Body)
}

In just ~10 lines of code, I can get a server up and running or fetch a real web page! In contrast, creating a basic HTTP server in C would take hundreds of lines, and anything beyond basics would require third-party libraries.

The Go snippets from above are so short because they rely on powerful high-level abstractions of the net and net/http packages. Go pragmatically chooses to optimize for frequently used scenarios, and its standard library hides many internal socket details behind these abstractions, making lots of default choices on the way. And that's very handy, but...

What if I need to fine-tune net/http sockets before initiating the communication? For instance, how can I set some socket options like SO_REUSEPORT or TCP_QUICKACK?

Read more

Disposable Local Development Environments with Vagrant, Docker, and Arkade

I use a (rather oldish) MacBook for my day-to-day development tasks. But I prefer keeping my host operating system free of development stuff. This strategy has the following benefits:

  • Increasing reproducibility of my code - it often happened to me in the past that some code worked on my machine but didn't work on others; usually, it was due to missing dependencies. Developing multiple projects on the same machine makes it harder to track what libraries and packages are required for what project. So, now I always try to have an isolated environment per project.
  • Testing code on the target platform - most of my projects have something to do with server-side and infra stuff; hence the actual target platform is Linux. Since I use a MacBook, I spend a lot of time inside virtual machines running the same operating system as my servers do. So, I'd need to duplicate the development tools from my macOS on every Linux OS I happen to use.
  • Keeping the host operating system clean and slim - even if I work on something platform-agnostic like a command-line tool, I prefer not to pollute my workstation with the dev tools and packages anyway. Projects and domains change often, and installing all the required stuff right into the host operating system would make it messy real quick.
  • Decreasing time to recover in case of machine loss - a single multi-purpose machine quickly becomes a snowflake host. Coming up with the full list of things to reinstall in the case of a sudden machine loss would be hardly feasible.

Since I usually work on several projects at the same time, I need not one but many isolated development environments. And every environment should be project-tailored, easy to spin up, suspend, and, eventually, dispose. I figured a way to achieve that by using only a few tools installed on my host operating system, and I'm going to share it here.

The approach may be helpful for folks using macOS or Linux:

  • to work on server-side and full-stack projects
  • to do Linux systems programming
  • to play with Cloud Native stack
  • to build some cool command-line tools.
Language logos

Read more

DevOps, SRE, and Platform Engineering

I compiled this thread on Twitter, and all of a sudden, it got quite some attention. So here, I'll try to elaborate on the topic a bit more. Maybe it would be helpful for someone trying to make a career decision or just improve general understanding of the most hyped titles in the industry.

Read more

My Choice of Programming Languages

When I was a kid, I used to spend days tinkering with woodworking tools. I was lucky enough to have a wide set of tools at my disposal. However, there was no one around to give me a hint about what tool to use when. So, I quickly came up with a heuristic: if my fingers and a tool survived an exercise, I've used the right tool; if either the fingers or the tool got damaged, I'd try other tools for the same task until I find the right one. And it worked quite well for me! Since then, I'm an apologist of the idea that every tool is good only for a certain set of tasks.

A programming language is yet another kind of tool. When I became a software developer, I adapted my heuristic to the new reality: if, while solving a task using a certain language, I suffer too much (fingers damage) or I need to hack things more often than not (tool damage), it's a wrong choice of a language.

Since the language is just a tool, my programming toolbox is defined by the tasks I work on the most often. Since 2010, I've worked in many domains, starting from web UI development and ending with writing code for infrastructure components. I find pleasure in being a generalist (jack of all trades), but there is always a pitfall of spreading yourself too thin (master of none). So, for the past few years, I've been trying to limit my sphere of expertise with the server-side, distributed systems, and infrastructure. Hence, the following choice of languages.

Language logos

Read more

Prometheus Cheat Sheet - Basics (Metrics, Labels, Time Series, Scraping)

Here we focus on the most basic Prometheus concepts - metrics, labels, scrapes, and time series.

What is a metric?

In Prometheus, everything revolves around metrics. A metric is a feature (i.e., a characteristic) of a system that is being measured. Typical examples of metrics are:

  • http_requests_total
  • http_request_size_bytes
  • system_memory_used_bytes
  • node_network_receive_bytes_total
Prometheus metrics

Read more

How to learn PromQL with Prometheus Playground

Working with real metrics is hard. Metrics are needed to give you an understanding of how your service behaves. That is, by definition, you have some uncertainty about the said behavior. Therefore, you have to be hell certain about your observability part. Otherwise, all sorts of metric misinterpretations and false conclusions will follow.

Here are the things I'm always trying to get confident about as soon as possible:

  • How metric collection works - push vs. pull model, aggregation on the client- or server-side?
  • How metrics are stored - raw samples or aggregated data, rollup and retention strategies?
  • How to query metrics - is my mental model aligned with the actual query execution model?
  • How to plot query results - what approximation errors may be induced by the graphing tools?

And even if I have a solid understanding of all of the above stuff, there will be one thing I'm never entirely sure about - the correctness of my query logic. But this one becomes testable once other parts are known.

Recently, I've been through another round of this journey - I was making an acquaintance with Prometheus. Since it was already a third of fourth monitoring system I had to work with, at first, I thought I could skip all the said steps and jump into writing queries to production metrics and reading graphs... The hope was on the knowledge extrapolation. But nope, it didn't work out well. So, I gave up on the idea of cutting corners quickly. That's how I found myself setting up a Prometheus playground, feeding it with some known inputs, observing the outputs, and trying to draw some meaningful conclusion.

Read more

Prometheus Is Not a TSDB

Misconception - the right word to explain my early Prometheus experience. I came to Prometheus with vast Graphite and moderate InfluxDB experience. In my eyes, Graphite was a highly performant but fairly limited system. Metrics in Graphite are just strings (well, dotted), and the values are always stored aggregated with the lowest possible resolution of 1 second. But due to these limitations, Graphite is fast. In contrast, InfluxDB adopts Metrics 2.0 format with multiple tags and fields per metric. It also allows the storage of non-aggregated data points with impressive nanosecond precision. But this power needs to be used carefully. Otherwise, you'll get all sorts of performance issues.

For some reason, I expected Prometheus to reside somewhere in between these two systems. A kinda-sorta system that takes the best of both worlds: rich labeled metrics, non-aggregated values, and high query performance.

And at first, it indeed felt as such! But then I started noticing that I cannot really explain some of the query results. Like at all. Or sometimes, I couldn't find evidence in metrics that just had to be there. Like the metrics were showing me a different picture than I was observing with my eyes while analyzing raw data such as web server access logs.

So, I started looking for more details. I wanted to understand precisely how metrics are collected, how they are stored, what a query execution model is, et cetera, et cetera. And at first, I was shocked by my findings! Oftentimes, the Prometheus behavior didn't make any sense, especially comparing to Graphite or InfluxDB! But then it occurred to me that I was missing one important detail...

Both Graphite and InfluxDB are pure time-series databases (TSDB). Yes, they are often used as metric storage for monitoring purposes. But every particular setup of these systems comes with certain trade-offs and bolt-on additions addressing performance or reliability concerns. For instance, there is often a statsd-like daemon in front of your Graphite doing preaggregation; you use different rollup strategies for older data points, etc. But normally, you are aware of that. So when you query the last couple of days of metrics, you expect them to have a secondly precision. But when you query something a week or a month old, you already know that each data point represents a minute of aggregated data, not a second.

However, Prometheus is not a TSDB.

Prometheus is not a TSDB

Read more

Rust - Writing Parsers With nom Parser Combinator Framework

I've been working on my new Rust side-project for several months now, and I've got some learnings to share. The project is called pq - it's a command-line tool to parse and query log files as time series. It comes with its own domain-specific language that is highly influenced by PromQL. A typical pq usage may look like this:

tail -f /var/log/nginx/access.log | pq '
/...some fancy regex.../
| map {
    .0 as ip,
    .1:ts,
    .2 as method,
    .3:str as status_code,
    .4 as content_len
  }
| select topk(
      10,
      sum(
          count_over_time(
              __line__{method="GET", status_code="200"}[1s]
          )
      ) by (ip)
  )
| to_json
'

pq has many components, including various log parsing strategies and a pretty sophisticated query execution engine. But surprisingly or not, about half of the time I've put into this project so far was dedicated to writing the parser of the pq's own query language. To be honest, when I was starting the project, I didn't see that coming...

nom logo

Luckily, writing a parser in Rust was mostly a pleasant experience, thanks to a crate concisely named nom. Although learning how to write parsers with nom wasn't completely seamless. So here is my journey.

Read more

pq - parse and query log files as time series

I often find myself staring at Nginx or Envoy access logs flooding my screens with real-time data. My only wish at such moments is to be able to aggregate these lines somehow and analyze the output at a slower pace, ideally, with some familiar and concise query language. And to my surprise, I haven't met a tool satisfying all my requirements yet. Well, I should be honest here - I haven't done thorough research. But if there would be a tool as widely known as jq for JSON, I wouldn't miss it probably.

So, here we go - my attempt to write a full-fledged parsing and query engine and master Rust at the same time. Yes, I know, it's a bad idea. But who has time for good ones?

First things first - a usage preview:

Read more

Prometheus Cheat Sheet - Moving Average, Max, Min, etc (Aggregation Over Time)

When you have a long series of numbers, such as server memory consumption scraped 10-secondly, it's a natural desire to derive another, probably more meaningful series from it, by applying a moving window function. For instance, moving average or moving quantile can give you much more readable results by smoothing some spikes.

Prometheus has a bunch of functions called <smth>_over_time(). They can be applied only to range vectors. It essentially makes them window aggregation functions. Every such function takes in a range vector and produces an instant vector with elements being per-series aggregations.

For people like me who normally grasp code faster than text, here is some pseudocode of the aggregation logic:

# Input vector example.
range_vector = [
    ({"lab1": "val1", "lab2": "val2"}, [(12, 1624722138), (11, 1624722148), (17, 1624722158)]),
    ({"lab1": "val1", "lab2": "val2"}, [(14, 1624722138), (10, 1624722148), (13, 1624722158)]),
    ({"lab1": "val1", "lab2": "val2"}, [(16, 1624722138), (12, 1624722148), (15, 1624722158)]),
    ({"lab1": "val1", "lab2": "val2"}, [(12, 1624722138), (17, 1624722148), (18, 1624722158)]),
]

# agg_func examples: `sum`, `min`, `max`, `avg`, `last`, etc.

def agg_over_time(range_vector, agg_func, timestamp):
    # The future instant vector.
    instant_vector = {"timestamp": timestamp, "elements": []}

    for (labels, samples) in range_vector:
        # Every instant vector element is 
        # an aggregation of multiple samples.
        sample = agg_func(samples)
        instant_vector["elements"].append((labels, sample))

    # Notice, that the timestamp of the resulting instant vector 
    # is the timestamp of the query execution. I.e., it may not 
    # match any of the timestamps in the input range vector.
    return instant_vector

Read more

Prometheus Cheat Sheet - How to Join Multiple Metrics (Vector Matching)

PromQL looks neat and powerful. And at first sight, simple. But when you start using it for real, you'll quickly notice that it's far from being trivial. Searching the Internet for query explanation rarely helps - most articles focus on pretty high-level overviews of the language's most basic capabilities. For example, when I needed to match multiple metrics using the common labels, I quickly found myself reading the code implementing binary operations on vectors. Without a solid understanding of the matching rules, I constantly stumbled upon various query execution errors, such as complaints about missing group_left or group_right modifier. Reading the code, feeding my local Prometheus playground with artificial metrics, running test queries, and validating assumptions, finally helped me understand how multiple metrics can be joined together. Below are my findings.

Read more

The Need For Slimmer Containers

I was hacking containers recently and noticed, that Docker started featuring the docker scan command in the docker build output. I've been ignoring its existence for a while, so evidently, it was time to finally try it out.

Scanning official Python images

The docker scan command uses a third-party tool, called Snyk Container. Apparently, it's some sort of a vulnerability scanner. So, I decided, mostly for the sake of fun, to scan one of my images. And it just so happened that it was a fairly basic thing:

# latest stable at the time
FROM python:3.9

RUN pip install Flask

COPY server.py server.py

ENV FLASK_APP=server.py
ENV FLASK_RUN_PORT=5000
ENV FLASK_RUN_HOST=0.0.0.0

EXPOSE 5000

CMD ["flask", "run"]

I ran docker build -t python-flask . and then docker scan python-flask. To my utter surprise, the output was huge! Here is just an excerpt:

Read more

Understanding Rust Privacy and Visibility Model

I spent the last couple of months writing code in Rust. It was probably my third or fourth attempt to write something substantial in this language. And every time my level of understanding of things deepened. I'm by no means a Rust expert so probably I'll be extremely inaccurate in the terminology here. And likely I'll get lots of technical details wrong too. But I had this epiphany moment of how the visibility and privacy model works in Rust so I can't help but think of sharing it with someone else.

Read more

Bridge vs. Switch: What I Learned From a Data Center Tour

The difference between these two networking devices has been an unsolvable mystery to me for quite some time. For a while, I used to use the words "bridge" and "switch" interchangeably. But after getting more into networking, I started noticing that some people tend to see them as rather different devices... So, maybe I've been totally wrong? Maybe saying "bridge aka switch" is way too inaccurate?

Let's try to figure it out!

How network switch works

Read more

Computer Networking Introduction - Ethernet and IP (Heavily Illustrated)

As a software engineer, I need to deal with networking every now and then - be it configuring a SOHO network, setting up container networking, or troubleshooting connectivity between servers in a data center. The domain is pretty broad, and the terminology can get quite confusing quickly. This article is my layman's attempt to sort the basic things out with the minimum words and maximum drawings. The primary focus will be on the Data link layer (OSI L2) of wired networks where the Ethernet is the king nowadays. But I'll slightly touch upon its neighboring layers too.

Read more

Networking Lab - Ethernet Broadcast Domains

In Ethernet, all the nodes forming one L2 segment constitute a broadcast domain. Such nodes should be able to communicate using their L2 addresses (MAC) or by broadcasting frames. A broadcast domain is a logical division of a computer network. Multiple physical (L1) segments can be bridged to form a single broadcast domain. Multiple L2 segments can also be bridged to create a bigger broadcast domain.

Read more

Networking Lab - L3 to L2 Segments Mapping

It's pretty common for an L2 segment to have a single IP subnet running atop. However, technically it's possible to configure multiple IP subnets over a single L2 broadcast domain. And although more complicated, configuring a single IP subnet over multiple disjoint L2 segments is also doable. In this lab, we'll cover the first two scenarios while the more advanced third case deserves its own lab - Proxy ARP.

Read more

KiND - How I Wasted a Day Loading Local Docker Images

From time to time I use kind as a local Kubernetes playground. It's super-handy, real quick, and 100% disposable.

It just so happened that virtually all the scenarios I've been testing so far were based on publicly available images. But recently I found myself in a situation when I needed to run a pod with a container image that I've just built on my laptop.

One way of doing it would be pushing the image to a local or remote registry accessible from inside the kind Kubernetes cluster. However, kind still doesn't spin up a local registry out of the box (you can vote for the GitHub issue here) and I'm not a fan of sending stuff over the Internet without very good reasons.

Read more

Go, HTTP handlers, panic, and deadlocks

Maybe the scenario I'm going to describe is just a silly bug no seasoned Go developer would ever make, but it is what it is.

I'm not an expert in Go but I do write code in this language from time to time. My cumulative number of LOC is probably still below 100 000 but it's definitely not just a few hundred lines of code. Go always looked like a simple language to me. But also it looked safe. Apparently, it's not as simple and safe as I've thought...

Here is a synthetic piece of code illustrating the erroneous logic I stumbled upon recently:

// main.go
package main

import (
    "fmt"
    "sync"
)

func main() {
    mutex := &sync.Mutex{}

    f := func() {
        fmt.Println("In f()")

        defer func() {
            if r := recover(); r != nil {
                fmt.Println("Recovered", r)
            }
        }()

        dogs := []string{"Lucky"}

        mutex.Lock()
        fmt.Println("Last dog's name is", dogs[len(dogs)])
        mutex.Unlock()
    }

    f()

    fmt.Println("About to get a deadlock in main()")
    mutex.Lock()
}

Read more

Exploring Kubernetes Operator Pattern

I've been using Kubernetes for almost a year now and, to be honest, I like the experience so far. Most of my use cases were rather trivial and thanks to its declarative approach, Kubernetes makes deploying and scaling stateless services pretty simple. I usually just describe my application in a YAML file as a set of interconnected services, feed it to Kubernetes, and let the built-in control loops bring the state of the cluster closer to the desired state by creating or destroying some resources for me automagically.

However, many times I've heard that the real power of Kubernetes comes with its extensibility. Kubernetes designed for automation. It brings a lot of useful automation out of the box. But it also provides extension points that can be used to customize Kubernetes capabilities. The cleverness of the Kubernetes design is that it encourages you to keep the extensions feel native! So when I stumbled upon the first few Kubernetes Operators on my Ops way, I could not even recognize that I'm dealing with custom logic...

In this article, I'll try to take a closer look at the Operators pattern, see which Kubernetes parts are involved in operators implementation, and what makes operators feel like first-class Kubernetes citizens. Of course, with as many pictures as possible.

Read more

Writing Web Server in Python: sockets

What is a web server?

Let's start by answering the question: What is a web server?

First off, it's a server (no pun intended). A server is a process [sic] serving clients. Surprisingly or not, a server has nothing to do with hardware. It's just a regular piece of software run by an operating system. Like most other programs around, a server gets some data on its input, transforms data in accordance with some business logic, and then produces some output data. In the case of a web server, the input and output happen over the network via Hypertext Transfer Protocol (HTTP). For a web server, the input consists of HTTP requests from its clients - web browsers, mobile applications, IoT devices, or even other web services. And the output consists of HTTP responses, oftentimes in form of HTML pages, but other formats are also supported.

Client talking to server over network

Read more

Making sense out of cloud-native buzz

I've been trying to wrap my head around the tremendous growth of the cloud-native zoo for quite some time. But recently I was listening to a wonderful podcast episode with the Linkerd creator Thomas Rampelberg and he kindly reminded me one thing about... microservices. Long story short, despite the common belief that microservices solve technical problems, the most appealing part of the microservice architecture apparently has something to do with solving organisational problems such as allocating teams to development areas or tackling software modernisation campaigns. And on the contrary, while helping with the org problems, microservices rather create new technical challenges!

Disclaimer: This article is not about Microservices vs Monolith.

That made me rethink the need for all those projects constituting the cloud-native landscape. From now on I can't help but see an awful load of projects solving all kinds of technical problems originated by the transition to the microservice paradigm.

Read more

Service Discovery in Kubernetes - Combining the Best of Two Worlds

Before jumping to any Kubernetes specifics, let's talk about the service discovery problem in general.

What is Service Discovery

In the world of web service development, it's a common practice to run multiple copies of a service at the same time. Every such copy is a separate instance of the service represented by a network endpoint (i.e. some IP and port) exposing the service API. Traditionally, virtual or physical machines have been used to host such endpoints, with the shift towards containers in more recent times. Having multiple instances of the service running simultaneously increases its availability and helps to adjust the service capacity to meet the traffic demand. On the other hand, it also complicates the overall setup - before accessing the service, a client (the term client is intentionally used loosely here; oftentimes a client of some service is another service) needs to figure out the actual IP address and the port it should use. The situation becomes even more tricky if we add the ephemeral nature of instances to the equation. New instances come and existing instances go because of the non-zero failure rate, up- and downscaling, or maintenance. That's how a so-called service discovery problem arises.

Service discovery problem

Service discovery problem.

Read more

API Developers Never REST

Disclaimer: despite the controversial title, this article is not trying to show that RPC is a superior approach to REST, or GraphQL is superior to RPC. Instead, the goal of the article is to give you an overview of the approaches, their strengths and weaknesses. The final choice anyway will be a trade-off.

Even though HTTP is an application layer (i.e. L7), protocol, when it comes to API development, HTTP de facto plays the role of a lower-level transport mechanism.

There are multiple conceptually different approaches on how to implement an API on top of HTTP:

  • REST
  • RPC
  • GraphQL

...but the actual list of things an average API developer needs to be aware of is not limited by these three dudes. There are also JSON, gRPC, protobuf, and many other terms in the realm. Let's try to sort them out, once and for all!

What is REST? What is RPC? What is GraphQL? What is the difference between REST, RPC, and GraphQL?

Read more

Container Networking Is Simple!

Just kidding, it's not... But fear not and read on!

You can find a Russian translation of this article here.

Working with containers always feels like magic. In a good way for those who understand the internals and in a terrifying - for those who don't. Luckily, we've been looking under the hood of the containerization technology for quite some time already and even managed to uncover that containers are just isolated and restricted Linux processes, that images aren't really needed to run containers, and on the contrary - to build an image we need to run some containers.

Now comes a time to tackle the container networking problem. Or, more precisely, a single-host container networking problem. In this article, we are going to answer the following questions:

  • How to virtualize network resources to make containers think each of them has a dedicated network stack?
  • How to turn containers into friendly neighbors, prevent them from interfering, and teach to communicate well?
  • How to reach the outside world (e.g. the Internet) from inside the container?
  • How to reach containers running on a machine from the outside world (aka port publishing)?

While answering these questions, we'll setup a container networking from scratch using standard Linux tools. As a result, it'll become apparent that the single-host container networking is nothing more than a simple combination of the well-known Linux facilities:

  • network namespaces;
  • virtual Ethernet devices (veth);
  • virtual network switches (bridge);
  • IP routing and network address translation (NAT).

And for better or worse, no code is required to make the networking magic happen...

Read more

Traefik: canary deployments with weighted load balancing

Traefik is The Cloud Native Edge Router yet another reverse proxy and load balancer. Omitting all the cloud-native buzzwords, what really makes Traefik different from Nginx, HAProxy, and alike is the automatic and dynamic configurability it provides out of the box. And the most prominent part of it is probably its ability to do automatic service discovery. If you put Traefik in front of Docker, Kubernetes, or even an old-fashioned VM/bare-metal deployment and show it how to fetch the information about the running services, it'll automagically expose them to the outside world. If you follow some conventions of course...

Read more

Sidecar Proxy Pattern - The Basis Of Service Mesh

How services talk to each other?

Imagine you're developing a service... For certainty, let's call it A. It's going to provide some public HTTP API to its clients. However, to serve requests it needs to call another service. Let's call this upstream service - B.

Service A talks to Service B directly.

Obviously, neither network nor service B is ideal. If service A wants to decrease the impact of the failing upstream requests on its public API success rate, it has to do something about errors. For instance, it could start retrying failed requests.

Read more

How Docker Build Works Internally

You need containers to build images. Yes, you've heard it right. Not another way around.

For people who found their way to containers through Docker (well, most of us I believe) it may seem like images are of somewhat primary nature. We've been taught to start from a Dockerfile, build an image using that file, and only then run a container from that image. Alternatively, we could run a container specifying an image from a registry, yet the main idea persists - an image comes first, and only then the container.

But what if I tell you that the actual workflow is reverse? Even when you are building your very first image using Docker, podman, or buildah, you are already, albeit implicitly, running containers under the hood!

Read more

How to Run a Container Without an Image

As we already know, containers are just isolated and restricted Linux processes. We also learned that it's fairly simple to create a container with a single executable file inside starting from scratch image (i.e. without putting a full Linux distribution in there). This time we will go even further and demonstrate that containers don't require images at all. And after that, we will try to justify the actual need for images and their place in the containerverse.

How to Run Containers without Docker and... Images

You may have heard that Docker uses a tool called runc to run containers. Well, to be more accurate, Docker depends on a lower-level piece of software called containerd which in turn relies on a standardized container runtime implementation. And in the wild, most of the time runc plays the role of such a component.

Read more

Does Container Image Have an OS Inside

Not every container has an operating system inside, but every one of them needs your Linux kernel.

Disclaimer 1: before going any further it's important to understand the difference between a kernel, an operating system, and a distribution.

  • Linux kernel is the core part of the Linux operating system. It's what originally Linus wrote.

  • Linux operating system is a combination of the kernel and a user-land (libraries, GNU utilities, config files, etc).

  • Linux distribution is a particular version of the Linux operating system like Debian, CentOS, or Alpine.

Disclaimer 2: the title of this article should have sounded like "Does container image have a whole Linux distribution inside", but I personally find this wording a bit boring ๐Ÿคช


Does a Container have an Operating System inside?

The majority of Docker examples out there explicitly or implicitly rely on some flavor of the Linux operating system running inside a container. I tried to quickly compile a list of the most prominent samples:

Running an interactive shell in the debian jessie distribution:

$ docker run -it debian:jessie

Running an nginx web-sever in a container and examine its config using cat utility:

$ docker run -d -P --name nginx nginx:latest
$ docker exec -it nginx cat /etc/nginx/nginx.conf

Building an image based on Alpine Linux:

$ cat <<EOF > Dockerfile
FROM alpine:3.7
RUN apk add --no-cache mysql-client
ENTRYPOINT ["mysql"]
EOF

$ docker build -t mysql-alpine .
$ docker run mysql-alpine

And so forth and so on...

For the newcomers learning the containerization through hands-on experience, this may lead to a false impression that containers are somewhat indistinguishable from full-fledged operating systems and that they are always based on well-known and wide-spread Linux distributions like debian, centos, or alpine.

Read more

Working with container images in Go

I've been working on adding basic images support to my experimental container manager and to my surprise, the task turned to be more complex than I initially expected. I spent some time looking for ways to manage container images directly from my application code. There is plenty of tools out there (docker, containerd, podman, buildah, cri-o, etc) providing image management capabilities. However, if you don't want to have a dependency on an external daemon running in your system, or you don't feel like shelling out for exec-ing a command-line tool from the code, the options are at best limited.

I've reviewed a bunch of the said tools focusing on the underlying means they use to deal with images and at last, I found two appealing libraries. The first one is github.com/containers/image library "[...] aimed at working in various way with containers' images and container image registries". The second one is github.com/containers/storage "[...] which aims to provide methods for storing filesystem layers, container images, and containers". The libraries are meant to be used in conjunction and form a very powerful image management tandem. But unfortunately, I could not find a sufficient amount of documentation, especially how to get started kind of it.

Without the docs the only way to learn how to use the libraries for me was to analyze the code of their dependants (most prominently - buildah and cri-o). It took me a while to forge a working example which is capable of:

  • pulling images from remote repositories;
  • storing images locally;
  • creating and mounting containers (i.e. writable instances of images).

In the rest of the article, I'll try to show how to use the libraries to perform the said task and highlight the most interesting parts of this journey.

Disclaimer: This is by no means an attempt to fully or even partially document the libraries!

Read more

Master Go While Learning Containers

I spent half a year deep-diving into the world of containers and their orchestration. I have been enjoying it very much and learned a lot. On my journey, I need to tackle lots of interesting and specific concepts. But there is one commonality almost every project in this area possesses. When it comes to containers - the Go programming language is ubiquitous!

Read more

Implementing Container Runtime Shim: Interactive Containers

In the previous articles, we discussed the scope of the container runtime shim and drafted the minimum viable version. Now, it's time to move on and have some fun with more advanced scenarios! Have you ever wondered how docker run -i or kubectl run --stdin work? If so, this article is for you! We will try to replicate this piece of functionality in our experimental container manager. And as you have probably guessed, the container runtime shim will do a lot of heavy lifting here again.

conman - interactive container demo

Read more

How to use Flask with gevent (uWSGI and Gunicorn editions)

Python is booming and Flask is a pretty popular web-framework nowadays. Probably, quite some new projects are being started in Flask. But people should be aware, it's synchronous by design and ASGI is not a thing yet. So, if someday you realize that your project really needs asynchronous I/O but you already have a considerable codebase on top of Flask, this tutorial is for you. The charming gevent library will enable you to keep using Flask while start benefiting from all the I/O being asynchronous. In the tutorial we will see:

  • How to monkey patch a Flask app to make it asynchronous w/o changing its code.
  • How to run the patched application using gevent.pywsgi application server.
  • How to run the patched application using Gunicorn application server.
  • How to run the patched application using uWSGI application server.
  • How to configure Nginx proxy in front of the application server.

Read more

My 10 Years of Programming Experience

Regardless of whether it's the end of the calendar decade or not it's the end of a programming decade for me. I started early in 2010 and since then I've been programming almost every day, including weekends and vacations. This was a really exciting period in my life and I realized that it's been a while since 2010 only recently. So, I decided to put into words some of my learnings from that time. Warning: the content of this article is highly opinionated and extremely subjective.

Read more

Implementing Container Runtime Shim: First Code

Well, at this moment we already know what we need to deal with. In order to use runc from code we need to implement our shim as a daemon and this daemon has to be as long-lived as the underlying container process. In this article, we will try to develop a minimum viable runtime shim and integrate it with our experimental container manager.

The minimal shim implementation takes as its input a path to the container bundle (with the config.json) as well as the list of the predefined locations (for the container log file, container pidfile, container exit status file, etc). The shim needs to create a container by executing runc with the provided parameters and then serve the container process until its termination. The planned functionality of this shim version includes:

  • Detaching the shim from the container manager process.
  • Launching runc and handling container creation errors.
  • Reporting the status of the container creation back to the manager.
  • Streaming container's stdout and stderr to the log file.
  • Tracking and reporting the exit code of the container.

Read more

Implementing Container Runtime Shim: runc

A container runtime shim is a piece of software that resides in between a container manager (containerd, cri-o, podman) and a container runtime (runc, crun) solving the integration problem of these counterparts.


Layered Docker architecture: docker (cli) -> dockerd -> containerd -> containerd-shim -> runc

Layered Docker architecture

The easiest way to spot a shim is to inspect the process tree on a Linux host with a running docker container:

Spotting container runtime shim process

ps auxf output on a host running docker run -it ubuntu bash; notice containerd-shim process in between containerd and bash.

On the one hand, runtimes need shims to be able to survive managers restarts. On the other hand, shims are helping container managers to deal with the quirky behavior of runtimes. As a part of the container manager implementation series, we will try to create our own shim and then integrate it with conman, an experimental container manager. Hopefully, during the development, we will gain an in-depth understanding of the topic.

However, before jumping to the shim development, we need to familiarize ourselves with the container runtime component of the choice. Unsurprisingly, conman uses runc as a container runtime, so I will start the article by covering basic runc use cases alongside its design quirks. Then I'll show the naive way to use runc from code and explain some related pitfalls. The final part of the article will provide an overview of the shim's design.

Read more

Kubernetes Repository On Flame

When I'm diving into a new codebase, I always start from the project structure analysis. And my favorite tool is tree. However, not every project is perfectly balanced. Some files and folders tend to be more popular and contain much more code than others. Seems like yet another incarnation of the Pareto principle.

So, when the tree's capabilities aren't enough, I jump to cloc. This tool is much more powerful and can show nice textual statistics for the number of code lines and programming languages used per the whole project or per each file individually.

However, some projects are really huge and some lovely visualization would be truly helpful! And here the FlameGraph goes! What if we feed the cloc's output for the Kubernetes codebase to FlameGraph? Thanks to the author of this article for the original cloc-to-flamegraph one-liner:

git clone https://github.com/brendangregg/FlameGraph
go get -d github.com/kubernetes/kubernetes

cd $(go env GOPATH)/src/github.com/kubernetes/kubernetes

cloc --csv-delimiter="$(printf '\t')" --by-file --quiet --csv . | \
    sed '1,2d' | \
    cut -f 2,5 | \
    sed 's/\//;/g' | \
    ~/FlameGraph/flamegraph.pl \
        --width=3600 \
        --height=32 \
        --fontsize=8 \
        --countname=lines \
        --nametype=package \
    > kubernetes.html

open kubernetes.html

Read more

What is a Container Manager

With this article, I want to start a series about the implementation of a container manager. What the heck is a container manager? Some prominent examples would be containerd, cri-o, dockerd, and podman. People here and there keep calling them container runtimes, but I would like to reserve the term runtime for a lower-level thingy - the OCI runtime (de facto runc), and a higher-level component controlling multiple such runtime instances I'd like to call a container manager. In general, by a container manager, I mean a piece of software doing a complete container lifecycle management on a single host. In the following series, I will try to guide you myself through the challenge of the creation of yet another container manager. By no means, the implementation is going to be feature-complete, correct or safe to use. The goal is rather to prove the already proven concept. So, mostly for the sake of fun, let the show begin!

Read more

Explaining Containerization Software

Containers gave birth to more advanced server-side architectures and sophisticated deployment techniques. Containers nowadays are so widespread that there is already a bunch of standard-alike specifications (1, 2, 3, 4, ...) describing different aspects of the containers universe. Of course, on the lowest level lie Linux primitives such as namespaces and cgroups. But containerization software is already so massive that it would be barely possible to implement it without its own concern separation layers. What I'm trying to achieve in this ongoing effort is to guide myself starting from the lowest layers to the topmost ones, having as much practice (code, installation, configuration, integration, etc) and, of course, fun as possible. The content of this page is going to be changing over time, reflecting my understanding of the topic.

Read more

Illustrated introduction to Linux iptables

Gee, it's my turn to throw some gloom light on iptables! There are hundreds or even thousands of articles on the topic out there, including introductory ones. I'm not going to put either formal and boring definitions here nor long lists of useful commands. I would rather try to use layman's terms and scribbling as much as possible to give you some insights about the domain before going to all these tables, rules, targets, and policies. By the way, the first time I faced this tool I was pretty much confused by the terminology too!

Read more

From Docker Container to Bootable Linux Disk Image

Well, I don't see any practical applications of the approach I'm going to describe... However, I do think that messing about with things like this is the only way to gain extra knowledge of any system internals. We are going to speak Docker and Linux here. What if we want to take a base Docker image, I mean really base, just an image made with a single line Dockerfile like FROM debian:latest, and convert it to something launchable on a real or virtual machine? In other words, can we create a disk image having exactly the same Linux userland a running container has and then boot from it? For this we would start with dumping container's root file system, luckily it's as simple as just running docker export, however, to finally accomplish the task a bunch of additional steps is needed...

Read more

Explaining async/await in 200 lines of code

In the previous article, we learned how to implement a simple but workable event loop. However, programs which are supposed to be run by the event loop are full of callbacks. This is the usual problem of event-loop-driven environments. When business logic becomes reasonably complicated, callbacks make program's code hardly readable and painfully maintainable. And the callback hell begins! There is plenty of ways to deal with the artificial complexity arose due to callbacks, but the most impressive one is to make the code great flat again. And by flat, I mean callback-less and synchronous-like. Usually, it's done by introducing async/await syntactic feature. But every high-level abstraction is built on top of some basic and fundamental ideas. Let's check the async/await sugar out and see what exactly happens under the hood.



Read more

Explaining event loop in 100 lines of code

There is plenty of articles out there about the event loop. However, as a software engineer, I prefer to read code, not text. And there is no better way of learning a new concept than implementing it yourself. So, let's try to grasp the idea of the event loop by coding a new and shiny one.

NB: In the article, we will try to describe the idea of the event loop in general, not a specific implementation of the event loop in Node.js or Python, or some other language/library.

Roller coaster

Read more

Save the day with gevent

One day you were hired by a startup to build yet another web site service. The problem sounded similar to numerous back-ends you had been doing before. So, you decided to move with python to be conservative enough and pick flask to be lightweight but yet powerful enough. Nobody was fired for buying IBM starting things in python, wise man's choice.

Magister Yoda: Why am I called wise? I can barely speak English!

Read more

Truly optional scalar types in protobuf3 (with Go examples)

In contrast to protobuf2 there is no way in protobuf3 to mark some fields as optional and some other fields as required. Instead, any field might be omitted leading this field to be set to its default zero-value. I believe there were many good reasons for such a design decision. However, while this behavior might be superior to the proto2's explicit distinction between required and optional fields, it also has some unfortunate implications.

Read more

Node.js Writable streams distilled

Writable streams are an abstraction for a destination to which data is written... And this time it's a concise abstraction! Compared to vague readable streams (multiple operation modes behind single abstraction) writable streams implement only one mode and hence expose only one essential method write(chunk). Nevertheless, the idea of writable streams is not trivial and it's due to one nice feature it provides - backpressure. Let's try to wrap our mind around writable stream abstraction.

Read more

Node.js Readable streams distilled

Readable stream is an abstraction for some data source. Which could be hard to grasp and even harder to use...

Everybody knows that readable streams support two modes of operating (flowing and paused) and piping to writable streams. It's not that easy to understand the purposes of these mechanisms and behavioral differences though. Since one readable stream abstraction stands for multiple usage modes its public interface (i.e. the set of methods and events) is a bit inconsistent. Usage of readable streams might be totally confusing without the understanding of the underlying ideas. In this article, we will make an attempt to justify the abstraction of readable streams trying to implement our own file reader. Also, we will have a look at some nicer ways to consume readable streams.

Read more