Prometheus Cheat Sheet - Moving Average, Max, Min, etc (Aggregation Over Time)

When you have a long series of numbers, such as server memory consumption scraped 10-secondly, it's a natural desire to derive another, probably more meaningful series from it, by applying a moving window function. For instance, moving average or moving quantile can give you much more readable results by smoothing some spikes.

Prometheus has a bunch of functions called <smth>_over_time(). They can be applied only to range vectors. It essentially makes them window aggregation functions. Every such function takes in a range vector and produces an instant vector with elements being per-series aggregations.

For people like me who normally grasp code faster than text, here is some pseudocode of the aggregation logic:

# Input vector example.
range_vector = [
    ({"lab1": "val1", "lab2": "val2"}, [(12, 1624722138), (11, 1624722148), (17, 1624722158)]),
    ({"lab1": "val1", "lab2": "val2"}, [(14, 1624722138), (10, 1624722148), (13, 1624722158)]),
    ({"lab1": "val1", "lab2": "val2"}, [(16, 1624722138), (12, 1624722148), (15, 1624722158)]),
    ({"lab1": "val1", "lab2": "val2"}, [(12, 1624722138), (17, 1624722148), (18, 1624722158)]),
]

# agg_func examples: `sum`, `min`, `max`, `avg`, `last`, etc.

def agg_over_time(range_vector, agg_func, timestamp):
    # The future instant vector.
    instant_vector = {"timestamp": timestamp, "elements": []}

    for (labels, samples) in range_vector:
        # Every instant vector element is 
        # an aggregation of multiple samples.
        sample = agg_func(samples)
        instant_vector["elements"].append((labels, sample))

    # Notice, that the timestamp of the resulting instant vector 
    # is the timestamp of the query execution. I.e., it may not 
    # match any of the timestamps in the input range vector.
    return instant_vector

Read more

Prometheus Cheat Sheet - How to Join Multiple Metrics (Vector Matching)

PromQL looks neat and powerful. And at first sight, simple. But when you start using it for real, you'll quickly notice that it's far from being trivial. Searching the Internet for query explanation rarely helps - most articles focus on pretty high-level overviews of the language's most basic capabilities. For example, when I needed to match multiple metrics using the common labels, I quickly found myself reading the code implementing binary operations on vectors. Without a solid understanding of the matching rules, I constantly stumbled upon various query execution errors, such as complaints about missing group_left or group_right modifier. Reading the code, feeding my local Prometheus playground with artificial metrics, running test queries, and validating assumptions, finally helped me understand how multiple metrics can be joined together. Below are my findings.

Read more

The Need For Slimmer Containers

I was hacking containers recently and noticed, that Docker started featuring the docker scan command in the docker build output. I've been ignoring its existence for a while, so evidently, it was time to finally try it out.

Scanning official Python images

The docker scan command uses a third-party tool, called Snyk Container. Apparently, it's some sort of a vulnerability scanner. So, I decided, mostly for the sake of fun, to scan one of my images. And it just so happened that it was a fairly basic thing:

# latest stable at the time
FROM python:3.9

RUN pip install Flask

COPY server.py server.py

ENV FLASK_APP=server.py
ENV FLASK_RUN_PORT=5000
ENV FLASK_RUN_HOST=0.0.0.0

EXPOSE 5000

CMD ["flask", "run"]

I ran docker build -t python-flask . and then docker scan python-flask. To my utter surprise, the output was huge! Here is just an excerpt:

Read more

Understanding Rust Privacy and Visibility Model

I spent the last couple of months writing code in Rust. It was probably my third or fourth attempt to write something substantial in this language. And every time my level of understanding of things deepened. I'm by no means a Rust expert so probably I'll be extremely inaccurate in the terminology here. And likely I'll get lots of technical details wrong too. But I had this epiphany moment of how the visibility and privacy model works in Rust so I can't help but think of sharing it with someone else.

Read more

Bridge vs. Switch: What I Learned From a Data Center Tour

The difference between these two networking devices has been an unsolvable mystery to me for quite some time. For a while, I used to use the words "bridge" and "switch" interchangeably. But after getting more into networking, I started noticing that some people tend to see them as rather different devices... So, maybe I've been totally wrong? Maybe saying "bridge aka switch" is way too inaccurate?

Let's try to figure it out!

How network switch works

Read more