pq - parse and query log files as time series

I often find myself staring at Nginx or Envoy access logs flooding my screens with real-time data. My only wish at such moments is to be able to aggregate these lines somehow and analyze the output at a slower pace, ideally, with some familiar and concise query language. And to my surprise, I haven't met a tool satisfying all my requirements yet. Well, I should be honest here - I haven't done thorough research. But if there would be a tool as widely known as jq for JSON, I wouldn't miss it probably.

So, here we go - my attempt to write a full-fledged parsing and query engine and master Rust at the same time. Yes, I know, it's a bad idea. But who has time for good ones?

First things first - a usage preview:

Read more

Prometheus Cheat Sheet - Moving Average, Max, Min, etc (Aggregation Over Time)

When you have a long series of numbers, such as server memory consumption scraped 10-secondly, it's a natural desire to derive another, probably more meaningful series from it, by applying a moving window function. For instance, moving average or moving quantile can give you much more readable results by smoothing some spikes.

Prometheus has a bunch of functions called <smth>_over_time(). They can be applied only to range vectors. It essentially makes them window aggregation functions. Every such function takes in a range vector and produces an instant vector with elements being per-series aggregations.

For people like me who normally grasp code faster than text, here is some pseudocode of the aggregation logic:

# Input vector example.
range_vector = [
    ({"lab1": "val1", "lab2": "val2"}, [(12, 1624722138), (11, 1624722148), (17, 1624722158)]),
    ({"lab1": "val1", "lab2": "val2"}, [(14, 1624722138), (10, 1624722148), (13, 1624722158)]),
    ({"lab1": "val1", "lab2": "val2"}, [(16, 1624722138), (12, 1624722148), (15, 1624722158)]),
    ({"lab1": "val1", "lab2": "val2"}, [(12, 1624722138), (17, 1624722148), (18, 1624722158)]),
]

# agg_func examples: `sum`, `min`, `max`, `avg`, `last`, etc.

def agg_over_time(range_vector, agg_func, timestamp):
    # The future instant vector.
    instant_vector = {"timestamp": timestamp, "elements": []}

    for (labels, samples) in range_vector:
        # Every instant vector element is 
        # an aggregation of multiple samples.
        sample = agg_func(samples)
        instant_vector["elements"].append((labels, sample))

    # Notice, that the timestamp of the resulting instant vector 
    # is the timestamp of the query execution. I.e., it may not 
    # match any of the timestamps in the input range vector.
    return instant_vector

Read more

Prometheus Cheat Sheet - How to Join Multiple Metrics (Vector Matching)

PromQL looks neat and powerful. And at first sight, simple. But when you start using it for real, you'll quickly notice that it's far from being trivial. Searching the Internet for query explanation rarely helps - most articles focus on pretty high-level overviews of the language's most basic capabilities. For example, when I needed to match multiple metrics using the common labels, I quickly found myself reading the code implementing binary operations on vectors. Without a solid understanding of the matching rules, I constantly stumbled upon various query execution errors, such as complaints about missing group_left or group_right modifier. Reading the code, feeding my local Prometheus playground with artificial metrics, running test queries, and validating assumptions, finally helped me understand how multiple metrics can be joined together. Below are my findings.

Read more

The Need For Slimmer Containers

I was hacking containers recently and noticed, that Docker started featuring the docker scan command in the docker build output. I've been ignoring its existence for a while, so evidently, it was time to finally try it out.

Scanning official Python images

The docker scan command uses a third-party tool, called Snyk Container. Apparently, it's some sort of a vulnerability scanner. So, I decided, mostly for the sake of fun, to scan one of my images. And it just so happened that it was a fairly basic thing:

# latest stable at the time
FROM python:3.9

RUN pip install Flask

COPY server.py server.py

ENV FLASK_APP=server.py
ENV FLASK_RUN_PORT=5000
ENV FLASK_RUN_HOST=0.0.0.0

EXPOSE 5000

CMD ["flask", "run"]

I ran docker build -t python-flask . and then docker scan python-flask. To my utter surprise, the output was huge! Here is just an excerpt:

Read more

Understanding Rust Privacy and Visibility Model

I spent the last couple of months writing code in Rust. It was probably my third or fourth attempt to write something substantial in this language. And every time my level of understanding of things deepened. I'm by no means a Rust expert so probably I'll be extremely inaccurate in the terminology here. And likely I'll get lots of technical details wrong too. But I had this epiphany moment of how the visibility and privacy model works in Rust so I can't help but think of sharing it with someone else.

Read more