Disclaimer 1: I work for Slim.AI as a software engineer, but this article is my personal take on the topic. No one at Slim.AI asked me to write it or somehow influenced the narrative.
Disclaimer 2: It's not an attempt to attack Alpine or any Alpine-based products. The article is about the trouble of producing decent container images, and the pros and cons of using Alpine is just one of the themes here.
Many of us these days seem to be in pursuit of better container images. And this is for good reasons! Bloated images with many (potentially unneeded) moving parts slow down development and give more space for a CVE to sneak in. Luckily, there is a number of ways to produce slim and secure images, and everyone just needs to pick their poison a suitable one. But before doing so, it's good to become aware of a potential dissonance between what we say is important for us (securing our software supply chains) and what may actually drive our decisions (keeping out dev loops fast).
Level up your server-side game β join 7,500 engineers getting insightful learning materials straight to their inbox.
One of the ways to produce better containers is to carefully choose the base image. Alpine seems like a popular choice today, so I decided to ask folks on Twitter if they use Alpine for their containerized services:
If you develop or operate containerized services, do you use alpine for prod workloads?
— Ivan Velichko (@iximiuz) August 18, 2022
And the results came out even more extreme than the recent Chainguard's "All About That Base Image" report forecasted. Assuming my improvised poll is representative enough, Alpine's usage is going through the roof!
Why?
There might be multiple reasons to choose Alpine - most notably, its insanely small size (just ~5MB!) and its increased focus on security (Alpine usually has either few or no reported CVEs). So, I followed up with my next question: What matters to you most - the size of the image (hence the development speed) or its secureness (measured in the number of reported CVEs)?
I care about container imagesβ quality (total size, amount of packages, etc) because of
— Ivan Velichko (@iximiuz) August 19, 2022
And, despite popular belief, people care almost as much about the image size (hence speed) as the number of CVEs in them. But I'm not surprised! There is a lot of talk at the moment about vulnerability scanners and securing software supply chains. However, slow inner and outer dev loops, often caused by endless CI/CD pipelines, might be more real trouble for developers than the need to protect their services from hypothetical threats. So, without all that hype around scanners, I'd expect the results to be even more skewed toward the "size & speed" option.
Now comes the funniest part!
In the previous question, I intentionally didn't mention Alpine. Assuming at least half of us care about the number of CVEs in our images more than the sole image size (the remaining ~5% replied they care equally about both), and Alpine is excellent on both fronts, let's see why people decide to use Alpine for their containers:
I use alpine as a base container image because of
— Ivan Velichko (@iximiuz) August 19, 2022
Of course, the intersection of the respondents wasn't perfect, but it was more or less the same audience! Something doesn't compute, right? Almost 80% chose Alpine because of its size! But where are all that folks that care more about vulnerabilities than the convenience of DevEx? π
β οΈ I used Alpine in these polls to ensure enough respondents and make the secureness/speed dissonance more apparent (Remember, Alpine is probably the most popular tiny base image and, at the same time, it's a security-focused one).
However, I'm not surprised again. Like many others, I also often choose a faster way over a more secure one π
But there might be a problem. As Kyle Quest, the creator of DockerSlim, rightfully noticed (even though it might sound like a little snark), the last poll should have included two more options: [I use alpine as a base container image] because others said I should be using Alpine, and because a bunch of hello world examples on GitHub use it. Well, I myself am guilty of doing it too! Way too often, my choice of Alpine wasn't really conscious. But it worked! And it was super fast and small! And these are the only things that matter (until you run into a breach, but it also is less likely with Alpine). So why would I reconsider?
Well, remember the results of the first poll?
A quarter of respondents said they started with Alpine but then moved away. The reasons (in the replies) mostly mention various compatibility and performance issues.
The root cause of the above issues probably lies in the fact that Alpine is not a drop-in replacement for other (good old) Linux distros. Debian, Ubuntu, CentOS, and even RHEL are all part of the GNU/Linux family. They use the same set of (GNU) tools and, most importantly, the same C standard library (glibc), while Alpine relies on BusyBox and musl. And that makes it, well, different. Sometimes (most of the time?) it's beneficial, but sometimes it's not. And you need to be aware of it because it may increase your operational costs (1, 2, 3, 4).
Getting back to the problem of producing better container images, what are the alternatives to choosing Alpine as a base image? Here are the ones that I found:
Google's distroless images: one of the first attempts to solve the problem of bloated container images. Debian-based, really small footprint, with the out-of-the-box support of the most popular runtimes (static, Java, Node.js, and experimental Python support). But it has one big con - you need to learn bazel to add a new dependency to such a distroless image. So, not so flexible.
Canonical's chisel project: Ubuntu-based, provides similar to Chainguard's distroless UX (which is just amazing). It's very early, but Microsoft has already used it to build slim .NET containers.
π‘ Off-topic: Haven't heard of Chainguard's distroless project yet? It's an attempt to rethink Google's approach to distroless leveraging Alpine's apk
ecosystem and two new tools - apko
and melange
. Go check it out! If you're seriously into Alpine, you'll likely find it attractive.
- DockerSlim (and DockerSlim-powered Slim.AI SaaS): the tool uses a slightly different but also efficient approach of automatically converting fat images into optimized slim ones: You can start with your favorite (Debian, Ubuntu, Red Hat's UBI, etc) base image, carelessly throw a ton of packages in, and DockerSlim will (try to) convert it into an optimal and secure one.
So, with all that pluralism of approaches, what actually matters in the end - size, speed, or the number of CVEs in the image?
Well, I think there is no right or wrong answer to this question. However, I noticed one leitmotif that kept being brought up by different folks in replies. If it's generally beneficial to produce clean and efficient software, and containers are an integral part of it, why not try making our images better too? And the right size and good-enough security will follow.
Containers are the medium of software packaging. I want to make "good software" so I should intrinsically care about quality of my container images! And yes security and zippy CI jobs are part of that
— Charles Landau (@landau_charles) August 19, 2022
And much like with the rest of software development, keeping the complexity at bay is vital:
Runtime complexity, which translates to troubleshooting, security etc...
— Son Luong (@sluongng) August 19, 2022
The speed angle is solved through many different techniques: local caching, lazy layering, p2p download etc, file/block trees, etc.. that you don't need to solve it on image.
Ok, let's try to summarize what we learned:
- Alpine is super popular as a base image, it has a number of significant advantages like a very small footprint and the low number of CVEs, but the operational costs caused by its peculiarities are likely underestimated.
- Alpine is not the only option to produce small and secure images; a number of alternatives is available, and they are getting better every day.
- We say we care about the number of CVEs in our images a lot, but we behave as if we care more about the physical size of the image, while managing image complexity (by limiting and knowing its moving parts) is probably a better thing to focus on.
Level up your server-side game β join 7,500 engineers getting insightful learning materials straight to their inbox: