UPD: This post unexpectedly made it to the Hacker News front page and sparked a fruitful discussion there. I'll try to summarize the key takeaways:
- vulnerability scanners tend to have lots of false positives
- some of the reported findings can already be patched in the upstream and backported
- some can be simply unrelated because they are specific to some esoteric architectures
- official base images are never (or very rarely) updated in repos (e.g. Docker Hub)
- with the raise of containers, the burden of patching OS de facto moved from admin & ops people to developers
- ...but not every developer is aware of that yet
- some people suggest adding
RUN apt-get update && apt-get -y upgradein the beginning of every Dockerfile
- I tried and it gave a very minor reduction in the case of full-fledged Debian 10 distr
- ...but others argue back that it'll result in non-reproducible builds
- and potential regressions due to backports changing the default behavior of dependencies
- which leads to a fair suggestion to control the source repositories
- ...but of course, it complicates things a lot
- and that's why the most typical solution seems to be simply ignoring the problem 🙈
- some people suggest adding
- despite very good scan results, Alpine images not always work great
- because reportedly musl libc is slower than glibc
- not every dependency library provides builds for this platform
Below is the original article.
I was hacking containers recently and noticed, that Docker started featuring the
docker scan command in the
docker build output. I've been ignoring its existence for a while, so evidently, it was time to finally try it out.
Scanning official Python images
docker scan command uses a third-party tool, called Snyk Container. Apparently, it's some sort of a vulnerability scanner. So, I decided, mostly for the sake of fun, to scan one of my images. And it just so happened that it was a fairly basic thing:
# latest stable at the time FROM python:3.9 RUN pip install Flask COPY server.py server.py ENV FLASK_APP=server.py ENV FLASK_RUN_PORT=5000 ENV FLASK_RUN_HOST=0.0.0.0 EXPOSE 5000 CMD ["flask", "run"]
Testing python-flask... ✗ Low severity vulnerability found in unbound/libunbound8 Description: Improper Input Validation Info: https://snyk.io/vuln/SNYK-DEBIAN10-UNBOUND-534899 Introduced through: email@example.com From: firstname.lastname@example.org > mariadb-10.3/libmariadb-dev-compat@1:10.3.27-0+deb10u1 > mariadb-10.3/libmariadb-dev@1:10.3.27-0+deb10u1 > email@example.com+deb10u6 > firstname.lastname@example.org+deb10u6 > email@example.com+deb10u2 ✗ Low severity vulnerability found in tiff/libtiff5 Description: Out-of-Bounds Info: https://snyk.io/vuln/SNYK-DEBIAN10-TIFF-1079067 Introduced through: imagemagick@8:22.214.171.124+dfsg-2.1+deb10u1, imagemagick/libmagickcore-dev@8:126.96.36.199+dfsg-2.1+deb10u1 From: imagemagick@8:188.8.131.52+dfsg-2.1+deb10u1 > imagemagick/imagemagick-6.q16@8:184.108.40.206+dfsg-2.1+deb10u1 > imagemagick/libmagickcore-6.q16-6@8:220.127.116.11+dfsg-2.1+deb10u1 > firstname.lastname@example.org+git191117-2~deb10u2 From: imagemagick/libmagickcore-dev@8:18.104.22.168+dfsg-2.1+deb10u1 > imagemagick/libmagickcore-6.q16-dev@8:22.214.171.124+dfsg-2.1+deb10u1 > email@example.com+git191117-2~deb10u2 > firstname.lastname@example.org+git191117-2~deb10u2 From: imagemagick/libmagickcore-dev@8:126.96.36.199+dfsg-2.1+deb10u1 > imagemagick/libmagickcore-6.q16-dev@8:188.8.131.52+dfsg-2.1+deb10u1 > email@example.com+git191117-2~deb10u2 > firstname.lastname@example.org+git191117-2~deb10u2 > email@example.com+git191117-2~deb10u2 and 3 more... ... ✗ High severity vulnerability found in gcc-8 Description: Insufficient Entropy Info: https://snyk.io/vuln/SNYK-DEBIAN10-GCC8-469413 Introduced through: gcc-defaults/g++@4:8.3.0-1, firstname.lastname@example.org, imagemagick@8:184.108.40.206+dfsg-2.1+deb10u1, meta-common-packages@meta From: gcc-defaults/g++@4:8.3.0-1 > email@example.com From: firstname.lastname@example.org > email@example.com From: gcc-defaults/g++@4:8.3.0-1 > firstname.lastname@example.org > email@example.com and 23 more... ✗ High severity vulnerability found in djvulibre/libdjvulibre21 Description: NULL Pointer Dereference Info: https://snyk.io/vuln/SNYK-DEBIAN10-DJVULIBRE-481572 Introduced through: imagemagick/libmagickcore-dev@8:220.127.116.11+dfsg-2.1+deb10u1 From: imagemagick/libmagickcore-dev@8:18.104.22.168+dfsg-2.1+deb10u1 > imagemagick/libmagickcore-6.q16-dev@8:22.214.171.124+dfsg-2.1+deb10u1 > firstname.lastname@example.org > email@example.com From: imagemagick/libmagickcore-dev@8:126.96.36.199+dfsg-2.1+deb10u1 > imagemagick/libmagickcore-6.q16-dev@8:188.8.131.52+dfsg-2.1+deb10u1 > imagemagick/libmagickcore-6.q16-6-extra@8:184.108.40.206+dfsg-2.1+deb10u1 > firstname.lastname@example.org From: imagemagick/libmagickcore-dev@8:220.127.116.11+dfsg-2.1+deb10u1 > imagemagick/libmagickcore-6.q16-dev@8:18.104.22.168+dfsg-2.1+deb10u1 > email@example.com and 1 more... ✗ High severity vulnerability found in bluez/libbluetooth3 Description: Double Free Info: https://snyk.io/vuln/SNYK-DEBIAN10-BLUEZ-1018718 Introduced through: firstname.lastname@example.org~deb10u1 From: email@example.com~deb10u1 > firstname.lastname@example.org~deb10u1 From: email@example.com~deb10u1 Package manager: deb Project name: docker-image|python-flask Docker image: python-flask Platform: linux/amd64 Tested 431 dependencies for known vulnerabilities, found 358 vulnerabilities. For more free scans that keep your images secure, sign up to Snyk at https://dockr.ly/3ePqVcp
358 vulnerabilities were found in total: among them, 54 were high severity and 48 medium severity.
Taking a closer look at the scan report gave me a clue that the majority of the found vulnerabilities might have something to do with Debian (see the
Info: https://snyk.io/vuln/SNYK-DEBIAN10-... bit above, lots, if not all, of items in the report have it).
python:3.9 image is based on the full-fledged Debian 10 distributive:
114 MB of potential security flaws.
To be honest, it was mind-blowing! I knew that the thicker our containers are, the higher the potential attack surface. But anyway, I didn't expect such a big scale.
Ok, let's try to slim down the image...
FROM python:3.9-slim RUN pip install Flask COPY server.py server.py ENV FLASK_APP=server.py ENV FLASK_RUN_PORT=5000 ENV FLASK_RUN_HOST=0.0.0.0 EXPOSE 5000 CMD ["flask", "run"]
python:3.9-slim-based image gave me much better results:
Package manager: deb Project name: docker-image|python-flask-slim Docker image: python-flask-slim Platform: linux/amd64 Tested 94 dependencies for known vulnerabilities, found 69 vulnerabilities.
69 vulnerabilities were found in total: among them, 14 were high severity and 8 medium severity.
A vulnerability per megabyte of the base image...
Can we do better? Let's try
FROM python:3.9-alpine RUN pip install Flask COPY server.py server.py ENV FLASK_APP=server.py ENV FLASK_RUN_PORT=5000 ENV FLASK_RUN_HOST=0.0.0.0 EXPOSE 5000 CMD ["flask", "run"]
Finally! 0 known vulnerabilities!
Package manager: apk Project name: docker-image|python-flask-alpine Docker image: python-flask-alpine Platform: linux/amd64 ✓ Tested 37 dependencies for known issues, no vulnerable paths found.
5.6 MB - Alpine rocks!
Scanning distroless Python image
Another potential solution for the problem of bloated containers that looks promising is so-called "distroless" Docker images by Google. The project description states it's "Language focused docker images, minus the operating system". Although it'd be hard to achieve in the case of Python since its standard library relies on some higher-level OS capabilities.
It took me a while to come up with the working distroless Python image. Most of the examples out there show how to use some trivial scripts. However, installing Flask (or any other dependencies) turned out to be much tricker than I expected. Python distroless containers require multi-stage building process because the
gcr.io/distroless/python3 image has neither
pip nor even
easy_install. At first, I was trying to utilize a virtual environment in the build image, then copy it to the runtime image, and hack the
PATH variable. But it didn't work out well, because the base distroless image makes some opinionated choices on the filesystem layout:
Expectedly, a Linux distr is still there, but it's thinner than even debian-slim
So, I ended up with the following Dockerfile allowing me to install Flask in a distroless Python image:
# Build image FROM python:3.7-slim AS build-env RUN python -m pip install Flask # Runtime image FROM gcr.io/distroless/python3 COPY --from=build-env /usr/local/bin/flask /usr/local/bin/flask COPY --from=build-env /usr/local/lib/python3.7/site-packages /usr/local/lib/python3.7/site-packages WORKDIR /app COPY server.py server.py # Important line! ENV PYTHONPATH=/usr/local/lib/python3.7/site-packages ENV FLASK_APP=server.py ENV FLASK_RUN_PORT=5000 ENV FLASK_RUN_HOST=0.0.0.0 EXPOSE 5000 CMD ["/usr/local/bin/flask", "run"]
Scanning it with
docker scan gave me the following:
Package manager: deb Project name: docker-image|python-flask-distroless Docker image: python-flask-distroless Platform: linux/amd64 Tested 25 dependencies for known vulnerabilities, found 37 vulnerabilities.
So, it's just 37 vulnerabilities: 6 with high severity and 8 with medium. Sounds like a 90% reduction comparing to the original
python:3.9 image! And it's even fewer vulnerabilities than in the
Thus, our Alpine-based Python image is a winner!
Scanning scratch Go image
I love the idea of building container images completely from scratch, avoiding putting any distr stuff inside. But for that, you'd need a language with profound support of static builds, such as Go ❤️
FROM scratch COPY hello / CMD ["/hello"]
I got three
looks lines... AND THAT’S IT!
Testing go-scratch... Package manager: linux Project name: docker-image|go-scratch Docker image: go-scratch Platform: linux/amd64 ✓ Tested go-scratch for known vulnerabilities, no vulnerable paths found.
Well, no distr doesn't necessarily mean no vulnerabilities. But it does reduce the probability of having one.
Instead of conclusion
From my experience bloated images are usually the results of either using a default
FROM bloated_base or because people intentionally put extra tools to an image for the sake of future debugging/troubleshooting.
The first is probably a result of a vast marketing campaign done by Docker back in the day. To popularize containers, you'd need to give people something looking cool and handy. And being able to launch a full-fledged Linux distr (kind of) from your dev machine in under a second indeed looks very cool even today. Some local tinkering and/or experimentation benefit a lot from
docker run -it ubuntu. However, I hope it was never meant for production use. But there are simply too many examples of Dockerfiles around starting
FROM a full-fledged Debian, CentOS, or Ubuntu to avoid penetrating at least some of them to our production environments.
The second reason looks more legit. But I have a feeling that it's only at the first glance. Ideally, there should be another way to debug your containerized service. The way, that doesn't involve packing all the potentially needed tools into a container. Recent
kubectl debug addition is good proof of that hypothesis. It allows one to inject an ephemeral container into a running pod and such a container can have all the stuff needed for debugging.
Smaller containers don't just mean faster builds and smaller disk and networking utilization, they mean safer life.