Working with container images in Go

Preface

I've been working on adding basic images support to my experimental container manager and to my surprise, the task turned to be more complex than I initially expected. I spent some time looking for ways to manage container images directly from my application code. There is plenty of tools out there (docker, containerd, podman, buildah, cri-o, etc) providing image management capabilities. However, if you don't want to have a dependency on an external daemon running in your system, or you don't feel like shelling out for exec-ing a command-line tool from the code, the options are at best limited.

I've reviewed a bunch of the said tools focusing on the underlying means they use to deal with images and at last, I found two appealing libraries. The first one is github.com/containers/image library "[...] aimed at working in various way with containers' images and container image registries". The second one is github.com/containers/storage "[...] which aims to provide methods for storing filesystem layers, container images, and containers". The libraries are meant to be used in conjunction and form a very powerful image management tandem. But unfortunately, I could not find a sufficient amount of documentation, especially how to get started kind of it.

Without the docs the only way to learn how to use the libraries for me was to analyze the code of their dependants (most prominently - buildah and cri-o). It took me a while to forge a working example which is capable of:

  • pulling images from remote repositories;
  • storing images locally;
  • creating and mounting containers (i.e. writable instances of images).

In the rest of the article, I'll try to show how to use the libraries to perform the said task and highlight the most interesting parts of this journey.

Disclaimer: This is by no means an attempt to fully or even partially document the libraries!

Prerequisites

The storage library is responsible not only for storing things on disk but also for mounting overlay filesystems. It implements various storage drivers (overlay, btrfs, vfs, etc) and requires cgo for building. While the full list of system dependencies can be looked up here, for the set of use cases from the article only the following packages are needed (assuming pre-installed go and git):

# Debian 10
apt-get install pkg-config libgpgme-dev libdevmapper-dev

# CentOS 8 (with enabled EPEL Repository)
yum install libassuan-devel gpgme-devel device-mapper-devel

Creating storage

Both storage and image libraries by default expect some configuration files present under /etc/containers path. A reference set of config files could be obtained by installing buildah. However, most of the settings could be overwritten in code:

package main

import "github.com/containers/storage"

func main() {
    // Reads /etc/containers/*.json
    options, err := storage.DefaultStoreOptions(false, 0)
    if err != nil {
        panic(err)
    }

    // options.RunRoot = "/path/to/root"
    // options.GraphRoot = "/path/to/graph/root"
    // options.GraphDriverName = "vfs" | "overlay" | etc
    // ...

    store, err := storage.GetStore(options)
    if err != nil {
        panic(err)
    }

    status, err := store.Status()
    if err != nil {
        panic(err)
    }
    println(status)
}

The storage.DefaultStoreOptions() tries to support different defaults for root- and rootless modes. However, the rootless mode requires some extra hustle with user namespaces and the set of supported drivers becomes pretty limited. Apparently, out of the box vfs is used and it's not a true overlay filesystem. On systems with rootless overlay filesystem support (Ubuntu?) behavior may vary. Using fuse-overlayfs is probably another alternative for the rootless mode. All the examples in the rest of the article will require sudo privileges and mostly use overlay driver.

The storage library uses reexec trick to run some of its subtasks in separate processes. Thus, before going full-on with storage, we need to initialize it properly:

package main

import "github.com/containers/storage/pkg/reexec"

func main() {
    if reexec.Init() {
        return
    }

    // your code here
}

Pulling images

It's time to take a look at the image library. It offers lots of functionality, but here we are mostly interested in its transport and copy capabilities. To pull an image, we need to know the transport (or will docker used by default), the repository (or hub.docker.com will be used by default), image name, and image tag (or latest will be used by default). Obviously, the image library requires the storage counterpart to store pulled images. Here is how we can pull an image from a remote repository (see a working example here):

package main

import (
    "context"
    "os"

    "github.com/containers/image/copy"
    "github.com/containers/image/signature"
    "github.com/containers/image/storage"
    "github.com/containers/image/transports/alltransports"
    "github.com/containers/image/types"
)

func main() {
    imageName := "docker://alpine:latest"
    srcRef, _ := alltransports.ParseImageName(imageName)

    // Carries various default locations.
    systemCtx := &types.SystemContext{}
    policy, _ := signature.DefaultPolicy(systemCtx)
    policyCtx, _ := signature.NewPolicyContext(policy)

    dstName := imageName
    if srcRef.DockerReference() != nil {
        dstName = srcRef.DockerReference().String()
    }
    store := createStore() // see previous section
    dstRef, _ := storage.Transport.ParseStoreReference(store, dstName)

    copyOptions := &copy.Options{ReportWriter: os.Stdout}
    manifest, _ := copy.Image(
        context.Background(),
        policyCtx,
        dstRef,
        srcRef,
        copyOptions,
    )
    println(string(manifest))
}

Listing layers and images

The storage library organizes images on disk in a well-known layered form originally popularized by docker. It's fairly simple to list the existing layers, images, (and containers, see the next section) alongside with their location, digest, and size information:

package main

import "github.com/davecgh/go-spew/spew"

func main() {
    store := createStorage()

    layers, _ := store.Layers()
    spew.Dump(layers)

    images, _ := store.Images()
    spew.Dump(images)
}

Creating containers

From the storage library README file:

A container is a read-write layer which is a child of an image's top layer, along with information which the library can manage for the convenience of its caller. This information typically includes configuration information for running the specific container. Multiple containers can be derived from a single image.

To create a container, we need to call store.CreateContainer() and the only required parameter is the image id:

package main

import "github.com/davecgh/go-spew/spew"

func main() {
    store := createStorage()

    imageID := "<image ID goes here>"
    cont, _ := store.CreateContainer("", nil, imageID, "", "", nil)
    spew.Dump(cont)
}

Mounting containers

Once the container is created, we still need to mount it using one of the storage drivers. The result of mounting is a writable location somewhere in the filesystem:

package main

func main() {
    store := createStorage()

    mountPoint, _ := store.Mount("<container ID goes here>")
    println(mountPoint)
}

Demo

Combining all the knowledge from above and using the demo program, we can come up with the following scenario:

# Pull image
$ ./goimagego pull docker://alpine:latest
Pulling image docker://alpine:latest
Getting image source signatures
Copying blob c9b1b535fdd9 skipped: already exists
Copying config e7d92cdc71 done
Writing manifest to image destination
Storing signatures
Image pulled - {
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 1511,
      "digest": "sha256:e7d92cdc71feacf90708cb59182d0df1b911f8ae022d29e8e95d75ca6a99776a"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 2802957,
         "digest": "sha256:c9b1b535fdd91a9855fb7f82348177e5f019329a58c53c47272962dd60f71fc9"
      }
   ]
}

# Create a new container using the image from above
$ ./goimagego container e7d92cdc71feacf90708cb59182d0df1b911f8ae022d29e8e95d75ca6a99776a
Container:
(*storage.Container)(0xc000089790)({
 ID: (string) (len=64) "f7c2136928fbe8e963b594833f0101b964edb5ec299444c2508ce1d3d15ef3c2",
 Names: ([]string) <nil>,
 ImageID: (string) (len=64) "e7d92cdc71feacf90708cb59182d0df1b911f8ae022d29e8e95d75ca6a99776a",
 LayerID: (string) (len=64) "5d531d0194a79ad4a40232cae1346012739380c0d7c354485d70ddc298ea0d9e",
 Metadata: (string) "",
 BigDataNames: ([]string) <nil>,
 BigDataSizes: (map[string]int64) {
 },
 BigDataDigests: (map[string]digest.Digest) {
 },
 Created: (time.Time) 2020-02-29 17:31:14.129769831 +0000 UTC,
 UIDMap: ([]idtools.IDMap) <nil>,
 GIDMap: ([]idtools.IDMap) <nil>,
 Flags: (map[string]interface {}) (len=2) {
  (string) (len=12) "ProcessLabel": (string) "",
  (string) (len=10) "MountLabel": (string) ""
 }
})

# Mount the created container
$ ./goimagego mount f7c2136928fbe8e963b594833f0101b964edb5ec299444c2508ce1d3d15ef3c2
/var/lib/containers/storage/overlay/5d531d0194a79ad4a40232cae1346012739380c0d7c354485d70ddc298ea0d9e/merged

$ df /var/lib/containers/storage/overlay/5d531d0194a79ad4a40232cae1346012739380c0d7c354485d70ddc298ea0d9e/merged
Filesystem     1K-blocks    Used Available Use% Mounted on
overlay         10474496 5666980   4807516  55% /var/lib/containers/storage/overlay/5d531d0194a79ad4a40232cae1346012739380c0d7c354485d70ddc298ea0d9e/merged

$ ./goimagego unmount f7c2136928fbe8e963b594833f0101b964edb5ec299444c2508ce1d3d15ef3c2
Unmounted!

After unmounting, the location /var/lib/containers/storage/overlay/5d531d0194a79ad4a40232cae1346012739380c0d7c354485d70ddc298ea0d9e/merged will not exist anymore. However, the topmost (i.e. container's) layer will still be there containing all the changes made to the filesystem. It's a so called diff folder and usually it resides close to the original merged folder. In my case it was /var/lib/containers/storage/overlay/5d531d0194a79ad4a40232cae1346012739380c0d7c354485d70ddc298ea0d9e/diff. To clean up after the container properly, we need to wipe the whole /var/lib/containers/storage/overlay/<container_id> hierarchy.

Miscellaneous

A lot of things could go wrong while working with images, containers, storage drivers, etc. So, never skip or suppress errors =) However, sometimes error messages aren't that helpful. In such situations, I found setting the log level to DEBUG extremely useful. Both image and storage use sirupsen/logrus, so putting logrus.SetLevel(logrus.DebugLevel) at the beginning of your program sheds tons of light on the actual execution problems.

And if you're getting completely desperate, even with enabled DEBUG logs, delve, is there to save you!

Make code, not war!

See also

How are docker images built? A look into the Linux overlay file-systems and the OCI specification by Nicola Apicella