How and when to use Docker labels / OCI container annotations

Written by:

November 3, 2021

0 mins read

Most container images are built using Dockerfiles which contain combinations of instructions like FROM, RUN, COPY, ENTRYPOINT, etc. to build the layers of an OCI-compliant image. One instruction that is used surprisingly rarely, though, is LABEL. In this post, we’ll dig into labels ("annotations" in the OCI Image Specification) what they are, some standardized uses as well as some practices you can use to enhance your container security posture.

For the remainder of this article, we’ll be referring to these as labels — as opposed to annotations — since that is the more commonly used term. Let's get started!

What are Docker image labels?

Docker image labels are a way for you to add key-value metadata to your image itself. This data is not exposed to a container running against the image, but rather, is valuable for codifying things like where the source code for the image is, who supports the image, or what CI build created it.

Docker / OCI image metadata explained

If you’ve built any kind of software package before, you know that — for the most part — their models include the software, configuration, and sometimes functional data as well as metadata about the package itself.

For example, while Java .jar files are basically just .zip archives, they all have a top-level META-INF directory that contains several files and directories that, per the Java 2 Platform spec, “... are recognized and interpreted by the Java 2 Platform to configure applications, extensions, class loaders, and services”. If we open up a .jar built by the popular Maven build tool, there will usually be (among other things) a maven directory that has content such as the effective Maven pom.xml and pom.properties used to build the .jar. (FYI “POM” stands for Maven’s Project Object Model)

RPM, APT, NPM, and most other packaging tools have similar metadata stored in them that is used by the tools in the process of installing or running the contained software or for utility purposes by repositories or runtime monitoring systems.

Container images have metadata stored in their layers too. If you list the “history” of an image, you will often see zero-byte sized layers because they don’t contain filesystem changes, but rather, metadata to be used at runtime that are commonly added by Dockerfile commands:

USER: Which user to run the process under
ENV: Variable (and its value) to be set in the processes environment
ARG: Build argument passed into the container build container and used like an environment variable in the scope of the build
CMD: Command and/or parameters to use to start the process (ENTRYPOINT is similar)
LABEL: Key/value pairs that are not used by the runtime engine

Most of these items are well known and used in every Dockerfile, but since label metadata is not something required to run your containers, it is often overlooked.

Why you should use container image labels

There are many reasons to use labels for your images such as documenting versioning, including contact info about the project maintainers, or even runtime usage information. One of the most common use cases is to document information about the construction of the image which can be used as information in the software supply chain of the image artifact.

Docker label / OCI image annotation metadata types

Standardized labels

Image creation/origin metadata is so commonly used that the OCI team publishes a standardized set of keys, all prefixed with “org.opencontainers.image.”, including:

source: URL to get source code for building the image
revision: Source control revision identifier for the packaged software
base.digest: Digest (hash) of the image this image is based on
base.name: Image reference of the image this image is based on
version: Version of the packaged software

Custom labels

Since they are simply key-value pairs, your project/organization can specify pretty much anything they desire. Some possible ideas (all of which could be prefixed with something like “com.mycorp.myteam.”):

ci-build: URL to the CI project run that produced the image
releasenotes: Release notes for the packaged software
healthz: The HTTP endpoint for doing healthchecks
docker.run: A Docker command example for running from this image
k8s.deployment: Base64 YAML for Kubernetes deployment using this image

That last one is interesting because, yes, you can store pretty much anything you can base64 encode as the value of a label key. This means you could actually pull that image and then run something like the following to get a sample YAML deployment file which you could then use to a Kubernetes cluster:

1$ docker image inspect myimage:tag | jq -r ".[].Config.Labels.\"com.mycorp.myteam.k8s.deployment\"" | base64 -d
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5  creationTimestamp: null
6  labels:
7    app: snyk
8  name: snyk
9spec:
10  replicas: 1
11  selector:
12    matchLabels:
13      app: snyk
14  strategy: {}
15  template:
16    metadata:
17      creationTimestamp: null
18      labels:
19        app: snyk
20    spec:
21      containers:
22      - image: ericsmalling/snyklabeldemo:m
23        name: snyklabeldemo
24        resources: {}
25status: {}

You could even pipe that straight into kubectl and deploy it straight away, no Helm charts or seperate YAML files in your Git repo needed!

1docker image inspect myimage:tag | jq -r ".[].Config.Labels.\"com.mycorp.myteam.k8s.deployment\"" | base64 -d | kubectl apply -f - 
2
3deployment.apps/snyk created

Leveraging Docker labels / OCI annotations

As you can imagine, labels can be fed any values your CI system has access to and can be used to correlate those images and/or running containers back to their sources, documentation, etc. simply by inspecting its labels. For example, let’s say your organization publishes images with the OCI standard org.opencontainers.image.source label which has the URL to the SCM repository that the image is sourced from. If you wanted to find out what all repositories’ images were running on a given Docker host, you could run something like this:

1$ docker inspect $(docker ps -q) --format='{{ .Id }} {{ index .Config.Labels "org.opencontainers.image.source" }}'
2
317f4ee967870c49d3ffeb1c49973071c99c63377b2f9bbf987f7c3e4a21d331c https://repo.mycorp.com/team-volton/redlion
4c958ffc87c2bd5af500d24eff1ccb3ee21992a5cbcb429fafcc651aa182b66ba <no value>

The output shows the IDs of two containers running, one of which has the label so it has printed it’s value.

Things get a little more complicated when running in a Kubernetes cluster where you likely won’t have access to the container engine sockets on the cluster nodes. Unfortunately, there is no API to get access to these labels from kubectl so we have to get a little more clever. The following bash script will find the images running in current context namespace and then it will query the image registry to get the image metadata and return label information:

1$ cat labelgrep.sh
2#!/bin/bash
3FINDLABEL=$1
4FINDVAL=$2
5
6IMAGES=$(kubectl get pods -o json | jq -r ".items[].spec.containers[].image" | uniq)
7
8for i in $IMAGES; do
9	VAL=$(regctl image inspect ${i} --format '{{ index .Config.Labels "'${FINDLABEL}'" }}')
10	if [[ "$VAL" != "" && ( "$FINDVAL" == "" || "$VAL" == "$FINDVAL") ]]; then
11	  echo "[${i}] ${FINDLABEL}=${VAL}"
12  fi
13done

Note that I’m using the excellent regctl tool from the open source regclient project, which allows me to get information about images straight from a registry instead of having to pull an image to my local environment for inspection. This also allows me to run this script anywhere I want, without having to have a container runtime engine.

Now, let’s run this against a cluster and search for the repos for images running in my cluster:

1$ ./labelgrep.sh org.opencontainers.image.source
2[images.mycorp.com/voltron/redlion] org.opencontainers.image.source=https://repo.mycorp.com/team-volton/redlion
3[images.mycorp.com/voltron/bluelion] org.opencontainers.image.source=https://repo.mycorp.com/team-volton/bluelion

As you can see, my cluster currently has two images which are running in pods that have the label org.opencontainers.image.source label.

While these are relatively simple examples, I’m sure you can expand on the concepts to build your own scripts or API calls to fit your organization’s needs.

Snyk integration

One of the more interesting aspects, with regard to security, is the fact that Snyk image scanning now supports the automatic correlation of the image scanned to its Dockerfile via image labels.

Snyk’s source code repo integration already has the ability to statically detect and scan Dockerfiles found in your code, as well as images imported from your container registries. Until recently, however, cross-referencing them has been a task you had to manage yourself. You either had to manually add the Dockerfile reference to the image or have implemented your own automation to do so via API calls.

Now, all you have to do is simply include the OCI standard org.opencontainers.image.source label with the URL to the repository containing its Dockerfile, and Snyk will automatically do the cross-referencing for you when the image is imported.

Example of automatically linked image scan project

Wrapping things up

In summary, the often forgotten image label/annotation is a powerful tool that can allow you to insert metadata right into your images. Using them with standardized keys can aid your efforts to not only document where an image came from but also be leveraged by deployment and security tools to help you have a better understanding of your deployed landscape.

You can start leveraging the Snyk automatic image linking functionality today by simply adding the OCI standard org.opencontainers.image.source label to any Dockerfiles currently being scanned in your account. Don’t have a Snyk account? Sign up for free and start using this right away!

I’m curious how you are using labels. Are these ideas new or have your projects been using them already? What other interesting ways are you all annotating your images and are there any new integrations you’d like to see implemented, with Snyk or other tools? Tag me (@ericsmalling) with your ideas on Twitter, I’d love to hear your thoughts!

Developer-first container security

Snyk finds and automatically fixes vulnerabilities in container images and Kubernetes workloads.

Book a live demo Start free

The developer security platform