When evaluating CI solutions for my homelab, I found striking that many require you to write how to build things three times:
- Once in the remote pipeline definition.
- Once in commands or scripts that you run locally.
- Once in a Dockerfile to build the image.
Since I will never go back to how I was deploying things before 3 came into existence, let's try to eliminate the first two. In a Dockerfile, you can write additional stages that you can build and run locally or in a CI pipeline, but there are several blockers:
- Whatever speed benefits you had from the incremental compilation done by the local or remote toolchain is gone.
- You have to send heavy images between build stages that contain the toolchain and build artifacts.
- Running tests with external dependencies like a database requires more control over how images are executed by the CI runners.
This article shows how to make this work in a simple Drone CI pipeline, leveraging various BuildKit caching mechanisms to keep things fast.
§Caching strategies
Let's start with a simple Go app to demonstrate common caching strategies:
package main
import (
"fmt"
"github.com/google/uuid"
)
func RunID() string {
return fmt.Sprintf("RunID-%s", uuid.NewString())
}
func main() {
fmt.Println(RunID())
}
Initialize the Go module and lock its dependencies (information saved in
go.mod
and go.sum
):
$ go mod init example.com/hello
$ go mod tidy
§Layer-level
Here's a simple Dockerfile to build this app:
FROM golang:1.23-alpine
WORKDIR /src
COPY . .
RUN go build -x -o /app .
CMD ["/app"]
You can run the build command:
$ docker build --pull -t hello .
[+] Building 38.7s (9/9) FINISHED
=> [1/4] FROM docker.io/library/golang:1.23-alpine@sha256:2c49857f2295e89b23b28386e57e018a86620a8fede5003900f2d138ba9c4037
=> [2/4] WORKDIR /src
=> [3/4] COPY . .
=> [4/4] RUN go build -x -o /app .
§Reducing cache invalidation
If you run this command again, you will see that all the layers have been cached. Same input, same output:
$ docker build --pull -t hello .
[+] Building 1.7s (9/9) FINISHED
=> [1/4] FROM docker.io/library/golang:1.23-alpine@sha256:2c49857f2295e89b23b28386e57e018a86620a8fede5003900f2d138ba9c4037
=> CACHED [2/4] WORKDIR /src
=> CACHED [3/4] COPY . .
=> CACHED [4/4] RUN go build -x -o /app .
Now we can start making changes to the source file to test the cache invalidation in a typical development workflow.
After changing main.go
(or any part of the source code), COPY . .
will
invalidate all further instructions, which means that go build
will
re-download the dependencies and recompile them each time you build the image,
which can be quite time consuming.
To solve this issue, you can add an intermediate step that will re-download the
dependencies only when go.mod
or go.sum
changes:
FROM golang:1.23-alpine
WORKDIR /src
COPY go.mod go.sum .
RUN go mod download -x
COPY . .
RUN go build -x -o /app .
CMD ["/app"]
The most important thing to keep in mind is to put instructions that change often at the end. And this is pretty much all we can do with layer caching.
§Multi-stage builds
There is still some room for improvement if you want to reduce the final image size:
FROM golang:1.23-alpine
provides the Go development toolchain which is no longer needed after building the executable.COPY . .
adds source files are not needed to run the app.RUN go build -o /app .
downloads dependencies, produces intermediate build artifacts, and adds debugging symbols to the executable.
All of this can be avoided by leveraging multi-stage builds and a few compiler options to keep the final image as slim as possible:
FROM golang:1.23-alpine AS build
WORKDIR /src
COPY go.mod go.sum .
RUN go mod download -x
COPY . .
RUN go build -x -ldflags='-s -w' -trimpath -o /app .
FROM scratch
COPY --from=build /app /app
CMD ["/app"]
§Distributing the layer cache
Now you may wonder how you could distribute the layer cache so it benefits to
the entire fleet of CI runners. BuildKit has features to import / export layer
caches to a regular registry. When all runners are configured accordingly, this
is a simple way to distribute the layer cache. You can test this feature with
the --cache-from
and --cache-to
build options:
$ docker build \
--cache-from type=registry,ref=registry/app:buildcache \
--cache-to type=registry,ref=registry/app:buildcache \
-t registry/app:latest .
Unfortunately, this technique is not be as efficient as it looks, because there is a trade-off between the time gained due to caching, and the time lost due to network transfer times. For instance, a compilation step often outputs intermediate build artifacts much faster than they can be retrieved from the network.
Also, it has all the limitations of local layer caching. It is likely that each
build contains changes to the source files, which means that starting at some
instruction (like COPY . .
), the cache is useless because all the following
layers must be rebuilt. This is exactly what prevents us from fully optimizing
the recompilation step in our current Dockerfile
. And you will pay the price
of pushing these new cache layers after each build even if they are not reused.
§Builder-level
In the previous section, we eliminated the bottleneck of downloading the dependencies, so now we can tackle the bottleneck of their recompilation. For that, we need an orthogonal caching mechanism introduced with BuildKit, called builder caching.
The idea is that you can mount a cache volume into the image being built, which can be reused for subsequent builds on the same builder. Contrary to layer caches, they cannot be exported. That means there is no way to actively distribute this cache, each builder will have to create and maintain its own copy (which is not that inefficient for the reasons outlined in the previous section).
The Go compiler relies on the following locations to improve recompilation times:
/go/pkg/mod
: downloaded dependencies.~/.cache/go-build
: intermediate build artifacts.
We can mount build caches at these locations using the --mount
option for
RUN
:
FROM golang:1.23-alpine AS build
WORKDIR /src
COPY go.mod go.sum .
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
go mod download -x
COPY . .
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
go build -x -ldflags='-s -w' -trimpath -o /app .
FROM scratch
COPY --from=build /app /app
CMD ["/app"]
If you add a new dependency or make some changes to the source files, you should see that only missing dependencies are downloaded and that the compilation step is much faster. But there are a few things to keep in mind:
-
You should not assume that these caches are always available, your build should work without them. An easy way to check that is to remove them with
docker builder prune -a
and build again. -
Each time you invoke the Go command (or any command that read from this kind of cache), you have to explicitly mount all the necessary caches again.
-
Build caches can be used concurrently by multiple builds jobs running on the same builder. To prevent issues with concurrent accesses, BuildKit provides various sharing policies: shared / locked / private.
-
Each cache volume is associated to a target location. A rogue Dockerfile could mount such volume and poison it, so there are security consideration when sharing builder instances between untrusted and production builds.
-
BuildKit default GC policies are pretty aggressive, since they prune build caches older than 48 hours when they exceed 512 MB, but this is configurable.
Here are some additional examples:
FROM alpine:3 AS apk
RUN --mount=type=cache,sharing=locked,target=/var/cache/apk \
apk add -U curl
FROM debian:bullseye-slim AS apt
RUN --mount=type=cache,sharing=locked,target=/var/lib/apt \
--mount=type=cache,sharing=locked,target=/var/cache/apt \
rm -f /etc/apt/apt.conf.d/docker-clean \
apt-get update && \
apt-get upgrade -y && \
apt-get install -y --no-install-recommends curl
FROM python:3.12-alpine AS pip
RUN python -m venv .venv
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip \
./.venv/bin/pip install -r requirements.txt
FROM node:20-slim AS yarn
COPY package.json yarn.lock .
RUN --mount=type=cache,target=/usr/local/share/.cache/yarn \
yarn install --frozen-lockfile --production
§Multi-step pipelines
In some pipelines, building a Docker image is only one of many steps, typically among the final ones. So if you want to test your application, you will usually have a dedicated step where you will encounter exactly the same caching issues when downloading and building the dependencies prior to running the tests.
With GitHub Actions for example, these difficulties are mostly handled by
third-party packages which provide appropriate caching hooks.
Container-oriented solutions like Drone CI or GitLab's Docker-based runners
will typically have you use a base image like golang:1.23-alpine
for the
ephemeral test container and run go test
commands directly inside it, without
giving a good solution for caching.
In this section, you will see how you can reuse the caching strategies from the previous section by implementing intermediate stages directly in the Dockerfile, which are then used by each pipeline stage.
§Building a test image
First, let's add some tests:
package main
import (
"strings"
"testing"
)
func TestRunID(t *testing.T) {
if !strings.HasPrefix(RunID(), "RunID-") {
t.Fatalf("RunID doesn't start with \"RunID-\"")
}
}
Executing this test locally is just a matter of running go test
, which works
the same way inside a Dockerfile:
FROM golang:1.23-alpine AS base
WORKDIR /src
COPY go.mod go.sum .
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
go mod download -x
COPY . .
FROM base AS test
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
go test -v ./...
FROM base AS build
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
go build -x -ldflags='-s -w' -trimpath -o /app .
FROM scratch
COPY --from=build /app /app
CMD ["/app"]
Since the final image doesn't depend on the test target, you have to invoke it
explicitly with the --target <name>
build option:
$ docker build -t foo . --target test
If you invalidate the layer cache (for instance by adding a dummy environment
variable before the RUN
instruction) and you re-run the build command, the
tests are considered cached by go test
since there is no change to the source
code and the previous run was saved to the cache volume:
$ docker build -t foo . --target test --progress=plain
...
#10 1.006 === RUN TestRunID
#10 1.006 --- PASS: TestRunID (0.00s)
#10 1.006 PASS
#10 1.006 ok example.com/hello (cached)
...
You could as easily add a lint stage using a base image for golangci-lint
:
FROM golangci/golangci-lint:v1.63.4-alpine AS lint
COPY . .
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
--mount=type=cache,target=/root/.cache/golangci-lint \
golangci-lint run -v
This works as long as your tests can be contained inside the image build process.
§Running a test image
What happens if your tests depend on an external system like a database?
Instead of running the test directly with RUN
, you could prepare everything
inside the image and use CMD
to define the runtime test command. You would
then set relevant environment variables like DATABASE_HOST
and run both the
database and the test image you've just built to execute your tests.
There are two main problems with that:
- If you use
CMD go test
, the tests will always be re-built and re-executed at runtime because they cannot rely on the build cache volumes. - In a CI pipeline, how do you run an image that you've just built without copying it to an external registry?
§Pre-building the tests
There are various solutions to solve the test caching problem in Go:
- Mount the cache directories at runtime. It's a little bit inelegant since we
already have cache directories at build time, and it's additional options to
pass to
docker run
that are not documented in theDockerfile
. - Run
go test -v -run DummyTestTarget ./...
so no tests are actually executed. You must be careful not to initialize global state that depends on external dependencies. - Run
go test -v -c ./tests ./...
to pre-compile all the test executables in./tests/
and run them withCMD
.
Let's go with the third option:
FROM base AS test
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
mkdir -p tests; \
go test -v -c ./... -o ./tests
CMD set -x; \
for f in ./tests/*; do \
"$f" -test.v; \
done
Build and run the tests:
$ docker build -t test . --target test --progress=plain
$ docker run --rm -it test
+ ./tests/hello.test -test.v
=== RUN TestRunInfo
--- PASS: TestRunInfo (0.00s)
PASS
If your CI pipeline allows direct access to docker
, then this is all you
need.
§Docker-in-Docker
Most often, CI pipelines can run Docker images but do not give access to the underlying Docker daemon (for obvious security reasons). Drone CI provides a plugin system that allows to build Docker images, so you could first push the test image to a registry and then run it:
kind: pipeline
name: default
steps:
- name: build
image: plugins/docker
settings:
target: test
repo: foo/hello-test:$DRONE_COMMIT_SHA
- name: test
image: foo/hello-test:$DRONE_COMMIT_SHA
In the execution model of Drone CI, the runner is a container image that has
access to the host Docker daemon, so both plugins/docker
and foo/hello-test
are executed by the runner on the host Docker daemon. However, the Docker
plugin doesn't have access to the host's Docker daemon, so the images are built
using Docker-in-Docker (DinD). Unfortunately, that prevents running an image
that was just built without first pushing it to an external registry.
The solution to this problem is to use the same Docker daemon by mounting the host Docker socket, but it would be an obvious security risk. As an alternative, you can use a Docker-in-Docker service and persist its data on the host:
kind: pipeline
name: default
steps:
- name: build
image: docker
volumes:
- name: dockersock
path: /var/run
commands:
- until docker info &>/dev/null; do sleep 1; done
- docker build --target=test -t foo/hello-test:${DRONE_COMMIT_SHA} .
- name: test
image: docker
volumes:
- name: dockersock
path: /var/run
commands:
- docker run --rm -it foo/hello-test:${DRONE_COMMIT_SHA}
services:
- name: docker
image: docker:dind
privileged: true
volumes:
- name: docker
path: /var/lib/docker
- name: dockersock
path: /var/run
volumes:
- name: docker
host:
path: /srv/docker
- name: dockersock
temp: {}
The advantage is that we can build, tag, and run images locally without having to push them to an external registry, but there are two major issues:
-
The data from the DinD service must be persisted, otherwise the build cache would be deleted at the end. Because it is bound to a single directory on the host, you cannot have two pipelines executing simultaneously, so the runner concurrency must not exceed 1.
-
This pipeline requires to bypass Drone's security model since DinD requires privileged execution and its data must be persisted on the host. Unfortunately, as soon as you enable trusted builds, a rogue pull-request can change the pipeline definition to mount any other path from the host.
§Runner-in-Docker-in-Docker
To better isolate the runner from the host, you can run it inside DinD. See
drone-dind-runner
for a sample
compose.yml
. There is no need to have an additional nested DinD daemon, so
its socket can be mounted directly, which simplifies the pipeline
configuration:
kind: pipeline
name: default
steps:
- name: build
image: docker
volumes:
- name: dockersock
path: /var/run/docker.sock
commands:
- docker build --target=test -t foo/hello-test:${DRONE_COMMIT_SHA} .
- name: test
image: foo/hello-test:${DRONE_COMMIT_SHA}
volumes:
- name: dockersock
host:
path: /var/run/docker.sock
Enabling trusted builds is still required, but the security implications are slightly less concerning, at least for a private runner. (For a public runner however, direct access to the Docker daemon cannot be allowed.)
§Further notes
This article is the result of trying to optimize Drone CI builds for the backend of this blog (written in Rust). Due to complex compile-time checks, Rust builds can be very slow, and when you add all the inefficiencies we've seen, it becomes a nightmare. Over time, I applied various optimizations to reduce the build time:
-
Better layer caching using
cargo-chef
. The principle is the same asgo mod download
, although it goes a step further by also pre-building the dependencies, which saves considerable time (from 2 hours down to 20 minutes between a cold and hot build). Of course that requires being able to export the cache to an external registry so it can be reused between steps / builds. (.drone.yml
,Dockerfile
) -
Using a persistent builder to decrease layer cache download times. I used BuildKit directly, but this is conceptually identical to starting a Docker daemon. Not relying on Drone's Docker plugin allows to download necessary cache layers lazily, instead of always downloading everything. (
.drone.yml
,Dockerfile
) -
Replacing
cargo-chef
with builder caching and using the "Runner-in-Docker-in-Docker" approach to persist both the layer and the builder cache, which allows to reuse temporary images without having to push them to an external registry. Nowadays cold builds take 20 minutes, hot builds take 5 minutes, largely dominated by the final steps in Rust release compile / linking time. (.drone.yml
,Dockerfile
)
Other resources on this topic:
- Optimize cache usage in builds for a reference to the various caching technique applicable to Docker.
- Cache storage backends for an overview of the options available in BuildKit.
- Depot.dev for a remote container build service. I'm not affiliated with them nor I've ever used their service, but they happen to publish some interesting articles like Rust Dockerfile best practices and BuildKit in depth, and their service seems to provide a secure version of what I described in this article.