also translates high-level commands for the containerd
Container runtime
= containerd [container-dee] and runc
2 parts:
high-level container runtime (containerd)
long-running deamon process, handles the full lifecycle of the containers
low-level container runtime (runc)
e.g. when the container is created, runc communicates with OS kernel to create a separate process for the container
does abstraction from syscalls or OS in general
How do these interact?
user enters docker run -d nginx to Docker CLI
Docker CLI sends the run command to the Docker Daemon (= container engine)
Docker Daemon validates the request and prepares the environment (checking if nginx is locally or pulling it from the registry) and then instructs the containerd (= container runtime) with creating the container
containerd-shim component is responsible for keeping the container process running (even when the container runtime restarts)
runc uses the OS and Linux namespaces to actually set-up the container and then exits (the main process for the container is containerd-shim, which monitors it and reports back to higher levels)
Docker terminology
image
contains a union of layered filesystems stacked on top of each other
it’s immutable, can be distributed, is used to create and run the container at some device
each container gets its own virtual eth0 Ethernet card and its own IP address
User (user)
isolates user and group IDs (UID, GID) between processes
use case: a process runs as root inside a container but as a regular process on the host
Cgroup (cgroup)
for limiting and measuring the process resource usage (CPU, memory, I/O etc.)
each process only sees it’s usage, not the usage of all processes
so it cannot interfere with resources, it does not technically have
kernel has tools to limit the resources
OverlayFS and image layering
each instruction in Dockerfile “adds” one read-only layer, so there could be a lot of layers in the image/container
OverlayFS mechanism creates a unified “merged” view on all the layers - so the processes in the container see it as one writable layer and can work with it
it uses the “copy on write” mechanism in the background
reading files - OverlayFS first looks on the top (writable, merged) layer and then propagates to lower layers and returns the first file found (so higher layers can “overshine” the lower ones)
writing files - OverlayFS looks for the file, if it is in lower image layers, it get copied into the common writable layer and then it is modified (the original file remains untouched)
deleting files - OverlayFS looks for the file and then it creates a special “whiteout” file in the upper writable layer, signaling that this file is “deleted”
the original file remains untouched
why?
storage efficiency - multiple containers could use the same image base (as there are only read-only files)
and each container has it’s own “upper writable” FS layer
images are immutable and we can rely on that
speed - newly created containers create only one empty upper writable layer (and that does not take much time)