Deep Dive into Docker Images
James Reed
Infrastructure Engineer · Leapcell

In-depth Analysis of Docker Images
I. Overview of Docker Images
As the foundation of containers, Docker images essentially represent the content of the container's file system. It is a read-only template used to create Docker containers. From a technical perspective, Docker images adopt a layered structure design. Except for the base image, other images are generated by overlaying new content on top of existing images. The metadata of each layer of the image is stored in a json file. This metadata not only describes the static content of the file system but also contains dynamic data information, such as the creation time of the image, build instructions, and so on.
1.1 Building Docker Images
Docker images are usually built through a Dockerfile. A Dockerfile is a text file that contains a series of instructions for defining the base environment of the image, installing software packages, copying files, and other operations. During the building process, Docker will create each layer of the image step by step according to the order of the instructions in the Dockerfile. When each instruction is executed, a new image layer will be generated, and these layers will be cached for reuse in subsequent builds.
1.2 Methods to Improve Image Building Efficiency
- Make Rational Use of the Caching Mechanism: When building an image, Docker will check whether the image layer generated by the current instruction already exists in the local cache. If it exists and meets certain conditions (such as the same instruction, unchanged file content, etc.), the cached image layer will be directly used to avoid repeated building. For example, when executing the RUNinstruction, if the execution result of this instruction is the same as the previously cached result (judged by comparing the instruction and changes in the file system, etc.), the image layer will be reused.
- Optimize the Structure of the Dockerfile: Put the instructions that do not change frequently (such as installing basic software packages) at the front. In this way, when the Dockerfileis modified later, as long as these basic instructions remain unchanged, the existing image layers can be reused, reducing the building time.
- Build in Layers: Split the complex building process into multiple stages, and each stage is only responsible for a specific task. For example, when building an image that contains a compilation process, the compilation can be completed in one stage first, and then the compiled result can be copied into the final image in another stage, which can reduce the size of the final image.
II. Commands Related to Docker Images
The Docker client provides a rich set of commands to interact with the Docker daemon to complete various tasks related to images:
- List Images: The docker imagescommand is used to list all the images on the Docker host. You can use the-fparameter for filtering. For example,docker images -f dangling=truecan list all the images without tags.
- Build Images: The docker buildcommand builds a new image from aDockerfile. For example,docker build -t myimage:latest., where-tis used to specify the tag of the image, and.indicates theDockerfilein the current directory.
- View Image History: The docker historycommand can list the build history of a certain image, showing information such as the creation time of each layer of the image, the executed instructions, and the size of the image.
- Import Images: The docker importcreates a new file system image from atarballfile.
- Pull Images: The docker pullpulls the specified image from the Docker image registry to the local.
- Push Images: The docker pushpushes the local image to the specified image registry.
- Delete Images: The docker rmiis used to delete local images. If an image is referenced by multiple tags, you need to remove all tags first or use the-fparameter to force deletion.
- Save Images: The docker savesaves the image as atarfile, which is convenient for migrating the image in different environments.
- Search Images: The docker searchsearches for images that meet the conditions on Docker Hub.
- Tag Images: The docker tagtags the image, which is convenient for version management and identification of the image.
III. The Download Process of Docker Images (pull Operation)
Docker adopts a typical C/S (Client/Server) architecture. Client commands such as docker pull will eventually be sent to the Docker daemon (server side) for processing. When the docker pull is executed, the specific process is as follows:
- The Docker client organizes the configuration and parameters and sends the pullinstruction to the Docker server.
- After the server side receives the instruction, it hands it over to the corresponding handler. The handler will start a CmdPulltask, which has been registered when the Docker daemon is started.
- According to the incoming image registry address (registry address), repository name (repo name), image name, and tag (tag), the Docker daemon finds and downloads the image through the following steps:
- Get all the image IDs under the repository: Through the GET /repositories/{repo}/imagesinterface.
- Get the information of all tags under the repository: Through the GET /repositories/{repo}/tagsinterface.
- Find the corresponding image UUID according to the tag and download the image.
- Get the historical information of the image and download these image layers one by one: Through the GET /images/{image_id}/ancestryinterface. If the image layer already exists locally, the download will be skipped; if not, the download will continue.
- Get the jsoninformation of the image layer: Through theGET /images/{image_id}/jsoninterface.
- Download the image content: Through the GET /images/{image_id}/layerinterface.
 
- Get all the image IDs under the repository: Through the 
- After the download is completed, store the image content in the local UnionFS (Union File System), and add the information of the newly downloaded image to the TagStore.
IV. Storage of Docker Images
4.1 UnionFS and aufs
UnionFS is the basis for Docker to implement hierarchical images. It is a file system service that supports transparently overlaying multiple branches of file systems on systems such as Linux, FreeBSD, and NetBSD to form a unified file system. In Docker, images are stored in a layered form. The application layer sees a complete file system, while the underlying layer manages the content and relationships of each image layer through UnionFS.
aufs (Another UnionFS) is one of the commonly used storage drivers in Docker. In addition, there are devicemapper and others. Users can choose an appropriate storage driver according to their needs, or even implement their own driver.
4.2 The Storage Structure of aufs Images
Take the ubuntu:20.04 image as an example (assuming the current Docker version is 20.10.0 and the image driver is aufs), use docker history to view the image history:
$ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE myregistry/ubuntu 20.04 8b24f7a1cb23 2 months ago 256.3 MB $ docker history 8b24 IMAGE CREATED CREATED BY SIZE 8b24f7a1cb23 2 months ago /bin/sh -c #(nop) CMD ["bash"] 0 B b17ee223aa89 2 months ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.9 kB c18294cc5170 2 months ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 195.5 kB d4fd76b09ce9 2 months ago /bin/sh -c #(nop) ADD file:0018ff77d038472f52 256.1 MB 511136ea3c5a 3 years ago 0 B
It can be seen that the ubuntu:20.04 image contains multiple layers. The aufs data is stored in the /var/lib/docker/aufs directory, which contains three main folders:
- layers: Records which layers each image consists of.
- diff: Stores the difference content between each image and the previous image, that is, the actual data of the current image layer.
- mnt: As the mount point provided by the UnionFS to the outside, each running container has a corresponding folder in this directory, which is used to provide a unified file access interface.
In addition, Docker also saves the metadata in json format for each image layer, which is stored in /var/lib/docker/graph/<image_id>/json, for example:
{ "id": "8b24f7a1cb23146e20erewtewtertewrwc0f82943f4ab8c097e7", "parent": "b17ee223aa89d1b136ea55eqweqweqwrewra6c88d93e1ad7c", "created": "2024-12-21T02:11:06.735146646Z", "container": "c9a3eda5951d28aa8dbe5qwrqwrewrtw886d0a8e7a710132a38ec", "container_config": { "Hostname": "43bd710ec89a", "Domainname": "", "User": "", "Memory": 0, "MemorySwap": 0, "CpuShares": 0, "Cpuset": "", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "PortSpecs": null, "ExposedPorts": null, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" ], "Cmd": [ "/bin/sh", "-c", "#(nop) CMD ["bash"]" ], "Image": "b17ee223aa89d1b136ea55e4421f4ce413dfc6c0cc6b2186dea6c88d93e1ad7c", "Volumes": null, "WorkingDir": "", "Entrypoint": null, "NetworkDisabled": false, "MacAddress": "", "OnBuild": [], "Labels": null }, "docker_version": "20.10.0", "config": { "Hostname": "43bd710ec89a", "Domainname": "", "User": "", "Memory": 0, "MemorySwap": 0, "CpuShares": 0, "Cpuset": "", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "PortSpecs": null, "ExposedPorts": null, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" ], "Cmd": [ "bash" ], "Image": "b17ee223aa89qwrewtretgertwerewrq6dea6c88d93e1ad7c", "Volumes": null, "WorkingDir": "", "Entrypoint": null, "NetworkDisabled": false, "MacAddress": "", "OnBuild": [], "Labels": null }, "architecture": "amd64", "os": "linux", "Size": 0 }
At the same time, the /var/lib/docker/graph/<image_id>/layersize file saves the size information of the image layer.
V. The Creation and Caching Mechanism of Docker Images
When using docker build to create an image, Docker will use the caching mechanism to improve the building efficiency. Take the following Dockerfile as an example:
FROM ubuntu:20.04 RUN apt-get update ADD run.sh / VOLUME /data CMD ["./run.sh"]
During the building process, Docker will execute according to the order of the instructions:
- Process the FROMInstruction: The Docker daemon first looks for theubuntu:20.04image locally. If it does not exist, it will pull it from the image registry and obtain itsjsonfile containing metadata.
- Process the RUNInstruction: If there is no cache available, this instruction executesapt-get update, and the changes in the file system (such as the updated software package list, etc.) will be saved in the/var/lib/docker/aufs/diff/<image_id>/directory. At the same time, thecontainer_config.Cmdfield in thejsonfile will record the executed instruction. When building next time, if theparentof the new image layer is stillubuntu:20.04and the content to be changed in thecmdof thejsonfile is the same, it is considered that the image layers are the same, and it will be directly reused without having to rebuild.
- Process the ADDandCOPYInstructions: For theADDorCOPYcommands, Docker determines whether the images are the same by calculating the hash value of the file. In thejsonfile, theCmdfield corresponding to theADDinstruction will record the hash string of the file. Only when the file content, file name, etc. are completely the same will the image layer be reused.
However, the caching mechanism has limitations. For commands that rely on external resources (such as apt-get update to obtain updates from external software sources, curl to download external files, etc.), if the external content changes, Docker cannot automatically detect it. At this time, you can use the --no-cache parameter to force the disabling of the cache and rebuild the image. Therefore, when writing a Dockerfile, developers need to fully consider the caching mechanism and follow the best practices provided by the official to ensure the accuracy and efficiency of image building.
VI. The Relationship between Docker Images and Containers
Docker containers are running instances of images. Images contain static file system content, and containers add dynamic runtime states on this basis. The relevant information during the running of the container (except for the content of the file system) is stored in the json file of the image. For example:
- Environment Variables: Such as ENV FOO=BAR, which defines the environment variables during the running of the container.
- Data Volumes: The container data volumes declared by VOLUME /some/pathare dynamically added during the running of the container and are not the fixed content of the image layer.
- Exposed Ports: EXPOSE 80records the ports that the container needs to expose to the outside during running.
- Execution Entry: CMD ["./myscript.sh"]defines the command to be executed when the container starts.
When starting a container, the Docker daemon reads the image information as the root file system (rootfs) of the container, and at the same time reads the dynamic information in the json file to configure the runtime state of the container. Each running container is a child process of the Docker daemon, and the Docker daemon is responsible for managing the life cycle and resource allocation of the container.
VII. Deletion of Docker Images
Images are stored locally in the UnionFS format, and the docker rmi command can be used to delete images. The following points need to be noted when deleting:
- Image Reference Relationship: There is a concept of "reference" for images, that is, an image can be referenced by multiple tags. When deleting an image with tags, the tags will be removed first (untag operation). If the image is still referenced by other tags, all tags must be deleted first, or the -fparameter can be used to force deletion.
- Deletion of Multi-layer Images: If an image contains multiple layers and the middle layers are not referenced by other images, when deleting this image, all the unreferenced image layers will be deleted together.
Leapcell: The Best of Serverless Web Hosting
Finally, I would like to recommend a platform that is most suitable for deploying web services: Leapcell

🚀 Build with Your Favorite Language
Develop effortlessly in JavaScript, Python, Go, or Rust.
🌍 Deploy Unlimited Projects for Free
Only pay for what you use—no requests, no charges.
⚡ Pay-as-You-Go, No Hidden Costs
No idle fees, just seamless scalability.

🔹 Follow us on Twitter: @LeapcellHQ

