Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Post History
Summary/Context I'm currently working to improve performance and throughput of our automation infrastructure, most of which is a combination of Bash/Shell scripts, Python scripts, Docker, Jenkins,...
#1: Initial revision
Unable to `mount` overlayfs in Docker container when inside a LXC with a ZFS pool
# Summary/Context I'm currently working to improve performance and throughput of our automation infrastructure, most of which is a combination of Bash/Shell scripts, Python scripts, Docker, Jenkins, etc. We use Yocto to build embedded Linux distributions for specialized hardware and we have a Docker image to define/run our build environment/process. Because of how our Docker containers work, using the `-v` option to bind-mount the host file system into itself, there're race conditions whenever you want to run parallel jobs. To help remedy this, I'm using a Bash script to automate the setup of an overlay file system. That allows me to transparently present the environment to the Docker containers in the way they expect it to be, without them "realizing" that there're overlays underneath to prevent actual data races. This was tested in several Linux systems including my Laptop (Ubuntu 20.04) and several virtual machines (Ubuntu 20.04), without issues. However, I noticed that when the docker containers exist inside a LXC-based system container (using Ubuntu 20.04 and `ext4` fs), the `mount` command executed from inside the docker container fails. (Docker has Ubuntu 14.04.05 LTS inside.) **The question boils down to this:** How can I *successfully* run the `mount` command from within a Docker container, that is running within a LXC-based container, so that the Docker container itself can set up and use the overlay filesystem? # Details One of the servers I manage hosts several Jenkins nodes inside LXC-based system containers. All of the LXC-based Jenkins nodes are running Ubuntu 20.04 LTS, exist within the same ZFS Pool, and are kept up-to-date. (For environment details, please see the end of this post.) The overlay setup step was written to execute as part of the "startup" process when the Docker container is launched. The launch command looks basically as follows (with some actual data being ommitted/`<placeholders>`): ``` $ docker run --rm -it --privileged \ -e USER_MOUNT_CACHE_OVERLAY=1 \ -v <host work directory path>:/home/workdir \ -v <host Git repositories path>:/home/localRepos \ <image>:<tag> \ bash ``` The Bash script uses `mktemp` to create the (work and read-write) directories that will be used for the overlay. A manual example of the `mount` command being used is: ``` $ sudo mount -t overlay sstate-cache \ -o lowerdir=sstate-cache,upperdir=overlayfs/cache-rw,workdir=overlayfs/cache-work \ sstate-cache ``` When I do this in my Laptop or any other non-LXC node, everything works fine. However, when the Docker container running the `mount` command exists *inside a LXC node*, this error shows up: ``` mount: wrong fs type, bad option, bad superblock on /home/workdir/sstate-cache, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so ``` The exit code returned is `32`, which is simply documented in the `man mount` pages as "mount failure". The contents in `/var/log/syslog` don't seem to have anything relevant. The `dmesg` command, shows this: ``` overlayfs: filesystem on 'overlayfs/cache-rw' not supported as upperdir ``` I've been trying to fix this since last week, but I still have no idea why this error would show up nor how to fix it. Many search results have not been relevant to my specific case. # Some Things I've Tried I found that Docker required the `--privileged` option in order to allow the `mount` command to work, so that's the reason it's there. This fixed the original mount issue in my Laptop and other VMs. (For the LXC nodes, it simply prevents a Docker crash; you'd see Go-lang stack traces otherwise.) But LXD/LXC has its own security options. Its `security.nesting` had already been set to `true` by me a few years back to let Docker containers to run; this has not been an issue. I tried making the LXC container itself privileged with: ``` $ lxc config set <node> security.privileged true ``` where `<node>` is the name of the LXC node, but it made no difference. Note that replacing/destroying the ZFS Pool and/or LXC itself are *not* valid options. # Remarks (Could Be Wrong) While the file system of the LXC-based node is `ext4`, as can be confirmed by looking at the filesystem table ``` $ cat /etc/fstab LABEL=cloudimg-rootfs / ext4 defaults 0 0 ``` the entire LXC-based container is stored in a ZFS Pool. A few years ago, I had enabled ZFS Compression in the physical host, which should've been completely transparent not only to the LXCs, but also the Docker containers. However, I observed issues with the `du` command, where it would calculate *incorrect* disk usage results, which then caused other parts of our build process to fail. While I can't be certain, and however unlikely this may be (I have no way to test/verify this), I have been asking myself if there're maybe some other ZFS options that could be affecting this. To me, it seems more likely that existing LXC options might do the trick, but I'm not sure which ones those could be. I already took a look at [this question elsewhere](https://askubuntu.com/questions/376345/allow-loop-mounting-files-inside-lxc-containers), but I've not found any similar error messages. # Environment Details (Host, LXCs, Docker) **Operating System (Physical Host)** ``` $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.1 LTS Release: 20.04 Codename: focal ``` **ZFS Version (Physical Host)** ``` $ zfs -V zfs-0.8.3-1ubuntu12.4 zfs-kmod-0.8.3-1ubuntu12.4 ``` **LXC Version (Physical Host, Snap Package)** ``` $ lxc --version 4.7 ``` **Docker Version (Inside LXC, Jenkins Node)** ``` $ docker --version Docker version 19.03.12, build 48a66213fe ``` **Operating System (Inside Docker Container)** ``` $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.5 LTS Release: 14.04 Codename: trusty ```