About two years ago I wrote a Rust crate to fulfill this promise:

If a crash occurs while updating a file, the file either contains the old contents, or the new contents, nothing in between.

This crate essentially solves the following problem: when you update a file, generally you open it, truncate it, write to it block-by-block, and eventually close it. The problem with this process is that if a crash occurs at any point during this process (the program segfaults, the kernel panics, the machine loses power, …), your file will be left in an intermediate state where the old contents are completely lost, and the new contents are only partially written (if at all). My crate solves this problem by making file updates atomic.

This crate is publicly available as atomic-write-file and you can read its description for more details (sorry, I’m very bad at naming things).

The way the crate works is simple:

  1. Instead of opening the target file directly, the crate opens a temporary file.
  2. You write to the temporary file.
  3. Once all changes are in, the temporary file contents are synced to the storage, and it is atomically renamed so that it replaces the target file.

This is not by any chance a new technique that I invented, and it’s a very common strategy to solve the problem. This technique guarantees that if a crash occurs at or before step 3, then the file will have the old contents, unchanged. If a crash occurs after step 3, then the file will have the new contents.

There’s a small caveat with this technique: if a crash occurs between steps 1 and 3, then the crate will leave behind a temporary file that occupies some space on the storage device with no purpose.

This is where Linux anonymous temporary files come into play: this Linux-specific feature allows you to create temporary files that are written on the filesystem, but are not given a path in the filesystem. If a crash occurs between step 1 and 3, the temporary file will simply be forgotten into oblivion.

Meet the opponent: btrfs

Using Linux anonymous temporary files seemed very appealing for atomic-write-file to avoid leaving leftovers behind, although I knew that they’re not the perfect solution: there’s limited support (they’re supported only on Linux, and only on some filesystems), they require the use of Linux features that may not be available all the time (the /proc filesystem), and they also have the problem that, sooner or later, they will need to be given a name if we want to be able to replace the target file, and this leaves a small time window during which we could leave some cruft behind.

But none of these are huge problems, and in fact atomic-write-file is able to do a best effort attempt at using anonymous temporary files, and reverting to regular files if that doesn’t work.

One day I was notified about an issue on using my crate on the btrfs file system: after a crash, my crate could break its promise and leave the file contents empty. I did some investigation on the issue that you can read about if you’re interested, and then fixed it.

This problem was specific to btrfs, and I argue that it’s actually caused by a btrfs bug, rather than a bug in the crate itself, but nonetheless this issue had revealed a critical flaw of my crate: I did not have any tests to check if my crate was fulfilling its promise or not!

Therefore I decided to create a small test suite for Linux to simulate crashes and inspect the contents of files after those crashes occurred on a variety of filesystems, which is what this blog post is about. This test suite now runs on my laptop and on GitHub Actions, and potentially any other CI environment.

Testing strategy

Here’s the idea I came up with: I can create a virtual machine, with a virtual storage device attached to it, and a filesystem initialized with a file. The virtual machine updates the file using atomic-write-file, and then triggers a kernel panic. After that, the file is inspected for consistency.

So here is how it works in practice: there are 3 main pieces: 1. A test binary that is responsible for updating the test file using atomic-write-file. 1. An init.sh script that runs inside the virtual machine. 1. The run-tests.sh script that puts everything together and starts up the virtual machine.

The test binary

The test binary that is a short Rust program that I can paste here:

use atomic_write_file::AtomicWriteFile;
use std::io::Write;

fn main() {
    let mut file = AtomicWriteFile::open("/test/file").expect("open failed");
    file.write_all(b"hello").expect("write failed");
    file.commit().expect("commit failed");
}

Nothing special to see here: this is the minimal code required to use atomic-write-file.

The init.sh script

The init.sh script is more interesting: this script in fact is meant to run twice. The first time it runs, init.sh will run the test binary above and trigger the kernel panic. The second time, it will read the test file contents and report them back run-tests.sh script for inspection.

Let’s take a closer look at what it does: first, it mounts the /proc and /sys virtual filesystems. These are needed for some basic system functions, and also to enable the Linux anonymous temporary file feature.

mount -t proc none /proc
mount -t sysfs none /sys

Then it mounts the filesystem containing the test file. Before it can do that, however, it needs to load the correct kernel module for the filesystem. Nothing too fancy here: the filesystem type is simply specified on the kernel command line (read through /proc/cmdline):

fstype=$(grep -Eo 'test.fs=\w+' /proc/cmdline | cut -d= -f2)
modprobe "$fstype" || true
mount -t "$fstype" /dev/sdb /test

Then, if this script is run for the first time, it will run the test executable and quit:

echo 'Running test binary'
atomic-write-file-test

init.sh (as the name suggests) is run as the init script of the virtual machine. In Linux, if the init script quits without invoking the proper shutdown sequence, the kernel will panic. So we don’t actually need to do anything special to trigger a panic. I could have inserted a echo c > /proc/sysrq-trigger line to make it more explicit, but then I figured that having that line wouldn’t be nice for people who want to test the script on their system.

You might ask: how does init.sh know if it’s the first time it gets called or the second? Again, nothing fancy here: run-tests.sh gives that hint using a kernel command line argument:

if grep -q test.verify /proc/cmdline; then
  # ...
fi

If test.verify is specified on /proc/cmdline, then it means this is the second time this script is run. In that case, the file contents are just printed on the console:

if grep -q test.verify /proc/cmdline; then
  echo 'Verifying test file contents'
  echo '-----'
  xxd /test/file
  echo '-----'
  poweroff -f
fi

This uses xxd so in case there are garbled characters on screen, we can comfortably analyze the output and figure out what happened. At the end, the system is more gracefully shut down using poweroff to avoid another kernel panic (this is not really necessary).

The run-tests.sh script

And then there’s the run-tests.sh script that crates the virtual storage, launches the virtual machine, and inspects the file contents. This is where most the complexity is, so let’s go step-by-step (the code below has been simplified from the original for readability):

The first thing it does is compiling the test binary above. The virtual machine runs in a very barebone environment, so I used static linking to avoid having to copy additional libraries like libc:

target=$(uname -m)-unknown-linux-gnu
RUSTFLAGS='-C target-feature=+crt-static' cargo build --release --features "$cargo_features" --target "$target"

Then run-tests.sh creates a minimal initramfs for the virtual machine. The initramfs is a filesystem that gets mounted by Linux early at boot. It’s generally used by Linux distributions to load the main operating system, but in my case the initramfs contains everything needed for the test. In particular, it contains:

  • The test binary.
  • The init.sh script.
  • BusyBox, to get the needed utilities like xxd and poweroff.
  • The kernel modules to mount filesystems and their dependencies.

The first 3 items are easy to set up:

cp "init.sh" -T "$initramfs_build_dir/sbin/init"
cp "atomic-write-file-test" -t "$initramfs_build_dir/bin"
cp "$(which busybox)" -t "$initramfs_build_dir/bin"
"$busybox" --install -s "$initramfs_build_dir/bin"

The kernel modules are a bit more complicated. First of all: where can we get them from? Ideally, I would have liked to download them from some minimal Linux distribution, but this turned out to be more complicated than expected. In the end, I decided to just steal them from the host operating system:

modules=/usr/lib/modules/$(uname -r)
cp -a "$modules" -t "$initramfs_build_dir/lib/modules"

while read -r mod_path; do
  if [[ "$mod_path" = *.ko.gz ]]; then
    gunzip "$mod_path" -o "$mod_path.uncompressed"
  elif [[ "$mod_path" = *.ko.xz ]]; then
    unxz "$mod_path" -o "$mod_path.uncompressed"
  elif [[ "$mod_path" = *.ko.zst ]]; then
    unzstd "$mod_path" -o "$mod_path.uncompressed"
  else
    continue
  fi
  mv "$mod_path.uncompressed" "$mod_path"
done < <(modprobe --dirname "$initramfs_build_dir" --show-depends "$filesystem_type" | grep ^insmod | cut -d' ' -f2)

What this does is copying all the module files from the host, getting all the dependencies for the filesystem module using modprobe, and then uncompressing those modules if they’re compressed (I found this much easier than adding compression support in the virtual machine). An alternative approach would be to simply decompress all the modules, but that’s much slower.

Once everything has been copied over and uncompressed, the initramfs is created:

initramfs=rootfs.img
mksquashfs "$initramfs_build_dir" "$initramfs"

Now it’s time to create the virtual storage device with the filesystem, which is a trivial task:

testfs=testfs.img
dd if=/dev/zero of="$testfs" bs=1M count=300
"mkfs.$filesystem_type" "$testfs"

Finally, we can start our VM using QEMU:

qemu=qemu-system-$(uname -m)
kernel=/boot/vmlinuz-$(uname -r)

"$qemu" \
  -kernel "$kernel" \
  -append "root=/dev/sda ro panic=-1 console=ttyS0 quiet test.fs=$filesystem_type" \
  -drive "index=0,media=disk,format=raw,file=$initramfs" \
  -drive "index=1,media=disk,format=raw,file=$testfs" \
  -no-reboot \
  -nographic

Again we’re stealing the kernel from the host operating system, which is also where the kernel modules come from. The drive with index=0 is the initramfs and will be visible as /dev/sda in the guest; the other drive with index=1 is our test storage and will be visible as /dev/sdb.

root=/dev/sda tells the kernel to boot from our initramfs image. panic=-1 tells the kernel not to reboot in case of a panic (panic=N specifies the number of seconds to wait before rebooting in case of kernel panic; specifying a negative value blocks that behavior). console=ttyS0 allows us to capture the output printed by the init script. test.fs=... is the option that init.sh is going to look for to understand what filesystem type to use.

Once that runs, the virtual machine is expected to do its job and crash with a kernel panic. We then need to restart it to verify the file contents. The QEMU command is exactly the same except for the additional test.verify option:

output=output.txt

"$qemu" \
  -kernel "$kernel" \
  -append "root=/dev/sda ro panic=-1 console=ttyS0 quiet test.fs=$filesystem_type test.verify" \
  -drive "index=0,media=disk,format=raw,file=$initramfs" \
  -drive "index=1,media=disk,format=raw,file=$testfs" \
  -no-reboot \
  -nographic \
  | tee "$output"

The output from this command is captured in a file that can be analyzed. If you remember how init.sh is written, you might have noticed that it writes the test file contents between two markers:

echo '-----'
xxd /test/file
echo '-----'

So what we have to do to verify the contents of the file is simply look for those two ----- markers, get the content in between, and parse it through xxd -r:

if [[ $(sed -n '/-----/, /-----/p' "$output" | xxd -r) = hello ]]; then
  echo "Success"
else
  echo "Failure"
  exit 1
fi

And that’s it!

Running in GitHub Actions

The next step for me was to set up some automation to make sure that my test script was run whenever I made any change to the crate. Because my project was already hosted on GitHub, I decided to go with GitHub Actions. I was a bit worried that I would have had to struggle to make it work because my script works by stealing the kernel from the host, and I thought that the GitHub runners could have some protections in place to prevent me from reading the kernel. To my surprise, there were no such restrictions. In fact, I did not have to modify a single line of code to make my tests work in GitHub Actions: check out the workflow file if you’re curious.

Result

In the end, the test suite was able to reproduce the issue with btrfs and anonymous temporary files, as well as show that the fix was working as intended. This is what it looks like (output in the screenshot was trimmed down a bit):

Output showing a failure from the test suite
Output from the run-test.sh script before applying the fix, showing a failure.
Output showing a success from the test suite
Output from the run-test.sh script after applying the fix, showing a success.

Future

I’m pretty satisfied with the general approach of running in a virtual machine and simulating a crash, but the implementation has a huge limitation: because it steals the kernel from the host, it cannot simulate crashes on other kernels, operating systems, or platforms.

I think in the future I’m going to take a look at cross-rs, a cross-platform tool for Rust crates. Cross-rs works kinda in a similar way as my test suite, in that it uses QEMU emulation to run tests. Maybe I can reuse their QEMU images and tweak them to run my crash tests. If that is feasible, then I will be able to extend my test suite to all major platforms and operating systems.