securing nixos with sandboxing

Note: This sandboxing approach has a fairly significant design flaw, however it is easily fixed. For more information check out my next blog post.

One of the things I love about NixOS is how much you can over-engineer everything and have it shared between your different computers. To me it feels like once I add something to my OS, it can stick with me for life rather than something that I might accidentally delete all traces of when I reinstall my PC.

Because of this I decided to invest some time (ok maybe a lot of time) into trying to make my setup reasonably secure.

general strategy

I can think of two main strategies to implementing sandboxing:

Create a sandbox with all my stuff in it. For each “project”, create a new instance of the sandbox. If malicious programs run they will only be able to steal data from that “project”.

Create a sandbox per application.

I ended up picking option 2 because I figured that provides better isolation while also working well with Nix (as with Nix it is really easy to control what ends up in your $PATH). Instead of doing:

{
  home.packages = with pkgs; [ vim ];
}

We can just do

{
  home.packages = with pkgs; [(mySandboxingFunction vim)];
}

and create a function called mySandboxingFunction that puts the argument inside a sandbox.

sandbox backend

There are a few options we have for actually implementing the sandbox:

Docker containers. We can benefit from the familiar interface and the fairly sane defaults in terms of security.
Bubblewrap. This provides some performance benefits over Docker because it crafts the namespaces manually. Docker has some overhead on top of the namespaces because it constructs an overlayfs so that a container can’t override an image. Also Docker sets up it’s own networking. We can get around this to gain performance.
Flatpak. Mainly handy because other people have done the work for us.
Virtual machines, specifically using QEMU because its quick and simple.

I experimented with all of the options and in the end decided to use each of them for different purposes. Let’s go into them one by one.

docker

I found that Docker works best for sandboxing command line tools that I don’t use every day and also those that might require internet access. The reason I specified “don’t use every day” is that Docker containers take around 300ms to start up and this gets annoying for me to the point where I use Bubblewrap instead for the startup speed boost (see bubblewrap section below). Anyway, the general idea is to avoid creating too many Docker images and instead create a single “base image” that is really barebones. Then we mount the host /nix/store into the container read-only and suddenly all the tools from our host become available inside the container.

A great side effect of this is that I ended up implementing a custom shell script which I added to my path called box. When you run box, it starts a Docker container with the host /nix/store mounted as above and I am free to mess around as much as I want inside the sandbox without fear of breaking things.

The question then becomes what do we need in our minimal Docker image? What I found was that because the Nix store doesn’t allow setuid binaries, it turned out to be better to just use the Dockerfile to install sudo and set up a user. Originally I thought I wouldn’t need sudo but I eventually found that it’s a nice thing to have when messing around in the container.

The Dockerfile I used was as follows (note it is stored as a Nix variable):

let dockerFileDir = pkgs.writeTextDir "Dockerfile" ''
  FROM alpine@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412

  RUN adduser -s ${pkgs.bash}/bin/bash -G users -D sprrw && \
    apk add sudo && \
    echo 'sprrw ALL=(ALL:ALL) NOPASSWD:SETENV: ALL' > /etc/sudoers && \
    mkdir -p /home/sprrw/.config /home/sprrw/.local/share /home/sprrw/.cache && chown -R sprrw: /home/sprrw
'';

The mkdirs are a hack to deal with some directory ownership issues that I ran into later and have been procrastinating fixing properly :)

Anyway, I also created an entrypoint script. This script will be executed before running the actual sandboxed binary and will essentially apply my home manager dotfiles into the home directory of the Docker container. The main reason for this is to make things like vim and yazi have my configs when debugging the container (particularly with the box command).

let dockerInit =
  execMode:
  pkgs.writeShellScript "dockerinit" ''
    set -e

    ${
      if execMode then
        ""
      else
        ''
          cp -r /etc/hm-package/home-files/.* ~/
          chmod -R u+w ~/.* &>/dev/null || true
        ''
    }

    export PATH="/etc/hm-package/home-path/bin:$PATH"

    exec "$@"
  '';

This script has an argument called execMode. The reason for this is that I don’t want to set up my dotfiles when using docker exec, but I do want to on the original docker run. The docker exec support is relevant because I want to make a command called box-enter which starts an interactive shell inside the Docker container. This shell will still need the path set up correctly.

You might be wondering what /etc/hm-package is. Essentially I have the following in my system configuration:

environment.etc."hm-package" = {
  source = config.home-manager.users.sprrw.home.activationPackage;
};

This creates a symbolic link at /etc/hm-package to my home manager activation package. I pretty much just put it here for convenience to make it easy to access home manager generated stuff like dotfiles and programs.

Next up I need a script to:

Build the Docker image if it doesn’t exist
Run the Docker image with relevant mounts
Have the Docker image call the entrypoint script (dockerinit)
Have the entrypoint script call the sandboxed program (as the script ends with exec "$@" this is the same as doing dockerinit sandboxprogram ...args)

sprrw.sandboxing.runDocker = pkgs.writeShellScript "run-docker" ''
  if ! docker inspect usermapped-img &>/dev/null; then
    docker build -t usermapped-img ${dockerFileDir}
  fi

  ${pkgs.python3}/bin/python ${./start-sandbox.py} ${dockerInit false} "$@"
''

Firstly note that I am setting sprrw.sandboxing.runDocker. I put this derivation as a NixOS module option so that I can easily access it from any NixOS module in my config by doing config.sprrw.sandboxing.runDocker.

As you can see we first build the image if it doesn’t exist. Note that I should probably check the SHA of the image against some expected value here to make it deterministic with the Nix config but I couldn’t be bothered. The side effect is that I need to delete the Docker image if I ever make changes to the Dockerfile, otherwise the image will be out of date.

Secondly we run a python script start-sandbox.py. The reason for this will be described soon. We also pass in the dockerInit script with execMode being false. As well as the provided arguments.

The reason for start-sandbox.py is that I want the code that uses the sandbox to be able to provide both arguments to docker run as well as the sandboxed program and also arguments to the program itself. One way I could do this is generate a new pkgs.writeShellScript and use a Nix function to pass in the desired docker run arguments. This is what I used to do but I found that it generated many derivations for every sandboxed program that it really did not need to be generating. Instead I settled on having a single run-docker script as seen above, and the arguments to the script determine both the docker run arguments and the program arguments. This is what start-sandbox.py is for.

start-sandbox.py is called like this:

$ start-sandbox.py /path/to/dockerinit dockerArg1 dockerArg2 DOCKERIMG sandboxProg arg1 arg2 arg3

Firstly we pass in the path to the dockerinit script as it will be a Nix store path so Python doesn’t have an easy way of retrieving it unless it’s put in a command line argument. After that we can specify arguments that will be passed to docker run. Eventually the argument DOCKERIMG (literal value) should be specified. This is a marker value that specifies the end of the docker run arguments and the start of the actual program arguments. Here is an example:

$ start-sandbox.py /path/to/dockerinit -it -v ~/.config/nvim:/home/sprrw/.config/nvim:ro DOCKERIMG /nix/store/.../nvim file.txt

start-sandbox.py also implements the default Docker arguments to docker run. The full file can be found here, but the relevant part is below:

args = [
  "docker", "run",
  "--rm",
  "--hostname", "sandbox",
  "-v", "/nix:/nix:ro",
  "-v", "/etc/fonts:/etc/fonts:ro",
  "-v", "/etc/hm-package:/etc/hm-package:ro",
  "-v", f"{os.path.expanduser("~/nixos")}:/home/sprrw/nixos:ro",
  "-u", "1000:100",
  "-e", "TERM",
  *beforeTargetArgs,
  "usermapped-img",
  dockerInit,
  *afterTargetArgs,
]

I think the arguments are fairly self explanatory.

From here we can start sandboxing programs in Nix! The way we do so is as follows:

pkgs.writeShellApplication {
  name = "lolcat";
  text = ''
    ${config.sprrw.sandboxing.runDocker} -it DOCKERIMG ${pkgs.lolcat}/bin/lolcat "$@"
  '';
}

However what I found was that this pattern is common enough that it’s worth making a helper function to reduce boilerplate. This is the function I used:

sprrw.sandboxing.runDockerBin = { name, args }: (pkgs.writeShellApplication {
  inherit name;
  text = ''
    ${cfg.runDocker} ${args} "$@"
  '';
});

This shortens the above to

config.sprrw.sandboxing.runDockerBin { name = "lolcat"; args = "-it DOCKERIMG ${pkgs.lolcat}/bin/lolcat"; }

From here I can also easily implement my box command by simply doing:

cfg.runDockerBin { name = "box"; args = "-it -w /home/sprrw DOCKERIMG ${pkgs.bash}"; }

I also implemented some other helper scripts, for example box-cwd which would also share the current directory with the sandbox.

bubblewrap

The main reason I implemented Bubblewrap was because I was unhappy with the speed that Neovim was loading at in Docker as I use it for many day-to-day tasks. And I definitely want to sandbox Neovim as it uses LSPs which can potentially import malicious libraries. The solution I settled on is to use Bubblewrap to manually create Linux containers rather than doing it with Docker.

In practice it is quite similar to Docker. Here are the commands I settled on:

default_bwrap_args = [
  "bwrap",
  "--unshare-all",
  "--as-pid-1",
  *["--ro-bind", "/nix", "/nix"],
  *["--ro-bind", "/etc", "/etc"],
  *["--ro-bind", "/usr", "/usr"],
  *["--ro-bind", "/run/current-system/sw", "/run/current-system/sw"],
  *["--ro-bind", "/home/sprrw/.config/nvim", "/home/sprrw/.config/nvim"],
  *["--ro-bind", "/home/sprrw/.config/yazi", "/home/sprrw/.config/yazi"],
  *["--tmpfs", "/tmp"],
  *["--proc", "/proc"],
  *["--dev", "/dev"],
  *["--bind", f"{XDG_RUNTIME_DIR}/{WAYLAND_DISPLAY}", f"{XDG_RUNTIME_DIR}/{WAYLAND_DISPLAY}"]
]

I ended up forwarding the Wayland socket to get copy to system clipboard working. However this may open me up to Wayland socket based attacks. This is something I need to look into in the future, but I also need to not let perfect be the enemy of good.

Note that this also gets rid of all networking for Neovim. This is fine because I manage my Neovim plugins with Nix so I generally don’t need it.

Also note --as-pid-1 which is required to terminate the namespace when my sandboxed process ends. Otherwise you can have dangling processes.

flatpak

Flatpak is fairly self explanatory as it is already built with great sandboxing support. The issue with Docker and Bubblewrap is that it’s hard to get the Wayland/X11 forwarding correct for a lot of applications, while also respecting things like GTK_THEME. If an application has good Flatpak support this is pretty much all done for you. I chose to avoid Flatpak for command line tools though as I believe Flatpak takes more disk usage and may also have some additional overheads I am not aware of.

qemu

Sometimes Docker and Bubblewrap are not enough. Sure, there exist things like sysbox which can get around a lot of the shortcomings, but NixOS lends itself really nicely to building bootable ISO files so I thought that this would probably be the most pain free and customizable way of doing this. To recap, the things that a VM would let us do that would normally cause issues are:

Run Docker inside the sandbox (running Docker in Docker is normally difficult due to SECCOMP restricting the creation of namespaces)
Install packages with Nix because my current setup mounts the host Nix store read only (could probably work around this with overlayfs but this seems like a lot of effort)
Do crazy stuff like install kernel modules???

Anyway, Nix provides a nice convenient way to build an ISO for our configuration:

$ nixos-rebuild build-image --image-variant iso

In my case I created a new flake output for my sandbox and made that flake output only enable non-gui stuff in my NixOS modules, essentially creating a headless build of my system. We then build it by specifying --flake in the same way as normal nixos-rebuild:

$ nixos-rebuild build-image --flake .#sandbox --image-variant iso

We can then start QEMU with this ISO.

qemu-system-x86_64 -enable-kvm -m 16384 -smp 4 -cdrom ~/.local/vm.iso -boot d -nic user,hostfwd=tcp:127.0.0.1:"$open_port"-:22 -display none -daemonize -pidfile "$pidfile" ${qemu_args}

Things to note:

We mount the ISO with -cdrom and use -boot d to boot into it. Technically it will show systemd-boot or equivalent however I set the timeout to 1 second in my configuration so we don’t need to wait long.
I use a port forward which is integrated with my script so I can SSH into it. I also enabled openssh-server in my Nix config for that flake output.
We disable the display and do some other stuff useful for scripting.

This is not the whole story, the full file can be found here.

We can also share a directory using a disk which can be mounted into the system. This boils down to using -virtfs and then using SSH in a script to mount the disk onto a path inside the VM. For more details see the example here.

conclusion

If you just want to see the code check out my Nix repo here. Hope you enjoyed my first blog post :3