How to get dynamic VFIO working on Nixos

Hello, I am SpiderUnderYourBed, I will go over how i got dynamic VFIO working on Nixos, first of all, what is dynamic VFIO? Dynamic VFIO is when you dynamically unload/load VFIO drivers for nixos? if you dont know what VFIO is, there are many great tutorials/explanations online, but in summery when someone is talking about VFIO they are referring to GPU passthrough. Now that we got all the definitions out of the way, I will share my flake and dive into each component, keep in mind your milage may vary, so i recommend going to the VFIO discord server or Reddit page

My flake:

{ pkgs, lib, ... }:
let 
  #script = pkgs.writeScriptBin "KWIN_DRM_DEVICES.sh" ''
  #      realpath /dev/dri/by-path/pci-0000\:00\:02.0-card
  #'';
#  tmpfilePath = "${builtins.getHome}/.config/plasma-workspace/env/export-vars";

#  moveScript = ''
#    mv $out/bin/KWIN_DRM_DEVICES.sh ${tmpfilePath}
#  '';
  coreutils = pkgs.writeShellApplication {
     name = "coreutils";
     runtimeInputs = [
        pkgs.coreutils
     ];
     text = ''
        realpath /dev/dri/by-path/pci-0000:00:02.0-card 
     '';
  };

in
{
  boot = {
    kernelParams = [
#      "module_blacklist=i915"
      "iommu=pt"
      "intel_iommu=on"
      "vfio_iommu_type1.allow_unsafe_interrupts=1"
      "kvm.ignore_msrs=1"
    ];
  };
  virtualisation = {
    libvirtd = {
      enable = true;
      qemu = {
#         package = pkgs.qemu_kvm.overrideAttrs (old: {
#          patches = old.patches ++ [
#                (builtins.toFile "qemu.diff" (builtins.readFile ./qemu-8.2.0.patch))
##              (builtins.readFile /etc/nixos/qemu-8.2.0.patch)
#          ];
##        });
        runAsRoot = true;
        ovmf.enable = true;
        verbatimConfig = ''
          user = "spiderunderurbed"
          group = "users"
          namespaces = []
        '';
      };
    };
  };
  environment = {
    systemPackages = [ pkgs.dmidecode ];
    shellAliases = {
      vm-start = "virsh start win11";
      vm-stop = "virsh shutdown win11";
    };
  };

  programs.virt-manager.enable = true;

  virtualisation.libvirtd.hooks.qemu = {
    "AAA" = lib.getExe (
      pkgs.writeShellApplication {
        name = "qemu-hook";

        runtimeInputs = [
          pkgs.libvirt
          pkgs.systemd
          pkgs.kmod
        ];

        text = ''
          GUEST_NAME="$1"
          OPERATION="$2"

#         echo "$1"
#         echo "$2"

          if [ "$GUEST_NAME" != "win11" ]; then
            exit 0
          fi

          if [ "$OPERATION" == "prepare" ]; then
              #systemctl stop sddm.service
              systemctl stop display-manager.service
#             systemctl set-environment KWIN_DRM_DEVICES=${coreutils}
              echo true > /tmp/kwin_drm_devices_flag      
              modprobe -r -a nvidia_uvm nvidia_drm nvidia nvidia_modeset
              virsh nodedev-detach pci_0000_01_00_0
              virsh nodedev-detach pci_0000_01_00_1
              #fix me
              #systemctl set-property --runtime -- user.slice AllowedCPUs=8-15,24-31
              #systemctl set-property --runtime -- system.slice AllowedCPUs=8-15,24-31
              #systemctl set-property --runtime -- init.scope AllowedCPUs=8-15,24-31
              systemctl start display-manager.service

              virsh net-start default
          fi

          if [ "$OPERATION" == "release" ]; then
            #systemctl stop sddm.service
            systemctl stop display-manager.service
            #fix me
            #systemctl set-property --runtime -- user.slice AllowedCPUs=0-31
            #systemctl set-property --runtime -- system.slice AllowedCPUs=0-31
            #systemctl set-property --runtime -- init.scope AllowedCPUs=0-31
            virsh nodedev-reattach pci_0000_01_00_0
            virsh nodedev-reattach pci_0000_01_00_1
            modprobe -a nvidia_uvm nvidia_drm nvidia nvidia_modeset
           # systemctl start sddm.service
            systemctl start display-manager.service
          fi

        '';
      }
    );
  };
  
  systemd.user.tmpfiles.users.spiderunderurbed.rules = [
   # "L+ %h/.config/plasma-workspace/env/ - - - - ${script}"
  ];
  systemd.tmpfiles.rules = [
  "f /dev/shm/looking-glass 0660 spiderunderurbed qemu-libvirtd -"
  ];
}

This is what I have in my personal laptop, I have a RTX 3070 (Nvidia) dGPU with a Intel iGpu with optimus, and i am using KDE plasma. Before you copy and paste! its important that you know what might be diffrent for your system. Some of the choices i made are super specilized for my system.

So when I wanted to get dynamic VFIO working, the most important aspect is to make sure you Nvidia or AMD gpu isnt in use before unloading the drivers, in nixos display-manager is a systemd service which manages stuff like sddm and so on, and most of whats rendered on your system. Stopping it, unloading the regular drivers and loading the new one surely works right? Well.. its not that simple. There are processes that persist even after shutting down display-manager which probably shouldnt be a thing. Here is how to identify it (run as root):

lsof -n | grep -e /dev/nvidia -e /dev/dri

^ this should display all the processes using your GPU. If you try and shutdown display-manager and run this command (perferably via ssh) you will see that there are some processes that persist.

Whats the solution? well there is no proper way because that would mean that this is a common issue people experinced, alot of people perfer using nixos specilizations for stuff like this. Specilizations let you boot into a slightly diffrent instance of your exisitng configuration on boot, like in a specilization with some nix code you could have a battery saving mode, to disable your dgpu entirely. But maybe you like switching between games that run on linux and games that run on windows (albeit with extra work needing to be put into getting some games to run in a vm), maybe your a AI developer, there are many reasons why you might perfer not to reboot each time to change where your gpu is present.

So thats where dynamic vfio comes in!

So the solution to the issue i presented will vary on DE, add this to your configuration.nix if your running KDE (explination of what the varible does comes later):

environment.extraInit = ''
export KWIN_DRM_DEVICES=<NVIDIA PCIE ID (find with lspci)
'';

Its a bit worrysome becaue PCIE ids can change, if it does you can do this:

let  
coreutils = pkgs.writeShellApplication {
     name = "coreutils";
     runtimeInputs = [
        pkgs.coreutils
     ];
     text = ''
        realpath /dev/dri/by-path/pci-0000:00:02.0-card 
     '';
  };
in 
{
export KWIN_DRM_DEVICES=$(${coreutils}/bin/coreutils)
...

This will find the PCIE id and export the varible, neat but a little janky. The equivalent on hyprland would be something like:
WLR_DRM_DEVICES
and for gnome i do not know as of the moment. (maybe ill update this blog post when i find out!)

Now what does (KWIN/WRL)_DRM_DEVICES do? It blocks a GPU, partially, DRM is direct rendering manager or something along the lines so this is like hybird graphics where it will stop your GPU from being attached to common processes unless something like a video game seeks it out (I have not tested a AI workload)

This has side effects though, multimonitor wont work, if you have Optimus, a MUX switch is what id perfer as it allows you to switch between your DGPU and IGPU but optimus requires that the frames from your dgpu goes through your igpu. The way my laptop works is that my HDMI ports belong to the dGPU and if you block the dGPU nothing will show on the monitor.

What now?

Well, there is a approach I am trying and still working on, it involves the strange echo command in the QEMU hook, the echo command is supposed to create a file in tmpfiles and in environmentInit you have this:

  environment.extraInit = ''  
    if [ -f /tmp/kwin_drm_devices_flag ]; then
      export KWIN_DRM_DEVICES=$(${coreutils}/bin/coreutils)
    else
     unset KWIN_DRM_DEVICES
    fi
  ''; 

However there are still ongoing issues i need to resolve, but a little side note that this setup is jank, but it could end up working, a issue i have is reloading the session, the session I am refering too, well it may be the tty session, or the session at boot, even when the varible is set, regardless if display-manager is restarting or not, you need to get KDE to apply the changes, and so far i only got it to work during boot. There might be a command to get it to reload the session properly but as of the time im writing this i dont know

What about the other solution? the other solution is to use specifilizations, so now you might be wondering if we come full circle. Well as of the time of writing this, it seems that i am limited to dynamic vfio but no multimonitor, or multimonitor but no dynamic vfio, this could be friendlier to some workflows, and dare i say more elegant.

What about persistent states? i had a issue configuring it, while x11 and more recently wayland supports restoring windows, there might be edge cases or more persistence that would be ideal in your workflow, there are some stuff i intend to try with hibernation and headless hypervisors, and ill be sure to update this page when i test it out, i recommend you to try and find some of the solutions to some of my problems here and maybe contact me at [email protected] or in the comments (if they are enabled)

How to get dynamic VFIO working on Nixos

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top