b.splendous.net

adventures in unraid

July 2021 / tech

As part of my pandemic coping, I started upgrading a PC I'd had for a few years. I'd been interested in working on a homelab for a while, and when I accidentally bought a case that was way too large - a Fractal Design Define 7 - I began to think about expanding that machine instead of going to a rack.

Virtualization software like Unraid or Proxmox would let me combine a gaming PC with some homelab server interests. The graphics card passthrough support with Unraid was much clearer, and I had friends that liked it, so I went with that.

It wasn't all clear sailing, but I tried to record the steps I took and challenges I faced.

First Try, and Initial Setup

Since this was an idea I had after upgrading the machine, I hadn't looked into Unraid support when choosing hardware. I had a mix of new and old hardware without clear Unraid support, but decided to forge on.

Notably (foreshadowing!), I started with a ROG Strix B550-F and an old AMD graphics card.

Initial Setup

Weirdly (at least to me), Unraid boots off of a flash drive. Instructions and recommendations are in Unraid's docs. To make the flash drive boot on my machine I had to rename the "EFI-" folder to "EFI" (I'll undo this later). I enabled AMD-V, and IOMMC (did I mean IOMMU?) in my BIOS.

References for initial setup:

Based on those resources, I went through these steps. At the time they felt throwaway, but I think they were helpful.

  • installed community applications
  • installed fix common problems
  • installed unassigned devices
  • installed dynamix system stats, s3 sleep, system info, temp (including required nerdpack gui, use tdie)
    • Much later, the array went to sleep while I was in a Windows VM. I didn't see a way to prevent that, so I've disabled S3 Sleep for now.
  • check s3 in bios
  • try wake on lan - yes! wakeonlan -i 192.168.6.255 aa:a1:61:23:01:dd

A note, in the future I should preclear before adding disks - how to preclear your disks Good to know for next time!! If you don't, adding new disks takes hours.

Passthrough / Bare Metal Windows

Let's make a gaming VM! Spaceinvader One has great info: How to dual boot baremetal windows and unRAID then boot the same windows as a vm

  • update bios & device drivers
  • grab UUID onto flash drive
  • passthrough NVMe PCIe - just add vfio-pci.ids=1cc1:5350 to syslinux config (I did both regular & GUI boots) and rebooted
  • create a new windows 10 vm, add passthrough pcie (nvme) and graphics cards
  • passthrough usb devices?

Went through that and the machine was crashing on boot (vfio_region_write(...) failed: device or resource busy). Checked out advanced passthrough techniques: Advanced GPU passthrough techniques on Unraid

  • realized that I was only using half the reserved CPU threads
  • was making the mapping mistake as in Advanced GPU passthrough techniques on Unraid @ 336; needed to remap the sound to the same slot as the graphics card
  • added more things (graphics card, motherboard sound) to vfio. Realized there's a new system off of system devices that lets you bind to vfio from there and store it in config/vfio-pci.cfg, so I removed the boot arg from syslinux.

At this point I got an NVidia card. Radeons are supposed to be a bit more challenging for passthrough, but the real reason is that I lucked into a 3070 in the Newegg Shuffle.

  • Tried reusing the old VM (updating the vfio passthroughs via the UI and config/vfio-pci.cfg, making sure the VM's configs have both graphics & audio on one slot with different functions and multifunction='on'), but that seemed to crash unraid entirely.
  • Try a new VM, starting over:
    • create a new windows 10 vm, add passthrough pcie (nvme) and graphics cards
      • try OVMF BIOS first
      • I passed through my mouse, keyboard, and USB headset, and "other PCIe devices" (my ADATA NVMe)
      • add the UUID, update the soundcard to multifunction (remember multifunction='on' goes on the target not the source; I just put it on the first entry for that slot, not the second)

And hey, that worked!!

Network didn't work, so I installed the virtio drivers, following the order and steps in this bit: https://wiki.unraid.net/UnRAID_Manual_6#Step_5:_Install_the_VirtIO_drivers_from_inside_the_VM_.28Windows_Guests_Only.29.

I tried to just carry on without specifying the vbios, at least until I see an error. Well, I saw some weird behavior pretty quick, and my second try booting went to a black screen, so I thought I'd better handle this.

I wasn't able to easily dump the vbios through the easy method: How to Easily Dump the vBios from any GPU for Passthrough. I got the cat: rom: Input/output error error as seen https://github.com/SpaceinvaderOne/Dump_GPU_vBIOS/issues/3.

Instead I rebooted into non-VM Windows and used GPU-Z (https://www.techpowerup.com/gpuz/) to dump the rom, saved it to the Unraid flash drive and rebooted. Now according to How to easily passthough a Nvidia GPU as primary without dumping your own vbios! in KVM unRAID or https://csandvik.com/unraid-windows-vm/#step-4-download-the-vbios-for-your-graphics-card you need to remove some stuff, so I did that in vim. My first line read U.K7400L.wVI DEO but it seemed to match close enough.

After adding that vbios to the VM configuration, my screen was still black with a bunch of errors:

2021-02-15T21:46:12.647785Z qemu-system-x86_64: vfio_region_write(0000:07:00.0:region1+0x156080, 0x0,8) failed: Device or resource busy
2021-02-15T21:46:12.647793Z qemu-system-x86_64: vfio_region_write(0000:07:00.0:region1+0x156078, 0x0,8) failed: Device or resource busy
2021-02-15T21:46:12.647800Z qemu-system-x86_64: vfio_region_write(0000:07:00.0:region1+0x156070, 0x0,8) failed: Device or resource busy

Per https://forums.unraid.net/topic/71371-resolved-primary-gpu-passthrough/ and https://listman.redhat.com/archives/vfio-users/2016-March/msg00088.html, this could be a conflict from efifb. That seemed to be borne out by less /proc/iomem:

d0000000-fec2ffff : PCI Bus 0000:00
d0000000-e1ffffff : PCI Bus 0000:07
d0000000-dfffffff : 0000:07:00.0
d0000000-d095ffff : efifb
e0000000-e1ffffff : 0000:07:00.0

I added the magic incantation from there (below for reference) and the VM started, straightaway (no need for a reboot or anything).

echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind

This got most everything into a working state. My VM would boot, with the graphics passthrough, and I could play games! I added those lines to a script that I set to run at boot.

Problems and the Second Try

This brought me to 90% of where I wanted to be, but the last 10% was a struggle.

"Demonic" sound (lots of clicks)

I'm using a USB soundcard & headset (which, btw, is crazy - the soundcard fits in this little 2"x1"x1/8" box; I remember my SoundBlaster AWE32 filling up my entire case). The audio in the VM was pretty bad, lots of clicks etc which matches what the community is calling "demonic sound."

The usual fix for that seems to be enabling MSI mode, like The best way to install and setup a windows 10 vm Part 2 Hardware Passthrough @ 467 or https://csandvik.com/unraid-windows-vm/#step-8-fix-demonic-sound or (the most complete version) https://forums.guru3d.com/threads/windows-line-based-vs-message-signaled-based-interrupts-msi-tool.378044/.

But, even when I downloaded the MSI_util_v2.exe and ran it, I can't select my headset.. of course, it's USB.

I wanted to switch the USB controller to passthrough like https://forums.unraid.net/topic/35112-guide-passthrough-entire-pci-usb-controller/ or How to easily pass through a USB Controller in unRAID. But, since all my backpanel USB slots are on the same bus, there wasn't a way to get just the headset into it's own IOMMU group, unless I was willing to put either the headset or the Unraid USB drive into the front panel USB ports.

Now I really wish I'd gotten an x570 - those boards have a third bus, and that'd help. Also better SATA.

Anyways, this seemed solveable with money, so I bought a USB card (https://www.startech.com/en-us/cards-adapters/pexusb312eic) with an ASMedia chip (the ASM1142), which seemed to have good compatibility. And a new NVMe for cache, because if I'm going in I might as well do two things at the same time.

And! It didn't work. The USB card I got was x4, and I only have a single x16 slot that can handle it.. and, that's in a group with all of my onboard USB controllers, and some SATA channels. In fact, it looked as though all the PCIe lanes were in the same group, so a second 1x USB card wouldn't help (although i bought one and tried it too, to make sure).

I read about ACS mode, which (if memory serves) would allow kernel-level remapping of individual devices to different IOMMU groups.

This can be enabled in Unraid by going to Settings > VM Manager, enabling "Advanced View", and changing "PCIe ACS Override" to "Multi-function" (I also triee "Downstream" but that didn't help), I got a separate IOMMU group for the USB card. I had to remap all devices, set my efi script back to run at array start (it'd somehow dropped off, so I set it back and also re-ran it), and everything seemed to work.

ACS mode was never stable, though. Windows kept crashing, and everything seemed messy. Maybe there was a way through but it seemed challenging.

New X570 Motherboard

So, I threw money & time at the problem, and got a board with better IOMMU groups: the Asrock Taichi X570. A friend was running it, and there were good guides online like this one: https://forums.unraid.net/topic/87557-guide-asrock-x570-taichi-vm-w-hardware-passthrough/

I went through a bunch of steps again:

  • Replaced all the hardware, reactivated the bare-metal windows boot, grabbed the UUID, etc
  • Updated the BIOS to 4.0.0
  • Ran through setting up the BIOS per https://forums.unraid.net/topic/87557-guide-asrock-x570-taichi-vm-w-hardware-passthrough/. I didn't have the USB options that they mention, so I proceeded without for now.
  • Enabled other things: "Advanced\ACPI Configuration\PCIE Devices Power On" (for WoL), reset my boot order
  • Instead of using the framebuffer script, I tried adding video=efifb:off to the syslinux configuration. It worked!
  • I started getting "Error 31" in my NVidia drivers.
    • I heard this might be because I was booting UEFI, so I made sure that the non-UEFI unraid boot disk was selected in the BIOS boot options (there were two, one starting with "UEFI:", the other starting with "USB:"). I also renamed "EFI" back to "EFI-" on my Unraid USB, as "EFI-" might be to enable UEFI mode which I didn't want. You can check this on the flash screen in the syslinux configuration, it should say "Server boot mode: Legacy". Still no dice.
    • I made a few tweaks to the VM settings - change USB Controller back to "2.0 (EHCI)" and removed spaces from my vbios. Rewatched Advanced GPU passthrough techniques on Unraid to see if I've missed anything.
    • Change to a Q35 BIOS instead of the default? I didn't have to do this, as one of the steps above seemed to help.

After all that, the VM was stable.

Continued Setup

Now that Windows is stable, let's see what else this can do.

Cache

As mentioned above I also bought a cache drive. I followed the steps https://wiki.unraid.net/Cache_disk#Adding_a_cache_disk_to_the_array to add the cache drive. Well, those are out of date, but https://forums.unraid.net/topic/46802-faq-for-unraid-v6/ and How to add a cache drive, replace a cache drive or create a cache pool and just blindly making a pool named 'cache' let me add it.

Security

A few steps to lock things down a bit. I'm not storing anything terribly important, but the default-open unraid seems just too much. http://kmwoley.com/blog/securing-a-new-unraid-installation/

  • Add a root password
  • Create a non-root user
  • Restrict shares to 'private', so that guests can't access them
  • Change the unraid flash to export=no
  • Disable SMB1
  • Disable telnet & ftp
  • Setup email notifications

Dockers

  • Plex - I used an image from linuxserver.io, but not for a great reason. Map your shares and you're off to the races.
  • syncthing - also a linuxserver.io image. I made a new 'cloud' share, and set the "Container Path: /sync" in the docker config to /mnt/user/cloud/syncthing. Within the syncthing GUI, I changed the default folder to /sync (instead of /config), to make sure that data was written out to the new 'cloud' share on spinning disks with redundancy, instead of my cache drivers.

Debian VM

I think the hive mind seems to prefer dockers, but I like having a VM just to use. It was really easy to set up - I downloaded the Debian netinstall ISO to /mnt/user/isos, and started a new VM. Some settings that I changed: use emulated (logical) CPUs instead of a passthrough (I don't care about performance), a 10G disk (was maybe a little small, 20G might have been better).

Upgrading CPU

I upgraded my CPU to a Ryzen 5800x. This seemed to drop a few virtualization parameters (like IOMMU settings), which borked my VM again. Ugh. My first clue should have been when unraid booted to a prompt instead of the vfio stuff taking over the graphics card part way through the boot. -

I went through the list from https://forums.unraid.net/topic/87557-guide-asrock-x570-taichi-vm-w-hardware-passthrough/ again, and found:

  • "Advanced>AMD CBS>NBIO Common Options>IOMMU" was disabled again, so I enabled that, set "Enable AER Cap" to Enable, and "ACS Enable" to Auto.
  • "Advanced>ACPI Configuration>PCIE Devices Power On" was disabled again, so I set that to "Enabled" for WoL.

I'd recommend rechecking your CPU pinning both in "Settings > CPU Pinning" (both CPU Pinning and CPU Isolation) and your VM configuration after changing your CPU.

Future/possible projects

  • /home shares? No, I don't think so - and this is actually a reason to keep my old Synology NAS. As a focused NAS that handles multiple users really well, and I haven't seen anything else that replicates that nearly as well.
  • My photos are currently very Apple Photos based. It might be nice to do something with the local server, and it seems like there's the possibility for offline processing (names etc) that might be interesting. Does Plex have something like this? Ars Technica recently ran a feature on this.
  • Local hosting for my repos. I have a few up on Bitbucket or GitHub, but something local might be nice. GitLab and Gitea require databases, and that might be too fancy. Cgit or GitList might be interesting.
  • Do I need hardware encoding for Plex? More details here: How to use GPU transcoding in an Emby or Plex container on Unraid, and a recommendation for a 1030 (not like it's possible to buy an extra graphics card, even an old one, these days): https://forums.unraid.net/topic/81844-gpu-pass-thru-recommendation/