Files
nixos-configs/machines/john-endesktop/MIGRATION_PLAN.md

424 lines
11 KiB
Markdown

# Migration Plan: Arch Linux to NixOS on john-endesktop (ZFS/NFS Server)
## Overview
This document outlines the plan to migrate the john-endesktop server from Arch Linux to NixOS while maintaining the existing ZFS pools and NFS exports that serve your k3s cluster.
## Current System State
### Hardware
- **Boot disk**: nvme0n1
- nvme0n1p3: 1000M EFI partition (UUID: F5C6-D570)
- nvme0n1p4: 120GB ext4 / (current Arch root)
- nvme0n1p5: 810GB - **Target for NixOS** (being removed from media pool)
- **Network**: enp0s31f6 @ 10.0.0.43/24 (DHCP)
### ZFS Pools
- **media**: ~3.5TB JBOD pool (2 drives after nvme0n1p5 removal)
- wwn-0x50014ee2ba653d70-part2
- ata-WDC_WD20EZBX-00AYRA0_WD-WX62D627X7Z8-part2
- Contains: /media/media/nix (bind mounted to /nix on Arch)
- NFS: Shared to 10.0.0.0/24 via ZFS sharenfs property
- **swarmvols**: 928GB mirror pool - **PRODUCTION DATA**
- wwn-0x5002538f52707e2d-part2
- wwn-0x5002538f52707e81-part2
- Contains: iocage jails and k3s persistent volumes
- NFS: Shared to 10.0.0.0/24 via ZFS sharenfs property
- Backed up nightly to remote borg
### Services
- NFS server exporting /media and /swarmvols to k3s cluster
- ZFS managing pools with automatic exports via sharenfs property
## Prerequisites
### Before Starting
1. ✅ Ensure nvme0n1p5 removal from media pool is complete
```bash
ssh 10.0.0.43 "zpool status media"
# Should show no "removing" devices
```
2. ✅ Verify recent backups exist
```bash
# Verify swarmvols backup is recent (< 24 hours)
# Check your borg backup system
```
3. ✅ Notify k3s cluster users of planned maintenance window
- NFS shares will be unavailable during migration
- Estimate: 30-60 minutes downtime
4. ✅ Build NixOS configuration from your workstation
```bash
cd ~/nixos-configs
nix build .#nixosConfigurations.john-endesktop.config.system.build.toplevel
```
## Migration Steps
### Phase 1: Prepare NixOS Installation Media
1. **Download NixOS minimal ISO**
```bash
wget https://channels.nixos.org/nixos-25.11/latest-nixos-minimal-x86_64-linux.iso
```
2. **Create bootable USB**
```bash
# Identify USB device (e.g., /dev/sdb)
lsblk
# Write ISO to USB
sudo dd if=latest-nixos-minimal-x86_64-linux.iso of=/dev/sdX bs=4M status=progress
sudo sync
```
### Phase 2: Backup and Shutdown
1. **On the server, verify ZFS pool status**
```bash
ssh 10.0.0.43 "zpool status"
ssh 10.0.0.43 "zfs list"
```
2. **Export ZFS pools cleanly**
```bash
ssh 10.0.0.43 "sudo zpool export media"
ssh 10.0.0.43 "sudo zpool export swarmvols"
```
3. **Shutdown Arch Linux**
```bash
ssh 10.0.0.43 "sudo shutdown -h now"
```
### Phase 3: Install NixOS
1. **Boot from NixOS USB**
- Insert USB drive
- Power on and select USB in boot menu
2. **Connect to network**
```bash
# If DHCP doesn't work automatically:
sudo systemctl start dhcpcd
ip a # Verify you have 10.0.0.43 or another IP
```
3. **Enable SSH for remote installation (recommended)**
```bash
# Set password for nixos user
sudo passwd nixos
# Start SSH
sudo systemctl start sshd
# From your workstation:
ssh nixos@10.0.0.43
```
4. **Partition nvme0n1p5 with btrfs**
```bash
# Verify the device is clear
lsblk
sudo wipefs -a /dev/nvme0n1p5
# Create btrfs filesystem
sudo mkfs.btrfs -L nixos /dev/nvme0n1p5
# Mount and create subvolumes
sudo mount /dev/nvme0n1p5 /mnt
sudo btrfs subvolume create /mnt/@
sudo btrfs subvolume create /mnt/@home
sudo btrfs subvolume create /mnt/@nix
sudo btrfs subvolume create /mnt/@log
sudo umount /mnt
# Mount root subvolume
sudo mount -o subvol=@,compress=zstd,noatime /dev/nvme0n1p5 /mnt
# Create mount points
sudo mkdir -p /mnt/{boot,home,nix,var/log}
# Mount other subvolumes
sudo mount -o subvol=@home,compress=zstd,noatime /dev/nvme0n1p5 /mnt/home
sudo mount -o subvol=@nix,compress=zstd,noatime /dev/nvme0n1p5 /mnt/nix
sudo mount -o subvol=@log,compress=zstd,noatime /dev/nvme0n1p5 /mnt/var/log
# Mount EFI partition
sudo mount /dev/nvme0n1p3 /mnt/boot
```
5. **Import ZFS pools**
```bash
# Import pools (should be visible)
sudo zpool import
# Import with force if needed due to hostid
sudo zpool import -f media
sudo zpool import -f swarmvols
# Verify pools are mounted
zfs list
ls -la /media /swarmvols
```
6. **Generate initial hardware configuration**
```bash
sudo nixos-generate-config --root /mnt
```
7. **Get the new root filesystem UUID**
```bash
blkid /dev/nvme0n1p5
# Note the UUID for updating hardware-configuration.nix
```
8. **Copy your NixOS configuration to the server**
```bash
# From your workstation:
scp -r ~/nixos-configs/machines/john-endesktop/* nixos@10.0.0.43:/tmp/
# On server:
sudo mkdir -p /mnt/etc/nixos
sudo cp /tmp/configuration.nix /mnt/etc/nixos/
sudo cp /tmp/hardware-configuration.nix /mnt/etc/nixos/
# Edit hardware-configuration.nix to update the root filesystem UUID
sudo nano /mnt/etc/nixos/hardware-configuration.nix
# Change: device = "/dev/disk/by-uuid/CHANGE-THIS-TO-YOUR-UUID";
# To: device = "/dev/disk/by-uuid/[UUID from blkid]";
```
9. **Install NixOS**
```bash
sudo nixos-install
# Set root password when prompted
# Set user password
sudo nixos-install --no-root-passwd
```
10. **Reboot into NixOS**
```bash
sudo reboot
# Remove USB drive
```
### Phase 4: Post-Installation Verification
1. **Boot into NixOS and verify system**
```bash
ssh johno@10.0.0.43
# Check NixOS version
nixos-version
# Verify hostname
hostname # Should be: john-endesktop
```
2. **Verify ZFS pools imported correctly**
```bash
zpool status
zpool list
zfs list
# Check for hostid mismatch warnings (should be gone)
# Verify both pools show ONLINE status
```
3. **Verify NFS exports are active**
```bash
sudo exportfs -v
systemctl status nfs-server
# Should see /media and /swarmvols exported to 10.0.0.0/24
```
4. **Test NFS mount from another machine**
```bash
# From a k3s node or your workstation:
sudo mount -t nfs 10.0.0.43:/swarmvols /mnt
ls -la /mnt
sudo umount /mnt
sudo mount -t nfs 10.0.0.43:/media /mnt
ls -la /mnt
sudo umount /mnt
```
5. **Verify ZFS sharenfs properties preserved**
```bash
zfs get sharenfs media
zfs get sharenfs swarmvols
# Should show: sec=sys,mountpoint,no_subtree_check,no_root_squash,rw=@10.0.0.0/24
```
6. **Check swap device**
```bash
swapon --show
free -h
# Should show /dev/zvol/media/swap
```
### Phase 5: Restore k3s Cluster Access
1. **Restart k3s nodes or remount NFS shares**
```bash
# On each k3s node:
sudo systemctl restart k3s # or k3s-agent
```
2. **Verify k3s pods have access to persistent volumes**
```bash
# On k3s master:
kubectl get pv
kubectl get pvc
# Check that volumes are bound and accessible
```
## Rollback Plan
If something goes wrong during migration, you can roll back to Arch Linux:
### Quick Rollback (If NixOS won't boot)
1. **Boot from NixOS USB (or Arch USB)**
2. **Import ZFS pools**
```bash
sudo zpool import -f media
sudo zpool import -f swarmvols
```
3. **Start NFS manually (temporary)**
```bash
sudo mkdir -p /media /swarmvols
sudo systemctl start nfs-server
sudo exportfs -o rw,sync,no_subtree_check,no_root_squash 10.0.0.0/24:/media
sudo exportfs -o rw,sync,no_subtree_check,no_root_squash 10.0.0.0/24:/swarmvols
sudo exportfs -v
```
This will restore k3s cluster access immediately while you diagnose.
4. **Boot back into Arch Linux**
```bash
# Reboot and select nvme0n1p4 (Arch) in GRUB/boot menu
sudo reboot
```
5. **Verify Arch boots and services start**
```bash
ssh johno@10.0.0.43
zpool status
systemctl status nfs-server
```
### Full Rollback (If needed)
1. **Follow Quick Rollback steps above**
2. **Re-add nvme0n1p5 to media pool (if desired)**
```bash
# Only if you want to restore the original configuration
sudo zpool add media /dev/nvme0n1p5
```
3. **Clean up NixOS partition**
```bash
# If you want to reclaim nvme0n1p5 for other uses
sudo wipefs -a /dev/nvme0n1p5
```
## Risk Mitigation
### Data Safety
- ✅ **swarmvols** (production): Mirrored + nightly borg backups
- ⚠️ **media** (important): JBOD - no redundancy, but not catastrophic
- ✅ **NixOS install**: Separate partition, doesn't touch ZFS pools
- ✅ **Arch Linux**: Remains bootable on nvme0n1p4 until verified
### Service Continuity
- Downtime: 30-60 minutes expected
- k3s cluster: Will reconnect automatically when NFS returns
- Rollback time: < 10 minutes to restore Arch
### Testing Approach
1. Test NFS exports from NixOS live environment before installation
2. Test single NFS mount from k3s node before full cluster restart
3. Keep Arch Linux boot option until 24-48 hours of stable NixOS operation
## Post-Migration Tasks
After successful migration and 24-48 hours of stable operation:
1. **Update k3s NFS mounts (if needed)**
- Verify no hardcoded references to old system
2. **Optional: Repurpose Arch partition**
```bash
# After you're confident NixOS is stable
# You can wipe nvme0n1p4 and repurpose it
```
3. **Update documentation**
- Update infrastructure docs with NixOS configuration
- Document any deviations from this plan
4. **Consider setting up NixOS remote deployment**
```bash
# From your workstation:
nixos-rebuild switch --target-host johno@10.0.0.43 --flake .#john-endesktop
```
## Timeline
- **Preparation**: 1-2 hours (testing config build, downloading ISO)
- **Migration window**: 1-2 hours (installation + verification)
- **Verification period**: 24-48 hours (before removing Arch)
- **Total**: ~3 days from start to declaring success
## Emergency Contacts
- Borg backup location: [Document your borg repo location]
- K3s cluster nodes: [Document your k3s nodes]
- Critical services on k3s: [Document what's running that depends on these NFS shares]
## Checklist
Pre-migration:
- [x] nvme0n1p5 removal from media pool complete
- [ ] Recent backup verified (< 24 hours)
- [ ] Maintenance window scheduled
- [ ] NixOS ISO downloaded
- [ ] Bootable USB created
- [ ] NixOS config builds successfully
During migration:
- [ ] ZFS pools exported
- [ ] Arch Linux shutdown cleanly
- [ ] Booted from NixOS USB
- [ ] nvme0n1p5 formatted with btrfs
- [ ] Btrfs subvolumes created
- [ ] ZFS pools imported
- [ ] NixOS installed
- [ ] Root password set
Post-migration:
- [ ] NixOS boots successfully
- [ ] ZFS pools mounted automatically
- [ ] NFS server running
- [ ] NFS exports verified
- [ ] Test mount from k3s node successful
- [ ] k3s cluster reconnected
- [ ] Persistent volumes accessible
- [ ] No hostid warnings in zpool status
- [ ] Arch Linux still bootable (for rollback)
Final verification (after 24-48 hours):
- [ ] All services stable
- [ ] No unexpected issues
- [ ] Performance acceptable
- [ ] Ready to remove Arch partition (optional)
- [ ] Ready to remove /swarmvols/media-backup (optional)