Remote deployments with NixOS-Anywhere

I expand my NixOS configs to remote machines

· 8 min read
Remote deployments with NixOS-Anywhere

Introduction

I have been using SearXNG as my search front-end proxy for a while now. I originally set it up on the infamous Oracle free tier in some x86 VM with Ubuntu, and it was working fine for the most part. It had such light usage of resources though, that Oracle often tried to kill it for being inactive. At the same time, I wanted to use the full extent of the Oracle free tier, specifically an Ampere VM with 4 vCPUs and 24 GB of memory, to perhaps put some more services on it.

Since I was already using NixOS at that time and had multiple hosts supported, I decided to try putting NixOS on OCI as well.

I terminated the previous Ubuntu VM and tried to create the bigger Ampere VM, but I was getting the dreaded “Out of Capacity” error message in my availability domain. This persisted for many days, and eventually I just switched to Oracle Pay As You Go (PAYG), with the intention to stay within the free tier limits. With PAYG I was finally able to create the Ampere VM.

I went with the Ubuntu image, created an SSH key pair and soon was able to log in as the ubuntu user.

NixOS-Anywhere

With the VM ready and waiting, I started to create a new host config for oracle host in my nixos-config repository. Initially it was just some basics straight from the tutorial, including default disk config via disko .

After this initial configuration was ready, NixOS had to be installed on that Ampere VM with Ubuntu. To do that I started to use a new tool - nixos-anywhere.

Quickstart - nixos-anywhere - install NixOS everywhere
N/A
Link thumbnail

This tool installs NixOS anywhere via SSH. With the help of Qwen I created a new script (app in my config) that encapsulated prompts for the target machine, asking for the SSH user, key and IP address of the target machine, and then executed the nixos-anywhere command:

nixos-anywhere \
  --flake ".#${FLAKE_TARGET}" \
  --target-host "${SSH_USER}@${ORACLE_HOST_IP}" \
  --build-on-remote \
  --ssh-option "IdentityFile=${SSH_KEY}" \
  --debug

With the script ready I just executed the command, and the whole system was supposed to be now up and running on NixOS:

nix run .\#deploy-oracle

That was not the case, of course, as suddenly NixOS failed to boot:

[  468.857701] EXT4-fs (sda1): I/O error while writing superblock
[  468.860109] Buffer I/O error on device sda1, logical block 250880
[  468.862302] Buffer I/O error on device sda1, logical block 23742
[  468.864355] Buffer I/O error on device sda1, logical block 23746
[  468.865834] Buffer I/O error on device sda1, logical block 23748
[  468.867183] Buffer I/O error on device sda1, logical block 23749
[  468.868545] Buffer I/O error on device sda1, logical block 23753
[  468.870004] Buffer I/O error on device sda1, logical block 23756
[  468.872011] Buffer I/O error on device sda1, logical block 23760
[  468.874025] Buffer I/O error on device sda1, logical block 23762
[  468.875793] Buffer I/O error on device sda1, logical block 23763
[  474.199316] EXT4-fs error (device sda1): ext4_journal_check_start:84: comm kworker/u16:3: Detected aborted journal
[  474.202024] I/O error, dev sda, sector 2099200 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 2
[  474.204179] Buffer I/O error on dev sda1, logical block 0, lost sync page write
[  474.205545] EXT4-fs error (device sda1): ext4_journal_check_start:84: comm systemd-journal: Detected aborted journal

I thought the issue was with the disko config, or partitioning and formatting the disk while the system was still running Ubuntu, or that somehow kexec was not able to do its job in this particular environment. Searching for similar issues and consulting Claude and Qwen did not really result in any breakthrough.

I thought perhaps the source Ubuntu was the problem, so I terminated the OCI VM and created a new one, this time with Oracle Linux. Running the same script with nixos-anywhere, the same disko config now worked without any issues and soon I was able to log in to a brand-new NixOS system!

Secrets Management

Since I would eventually need secrets to be decrypted and available on the new host, I needed to add the new host keys to my secrets repo. Sops-nix supports two basic ways of encryption, GPG and age. age keys can be generated or SSH host/user keys in Ed25519 format can be converted to them.

Normally, I would have to obtain the newly generated SSH host key on the newly installed NixOS machine, convert it to age, re-encrypt my secrets, and build-switch again to make them available. This is possible of course, but it would be better if this could be done automatically by my deploy-oracle script.

nixos-anywhere offers the capability to copy some files to the target machine using --extra-files flag. You can prepare the needed folder structure as a temporary directory on the initiating host and that structure will be copied over to the target. I decided to forgo the host SSH key and simply generate my own age key dedicated for the new host, and copy it over during nixos-anywhere deployment.

The updated script then:

  1. Generates a new age key for the new host.
  2. Adds the newly generated key to .sops.yaml and as a secret to the encoded file in the appropriate places in YAML files.
  3. Updates keys/re-encrypts the secrets, shows the diff for review and commits to the repo.
  4. Updates secrets flake input, so that updated keys are available for nixos-anywhere and build-switch.
  5. Prepares the age keys in a temporary location.
  6. Deploys with nixos-anywhere using --extra-files to move the age keys to the appropriate place on the host.

Example of resulting .sops.yaml file:

---
# pub keys
keys:
  - &users
    - &anarion ageAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
  - &hosts # nix-shell -p ssh-to-age --run 'cat /etc/ssh/ssh_host_ed25519_key.pub | ssh-to-age'
    - &server1 ageXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    - &server2 ageYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY

creation_rules:
  - path_regex: .*.yaml$
    key_groups:
      - age:
          - *anarion
          - *server1
          - *server2

So far, so good. After a reboot, though, the system still couldn’t find the age key to decrypt the secrets. The key was there in the expected location ("/home/${user}/.config/sops/age/keys.txt"), but with root permissions. Turns out --extra-files copies files only as root, and they remain that way.

To fix that we can use systemd.tmpfiles.rules option in the new host configuration to set the permissions we need:

  # Fix ownership of .config directory created by nixos-anywhere --extra-files
  systemd.tmpfiles.rules = [
    "d /home/${user}/.config 0755 ${user} ${user} -"
    "d /home/${user}/.config/sops 0755 ${user} ${user} -"
    "d /home/${user}/.config/sops/age 0755 ${user} ${user} -"
    "f /home/${user}/.config/sops/age/keys.txt 0600 ${user} ${user} -"
  ];

After that the secrets were decrypted successfully.

Making the Script Generic

With the basic framework in place, I expanded the script to be more generic, not just specific to Oracle. I wanted it to automatically check hosts/nixos directory for suitable hosts, and prompt the user to select a host from amongst the available host configurations.

Again, Qwen did the heavy lifting (typing) and expanded the script. It also added plenty of checks and backups for the secret files, which was very nice and prudent of it.

The script also includes checks for age key already being present for a given host, default prompts, etc. In the end, the deploy-remote script was ready. This is how the interaction looks all the way up to actual execution of the nixos-anywhere command:

 nix run .\#deploy-remote
warning: Git tree '/home/anarion/Documents/Repos/nixos-config-test' is dirty
Running deploy-remote for x86_64-linux
================================
NixOS Anywhere - Remote Deploy
================================

Step 1: Configuration

Available hosts:
  1. hplaptop
  2. nix-cache
  3. nixos
  4. oracle

Select host to deploy (enter number or name): 2
 Selected host: nix-cachee

SSH user [root]:

SSH Key Configuration
Enter the path to your SSH private key.
Examples:
  - /home/anarion/.ssh/id_rsa
  - /home/anarion/.ssh/oracle_ssh-key-2025-12-22.key
  - /home/anarion/.ssh/ed25519

SSH key path [/home/anarion/.ssh/id_rsa]:
 SSH key found: /home/anarion/.ssh/id_rsa

Remote User Configuration
This is the username that will be created on the remote system.
It will be used for home-manager and file paths plus age keys.

Remote username [anarion]:
 Remote user: anarion

Network Configuration
Remote server IP address: 192.168.2.200
 Target IP: 192.168.2.200

Step 2: Checking Prerequisites

 nixos-anywhere available
 sops available
 age available

Step 3: Locating Secrets Repository

 Using secrets from: /home/anarion/Documents/Repos/secrets

Step 4: Checking for Existing Age Key

 Public key already exists in .sops.yaml
 Secret key already exists in system.yaml

 Age key already fully configured

Step 5: Age Key Configuration

 Age key already fully configured, skipping modifications

Step 6: Preparing Age Keys for Deployment

Decrypting age key from system.yaml...
 Age key decrypted and prepared
  - Location: /home/anarion/.config/sops/age/keys.txt

Step 7: Deployment Summary

─────────────────────────────────────────────────────────────────────
  Host:      nix-cache
  IP:         192.168.2.200
  SSH User:   root
  SSH Key:    /home/anarion/.ssh/id_rsa
  Remote User: anarion
  Flake Target: .#nix-cache
─────────────────────────────────────────────────────────────────────

 WARNING: This will destroy all data on 192.168.2.200!

Type 'yes' to continue:

With this script, I can just prepare the new host config and run:

nix run .\#deploy-remote

And have the host ready, no further manual steps needed. This is pretty great.

It is somewhat dangerous to let a script change and commit to your secrets repo, but there is a diff review step before that, so it is mitigated to some extent.

Also, the script currently uses the secrets repo cloned locally and expects it at a certain location. For further automation it should be extracted from flake input, cloned, updated and cleared. I am, however, very happy with how it turned out and how it works. I’ve used it to deploy a couple of servers and I’m sure I will use it again in the future.

The whole script is available as a gist here for anyone interested.

Related Posts

Using NixOS
· 17 min read
My custom keyboard journey
· 23 min read
Home Assistant Daily Digest
· 10 min read