Introduction
Introduction Statistics Contact Development Disclaimer Help
Title: Managing a fleet of NixOS Part 1 - Design choices
Author: Solène
Date: 02 September 2022
Tags: bento nixos nix
Description: In this series of articles, I'll explain my steps toward
designing an infrastructure to centrally manage a fleet of NixOS
systems.
# Introduction
I have a grand project in my mind, and I need to think about it before
starting any implementation. The blog is a right place for me to
explain what I want to do and the different solutions.
It's related to NixOS. I would like to ease the management of a fleet
of NixOS workstations that could be anywhere.
This could be useful for companies using NixOS for their employees, to
manage all the workstations remotely, but also for people who may
manage NixOS systems in various places (cloud, datacenter, house,
family computers).
In this central management, it makes sense to not have your users with
root access, they would have to call their technical support to ask for
a change, and their system could be updated quickly to reflect the
request. This can be super useful for remote family computers when
they need an extra program not currently installed, and that you took
the responsibility of handling your system...
With NixOS, this setup totally makes sense, you can potentially
reproduce users bugs as you have their configuration, stage new changes
for testing, and users can roll back to a previous working state in
case of big regression.
Cachix company made it possible before I figure a solution. It's still
not late to propose an open source alternative.
Cachix Deploy
# Defining the project
The purpose of this project is to have a central management system on
which you keep the configuration files for all the NixOS around, and
allow the administrator to make the remote NixOS to pick up the new
configuration as soon as possible when required.
We can imagine three different implementations at the highest level:
* a scheduled job on each machine looking for changes in the source.
The source could be a git repository, a tarball or anything that could
be used to carry the configuration.
* NixOS systems could connect to something like a pub/sub and wait for
an event from the central management to trigger a rebuild, the event
may or not contain information / sources.
* the central management system could connect to the remote NixOS to
trigger the build / push the build
These designs have all pros and cons. Let's see them more in details.
## Solution 1 - Scheduled job
In this scenario, The NixOS system would use a cron or systemd timer to
periodically check for changes and trigger the update.
### Pros
* low maintenance
* could interactively ask the user when they want to upgrade if not now
### Cons
* may not run at all if the system is not up at the correct time, or
could be run at a delayed time depending on situation
* can't force an update as soon as possible
* not really bandwidth effective if you often poll
* no feedback from the central management about who made/receive the
update (except by adding a call to the server?)
## Solution 2 - Remote systems are listening for changes (publisher / subscribe…
In this scenario, the NixOS system would always be connected to the
central management, using some kind of protocol like MQTT, BOCH or
similar.
### Pros
* you know which systems are up
* events from central management are instantaneous and should wait for
an acknowledgment
* updates should propagate very quickly
* could interactively ask the user when they want to upgrade if not now
### Cons
* this can lead to privacy issue as you know when each host is
connected
* this adds complexity to the server
* this adds complexity on each client
* firewalls usually don't like long-lived connections, HTTPS based
solution would help bypass firewalls
## Solution 3 - The central management pushes the updates to the remote systems
In this scenario, the NixOS system would be reachable over a protocol
allowing to run commands like SSH. The central management system would
run a remote upgrade on it, or push the changes using tools like
deploy-rs, colmena, morph or similar...
Awesome-nix list: deployment-tools
### Pros
* update is immediate
* SSH could be exposed over TOR or I2P for maximum firewall bypassing
capability
### Cons
* offline systems may be complicated to update, you would need to try
to connect to them often until they are reachable
* you can connect to the remote machine and potentially spy the user.
In the alternatives above, you can potentially achieve the same by
reconfiguring the computer to allow this, but it would have to be done
on purpose
# Making a choice
I tried to state the pros and cons of each setup, but I can't see a
clear winner. However, I'm not convinced by the Solution 1 as you
don't have any feedback or direct control on the systems, I prefer to
abandon it.
The Solutions 2 and 3 are still in the competition, we basically ended
with a choice between a PUSH and a PULL workflow.
# Conclusion
In order to choose between 2 and 3, I will need to experiment with the
Solution 2 technologies as I never used them (MQTT, RabbitMQ, BOCH
etc…).
You are viewing proxied material from dataswamp.org. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.