I’m working on starting up my first home server which I’m trying to make relatively foolproof and easily recoverable. What is some common maintenance people do to avoid dire problems, including those that accumulate over time, and what are ways to recover a server when issues pop up?
At first, I figured I’d just use debian with some kind of snapshot system and monitor changelogs to update manually when needed, but then I started hearing that immutable distros like microOS and coreOS have some benefits in terms of long term “os drift”, security, and recovering from botched updates or conflicts? I don’t even know if I’m going to install any native packages, I’m pretty certain every service I want to run has a docker image already, so does it matter? I should also mention, I’m going to use this as a file server with snapraid, so I’m trying to figure out if there will be conflicts to look out for there or with hardware acceleration for video transcoding.
Be cautious with the answers when asking things like this. In discussion boards like here many are (rightfully) very excited about selfhosting and eager to share what they learned, but may (“just”) have very few years of experience. There’s a LOT to learn in this space, and it also takes a very long time to find out what is truly foolproof and easily recoverable.
First of, you want your OS do be disposable. And just as the OS should be decoupled, all the things you run should be decoupled from one another. Don’t build complex structures that take effort to rebuild. When you build something, that is state. You want to minimize the amount of state you need to keep track of. Ideally, the only state you should have is your payload data. That is impossible of course, but you get the idea.
Immutable distros are indeed the way to go for long term reliability. And ideally you want immutability by booting images (like coreOS or Fedora IoT). Distros like microOS are not really immutable, they still use the regular package manager. They only make it a little more reliable by encouraging flatpak/docker/etc (and therefore cutting down on packages managed by the package manager) and a slightly more controlled update-procedure (making them transactional). But ultimately, once your system is in some way defect, the package manager will build on top of that defect. So you keep carrying along that fault. In that sense it is not immune to “os drift” (well expressed), it is just that drift happens slower. “Proper” immutable distros that work with images are self healing, because once you rebase to another image (could be an update or a completely different distribution, doesn’t matter), you have a fresh system that doesn’t have anything to do with the previous image. Furthermore the new image does not get composed on your computer, it gets put together upstream. You only run the final result with you know is bit for bit what was tested by the distro maintainers. So microOS is like receiving a puzzle and a manual how to put it together, and gluing it in a frame is the “immutability”. Updates are like losening the glue of specific pieces and gluing in new ones. In coreOS you receive the glued puzzle and do not have to do anything yourself. Updates are like receiving an entire new glued puzzle. This also comes down to the state idea: some mutable system that was set up a long time ago and even drifted a bit has a ton of state. A truly immutable distro has a very tiny state, it is merely the hash of the image you run, and your changes in /etc (which should be minimal and well documented!).
Also you want to steer clear from things like Proxmox and generally LXC-containers and VMs. All these are not immutable (let alone immune to drift), and you only burden yourself with maintaining more mutable things with tons of state.
Docker is a good way to run your stuff. Just make sure to put all the persistent data the belongs together in subfolders of a subvolume and snapshot that, and then backup these snapshots. That way you ensure that you meet the requirements for the data(base)'s ACID properties to work. Your “backups” will be corrupted otherwise, since they would be a wild mosaic from different points in time. To be able to roll back cleanly if an update goes wrong, you should also snapshot the image hash together with the persistent data. This way you can preserve the complete state of a docker service before updating. Here you also minimize the state: you only have your payload data, the image hash and your compose.yml.