Upgrading my infrastructure with Garage and PGBackWeb
I've seen a lot of projects offer configuration options for storing data in an S3 bucket (instead of just dumping them into the local filesystem), but I never saw the appeal. After all, one of the reasons I'm hosting things myself is to have the data under my control.
Many S3-compatible projects popped up as well, which would let me host the bucket myself, but again - why bother? That seemed to increase operational complexity for little gain.
And to be clear, I mean for my use case specifically. I'm the only person using most of the software I host; and even apps that are used by others don't have above 3 active users. I'm sure the software is rock-solid, but "exascale data store" simply isn't something I need.

My view on this started changing when I discovered Garage, whose stated goals seem to be aimed exactly at someone like me.
To quote from their website:
Our Goals:
We made it lightweight and kept the efficiency in mind:
- Self-contained - We ship a single dependency-free binary that runs on all Linux distributions
- Fast to deploy, safe to operate - We are sysadmins, we know the value of operator-friendly software
- Deploy everywhere on every machine - We do not have a dedicated backbone, and neither do you, so we made software that run over the Internet across multiple datacenters
- Highly resilient - to network failures, network latency, disk failures, sysadmin failures
If I could easily (i.e. without too much admin overhead and wondering if a misconfiguration fries all my data) have redundant storage replicated across multiple devices, I wouldn't have to back them up myself! That might be worth looking into.
And the time for that came when decided to improve my backup workflows. I had a few scripts that exported my databases and stored them on disk, and which copied those backups from one machine to another - along with some other data in the filesystems. But when I recently migrated from one server to another, I ditched them and wanted something else.
I opted for BG Back Web, now being renamed to UFO Backup.
They boast "effortless PostgreSQL backups with a user-friendly web interface" - and S3 storage as a backend option.
So I decided to give both a try. I deployed PGBW on both machines that run Postgres databases. I deployed Garage on 3 machines - their recommended setup for maximum resilience. Then I configure PGBW to store the backups in an S3 bucket in Garage and voila - I found an effective way of losing all my backups at the same time!
Everything seemed to work fine, but something felt off. Then I realized that there's an oddity in the example configs on Garage's website - their docker-compose file sets the paths for data storage, but the garage.toml config sets the container's internal storage paths to different folders. That meant everything worked, but all data was being kept in the container and would be lost on the next update.
At that point I already had several gigabytes of data replicated across all the machines. I took it as an opportunity for disaster recovery practice: instead of trying to move and fix the data, I just re-made the nodes one by one with the correct config, then used Garage's documentation for what to do when a node permanently fails and must be replaced. I waited for the automatic replication to copy data to the newly remade instance, then proceeded to do the same with the next node.
So that was on me, I guess. After that everything seems to be working: backups are being made, they're being stored in the Garage bucket, everything is synced across three machines.
When that ran smoothly for a while, I decided to move Karakeep (which I use for reading) to S3 as well, purely to have its assets automatically backed up. That required a migration of the assets, because they're stored differently on the filesystem (with metadata being stored in a json file) than in S3 (with metadata directly associated). That step took a while to run, but after that, everything seems to be working fine.
I'm mostly done for now. I will consider moving the Matrix server to S3 as well, but as that seems a bit more involved, it probably won't be quite so soon.
Oh, and one last thing: I made sure to use Garage's docs to work through everything using their official CLI and learn how it works. But there's also a nice dashboard/web UI for it, called Garage Web UI, that I like to use for checking the status, and visually browsing the buckets' contents. I recommend that one as well.
Comments ()