| Title: Securing backups using S3 storage | |
| Author: Solène | |
| Date: 19 October 2024 | |
| Tags: security network backup | |
| Description: In this guide, you will learn how S3 storage can help you | |
| securing your backups | |
| # Introduction | |
| In this blog post, you will learn how to make secure backups using | |
| Restic and a S3 compatible object storage. | |
| Backups are incredibly important, you may lose important files that | |
| only existed on your computer, you may lose access to some encrypted | |
| accounts or drives, when you need backups, you need them to be reliable | |
| and secure. | |
| There are two methods to handle backups: | |
| * pull backups: a central server connects to the system and pulls data | |
| to store it locally, this is how rsnapshot, backuppc or bacula work | |
| * push backups: each system run the backup software locally to store it | |
| on the backup repository (either locally or remotely), this is how most | |
| backups tool work | |
| Both workflows have pros and cons. The pull backups are not encrypted, | |
| and a single central server owns everything, this is rather bad from a | |
| security point of view. While push backups handle all encryption and | |
| accesses to the system where it runs, an attacker could destroy the | |
| backup using the backup tool. | |
| I will explain how to leverage S3 features to protect your backups from | |
| an attacker. | |
| # Quick intro to object storage | |
| S3 is the name of an AWS service used for Object Storage. Basically, | |
| it is a huge key-value store in which you can put data and retrieve it, | |
| there are very little metadata associated with an object. Objects are | |
| all stored in a "bucket", they have a path, and you can organize the | |
| bucket with directories and subdirectories. | |
| Buckets can be encrypted, which is an important feature if you do not | |
| want your S3 provider to be able to access your data, however most | |
| backup tools already encrypt their repository, so it is not really | |
| useful to add encryption to the bucket. I will not explain how to use | |
| encryption in the bucket in this guide, although you can enable it if | |
| you want. Using encryption requires more secrets to store outside of | |
| the backup system if you want to restore, and it does not provide real | |
| benefits because the repository is already encrypted. | |
| S3 was designed to be highly efficient for retrieving / storage data, | |
| but it is not a competitor to POSIX file systems. A bucket can be | |
| public or private, you can host your website in a public bucket (and it | |
| is rather common!). A bucket has permissions associated to it, you | |
| certainly do not want to allow random people to put files in your | |
| public bucket (or list the files), but you need to be able to do so. | |
| The protocol designed around S3 was reused for what we call | |
| "S3-compatible" services on which you can directly plug any | |
| "S3-compatible" client, so you are not stuck with AWS. | |
| This blog post exists because I wanted to share a cool S3 feature (not | |
| really S3 specific, but almost everyone implemented this feature) that | |
| goes well with backups: a bucket can be versioned. So, every change | |
| happening on a bucket can be reverted. Now, think about an attacker | |
| escalating to root privileges, they can access the backup repository | |
| and delete all the files there, then destroy the server. With a backup | |
| on a versioned S3 storage, you could revert your bucket just before the | |
| deletion happened and recover your backup. In order to prevent this, | |
| the attacker should also get access to the S3 storage credentials, | |
| which is different from the credentials required to use the bucket. | |
| Finally, restic supports S3 as a backend, and this is what we want. | |
| ## Open source S3-compatible storage implementations | |
| There is a list of open source and free S3-compatible storage, I played | |
| with them all, and they have different goals and purposes, they all | |
| worked well enough for me: | |
| Seaweedfs GitHub project page | |
| Garage official project page | |
| Minio official project page | |
| A quick note about those: | |
| * I consider seaweedfs to be the Swiss army knife of storage, you can | |
| mix multiple storage backends and expose them over different protocols | |
| (like S3, HTTP, WebDAV), it can also replicate data over remote | |
| instances. You can do tiering (based on last access time or speed) as | |
| well. | |
| * Garage is a relatively new project, it is quite bare bone in terms of | |
| features, but it works fine and support high availability with multiple | |
| instances, it only offers S3. | |
| * Minio is the big player, it has a paid version (which is extremely | |
| expensive) although the free version should be good enough for most | |
| users. | |
| # Configure your S3 | |
| You need to pick a S3 provider, you can self-host it or use a paid | |
| service, it is up to you. I like backblaze as it is super cheap, with | |
| $6/TB/month, but I also have a local minio instance for some needs. | |
| Create a bucket, enable the versioning on it and define the data | |
| retention, for the current scenario I think a few days is enough. | |
| Create an application key for your restic client with the following | |
| permissions: "GetObject", "PutObject", "DeleteObject", | |
| "GetBucketLocation", "ListBucket", the names can change, but it needs | |
| to be able to put/delete/list data in the bucket (and only this | |
| bucket!). After this process done, you will get a pair of values: an | |
| identifier and a secret key | |
| Now, you will have to provide the following environment variables to | |
| restic when it runs: | |
| * `AWS_DEFAULT_REGION` which contains the region of the S3 storage, | |
| this information is given when you configure the bucket. | |
| * `AWS_ACCESS_KEY` which contains the access key generated when you | |
| created the application key. | |
| * `AWS_SECRET_ACCESS_KEY` which contains the secret key generated when | |
| you created the application key. | |
| * `RESTIC_REPOSITORY` which will look like | |
| `s3:https://$ENDPOINT/$BUCKET` with $ENDPOINT being the bucket endpoint | |
| address and $BUCKET the bucket name. | |
| * `RESTIC_PASSWORD` which contains your backup repository passphrase to | |
| encrypt it, make sure to write it down somewhere else because you need | |
| it to recover the backup. | |
| If you want a simple script to backup some directories, and remove old | |
| data after a retention of 5 hourly, 2 daily, 2 weekly and 2 monthly | |
| backups: | |
| ``` | |
| restic backup -x /home /etc /root /var | |
| restic forget --prune -H 5 -d 2 -w 2 -m 2 | |
| ``` | |
| Do not forget to run `restic init` the first time, to initialize the | |
| restic repository. | |
| # Conclusion | |
| I really like this backup system as it is cheap, very efficient and | |
| provides a fallback in case of a problem with the repository (mistakes | |
| happen, there is not always need for an attacker to lose data ^_^'). | |
| If you do not want to use S3 backends, you need to know Borg backup and | |
| Restic both support an "append-only" method, which prevents an attacker | |
| from doing damages or even read the backup, but I always found the use | |
| to be hard, and you need to have another system to do the prune/cleanup | |
| on a regular basis. | |
| # Going further | |
| This approach could work on any backend supporting snapshots, like | |
| BTRFS or ZFS. If you can recover the backup repository to a previous | |
| point in time, you will be able to access to the working backup | |
| repository. | |
| You could also do a backup of the backup repository, on the backend | |
| side, but you would waste a lot of disk space. |