Title: Securing backups using S3 storage | |
Author: Solène | |
Date: 19 October 2024 | |
Tags: security network backup | |
Description: In this guide, you will learn how S3 storage can help you | |
securing your backups | |
# Introduction | |
In this blog post, you will learn how to make secure backups using | |
Restic and a S3 compatible object storage. | |
Backups are incredibly important, you may lose important files that | |
only existed on your computer, you may lose access to some encrypted | |
accounts or drives, when you need backups, you need them to be reliable | |
and secure. | |
There are two methods to handle backups: | |
* pull backups: a central server connects to the system and pulls data | |
to store it locally, this is how rsnapshot, backuppc or bacula work | |
* push backups: each system run the backup software locally to store it | |
on the backup repository (either locally or remotely), this is how most | |
backups tool work | |
Both workflows have pros and cons. The pull backups are not encrypted, | |
and a single central server owns everything, this is rather bad from a | |
security point of view. While push backups handle all encryption and | |
accesses to the system where it runs, an attacker could destroy the | |
backup using the backup tool. | |
I will explain how to leverage S3 features to protect your backups from | |
an attacker. | |
# Quick intro to object storage | |
S3 is the name of an AWS service used for Object Storage. Basically, | |
it is a huge key-value store in which you can put data and retrieve it, | |
there are very little metadata associated with an object. Objects are | |
all stored in a "bucket", they have a path, and you can organize the | |
bucket with directories and subdirectories. | |
Buckets can be encrypted, which is an important feature if you do not | |
want your S3 provider to be able to access your data, however most | |
backup tools already encrypt their repository, so it is not really | |
useful to add encryption to the bucket. I will not explain how to use | |
encryption in the bucket in this guide, although you can enable it if | |
you want. Using encryption requires more secrets to store outside of | |
the backup system if you want to restore, and it does not provide real | |
benefits because the repository is already encrypted. | |
S3 was designed to be highly efficient for retrieving / storage data, | |
but it is not a competitor to POSIX file systems. A bucket can be | |
public or private, you can host your website in a public bucket (and it | |
is rather common!). A bucket has permissions associated to it, you | |
certainly do not want to allow random people to put files in your | |
public bucket (or list the files), but you need to be able to do so. | |
The protocol designed around S3 was reused for what we call | |
"S3-compatible" services on which you can directly plug any | |
"S3-compatible" client, so you are not stuck with AWS. | |
This blog post exists because I wanted to share a cool S3 feature (not | |
really S3 specific, but almost everyone implemented this feature) that | |
goes well with backups: a bucket can be versioned. So, every change | |
happening on a bucket can be reverted. Now, think about an attacker | |
escalating to root privileges, they can access the backup repository | |
and delete all the files there, then destroy the server. With a backup | |
on a versioned S3 storage, you could revert your bucket just before the | |
deletion happened and recover your backup. In order to prevent this, | |
the attacker should also get access to the S3 storage credentials, | |
which is different from the credentials required to use the bucket. | |
Finally, restic supports S3 as a backend, and this is what we want. | |
## Open source S3-compatible storage implementations | |
There is a list of open source and free S3-compatible storage, I played | |
with them all, and they have different goals and purposes, they all | |
worked well enough for me: | |
Seaweedfs GitHub project page | |
Garage official project page | |
Minio official project page | |
A quick note about those: | |
* I consider seaweedfs to be the Swiss army knife of storage, you can | |
mix multiple storage backends and expose them over different protocols | |
(like S3, HTTP, WebDAV), it can also replicate data over remote | |
instances. You can do tiering (based on last access time or speed) as | |
well. | |
* Garage is a relatively new project, it is quite bare bone in terms of | |
features, but it works fine and support high availability with multiple | |
instances, it only offers S3. | |
* Minio is the big player, it has a paid version (which is extremely | |
expensive) although the free version should be good enough for most | |
users. | |
# Configure your S3 | |
You need to pick a S3 provider, you can self-host it or use a paid | |
service, it is up to you. I like backblaze as it is super cheap, with | |
$6/TB/month, but I also have a local minio instance for some needs. | |
Create a bucket, enable the versioning on it and define the data | |
retention, for the current scenario I think a few days is enough. | |
Create an application key for your restic client with the following | |
permissions: "GetObject", "PutObject", "DeleteObject", | |
"GetBucketLocation", "ListBucket", the names can change, but it needs | |
to be able to put/delete/list data in the bucket (and only this | |
bucket!). After this process done, you will get a pair of values: an | |
identifier and a secret key | |
Now, you will have to provide the following environment variables to | |
restic when it runs: | |
* `AWS_DEFAULT_REGION` which contains the region of the S3 storage, | |
this information is given when you configure the bucket. | |
* `AWS_ACCESS_KEY` which contains the access key generated when you | |
created the application key. | |
* `AWS_SECRET_ACCESS_KEY` which contains the secret key generated when | |
you created the application key. | |
* `RESTIC_REPOSITORY` which will look like | |
`s3:https://$ENDPOINT/$BUCKET` with $ENDPOINT being the bucket endpoint | |
address and $BUCKET the bucket name. | |
* `RESTIC_PASSWORD` which contains your backup repository passphrase to | |
encrypt it, make sure to write it down somewhere else because you need | |
it to recover the backup. | |
If you want a simple script to backup some directories, and remove old | |
data after a retention of 5 hourly, 2 daily, 2 weekly and 2 monthly | |
backups: | |
``` | |
restic backup -x /home /etc /root /var | |
restic forget --prune -H 5 -d 2 -w 2 -m 2 | |
``` | |
Do not forget to run `restic init` the first time, to initialize the | |
restic repository. | |
# Conclusion | |
I really like this backup system as it is cheap, very efficient and | |
provides a fallback in case of a problem with the repository (mistakes | |
happen, there is not always need for an attacker to lose data ^_^'). | |
If you do not want to use S3 backends, you need to know Borg backup and | |
Restic both support an "append-only" method, which prevents an attacker | |
from doing damages or even read the backup, but I always found the use | |
to be hard, and you need to have another system to do the prune/cleanup | |
on a regular basis. | |
# Going further | |
This approach could work on any backend supporting snapshots, like | |
BTRFS or ZFS. If you can recover the backup repository to a previous | |
point in time, you will be able to access to the working backup | |
repository. | |
You could also do a backup of the backup repository, on the backend | |
side, but you would waste a lot of disk space. |