Introduction
Introduction Statistics Contact Development Disclaimer Help
Title: Securing backups using S3 storage
Author: Solène
Date: 19 October 2024
Tags: security network backup
Description: In this guide, you will learn how S3 storage can help you
securing your backups
# Introduction
In this blog post, you will learn how to make secure backups using
Restic and a S3 compatible object storage.
Backups are incredibly important, you may lose important files that
only existed on your computer, you may lose access to some encrypted
accounts or drives, when you need backups, you need them to be reliable
and secure.
There are two methods to handle backups:
* pull backups: a central server connects to the system and pulls data
to store it locally, this is how rsnapshot, backuppc or bacula work
* push backups: each system run the backup software locally to store it
on the backup repository (either locally or remotely), this is how most
backups tool work
Both workflows have pros and cons. The pull backups are not encrypted,
and a single central server owns everything, this is rather bad from a
security point of view. While push backups handle all encryption and
accesses to the system where it runs, an attacker could destroy the
backup using the backup tool.
I will explain how to leverage S3 features to protect your backups from
an attacker.
# Quick intro to object storage
S3 is the name of an AWS service used for Object Storage. Basically,
it is a huge key-value store in which you can put data and retrieve it,
there are very little metadata associated with an object. Objects are
all stored in a "bucket", they have a path, and you can organize the
bucket with directories and subdirectories.
Buckets can be encrypted, which is an important feature if you do not
want your S3 provider to be able to access your data, however most
backup tools already encrypt their repository, so it is not really
useful to add encryption to the bucket. I will not explain how to use
encryption in the bucket in this guide, although you can enable it if
you want. Using encryption requires more secrets to store outside of
the backup system if you want to restore, and it does not provide real
benefits because the repository is already encrypted.
S3 was designed to be highly efficient for retrieving / storage data,
but it is not a competitor to POSIX file systems. A bucket can be
public or private, you can host your website in a public bucket (and it
is rather common!). A bucket has permissions associated to it, you
certainly do not want to allow random people to put files in your
public bucket (or list the files), but you need to be able to do so.
The protocol designed around S3 was reused for what we call
"S3-compatible" services on which you can directly plug any
"S3-compatible" client, so you are not stuck with AWS.
This blog post exists because I wanted to share a cool S3 feature (not
really S3 specific, but almost everyone implemented this feature) that
goes well with backups: a bucket can be versioned. So, every change
happening on a bucket can be reverted. Now, think about an attacker
escalating to root privileges, they can access the backup repository
and delete all the files there, then destroy the server. With a backup
on a versioned S3 storage, you could revert your bucket just before the
deletion happened and recover your backup. In order to prevent this,
the attacker should also get access to the S3 storage credentials,
which is different from the credentials required to use the bucket.
Finally, restic supports S3 as a backend, and this is what we want.
## Open source S3-compatible storage implementations
There is a list of open source and free S3-compatible storage, I played
with them all, and they have different goals and purposes, they all
worked well enough for me:
Seaweedfs GitHub project page
Garage official project page
Minio official project page
A quick note about those:
* I consider seaweedfs to be the Swiss army knife of storage, you can
mix multiple storage backends and expose them over different protocols
(like S3, HTTP, WebDAV), it can also replicate data over remote
instances. You can do tiering (based on last access time or speed) as
well.
* Garage is a relatively new project, it is quite bare bone in terms of
features, but it works fine and support high availability with multiple
instances, it only offers S3.
* Minio is the big player, it has a paid version (which is extremely
expensive) although the free version should be good enough for most
users.
# Configure your S3
You need to pick a S3 provider, you can self-host it or use a paid
service, it is up to you. I like backblaze as it is super cheap, with
$6/TB/month, but I also have a local minio instance for some needs.
Create a bucket, enable the versioning on it and define the data
retention, for the current scenario I think a few days is enough.
Create an application key for your restic client with the following
permissions: "GetObject", "PutObject", "DeleteObject",
"GetBucketLocation", "ListBucket", the names can change, but it needs
to be able to put/delete/list data in the bucket (and only this
bucket!). After this process done, you will get a pair of values: an
identifier and a secret key
Now, you will have to provide the following environment variables to
restic when it runs:
* `AWS_DEFAULT_REGION` which contains the region of the S3 storage,
this information is given when you configure the bucket.
* `AWS_ACCESS_KEY` which contains the access key generated when you
created the application key.
* `AWS_SECRET_ACCESS_KEY` which contains the secret key generated when
you created the application key.
* `RESTIC_REPOSITORY` which will look like
`s3:https://$ENDPOINT/$BUCKET` with $ENDPOINT being the bucket endpoint
address and $BUCKET the bucket name.
* `RESTIC_PASSWORD` which contains your backup repository passphrase to
encrypt it, make sure to write it down somewhere else because you need
it to recover the backup.
If you want a simple script to backup some directories, and remove old
data after a retention of 5 hourly, 2 daily, 2 weekly and 2 monthly
backups:
```
restic backup -x /home /etc /root /var
restic forget --prune -H 5 -d 2 -w 2 -m 2
```
Do not forget to run `restic init` the first time, to initialize the
restic repository.
# Conclusion
I really like this backup system as it is cheap, very efficient and
provides a fallback in case of a problem with the repository (mistakes
happen, there is not always need for an attacker to lose data ^_^').
If you do not want to use S3 backends, you need to know Borg backup and
Restic both support an "append-only" method, which prevents an attacker
from doing damages or even read the backup, but I always found the use
to be hard, and you need to have another system to do the prune/cleanup
on a regular basis.
# Going further
This approach could work on any backend supporting snapshots, like
BTRFS or ZFS. If you can recover the backup repository to a previous
point in time, you will be able to access to the working backup
repository.
You could also do a backup of the backup repository, on the backend
side, but you would waste a lot of disk space.
You are viewing proxied material from dataswamp.org. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.