Title: Securing backups using S3 storage

	Title: Securing backups using S3 storage
	Author: Solène
	Date: 19 October 2024
	Tags: security network backup
	Description: In this guide, you will learn how S3 storage can help you
	securing your backups

	# Introduction

	In this blog post, you will learn how to make secure backups using
	Restic and a S3 compatible object storage.

	Backups are incredibly important, you may lose important files that
	only existed on your computer, you may lose access to some encrypted
	accounts or drives, when you need backups, you need them to be reliable
	and secure.

	There are two methods to handle backups:

	* pull backups: a central server connects to the system and pulls data
	to store it locally, this is how rsnapshot, backuppc or bacula work
	* push backups: each system run the backup software locally to store it
	on the backup repository (either locally or remotely), this is how most
	backups tool work

	Both workflows have pros and cons. The pull backups are not encrypted,
	and a single central server owns everything, this is rather bad from a
	security point of view. While push backups handle all encryption and
	accesses to the system where it runs, an attacker could destroy the
	backup using the backup tool.

	I will explain how to leverage S3 features to protect your backups from
	an attacker.

	# Quick intro to object storage

	S3 is the name of an AWS service used for Object Storage. Basically,
	it is a huge key-value store in which you can put data and retrieve it,
	there are very little metadata associated with an object. Objects are
	all stored in a "bucket", they have a path, and you can organize the
	bucket with directories and subdirectories.

	Buckets can be encrypted, which is an important feature if you do not
	want your S3 provider to be able to access your data, however most
	backup tools already encrypt their repository, so it is not really
	useful to add encryption to the bucket. I will not explain how to use
	encryption in the bucket in this guide, although you can enable it if
	you want. Using encryption requires more secrets to store outside of
	the backup system if you want to restore, and it does not provide real
	benefits because the repository is already encrypted.

	S3 was designed to be highly efficient for retrieving / storage data,
	but it is not a competitor to POSIX file systems. A bucket can be
	public or private, you can host your website in a public bucket (and it
	is rather common!). A bucket has permissions associated to it, you
	certainly do not want to allow random people to put files in your
	public bucket (or list the files), but you need to be able to do so.

	The protocol designed around S3 was reused for what we call
	"S3-compatible" services on which you can directly plug any
	"S3-compatible" client, so you are not stuck with AWS.

	This blog post exists because I wanted to share a cool S3 feature (not
	really S3 specific, but almost everyone implemented this feature) that
	goes well with backups: a bucket can be versioned. So, every change
	happening on a bucket can be reverted. Now, think about an attacker
	escalating to root privileges, they can access the backup repository
	and delete all the files there, then destroy the server. With a backup
	on a versioned S3 storage, you could revert your bucket just before the
	deletion happened and recover your backup. In order to prevent this,
	the attacker should also get access to the S3 storage credentials,
	which is different from the credentials required to use the bucket.

	Finally, restic supports S3 as a backend, and this is what we want.

	## Open source S3-compatible storage implementations

	There is a list of open source and free S3-compatible storage, I played
	with them all, and they have different goals and purposes, they all
	worked well enough for me:

	Seaweedfs GitHub project page
	Garage official project page
	Minio official project page

	A quick note about those:

	* I consider seaweedfs to be the Swiss army knife of storage, you can
	mix multiple storage backends and expose them over different protocols
	(like S3, HTTP, WebDAV), it can also replicate data over remote
	instances. You can do tiering (based on last access time or speed) as
	well.
	* Garage is a relatively new project, it is quite bare bone in terms of
	features, but it works fine and support high availability with multiple
	instances, it only offers S3.
	* Minio is the big player, it has a paid version (which is extremely
	expensive) although the free version should be good enough for most
	users.

	# Configure your S3

	You need to pick a S3 provider, you can self-host it or use a paid
	service, it is up to you. I like backblaze as it is super cheap, with
	$6/TB/month, but I also have a local minio instance for some needs.

	Create a bucket, enable the versioning on it and define the data
	retention, for the current scenario I think a few days is enough.

	Create an application key for your restic client with the following
	permissions: "GetObject", "PutObject", "DeleteObject",
	"GetBucketLocation", "ListBucket", the names can change, but it needs
	to be able to put/delete/list data in the bucket (and only this
	bucket!). After this process done, you will get a pair of values: an
	identifier and a secret key

	Now, you will have to provide the following environment variables to
	restic when it runs:

	* `AWS_DEFAULT_REGION` which contains the region of the S3 storage,
	this information is given when you configure the bucket.
	* `AWS_ACCESS_KEY` which contains the access key generated when you
	created the application key.
	* `AWS_SECRET_ACCESS_KEY` which contains the secret key generated when
	you created the application key.
	* `RESTIC_REPOSITORY` which will look like
	`s3:https://$ENDPOINT/$BUCKET` with $ENDPOINT being the bucket endpoint
	address and $BUCKET the bucket name.
	* `RESTIC_PASSWORD` which contains your backup repository passphrase to
	encrypt it, make sure to write it down somewhere else because you need
	it to recover the backup.

	If you want a simple script to backup some directories, and remove old
	data after a retention of 5 hourly, 2 daily, 2 weekly and 2 monthly
	backups:

	```
	restic backup -x /home /etc /root /var
	restic forget --prune -H 5 -d 2 -w 2 -m 2
	```

	Do not forget to run `restic init` the first time, to initialize the
	restic repository.

	# Conclusion

	I really like this backup system as it is cheap, very efficient and
	provides a fallback in case of a problem with the repository (mistakes
	happen, there is not always need for an attacker to lose data ^_^').

	If you do not want to use S3 backends, you need to know Borg backup and
	Restic both support an "append-only" method, which prevents an attacker
	from doing damages or even read the backup, but I always found the use
	to be hard, and you need to have another system to do the prune/cleanup
	on a regular basis.

	# Going further

	This approach could work on any backend supporting snapshots, like
	BTRFS or ZFS. If you can recover the backup repository to a previous
	point in time, you will be able to access to the working backup
	repository.

	You could also do a backup of the backup repository, on the backend
	side, but you would waste a lot of disk space.