Title: My BTRFS cheatsheet

	Title: My BTRFS cheatsheet
	Author: Solène
	Date: 29 August 2022
	Tags: btrfs linux
	Description:

	# Introduction

	I recently switched my home "NAS" (single disk!) to BTRFS, it's a
	different ecosystem with many features and commands, so I had to write
	a bit about it to remember the various possibilities...

	BTRFS is an advanced file-system supported in Linux, it's somehow
	comparable to ZFS.

	# Layout

	A BTRFS file-system can be made of multiple disks and aggregated in
	mirror or "concatenated", it can be split into subvolumes which may
	have specific settings.

	Snapshots and quotas are applying on subvolumes, so it's important to
	think beforehand when creating BTRFS subvolumes, one may want to use a
	subvolume for /home and for /var for most cases.

	# Snapshots / Clones

	It's possible to take an instant snapshot of a subvolume, this can be
	used as a backup. Snapshots can be browsed like any other directory.
	They exist in two flavors: read-only and writable. ZFS users will
	recognize writable snapshots as "clones" and read-only as regular ZFS
	snapshots.

	Snapshots are an effective way to make a backup and rolling back
	changes in a second.

	# Send / Receive

	Raw filesystem can be sent / receive over network (or anything
	supporting a pipe) to allow incremental differences backup. This is a
	very effective way to do incremental backups without having to scan the
	entire file-system each time you run your backup.

	# Deduplication

	I covered deduplication with bees, but one can also use the program
	"duperemove" (works on XFS too!). They work a bit differently, but in
	the end they have the same purpose. Bees operates on the whole BTRFS
	file-system, duperemove operates on files, it's different use cases.

	duperemove GitHub project page
	Bees GitHub project page

	# Compression

	BTRFS supports on-the-fly compression per subvolume, meaning the
	content of each file is stored compressed, and decompressed on demand.
	Depending on the files, this can result in better performance because
	you would store less content on the disk, and it's less likely to be
	I/O bound, but also improve storage efficiency. This is really content
	dependent, you can't compress binary files like pictures/videos/music,
	but if you have a lot of text and sources files, you can achieve great
	ratios.

	From my experience, compression is always helpful for a regular user
	workload, and newer algorithm are smart enough to not compress binary
	data that wouldn't yield any benefit.

	There is a program named compsize that reports compression statistics
	for a file/directory. It's very handy to know if the compression is
	beneficial and to which extent.

	compsize GitHub project page

	# Defragmentation

	Fragmentation is a real thing and not specific to Windows, it matters a
	lot for mechanical hard drive but not really for SSDs.

	Fragmentation happens when you create files on your file-system, and
	delete them: this happens very often due to cache directories, updates
	and regular operations on a live file-system.

	When you delete a file, this creates a "hole" of free space, after some
	time, you may want to gather all these small parts of free space to
	have big chunks of free space, this matters for mechanical disks has
	the physical location of data is tied to the raw performance. The
	defragmentation process is just physically reorganizing data to order
	files chunks and free space into continuous blocks.

	Defragmentation can be used to force compression in a subvolume, like
	if you want to change the compression algorithm or enabled compression
	after saving the files.

	The command line is: btrfs filesystem defragment

	# Scrubbing

	The scrubbing feature is one of the most valuable feature provided by
	BTRFS and ZFS. Each file in these file-system is associated with its
	checksum in some metadata index, this mean you can actually check each
	file integrity by comparing its current content with the checksum known
	in the index.

	Scrubbing costs a lot of I/O and CPU because you need to compute the
	checksum of each file, but it's a guarantee for validating the stored
	data. In case of a corrupted file, if the file-system is composed of
	multiple disks (raid1 / raid5), it can be repaired from mirrored
	copies, it should work most of the time because such file corruption is
	often related to the drive itself, thus other drives shouldn't be
	affected.

	Scrubbing can be started / paused / resumed, this is handy if you need
	to operate heavy I/O and you don't want the scrubbing process to
	increase time. While the scrub commands can take a device or a path,
	the path parameter is only used to find the related file-system, it
	won't just scrub the files in that directory.

	The command line is: btrfs scrub

	# Rebalancing

	When you are aggregating multiple disks into one BTRFS file-system,
	files are written on a disk and some other files are written to the
	other, after a while, a disk may contain more data than the other.

	The rebalancing purpose is to redistribute data across the disks more
	evenly.

	# Swap file

	You can't create a swap file on a BTRFS disk without a tweak. You must
	create the file in a directory with the special attribute "no COW"
	using "chattr +C /tmp/some_directory", then you can move it anywhere as
	it will inherit the "no COW" flag.

	If you try to use a swap file with COW enabled on it, swapon will
	report a weird error, but you get more details in the dmesg output.

	# Converting

	It's possible to convert a ext2/3/4 file-system into BTRFS, obviously
	it must not be currently in use. The process can be rolled back until
	a certain point like defragmenting or rebalancing.