Title: BTRFS deduplication using bees | |
Author: Solène | |
Date: 16 August 2022 | |
Tags: nixos btrfs linux | |
Description: This explains how to use bees to enable offline | |
deduplication on a BTRFS file system | |
# Introduction | |
BTRFS is a Linux file system that uses a Copy On Write (COW) model. It | |
is providing many features like on the fly compression, volumes | |
management, snapshots and clones etc... | |
Wikipedia page about Copy on write | |
However, BTRFS doesn't natively support deduplication, which is a | |
feature that looks for chunks in files to see if another file share | |
that block, if so, only one chunk of data can be used for both files. | |
In some scenarios, this can drastically reduce the disk space usage. | |
This is where we can use "bees", a program that can do offline | |
deduplication for BTRFS file systems. In this context, offline means | |
it's done when you run a command, while it could be live/on the fly | |
where deduplication is instantly applied. HAMMER file system from | |
DragonFly BSD is doing offline deduplication, while ZFS is doing it | |
live. There are pros and cons for both models, ZFS documentation | |
recommends 1 GB of memory per Terabyte of disk when deduplication is | |
enabled, because it requires to have all chunks hashes in memory. | |
Bees GitHub page project | |
# Usage | |
Bees is a service you need to install and start on your system, it has | |
some limitations and caveats documented, but it should work for most | |
users. | |
You can define a BTRFS file system on which you want deduplication and | |
a load target. Bees will work silently when your system is below the | |
load threshold, and will stop when the load exceeds the limit, this is | |
a simple mechanism to prevent bees to eat all your system resources | |
after some freshly modified/created files need to be scanned. | |
First time you run bees on a file system that is not empty, it may take | |
a while to scan everything, but then it's really quiet except if you do | |
heavy I/O operation like downloading big files, but it's doing a good | |
job at staying behind the scene. | |
# Installation on NixOS | |
Add this code to /etc/nixos/configuration.nix and run "nixos-rebuild | |
switch" to apply the changes. | |
``` | |
services.beesd.filesystems = { | |
root = { | |
spec = "LABEL=nixos"; | |
hashTableSizeMB = 256; | |
verbosity = "crit"; | |
extraOptions = [ "--loadavg-target" "2.0" ]; | |
}; | |
}; | |
``` | |
The code suppose your root partition is labelled "nixos", you want a | |
hash table of 256 MB (this will be used by bees) and you don't want | |
bees to run when the system load is more than 2.0. | |
You may want to tune the values, mostly the hash size, depending on | |
your file system size. Bees is for terabytes file systems, but this | |
doesn't mean you can use it for the average user disks. | |
# Results | |
I tried on my workstation with a lot of build artifacts and git | |
repositories, bees reduced the disk usage from 160 GB to 124 GB, so | |
it's a huge win here. | |
Later, I tried again on some Steam games with a few proton versions, it | |
didn't save much on the games but saved a lot on the proton | |
installations. | |
On my local cache server, it did save nothing, but is to be expected. | |
# Conclusion | |
BTRFS is a solid alternative to ZFS, it requires less memory while | |
providing volumes, snapshots and compression. The only thing it needed | |
for me was deduplication, and I'm glad it's offline, so it doesn't use | |
too much memory. |