Title: How to check your data integrity? | |
Author: Solène | |
Date: 17 March 2017 | |
Tags: unix security | |
Description: | |
Today, the topic is data degradation, bit rot, birotting, damaged files | |
or whatever you call it. It's when your data get corrupted over the | |
time, due to disk fault or some unknown reason. | |
# What is data degradation ? # | |
I shamelessy paste one line from wikipedia: "*Data degradation is the | |
gradual corruption of computer data due to an accumulation of | |
non-critical failures in a data storage device. The phenomenon is also | |
known as data decay or data rot.*". | |
[Data degradation on | |
Wikipedia](https://en.wikipedia.org/wiki/Data_degradation) | |
So, how do we know we encounter a bit rot ? | |
bit rot = (checksum changed) && NOT (modification time changed) | |
While updating a file could be mistaken as bit rot, there is a | |
difference | |
update = (checksum changed) && (modification time changed) | |
# How to check if we encounter bitrot ? # | |
There is no way you can prevent bitrot. But there are some ways to | |
detect it, so you can restore a corrupted file from a backup, or | |
repair it with the right tool (you can't repair a file with a hammer, | |
except if it's some kind of HammerFS ! :D ) | |
In the following I will describe software I found to check (or even | |
repair) bitrot. If you know others tools which are not in this list, I | |
would be happy to hear about it, please mail me. | |
In the following examples, I will use this method to generate bitrot | |
on a file: | |
% touch -d "2017-03-16T21:04:00" | |
my_data/some_file_that_will_be_corrupted | |
% generate_checksum_database_with_tool | |
% echo "a" >> my_data/some_file_that_will_be_corrupted | |
% touch -d "2017-03-16T21:04:00" | |
my_data/some_file_that_will_be_corrupted | |
% start_tool_for_checking | |
We generate the checksum database, then we alter a file by adding a | |
"a" at the end of the file and we restore the modification and acess | |
time of the file. Then, we start the tool to check for data | |
corruption. | |
The first **touch** is only for convenience, we could get the | |
modification time with **stat** command and pass the same value to | |
touch after modification of the file. | |
## bitrot ## | |
This is a python script, it's **very** easy to use. I will scan a | |
directory and create a database with the checksum of the files and | |
their modification date. | |
**Initialization usage:** | |
% cd /home/my_data/ | |
% bitrot | |
Finished. 199.41 MiB of data read. 0 errors found. | |
189 entries in the database, 189 new, 0 updated, 0 renamed, 0 | |
missing. | |
Updating bitrot.sha512... done. | |
% echo $? | |
0 | |
**Verify usage (case OK):** | |
% cd /home/my_data/ | |
% bitrot | |
Checking bitrot.db integrity... ok. | |
Finished. 199.41 MiB of data read. 0 errors found. | |
189 entries in the database, 0 new, 0 updated, 0 renamed, 0 | |
missing. | |
% echo $? | |
0 | |
Exit status is 0, so our data are not damaged. | |
**Verify usage (case Error):** | |
% cd /home/my_data/ | |
% bitrot | |
Checking bitrot.db integrity... ok. | |
error: SHA1 mismatch for ./sometextfile.txt: expected | |
17b4d7bf382057dc3344ea230a595064b579396f, got | |
db4a8d7e27bb9ad02982c0686cab327b146ba80d. Last good hash checked on | |
2017-03-16 21:04:39. | |
Finished. 199.41 MiB of data read. 1 errors found. | |
189 entries in the database, 0 new, 0 updated, 0 renamed, 0 | |
missing. | |
error: There were 1 errors found. | |
% echo $? | |
1 | |
fails, it's easy to write a script running every day/week/month. | |
[Github page](https://github.com/ambv/bitrot/) | |
bitrot is available in OpenBSD ports in sysutils/bitrot since 6.1 | |
release. | |
## par2cmdline ## | |
This tool works with PAR2 archives (see below for more informations | |
about what PAR ) and from them, it will be able to check your data | |
integrity **AND** repair it. | |
While it has some pros like being able to repair data, the cons is | |
that it's not very easy to use. I would use this one for checking | |
integrity of long term archives that won't changes. The main drawback | |
comes from PAR specifications, the archives are created from a | |
filelist, if you have a directory with your files and you add new | |
files, you will need to recompute ALL the PAR archives because the | |
filelist changed, or create new PAR archives only for the new files, | |
but that will make the verify process more complicated. That doesn't | |
seems suitable to create new archives for every bunchs of files added | |
in the directory. | |
PAR2 let you choose the percent of a file you will be able to repair, | |
by default it will create the archives to be able to repair up to 5% | |
of each file. That means you don't need a whole backup for the files | |
(while it's would be a bad idea) and only an approximately extra of 5% | |
of your data to store. | |
**Create usage:** | |
% cd /home/ | |
% par2 create -a integrity_archive -R my_data | |
Skipping 0 byte file: /home/my_data/empty_file | |
Source file count: 17 | |
Source block count: 2000 | |
Redundancy: 5% | |
Recovery block count: 100 | |
Recovery file count: 7 | |
[text cut here] | |
Opening: my_data/[....] | |
Computing Reed Solomon matrix. | |
Constructing: done. | |
Wrote 381200 bytes to disk | |
Writing recovery packets | |
Writing verification packets | |
Done | |
% echo $? | |
0 | |
integrity_archive.par2 | |
integrity_archive.vol000+01.par2 | |
integrity_archive.vol001+02.par2 | |
integrity_archive.vol003+04.par2 | |
integrity_archive.vol007+08.par2 | |
integrity_archive.vol015+16.par2 | |
integrity_archive.vol031+32.par2 | |
integrity_archive.vol063+37.par2 | |
my_data | |
**Verify usage (OK):** | |
% par2 verify integrity_archive.par2 | |
Loading "integrity_archive.par2". | |
Loaded 36 new packets | |
Loading "integrity_archive.vol000+01.par2". | |
Loaded 1 new packets including 1 recovery blocks | |
Loading "integrity_archive.vol001+02.par2". | |
Loaded 2 new packets including 2 recovery blocks | |
Loading "integrity_archive.vol003+04.par2". | |
Loaded 4 new packets including 4 recovery blocks | |
Loading "integrity_archive.vol007+08.par2". | |
Loaded 8 new packets including 8 recovery blocks | |
Loading "integrity_archive.vol015+16.par2". | |
Loaded 16 new packets including 16 recovery blocks | |
Loading "integrity_archive.vol031+32.par2". | |
Loaded 32 new packets including 32 recovery blocks | |
Loading "integrity_archive.vol063+37.par2". | |
Loaded 37 new packets including 37 recovery blocks | |
Loading "integrity_archive.par2". | |
No new packets found | |
The block size used was 3812 bytes. | |
There are a total of 2000 data blocks. | |
The total size of the data files is 7595275 bytes. | |
[...cut here...] | |
Target: "my_data/....." - found. | |
% echo $? | |
0 | |
**Verify usage (with error):** | |
par2 verify integrity_archive.par.par2 | |
Loaded 36 new packets | |
Loading "integrity_archive.par.vol000+01.par2". | |
Loaded 1 new packets including 1 recovery blocks | |
Loading "integrity_archive.par.vol001+02.par2". | |
Loaded 2 new packets including 2 recovery blocks | |
Loading "integrity_archive.par.vol003+04.par2". | |
Loaded 4 new packets including 4 recovery blocks | |
Loading "integrity_archive.par.vol007+08.par2". | |
Loaded 8 new packets including 8 recovery blocks | |
Loading "integrity_archive.par.vol015+16.par2". | |
Loaded 16 new packets including 16 recovery blocks | |
Loading "integrity_archive.par.vol031+32.par2". | |
Loaded 32 new packets including 32 recovery blocks | |
Loading "integrity_archive.par.vol063+37.par2". | |
Loaded 37 new packets including 37 recovery blocks | |
Loading "integrity_archive.par.par2". | |
No new packets found | |
The block size used was 3812 bytes. | |
There are a total of 2000 data blocks. | |
The total size of the data files is 7595275 bytes. | |
[...cut here...] | |
Target: "my_data/....." - found. | |
Target: "my_data/Ebooks/Lovecraft/Quete Onirique de Kadath | |
l'Inconnue.epub" - damaged. Found 95 of 95 data blocks. | |
1 file(s) exist but are damaged. | |
16 file(s) are ok. | |
You have 2000 out of 2000 data blocks available. | |
You have 100 recovery blocks available. | |
Repair is possible. | |
You have an excess of 100 recovery blocks. | |
None of the recovery blocks will be used for the repair. | |
1 | |
% par2 repair integrity_archive.par.par2 | |
Loading "integrity_archive.par.par2". | |
Loaded 36 new packets | |
Loading "integrity_archive.par.vol000+01.par2". | |
Loaded 1 new packets including 1 recovery blocks | |
Loading "integrity_archive.par.vol001+02.par2". | |
Loaded 2 new packets including 2 recovery blocks | |
Loading "integrity_archive.par.vol003+04.par2". | |
Loaded 4 new packets including 4 recovery blocks | |
Loading "integrity_archive.par.vol007+08.par2". | |
Loaded 8 new packets including 8 recovery blocks | |
Loading "integrity_archive.par.vol015+16.par2". | |
Loaded 16 new packets including 16 recovery blocks | |
Loading "integrity_archive.par.vol031+32.par2". | |
Loaded 32 new packets including 32 recovery blocks | |
Loading "integrity_archive.par.vol063+37.par2". | |
Loaded 37 new packets including 37 recovery blocks | |
Loading "integrity_archive.par.par2". | |
No new packets found | |
The block size used was 3812 bytes. | |
There are a total of 2000 data blocks. | |
The total size of the data files is 7595275 bytes. | |
[...cut here...] | |
Target: "my_data/....." - found. | |
Target: "my_data/Ebooks/Lovecraft/Quete Onirique de Kadath | |
l'Inconnue.epub" - damaged. Found 95 of 95 data blocks. | |
1 file(s) exist but are damaged. | |
16 file(s) are ok. | |
You have 2000 out of 2000 data blocks available. | |
You have 100 recovery blocks available. | |
Repair is possible. | |
You have an excess of 100 recovery blocks. | |
None of the recovery blocks will be used for the repair. | |
l'Inconnue.epub" - found. | |
0 | |
working with PAR archives exists. They should be able to all works | |
with the same PAR files. | |
[Parchive on Wikipedia](https://en.wikipedia.org/wiki/Parchive) | |
[Github page](https://github.com/Parchive/par2cmdline) | |
par2cmdline is available in OpenBSD ports in archivers/par2cmdline. | |
If you find a way to add new files to existing archives, please mail | |
me. | |
## mtree ## | |
One can write a little script using **mtree** (in base system on | |
OpenBSD and FreeBSD) which will create a file with the checksum of | |
every files in the specified directories. If mtree output is different | |
since last time, we can send a mail with the difference. This is a | |
process done in base install of OpenBSD for /etc and some others files | |
to warn you if it changed. | |
While it's suited for directories like /etc, in my opinion, this is | |
not the best tool for doing integrity check. | |
## ZFS ## | |
I would like to talk about ZFS and data integrity because this is | |
where ZFS is very good. If you are using ZFS, you may not need any | |
other software to take care about your data. When you write a file, | |
ZFS will also store its checksum as metadata. By default, the option | |
"checksum" is activated on dataset, but you may want to disable it for | |
better performance. | |
There is a command to ask ZFS to check the integrity of the | |
files. Warning: scrub is very I/O intensive and can takes from hours | |
to days or even weeks to complete depending on your CPU, disks and the | |
amount of data to scrub: | |
# zpool scrub zpool | |
The scrub command will recompute the checksum of every file on the ZFS | |
pool, if something is wrong, it will try to repair it if possible. A | |
repair is possible in the following cases: | |
If you have multiple disks like raid-Z or raid-1 (mirror), ZFS will be | |
look on the differents disks if the non corrupted version of the file | |
exists, if it finds it, it will restore it on the disk(s) where it's | |
corrupted. | |
If you have set the ZFS option "copies" to 2 or 3 (1 = default), that | |
means that the file is written 2 or 3 time on the disk. Each file of | |
the dataset will be allocated 2 or 3 time on the disk, so take care if | |
you want to use it on a dataset containing heavy files ! If ZFS find | |
thats a version of a file is corrupted, it will check the others | |
copies of it and tries to restore the corrupted file is possible. | |
You can see the percentage of filesystem already scrubbed with | |
zfs status zpool | |
and the scrub can be stopped with | |
zfs scrub -s zpool | |
Like ZFS, BTRFS is able to scrub its data and report bit rot, and | |
repair | |
it if data is available in another disk. | |
To start a scrub, run: | |
btrfs scrub start / | |
You can check progress using: | |
btrfs scrub status / | |
It's possible to use `btrfs scrub cancel /` to stop a scrub, and resume | |
it later with `btrfs scrub resume /`, however btrfs tries its best to | |
scrub the data without affecting much the responsiveness of the system. | |
### AIDE ### | |
Its name is an acronym for "Advanced Intrusion Detection Environment", | |
it's an complicated software which can be used to check for bitrot. I | |
would not recommend using it if you only need bitrot detection. | |
Here is a few hints if you want to use it for checking your file | |
integrity: | |
**/etc/aide.conf** | |
/home/my_data/ R | |
# Rule definition | |
All=m+s+i+md5 | |
report_summarize_changes=yes | |
(R for recursive). "All" line list the checks we do on each file. For | |
bitrot checking, we want to check modification time, size, checksum | |
and inode of the files. The `report_summarize_change` displays a | |
list of changes if something is wrong. | |
This is the most basic config file you can have. Then you will have to | |
run **aide** to create the database and then run aide to create a new | |
database and compare the two databases. It doesn't update its database | |
itself, you will have to move the old database and tell it where to | |
found the older database. | |
# My use case # | |
I have different kind of data. On a side, I have static data like | |
pictures, clips, music or things that won't change over time and the | |
other side I have my mails, documents and folders where the content | |
changes regularly (creation, deletetion, modification). I am able to | |
afford a backup for 100% of my data with some history of the backup on | |
a few days, so I won't be interested about file repairing. | |
I want to be warned quickly if a file get corrupted, so I can still | |
get the backup in my history but I don't keep every versions of my | |
files for too long. I choose to go with the python tool **bitrot**, | |
it's very easy to use and it doesn't become a mess with my folders | |
getting updated often. | |
I would go with par2cmdline if I could not be able to backup all my | |
data. Having 5% or 10% of redundancy of my files *should* be enough to | |
restore it in case of corruption without taking too much space. |