Title: Introduction to git-annex (Port Of The Week) | |
Author: Solène | |
Date: 12 May 2021 | |
Tags: git versioning openbsd | |
Description: | |
# Introduction | |
Now that git-annex is available as a package on OpenBSD I can use it | |
again. I've been relying on it a few years ago but it was really | |
complicated for me to compile it and I gave up. Since I really missed | |
it, I'm now back to it and I think it's time to share about this | |
wonderful piece of software. | |
git-annex is meant to help you manage your data like you would manage | |
books in a library, you have a database telling you where the books are | |
and you can find them on the shelves, or at least you can know who | |
borrowed the book. We are working with digital files that can be | |
copied here so the analogy doesn't fully work, but you could want to | |
put your data in an external hard drive but not everything, and you may | |
want to have some data on multiples devices for safety reasons, | |
git-annex automates this. | |
It works very well for files that are not changing much, I call them | |
"static files", they are music, videos, pictures, documents. You don't | |
really want to use git-annex with files you edit everyday, it doesn't | |
work well because the process can be a bit tedious. | |
git-annex may not be easy to understand at first, I suggest you try | |
locally to grasp its purpose. | |
git-annex official website | |
what git-annex is not | |
# Cheat sheet | |
Let's create a cheat sheet first. Most git-annex commands have a | |
dedicated man page, but can also provide a simpler help by using "git | |
annex help somecommand". | |
## Create the repository | |
The first step is to create a repository which is based on git, then we | |
will tell git-annex to init it too. | |
```command line example | |
mkdir ~/MyDataLibrary && cd ~/MyDataLibrary | |
git init | |
git annex init "my-computer" | |
``` | |
## Add a file | |
When you want to register a file in git annex, you need to use "git | |
annex add" to add it and then "git commit" to make it permanent. The | |
files are not stored in the git repository, it will only contains | |
metadata. | |
```command line example | |
git annex add Something | |
git commit -m "I added something" | |
``` | |
Example: | |
```command line example | |
$ echo "hello there" > hello | |
$ ls -l hello | |
-rw-r--r-- 1 solene wheel 12 May 12 18:38 hello | |
$ git annex add hello | |
add hello | |
ok | |
(recording state in git...) | |
$ ls -l hello | |
lrwxr-xr-x 1 solene wheel 180 May 12 18:38 hello -> .git/annex/objects/qj/g5… | |
$ git status hello | |
On branch master | |
Changes to be committed: | |
(use "git restore --staged <file>..." to unstage) | |
new file: hello | |
``` | |
## Make changes to a file | |
If you want to make changes to a file, you first need to "unlock" it in | |
git-annex, which mean the symbolic link is replaced by the file itself | |
and is no longer in read-only. Then, after your changes, you need to | |
add it again to git-annex and commit your changes. | |
```command line example | |
git annex unlock file | |
vi file | |
git annex add file | |
git commit -m "I changed something" file | |
``` | |
## Add a remote encrypted repository | |
If you want to store data (for duplication) on a remote server using | |
ssh you can use a remote of type "rsync" and encrypt the data in many | |
fashions (GPG with hybrid is the best). This will allow to store data | |
on remote untrusted devices. | |
```command line example | |
git annex initremote my-remote-server type=rsync rsyncurl=remote-server.com:/ho… | |
``` | |
After this command, I can send files to my-remote-server. | |
git-annex website about encryption | |
git-annex website about special remotes | |
## Manage data from multiple computers (with ssh) | |
**This is a way to have a central git repository for many computers, | |
this is not the best way to store data on remote servers**. | |
If you want to use a remote server through ssh, there are two ways: | |
mounting the remote file system using sshfs or use a plain ssh. If you | |
use sshfs, then it falls as a standard local file system like an | |
external usb drive, but if you go through ssh, it's different. | |
You need to have a key authentication based for the remote ssh and you | |
also need git-annex on the remote server. It's important to have a | |
bare git repo. | |
```command line example | |
cd /home/data/ | |
git init --bare | |
git annex init "remote-server" | |
``` | |
On your computer: | |
```command line example | |
git remote add remote-server ssh://hostname:/home/data/ | |
git fetch remote-server | |
``` | |
You will be able to use commands related to repositories now! | |
## List files and where they are stored | |
You can use the "git annex list" command to list where your files are | |
physically stored. | |
In the following example you can see which files are on my computer and | |
which are available on my remote server called "network", "web" and | |
"bittorrent" are special remotes. | |
```command line example | |
here | |
|network | |
||web | |
|||bittorrent | |
|||| | |
X___ Documentation/Nim/Dominik Picheta - Nim in Action-Manning Publications (20… | |
X___ Documentation/ada/Ada-Distilled-24-January-2011-Ada-2005-Version.pdf | |
X___ Documentation/ada/courseada1.pdf | |
X___ Documentation/ada/courseada2.pdf | |
X___ Documentation/ada/courseada3.pdf | |
X___ Documentation/scheme/artanis.pdf | |
X___ Documentation/scheme/guix.pdf | |
X___ Documentation/scheme/manual_guix.pdf | |
X___ Documentation/skribilo/skribilo.pdf | |
X___ Documentation/uck2ep1.pdf | |
X___ Documentation/uck2ep2.pdf | |
X___ Documentation/usingckermit3e.pdf | |
XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/01 - Daftendirekt.flac | |
XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/02 - Wdpk 83.7 fm.flac | |
XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/03 - Revolution 909.flac | |
XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/04 - Da Funk.flac | |
XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/05 - Phoenix.flac | |
_X__ Musique/Alan Walker/Alan Walker - Different World/01 - Alan Walker - Intro… | |
_X__ Musique/Alan Walker/Alan Walker - Different World/02 - Alan Walker, Sorana… | |
_X__ Musique/Alan Walker/Alan Walker - Different World/03 - Alan Walker, Julie … | |
``` | |
## List files locally available | |
If you want to list the files for which you have the content available | |
locally, you can use the "list" command from git-annex but only | |
restrict to the group "here" representing your local repository. | |
```command line example | |
git annex list --in here | |
``` | |
# Work with a remote repository | |
## Delete a repository | |
Simply mark it as "dead". | |
```command line example | |
git annex dead $repo_name | |
``` | |
## Adding a remote repository GPG encrypted | |
```command line example | |
git annex initremote $name type=rsync rsyncurl=remote-server:/home/solene/mydir… | |
``` | |
## Copy files to a remote | |
If you want to duplicate files between repositories to have multiples | |
copies you can use "git annex copy". | |
```command line example | |
git annex copy Music -t remote-server | |
``` | |
## Move files to a remote | |
If you want to move files from a repository to another (removing the | |
content from origin) you can use "git annex move" which will copy to | |
destination and remove from origin. | |
```command line example | |
git annex move Music -t remote-server | |
``` | |
## Get a file content | |
If you don't have a file locally, you can fetch it from a remote to get | |
the content. | |
```command line example | |
git annex get Music/Queen | |
``` | |
## Forget a file locally | |
If you don't want to have the file locally because you don't have disk | |
space or you simply don't want it, you can use the "drop" command. | |
Note that "drop" is safe because git-annex won't allow you to drop | |
files that have only one copy (except if you use --force of course). | |
```command line example | |
git annex drop Music/Queen | |
``` | |
Real life example: I have a very huge music library but my laptop SSD | |
is too small, I get get some music I want and drop the files I don't | |
want to listen for a while. | |
## Use mincopies to enforce multi repository data duplication | |
The numcopies and mincopies variables can be used to tell git-annex you | |
want exactly or at least "n" copies of the files, so it will be able to | |
protect you from accidental deletions and also help uploading files to | |
other repositories to match the requirements. | |
### Enable per directory recursively | |
```command line example | |
echo "* annex.mincopies=2" > .gitattributes | |
``` | |
### Only upload files not matching the num copies | |
If you have multiples repositories and some files doesn't match the | |
copies requirements, you can use the following commands to only push | |
the files missing copies. | |
```command line example | |
git annex copy --auto -t remote-server | |
``` | |
Real life example: I want my salaries PDF to be really safe, I can ask | |
to have 2 copies of those and then run a sync to the remote server | |
which will proceed to upload them if there is only one copy of the file | |
yet. | |
## Verifying integrity and requirements | |
There is the git-annex fsck command which will check the integrity of | |
every file in the local repository and reports you if they are sane (or | |
not), but it will also tell you which file doesn't meet the mincopies | |
requirements. | |
```command line example | |
git annex fsck | |
``` | |
# Reversibility | |
If for some reasons you want to give up git-annex, you can easily get | |
all your files back like a normal file system by using "git annex | |
unlock ." on the top directory of your repository, every local files | |
will be replaced by their physical copy instead of the symlink. | |
Reversibility is very important when you deal with your data because it | |
means you are not stuck forever with a tool in case it's broken or if | |
you want to switch to another process. | |
# My workflow | |
I have a ~/DATA/ directory in which I have sub directories | |
{documents,documentation,pictures,videos,music,images}, documents are | |
papers or legal papers, documentation are mostly PDF. Pictures are | |
family pictures and images are wallpapers or stupid images I want to | |
keep. | |
I've set a mincopies to 2 for documents and pictures and my music is | |
not on my computer but on a remote, I get the music files I want to | |
listen when I'm on the local network with the computer having the | |
files, I drop them locally when I'm bored. | |
# Conclusion | |
git-annex separates content from indexation, it can be used in many | |
ways but it implies an archivist philosophy: redundancy, safety, | |
immutability (sort of). It is not meant for backup, you can backup | |
your directory managed by git-annex, it will save the data you have | |
locally, you will have to make backup of your other data as well. | |
I love that tool, it's a very nice piece of software. It's unique, I | |
didn't find any other program to achieve this. | |
## More resources | |
git-annex official walkthrough | |
git-annex special remotes (S3, webdav, bittorrent etc..) | |
git-annex encryption |