| Title: Introduction to git-annex (Port Of The Week) | |
| Author: Solène | |
| Date: 12 May 2021 | |
| Tags: git versioning openbsd | |
| Description: | |
| # Introduction | |
| Now that git-annex is available as a package on OpenBSD I can use it | |
| again. I've been relying on it a few years ago but it was really | |
| complicated for me to compile it and I gave up. Since I really missed | |
| it, I'm now back to it and I think it's time to share about this | |
| wonderful piece of software. | |
| git-annex is meant to help you manage your data like you would manage | |
| books in a library, you have a database telling you where the books are | |
| and you can find them on the shelves, or at least you can know who | |
| borrowed the book. We are working with digital files that can be | |
| copied here so the analogy doesn't fully work, but you could want to | |
| put your data in an external hard drive but not everything, and you may | |
| want to have some data on multiples devices for safety reasons, | |
| git-annex automates this. | |
| It works very well for files that are not changing much, I call them | |
| "static files", they are music, videos, pictures, documents. You don't | |
| really want to use git-annex with files you edit everyday, it doesn't | |
| work well because the process can be a bit tedious. | |
| git-annex may not be easy to understand at first, I suggest you try | |
| locally to grasp its purpose. | |
| git-annex official website | |
| what git-annex is not | |
| # Cheat sheet | |
| Let's create a cheat sheet first. Most git-annex commands have a | |
| dedicated man page, but can also provide a simpler help by using "git | |
| annex help somecommand". | |
| ## Create the repository | |
| The first step is to create a repository which is based on git, then we | |
| will tell git-annex to init it too. | |
| ```command line example | |
| mkdir ~/MyDataLibrary && cd ~/MyDataLibrary | |
| git init | |
| git annex init "my-computer" | |
| ``` | |
| ## Add a file | |
| When you want to register a file in git annex, you need to use "git | |
| annex add" to add it and then "git commit" to make it permanent. The | |
| files are not stored in the git repository, it will only contains | |
| metadata. | |
| ```command line example | |
| git annex add Something | |
| git commit -m "I added something" | |
| ``` | |
| Example: | |
| ```command line example | |
| $ echo "hello there" > hello | |
| $ ls -l hello | |
| -rw-r--r-- 1 solene wheel 12 May 12 18:38 hello | |
| $ git annex add hello | |
| add hello | |
| ok | |
| (recording state in git...) | |
| $ ls -l hello | |
| lrwxr-xr-x 1 solene wheel 180 May 12 18:38 hello -> .git/annex/objects/qj/g5… | |
| $ git status hello | |
| On branch master | |
| Changes to be committed: | |
| (use "git restore --staged <file>..." to unstage) | |
| new file: hello | |
| ``` | |
| ## Make changes to a file | |
| If you want to make changes to a file, you first need to "unlock" it in | |
| git-annex, which mean the symbolic link is replaced by the file itself | |
| and is no longer in read-only. Then, after your changes, you need to | |
| add it again to git-annex and commit your changes. | |
| ```command line example | |
| git annex unlock file | |
| vi file | |
| git annex add file | |
| git commit -m "I changed something" file | |
| ``` | |
| ## Add a remote encrypted repository | |
| If you want to store data (for duplication) on a remote server using | |
| ssh you can use a remote of type "rsync" and encrypt the data in many | |
| fashions (GPG with hybrid is the best). This will allow to store data | |
| on remote untrusted devices. | |
| ```command line example | |
| git annex initremote my-remote-server type=rsync rsyncurl=remote-server.com:/ho… | |
| ``` | |
| After this command, I can send files to my-remote-server. | |
| git-annex website about encryption | |
| git-annex website about special remotes | |
| ## Manage data from multiple computers (with ssh) | |
| **This is a way to have a central git repository for many computers, | |
| this is not the best way to store data on remote servers**. | |
| If you want to use a remote server through ssh, there are two ways: | |
| mounting the remote file system using sshfs or use a plain ssh. If you | |
| use sshfs, then it falls as a standard local file system like an | |
| external usb drive, but if you go through ssh, it's different. | |
| You need to have a key authentication based for the remote ssh and you | |
| also need git-annex on the remote server. It's important to have a | |
| bare git repo. | |
| ```command line example | |
| cd /home/data/ | |
| git init --bare | |
| git annex init "remote-server" | |
| ``` | |
| On your computer: | |
| ```command line example | |
| git remote add remote-server ssh://hostname:/home/data/ | |
| git fetch remote-server | |
| ``` | |
| You will be able to use commands related to repositories now! | |
| ## List files and where they are stored | |
| You can use the "git annex list" command to list where your files are | |
| physically stored. | |
| In the following example you can see which files are on my computer and | |
| which are available on my remote server called "network", "web" and | |
| "bittorrent" are special remotes. | |
| ```command line example | |
| here | |
| |network | |
| ||web | |
| |||bittorrent | |
| |||| | |
| X___ Documentation/Nim/Dominik Picheta - Nim in Action-Manning Publications (20… | |
| X___ Documentation/ada/Ada-Distilled-24-January-2011-Ada-2005-Version.pdf | |
| X___ Documentation/ada/courseada1.pdf | |
| X___ Documentation/ada/courseada2.pdf | |
| X___ Documentation/ada/courseada3.pdf | |
| X___ Documentation/scheme/artanis.pdf | |
| X___ Documentation/scheme/guix.pdf | |
| X___ Documentation/scheme/manual_guix.pdf | |
| X___ Documentation/skribilo/skribilo.pdf | |
| X___ Documentation/uck2ep1.pdf | |
| X___ Documentation/uck2ep2.pdf | |
| X___ Documentation/usingckermit3e.pdf | |
| XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/01 - Daftendirekt.flac | |
| XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/02 - Wdpk 83.7 fm.flac | |
| XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/03 - Revolution 909.flac | |
| XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/04 - Da Funk.flac | |
| XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/05 - Phoenix.flac | |
| _X__ Musique/Alan Walker/Alan Walker - Different World/01 - Alan Walker - Intro… | |
| _X__ Musique/Alan Walker/Alan Walker - Different World/02 - Alan Walker, Sorana… | |
| _X__ Musique/Alan Walker/Alan Walker - Different World/03 - Alan Walker, Julie … | |
| ``` | |
| ## List files locally available | |
| If you want to list the files for which you have the content available | |
| locally, you can use the "list" command from git-annex but only | |
| restrict to the group "here" representing your local repository. | |
| ```command line example | |
| git annex list --in here | |
| ``` | |
| # Work with a remote repository | |
| ## Delete a repository | |
| Simply mark it as "dead". | |
| ```command line example | |
| git annex dead $repo_name | |
| ``` | |
| ## Adding a remote repository GPG encrypted | |
| ```command line example | |
| git annex initremote $name type=rsync rsyncurl=remote-server:/home/solene/mydir… | |
| ``` | |
| ## Copy files to a remote | |
| If you want to duplicate files between repositories to have multiples | |
| copies you can use "git annex copy". | |
| ```command line example | |
| git annex copy Music -t remote-server | |
| ``` | |
| ## Move files to a remote | |
| If you want to move files from a repository to another (removing the | |
| content from origin) you can use "git annex move" which will copy to | |
| destination and remove from origin. | |
| ```command line example | |
| git annex move Music -t remote-server | |
| ``` | |
| ## Get a file content | |
| If you don't have a file locally, you can fetch it from a remote to get | |
| the content. | |
| ```command line example | |
| git annex get Music/Queen | |
| ``` | |
| ## Forget a file locally | |
| If you don't want to have the file locally because you don't have disk | |
| space or you simply don't want it, you can use the "drop" command. | |
| Note that "drop" is safe because git-annex won't allow you to drop | |
| files that have only one copy (except if you use --force of course). | |
| ```command line example | |
| git annex drop Music/Queen | |
| ``` | |
| Real life example: I have a very huge music library but my laptop SSD | |
| is too small, I get get some music I want and drop the files I don't | |
| want to listen for a while. | |
| ## Use mincopies to enforce multi repository data duplication | |
| The numcopies and mincopies variables can be used to tell git-annex you | |
| want exactly or at least "n" copies of the files, so it will be able to | |
| protect you from accidental deletions and also help uploading files to | |
| other repositories to match the requirements. | |
| ### Enable per directory recursively | |
| ```command line example | |
| echo "* annex.mincopies=2" > .gitattributes | |
| ``` | |
| ### Only upload files not matching the num copies | |
| If you have multiples repositories and some files doesn't match the | |
| copies requirements, you can use the following commands to only push | |
| the files missing copies. | |
| ```command line example | |
| git annex copy --auto -t remote-server | |
| ``` | |
| Real life example: I want my salaries PDF to be really safe, I can ask | |
| to have 2 copies of those and then run a sync to the remote server | |
| which will proceed to upload them if there is only one copy of the file | |
| yet. | |
| ## Verifying integrity and requirements | |
| There is the git-annex fsck command which will check the integrity of | |
| every file in the local repository and reports you if they are sane (or | |
| not), but it will also tell you which file doesn't meet the mincopies | |
| requirements. | |
| ```command line example | |
| git annex fsck | |
| ``` | |
| # Reversibility | |
| If for some reasons you want to give up git-annex, you can easily get | |
| all your files back like a normal file system by using "git annex | |
| unlock ." on the top directory of your repository, every local files | |
| will be replaced by their physical copy instead of the symlink. | |
| Reversibility is very important when you deal with your data because it | |
| means you are not stuck forever with a tool in case it's broken or if | |
| you want to switch to another process. | |
| # My workflow | |
| I have a ~/DATA/ directory in which I have sub directories | |
| {documents,documentation,pictures,videos,music,images}, documents are | |
| papers or legal papers, documentation are mostly PDF. Pictures are | |
| family pictures and images are wallpapers or stupid images I want to | |
| keep. | |
| I've set a mincopies to 2 for documents and pictures and my music is | |
| not on my computer but on a remote, I get the music files I want to | |
| listen when I'm on the local network with the computer having the | |
| files, I drop them locally when I'm bored. | |
| # Conclusion | |
| git-annex separates content from indexation, it can be used in many | |
| ways but it implies an archivist philosophy: redundancy, safety, | |
| immutability (sort of). It is not meant for backup, you can backup | |
| your directory managed by git-annex, it will save the data you have | |
| locally, you will have to make backup of your other data as well. | |
| I love that tool, it's a very nice piece of software. It's unique, I | |
| didn't find any other program to achieve this. | |
| ## More resources | |
| git-annex official walkthrough | |
| git-annex special remotes (S3, webdav, bittorrent etc..) | |
| git-annex encryption |