Title: Synchronization files software | |
Author: Solène | |
Date: 04 May 2021 | |
Tags: unix | |
Description: | |
# Introduction | |
In this article I will introduce you to various opensource file | |
synchronization programs and their according workflows. I may not know | |
them all, obviously. | |
I can't give a full explanation of each of them, but I will tell you | |
enough so you can know if it could be of any interest to you. | |
# Software | |
There are many software out there, with pros and cons, to match our | |
file synchronization requirements. | |
## rsync | |
rsync is the leader for simple file replication, it can take care that | |
the destination will exactly match the source data. It's available | |
mostly everywhere and using ssh as a transport it's also secure. | |
rsync is really the reference for a one-way synchronization. | |
rsync website | |
## lsyncd | |
lsyncd is meant to be used in an environment for near to realtime | |
synchronization. It will check for changes in the monitored | |
directories and will replicate the changes on a remote system (using | |
rsync by default). | |
lsyncd website | |
## unison | |
unison is like rsync but can synchronize in both way, meaning you can | |
keep two directories synchronized without having to think in which | |
order you need to transfer. Obviously, in case of conflict you will | |
have to resolve and pick which file you want to keep. This is a well | |
established software that is very reliable. | |
unison website | |
## rclone | |
rclone is like rsync but will support many backend instead of relying | |
on ssh to connect to a remote source. It's mostly used to transfer | |
files from or to Cloud services by making a glue between core rclone | |
and the service API. | |
I covered rclone in a previous article if you want more information. | |
rclone website | |
## syncthing | |
syncthing is a fantastic tool to keep directories synchronized between | |
computers/phones. It's a service you run, you define what directories | |
you want to export, and on other syncthing instances you can add those | |
exports and it will be kept synchronized together without tuning. It | |
uses a public tracker to find peers so you don't have to mess with NAT | |
or redirections, and if you want full privacy you can use direct IPs. | |
Data are encrypted during transfers. | |
It has the advantages of working in full automatic mode and can | |
exchange in both ways in a same directory, with multiples instance on a | |
same share, it can also keep previous copies of deleted / replaced | |
files and support many other features. | |
syncthing website | |
## sparkleshare | |
SparkleShare isn't well known but still does the job very efficiently. | |
It offers automatic synchronization of a directory with other peers | |
based on a git directory, basically, if you add a file or make a | |
change, it's committed and pushed to the remote repositories. If | |
someone make a change, you will receive it too. | |
While it works very well, it's mostly suited for non binary data | |
because of the git backend. You can't really delete old data so the | |
sparkleshare share will grow over time. | |
SparkleShare website | |
## nextcloud | |
Nextcloud has a file synchronization capability, it's mostly used to | |
upload your data to a remote server and be able to access it from | |
remote, but also share a file or a directory in read only or read/write | |
to other people. It's really a huge toolbox that requires a 24/7 | |
server but provide many features for sharing files. A not so well | |
known feature is the ability to share a directory between Nextcloud | |
instances. | |
Nextcloud has its core in PHP for the www access but also phone or | |
desktop applications. | |
Nextcloud can encrypt stored data. | |
Nextcloud website | |
## seafile | |
Seafile is a centralized server to store data, like netxtcloud. It's | |
more focused on file storage than nextcloud, but will provide solid | |
features and also companions apps for phones and desktop. | |
seafile website | |
## git-annex | |
I kept the best for the end. Git-annex is a special beast that would | |
have deserved a full article for it but I never found how to approach | |
it. | |
git-annex is a command line tool to manage a library of data and will | |
delegate actual transfer to the according protocol. | |
WHAT DOES IT MEAN? Let's try an analogy. | |
You are in a house, you have many things in your house: movies, music, | |
books, papers. If you want to keep track of where is stored something, | |
you need an inventory, in which you will label where you stored this | |
paper, this DVD, this book etc... This is what git-annex is doing. | |
git-annex will allow you to entirely manage data and spread it on | |
different location (with redundancy possible) and let you access | |
natively (or at least tell you where to get it). A real life example | |
would be to use an external hard drive to store big files like music or | |
movies but use a remote server to backup important documents. But you | |
may want your documents to also be on the external hard drive, or even | |
two hard drives, you can tell git-annex to manage that. | |
git-annex can give you the current state of your library without having | |
the files locally, it will replace the whole hierarchy with symlinks to | |
the real files if they are on your computer, meaning you can get the | |
files when you need them or simply work on that index to remove files | |
and then tell git-annex to proceed to deletion if possible (or when it | |
can, like when you get internet access or you connect that external | |
hard drive). | |
The draw back is that all the tracked files are symbolic links to a | |
potentially non existing file and that you need a specific workflow of | |
unlocking file in order to make changes, and then store it again. | |
I've been using it for years for data that doesn't change much | |
(administrative documents, music, pictures) but it's certainly not | |
suitable for tracking logs or often modified files. | |
The name contains "git" but git-annex only use gits to store the whole | |
metadata, the data themselves are not in git. | |
git-annex website | |
# Conclusion | |
There are different strategies to synchronize files between computers, | |
they can be one way, both way, allow other people to use them, manage | |
at huge scale, realtime etc... | |
From my experience, we all manage our files in very different ways so | |
I'm glad we have that many ways to synchronize them. | |
PS: don't forget to backup, it's not because you replicate your data | |
that you don't need backup, sometimes it's easy to destroy all the data | |
at once with a simple mistake. |