| Title: Synchronization files software | |
| Author: Solène | |
| Date: 04 May 2021 | |
| Tags: unix | |
| Description: | |
| # Introduction | |
| In this article I will introduce you to various opensource file | |
| synchronization programs and their according workflows. I may not know | |
| them all, obviously. | |
| I can't give a full explanation of each of them, but I will tell you | |
| enough so you can know if it could be of any interest to you. | |
| # Software | |
| There are many software out there, with pros and cons, to match our | |
| file synchronization requirements. | |
| ## rsync | |
| rsync is the leader for simple file replication, it can take care that | |
| the destination will exactly match the source data. It's available | |
| mostly everywhere and using ssh as a transport it's also secure. | |
| rsync is really the reference for a one-way synchronization. | |
| rsync website | |
| ## lsyncd | |
| lsyncd is meant to be used in an environment for near to realtime | |
| synchronization. It will check for changes in the monitored | |
| directories and will replicate the changes on a remote system (using | |
| rsync by default). | |
| lsyncd website | |
| ## unison | |
| unison is like rsync but can synchronize in both way, meaning you can | |
| keep two directories synchronized without having to think in which | |
| order you need to transfer. Obviously, in case of conflict you will | |
| have to resolve and pick which file you want to keep. This is a well | |
| established software that is very reliable. | |
| unison website | |
| ## rclone | |
| rclone is like rsync but will support many backend instead of relying | |
| on ssh to connect to a remote source. It's mostly used to transfer | |
| files from or to Cloud services by making a glue between core rclone | |
| and the service API. | |
| I covered rclone in a previous article if you want more information. | |
| rclone website | |
| ## syncthing | |
| syncthing is a fantastic tool to keep directories synchronized between | |
| computers/phones. It's a service you run, you define what directories | |
| you want to export, and on other syncthing instances you can add those | |
| exports and it will be kept synchronized together without tuning. It | |
| uses a public tracker to find peers so you don't have to mess with NAT | |
| or redirections, and if you want full privacy you can use direct IPs. | |
| Data are encrypted during transfers. | |
| It has the advantages of working in full automatic mode and can | |
| exchange in both ways in a same directory, with multiples instance on a | |
| same share, it can also keep previous copies of deleted / replaced | |
| files and support many other features. | |
| syncthing website | |
| ## sparkleshare | |
| SparkleShare isn't well known but still does the job very efficiently. | |
| It offers automatic synchronization of a directory with other peers | |
| based on a git directory, basically, if you add a file or make a | |
| change, it's committed and pushed to the remote repositories. If | |
| someone make a change, you will receive it too. | |
| While it works very well, it's mostly suited for non binary data | |
| because of the git backend. You can't really delete old data so the | |
| sparkleshare share will grow over time. | |
| SparkleShare website | |
| ## nextcloud | |
| Nextcloud has a file synchronization capability, it's mostly used to | |
| upload your data to a remote server and be able to access it from | |
| remote, but also share a file or a directory in read only or read/write | |
| to other people. It's really a huge toolbox that requires a 24/7 | |
| server but provide many features for sharing files. A not so well | |
| known feature is the ability to share a directory between Nextcloud | |
| instances. | |
| Nextcloud has its core in PHP for the www access but also phone or | |
| desktop applications. | |
| Nextcloud can encrypt stored data. | |
| Nextcloud website | |
| ## seafile | |
| Seafile is a centralized server to store data, like netxtcloud. It's | |
| more focused on file storage than nextcloud, but will provide solid | |
| features and also companions apps for phones and desktop. | |
| seafile website | |
| ## git-annex | |
| I kept the best for the end. Git-annex is a special beast that would | |
| have deserved a full article for it but I never found how to approach | |
| it. | |
| git-annex is a command line tool to manage a library of data and will | |
| delegate actual transfer to the according protocol. | |
| WHAT DOES IT MEAN? Let's try an analogy. | |
| You are in a house, you have many things in your house: movies, music, | |
| books, papers. If you want to keep track of where is stored something, | |
| you need an inventory, in which you will label where you stored this | |
| paper, this DVD, this book etc... This is what git-annex is doing. | |
| git-annex will allow you to entirely manage data and spread it on | |
| different location (with redundancy possible) and let you access | |
| natively (or at least tell you where to get it). A real life example | |
| would be to use an external hard drive to store big files like music or | |
| movies but use a remote server to backup important documents. But you | |
| may want your documents to also be on the external hard drive, or even | |
| two hard drives, you can tell git-annex to manage that. | |
| git-annex can give you the current state of your library without having | |
| the files locally, it will replace the whole hierarchy with symlinks to | |
| the real files if they are on your computer, meaning you can get the | |
| files when you need them or simply work on that index to remove files | |
| and then tell git-annex to proceed to deletion if possible (or when it | |
| can, like when you get internet access or you connect that external | |
| hard drive). | |
| The draw back is that all the tracked files are symbolic links to a | |
| potentially non existing file and that you need a specific workflow of | |
| unlocking file in order to make changes, and then store it again. | |
| I've been using it for years for data that doesn't change much | |
| (administrative documents, music, pictures) but it's certainly not | |
| suitable for tracking logs or often modified files. | |
| The name contains "git" but git-annex only use gits to store the whole | |
| metadata, the data themselves are not in git. | |
| git-annex website | |
| # Conclusion | |
| There are different strategies to synchronize files between computers, | |
| they can be one way, both way, allow other people to use them, manage | |
| at huge scale, realtime etc... | |
| From my experience, we all manage our files in very different ways so | |
| I'm glad we have that many ways to synchronize them. | |
| PS: don't forget to backup, it's not because you replicate your data | |
| that you don't need backup, sometimes it's easy to destroy all the data | |
| at once with a simple mistake. |