Title: Synchronization files software

	Title: Synchronization files software
	Author: Solène
	Date: 04 May 2021
	Tags: unix
	Description:

	# Introduction

	In this article I will introduce you to various opensource file
	synchronization programs and their according workflows. I may not know
	them all, obviously.

	I can't give a full explanation of each of them, but I will tell you
	enough so you can know if it could be of any interest to you.

	# Software

	There are many software out there, with pros and cons, to match our
	file synchronization requirements.

	## rsync

	rsync is the leader for simple file replication, it can take care that
	the destination will exactly match the source data. It's available
	mostly everywhere and using ssh as a transport it's also secure.

	rsync is really the reference for a one-way synchronization.

	rsync website

	## lsyncd

	lsyncd is meant to be used in an environment for near to realtime
	synchronization. It will check for changes in the monitored
	directories and will replicate the changes on a remote system (using
	rsync by default).

	lsyncd website

	## unison

	unison is like rsync but can synchronize in both way, meaning you can
	keep two directories synchronized without having to think in which
	order you need to transfer. Obviously, in case of conflict you will
	have to resolve and pick which file you want to keep. This is a well
	established software that is very reliable.

	unison website

	## rclone

	rclone is like rsync but will support many backend instead of relying
	on ssh to connect to a remote source. It's mostly used to transfer
	files from or to Cloud services by making a glue between core rclone
	and the service API.

	I covered rclone in a previous article if you want more information.

	rclone website

	## syncthing

	syncthing is a fantastic tool to keep directories synchronized between
	computers/phones. It's a service you run, you define what directories
	you want to export, and on other syncthing instances you can add those
	exports and it will be kept synchronized together without tuning. It
	uses a public tracker to find peers so you don't have to mess with NAT
	or redirections, and if you want full privacy you can use direct IPs.
	Data are encrypted during transfers.

	It has the advantages of working in full automatic mode and can
	exchange in both ways in a same directory, with multiples instance on a
	same share, it can also keep previous copies of deleted / replaced
	files and support many other features.

	syncthing website

	## sparkleshare

	SparkleShare isn't well known but still does the job very efficiently.
	It offers automatic synchronization of a directory with other peers
	based on a git directory, basically, if you add a file or make a
	change, it's committed and pushed to the remote repositories. If
	someone make a change, you will receive it too.

	While it works very well, it's mostly suited for non binary data
	because of the git backend. You can't really delete old data so the
	sparkleshare share will grow over time.

	SparkleShare website

	## nextcloud

	Nextcloud has a file synchronization capability, it's mostly used to
	upload your data to a remote server and be able to access it from
	remote, but also share a file or a directory in read only or read/write
	to other people. It's really a huge toolbox that requires a 24/7
	server but provide many features for sharing files. A not so well
	known feature is the ability to share a directory between Nextcloud
	instances.

	Nextcloud has its core in PHP for the www access but also phone or
	desktop applications.

	Nextcloud can encrypt stored data.

	Nextcloud website

	## seafile

	Seafile is a centralized server to store data, like netxtcloud. It's
	more focused on file storage than nextcloud, but will provide solid
	features and also companions apps for phones and desktop.

	seafile website

	## git-annex

	I kept the best for the end. Git-annex is a special beast that would
	have deserved a full article for it but I never found how to approach
	it.

	git-annex is a command line tool to manage a library of data and will
	delegate actual transfer to the according protocol.

	WHAT DOES IT MEAN? Let's try an analogy.

	You are in a house, you have many things in your house: movies, music,
	books, papers. If you want to keep track of where is stored something,
	you need an inventory, in which you will label where you stored this
	paper, this DVD, this book etc... This is what git-annex is doing.

	git-annex will allow you to entirely manage data and spread it on
	different location (with redundancy possible) and let you access
	natively (or at least tell you where to get it). A real life example
	would be to use an external hard drive to store big files like music or
	movies but use a remote server to backup important documents. But you
	may want your documents to also be on the external hard drive, or even
	two hard drives, you can tell git-annex to manage that.

	git-annex can give you the current state of your library without having
	the files locally, it will replace the whole hierarchy with symlinks to
	the real files if they are on your computer, meaning you can get the
	files when you need them or simply work on that index to remove files
	and then tell git-annex to proceed to deletion if possible (or when it
	can, like when you get internet access or you connect that external
	hard drive).

	The draw back is that all the tracked files are symbolic links to a
	potentially non existing file and that you need a specific workflow of
	unlocking file in order to make changes, and then store it again.

	I've been using it for years for data that doesn't change much
	(administrative documents, music, pictures) but it's certainly not
	suitable for tracking logs or often modified files.

	The name contains "git" but git-annex only use gits to store the whole
	metadata, the data themselves are not in git.

	git-annex website

	# Conclusion

	There are different strategies to synchronize files between computers,
	they can be one way, both way, allow other people to use them, manage
	at huge scale, realtime etc...

	From my experience, we all manage our files in very different ways so
	I'm glad we have that many ways to synchronize them.

	PS: don't forget to backup, it's not because you replicate your data
	that you don't need backup, sometimes it's easy to destroy all the data
	at once with a simple mistake.