Introduction
Introduction Statistics Contact Development Disclaimer Help
Title: Synchronization files software
Author: Solène
Date: 04 May 2021
Tags: unix
Description:
# Introduction
In this article I will introduce you to various opensource file
synchronization programs and their according workflows. I may not know
them all, obviously.
I can't give a full explanation of each of them, but I will tell you
enough so you can know if it could be of any interest to you.
# Software
There are many software out there, with pros and cons, to match our
file synchronization requirements.
## rsync
rsync is the leader for simple file replication, it can take care that
the destination will exactly match the source data. It's available
mostly everywhere and using ssh as a transport it's also secure.
rsync is really the reference for a one-way synchronization.
rsync website
## lsyncd
lsyncd is meant to be used in an environment for near to realtime
synchronization. It will check for changes in the monitored
directories and will replicate the changes on a remote system (using
rsync by default).
lsyncd website
## unison
unison is like rsync but can synchronize in both way, meaning you can
keep two directories synchronized without having to think in which
order you need to transfer. Obviously, in case of conflict you will
have to resolve and pick which file you want to keep. This is a well
established software that is very reliable.
unison website
## rclone
rclone is like rsync but will support many backend instead of relying
on ssh to connect to a remote source. It's mostly used to transfer
files from or to Cloud services by making a glue between core rclone
and the service API.
I covered rclone in a previous article if you want more information.
rclone website
## syncthing
syncthing is a fantastic tool to keep directories synchronized between
computers/phones. It's a service you run, you define what directories
you want to export, and on other syncthing instances you can add those
exports and it will be kept synchronized together without tuning. It
uses a public tracker to find peers so you don't have to mess with NAT
or redirections, and if you want full privacy you can use direct IPs.
Data are encrypted during transfers.
It has the advantages of working in full automatic mode and can
exchange in both ways in a same directory, with multiples instance on a
same share, it can also keep previous copies of deleted / replaced
files and support many other features.
syncthing website
## sparkleshare
SparkleShare isn't well known but still does the job very efficiently.
It offers automatic synchronization of a directory with other peers
based on a git directory, basically, if you add a file or make a
change, it's committed and pushed to the remote repositories. If
someone make a change, you will receive it too.
While it works very well, it's mostly suited for non binary data
because of the git backend. You can't really delete old data so the
sparkleshare share will grow over time.
SparkleShare website
## nextcloud
Nextcloud has a file synchronization capability, it's mostly used to
upload your data to a remote server and be able to access it from
remote, but also share a file or a directory in read only or read/write
to other people. It's really a huge toolbox that requires a 24/7
server but provide many features for sharing files. A not so well
known feature is the ability to share a directory between Nextcloud
instances.
Nextcloud has its core in PHP for the www access but also phone or
desktop applications.
Nextcloud can encrypt stored data.
Nextcloud website
## seafile
Seafile is a centralized server to store data, like netxtcloud. It's
more focused on file storage than nextcloud, but will provide solid
features and also companions apps for phones and desktop.
seafile website
## git-annex
I kept the best for the end. Git-annex is a special beast that would
have deserved a full article for it but I never found how to approach
it.
git-annex is a command line tool to manage a library of data and will
delegate actual transfer to the according protocol.
WHAT DOES IT MEAN? Let's try an analogy.
You are in a house, you have many things in your house: movies, music,
books, papers. If you want to keep track of where is stored something,
you need an inventory, in which you will label where you stored this
paper, this DVD, this book etc... This is what git-annex is doing.
git-annex will allow you to entirely manage data and spread it on
different location (with redundancy possible) and let you access
natively (or at least tell you where to get it). A real life example
would be to use an external hard drive to store big files like music or
movies but use a remote server to backup important documents. But you
may want your documents to also be on the external hard drive, or even
two hard drives, you can tell git-annex to manage that.
git-annex can give you the current state of your library without having
the files locally, it will replace the whole hierarchy with symlinks to
the real files if they are on your computer, meaning you can get the
files when you need them or simply work on that index to remove files
and then tell git-annex to proceed to deletion if possible (or when it
can, like when you get internet access or you connect that external
hard drive).
The draw back is that all the tracked files are symbolic links to a
potentially non existing file and that you need a specific workflow of
unlocking file in order to make changes, and then store it again.
I've been using it for years for data that doesn't change much
(administrative documents, music, pictures) but it's certainly not
suitable for tracking logs or often modified files.
The name contains "git" but git-annex only use gits to store the whole
metadata, the data themselves are not in git.
git-annex website
# Conclusion
There are different strategies to synchronize files between computers,
they can be one way, both way, allow other people to use them, manage
at huge scale, realtime etc...
From my experience, we all manage our files in very different ways so
I'm glad we have that many ways to synchronize them.
PS: don't forget to backup, it's not because you replicate your data
that you don't need backup, sometimes it's easy to destroy all the data
at once with a simple mistake.
You are viewing proxied material from dataswamp.org. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.