(2023-04-09) On reliable timekeeping on slow networks

(2023-04-09) On reliable timekeeping on slow networks
-----------------------------------------------------
I am very passionate about timekeeping. I have a nice collection of watches,
some of which are capable of syncing via longwaves or even Bluetooth LE, and
the protocol to do this already is reverse-engineered, at least to the
extent of performing basic time synchronization tasks ([1]). I'll probably
write a non-GUI tool for it as well once I figure out what the most optimal
stack is to write it on top of. But, of course, these tools also need some
source of truth. Something needs to set the time on our own client devices
before we can pass it further or display it to the user.

Nowadays, all time synchronization and coordination over the Internet is
usually done via the NTP protocol. It's really well-engineered and takes
into account a lot of factors and allows to receive accurate time all over
the world. Can't really complain about that. One thing I can complain about
though, is that it's too complex to reimplement from scratch, and some CVE
reports about NTP server or client vulnerabilities just confirm that. On top
of that, it involves a lot of overhead data in every synchronization packet,
which might not be a lot in modern conventional networks, but pose a
significant problem once we're talking about something like GPRS at 30 Kpbs,
PSTN or CSD dialup at 9600 bps, AX.25 at 1200 bps or even slower
transmission modes at 300 bps. In these conditions, every extra byte
matters. The solution? Ye goode olde Time protocol (RFC 868), which some
timekeeping networks (like time.nist.gov) still gracefully run on the port
37 of their servers. It returns a 4-byte (32-bit) timestamp on a TCP
connection or as a response to any UDP datagram, and that's it. The
timestamp is expected to represent the number of seconds since the beginning
of the year 1900 UTC, and is supposed to roll over every 136 years.

Now, I encourage you to only use the UDP mode of this protocol whenever you
need it, as TCP connections made to just retrieve a 4-byte payload both pose
significant overhead for our purposes and don't make server admins happy
either. And, just like with NTP, you still need some way to measure elapsed
time locally with around a millisecond precision. Once it is sorted though,
the algorithm to get more or less accurate time (although with a whole
second resolution) is very simple and straightforward:

1. Prepare a tool for time measurement of steps 2 and 3 combined.
2. Send a random 32-bit datagram to the Time server.
3. Receive the 32-bit timestamp datagram FTIME from the Time server.
4. Record the execution time (in milliseconds) of steps 2 and 3 as ETIME.
5. Obtain the true Unix timestamp (in seconds) using the following formula:
TRUETIME = |(1000*FTIME + ETIME/2 - 2208988799500) / 1000|
6. Emit the TRUETIME value for further processing. End of algorithm.

There is a couple of things that might need explanation here. First, the RFC
says the Time protocol expects an empty datagram in UDP mode, but IRL most,
if not all, implementations accept any datagram and just discard its
contents. The 4-byte length of the outgoing datagram was chosen to make the
resulting IP packet have exactly the same length as the one we're going to
receive, so we can then safely divide the elapsed time by 2 to get more or
less accurate correction. Second, the constant 2208988799500 is the number
of milliseconds between the start of the year 1900 (where Time protocol
timestamps start) and the start of the year 1970 (where the Unix epoch
starts), minus 500 milliseconds used for rounding the final result properly.
So, starting with the year 2036 when the 32-bit Time protocol counter rolls
over, we will be adding 2082758400500 here instead of subtracting
2208988799500. And this is something that we need to know before applying
the formula, but I hope no one will ever finds themselves in the situation
they don't know whether or not the year 2036 already has come. But just in
case, here is a more future-proof version of the same algorithm with a
larger safety margin (until the year 2106):

1. Prepare a tool for time measurement of steps 2 and 3 combined.
2. Send a random 32-bit datagram to the Time server.
3. Receive the 32-bit timestamp datagram FTIME from the Time server.
4. Record the execution time (in milliseconds) of steps 2 and 3 as ETIME.
5. Obtain the true Unix timestamp (in seconds) using the following formula:
TRUETIME =
|(1000*FTIME + ETIME/2 - 2208988799500) / 1000| if FTIME > 2208988800,
|(1000*FTIME + ETIME/2 + 2082758400500) / 1000| otherwise.
6. Emit the TRUETIME value for further processing. End of algorithm.

Somewhere between the year 2036 and 2106, you can switch to using the second
formula unconditionally, and this will prolong the algorithm to the year
2172. Afterwards, you adjust the offset accordingly (500 + the amount of
milliseconds that actually passed between the start of 1970 and the start of
2172), and so on. This way, the 32-bit second counter can be reused forever.

This approach has the following advantages: only a single send-receive round
required, no transactional overhead thanks to UDP (4 bytes out, 4 bytes in),
not having to rely on the local clock for anything but elapsed time
measurement, and simple but accurate enough compensation for the roundtrip
time. For really slow networks, I can't think of anything better at the
moment. This is why I hope that even when NIST stops serving time using this
protocol, someone still will. Maybe, I'll set it up right here on hoi.st,
who knows.

--- Luxferre ---

[1]: https://git.sr.ht/~luxferre/RCVD