Hybrid Logical Clocks: Summary

Hybrid Logical Clocks: Summary

This [paper](https://cse.buffalo.edu/tech-reports/2014-04.pdf) focuses on the idea of time synchronization. They compare several different methods, and ultimately introduce a new one which they term "Hybrid Logical Clocks". While the original title of the paper is **"Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases"**, I find this a bit too wordy to say more than once.

# Some Background

## Physical Clocks and NTP

Many distributed systems developers might be tempted to just use the physical clock on your local host. However, as anyone who has actually tried this knows, there is often a significant drift between the clocks of any two systems, *especially* as they increase in distance. The [Network Time Protocol](https://en.wikipedia.org/wiki/Network_Time_Protocol) (NTP) can usually keep nodes within ~10ms of each other, but that still leaves large windows of uncertainty when ordering events for, say, a video game. You also have to worry about things like leap seconds, or sysadmins changing the time on you.

## TrueTime

To try to address these problems, Google recently proposed a protocol called [TrueTime](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45855.pdf) (PDF warning, I wasn't able to find a useful article elsewhere, see page 5). What this allows is for you to get a strict bounding on clock error by using a combination of GPS and local atomic clocks to synchronize a "global clock". The problem with this, of course, is that you need to have your own personal atomic clock. That is highly problematic for the average user. An alternative might be to have a receiver for some of the publicly-broadcasting atomic clocks and just store some offset to it, but this is also problematic because it both requires extra hardware and would reduce TrueTime's error guarantees.

So while this is helpful, it is not *generally* useful.

## Lamport Clocks

One of the earliest methods to address the problems of using physical time was the [Lamport Clock](https://en.wikipedia.org/wiki/Lamport_timestamps) (LC). The idea here is to make a simple counter on each node. Start at 0, and with each event that node can see (purely local or otherwise), increment the counter. If it receives a message, that message will always come with LC timestamp from the originating node, and so you take max(yours, theirs) + 1 as your new LC time.

This makes two guarantees:

- Event $X$ **happens before** event $Y$ $\implies$ $X.lc < Y.lc$
- If $Y$ is directly related to event $X$, $X.lc < Y.lc$ $\implies$ $X$ **happens before** $Y$
- $Y$ is directly related to $X$ if they are a send/receive pair, or
- $Y$ is directly related to $X$ if they are directly next to each other on some node's timeline

Example: ![](https://www.cs.rutgers.edu/~pxk/rutgers/notes/clocks/images/clocks-lamport.png)

## Vector Clocks

[Vector Clocks](https://en.wikipedia.org/wiki/Vector_clock) are an extension of Lamport Clocks which allows you to make strong guarantees about ordering.

The idea behind it is that timestamps, rather than being a single number, are a vector of numbers where each item corresponds to the last known Lamport time of some other process.

Example: ![](https://8thlight.com/blog/assets/posts/2013-10-04-synchronization-in-a-distributed-system/vc-example1-d34e3688946b4be742026c606847d63c50fff0e144acc996605de95e99d34c7d.png)

The major benefit here is that event $X$ **happens before** event $Y$ $\iff$ $X.vc < Y.vc$, so long as you define comparisons by the rules given by their algorithm. They would specify: $X.vc < Y.vc$ $\iff$ $[(\forall i \; X.vc\_i \le Y.vc\_i)$ $\land$ $(\exists i \; X.vc\_i < Y.vc\_i)]$.

![](https://8thlight.com/blog/assets/posts/2013-10-04-synchronization-in-a-distributed-system/vc-rule1-f3d51eb85aa13b2b682c34ff403b4c60fb833d4e3977c9e0f04d2e9c1c1471cc.png)

![](https://8thlight.com/blog/assets/posts/2013-10-04-synchronization-in-a-distributed-system/vc-rule2-0e5f57ac611a872eb4d7f01382deee329c4298ae3262e3de7994aaa35f0e0906.png)

![](https://8thlight.com/blog/assets/posts/2013-10-04-synchronization-in-a-distributed-system/vc-rule3-b93582d0e0ec6f60a00c0459d8039cdd36337b56b04cd7370f34427176eb8d11.png)

This, however, comes with a number of drawbacks. The biggest of these is that you need to know how many nodes your network will have before runtime. A more minor one is that the size of timestamps grows proportional to the number of nodes in your network.

## HybridTime

This is a method which combines Vector Clocks with a set of optimizations based on loose synchronization of physical clocks. I can't find any info on this, except that I think they are referencing [this article](https://dl.acm.org/doi/10.5555/647272.722012) that I can't get access to. I'll ask my advisor if I remember to. They reference this in the paper, but not for much more than to say "hey, people have tried this".

# The Main Ideas

By the time we're done, the authors will have shown us an algorithm that:

1. Guarantees $X$ **h.b.** $Y$ $\implies$ $X.hlc < Y.hlc$
2. Can be used in a backwards-compatible way with NTP timestamps
3. Uses $O(1)$ memory
4. Is resistant both to "stragglers" and "rushers" who are significantly behind or ahead of the rest of the network
5. Is directly useful in identifying distributed snapshots

# The Algorithm

## The Components

A Hybrid Logical Clock consists of three elements:

- $l$, which represents the highest physical time this node has seen, truncated to $2^{-16}$ second precision
- $c$, which is a Lamport Clock that resets whenever $l$ increases
- $pt$, which is a local property representing the given node's physical time

## On a Send or Local Event

When a node has a local event or sends a message, they should do the following:

def on_local(l, c):
new_l = max((l, truncate(physical_clock())))
if new_l == l:
logical = c + 1
else:
logical = 0
return HybridLogicalClock(new_pt, logical)

## On Receiving a Message

When a node receives a message, they should do the following:

def on_receive(l, c, m_l, m_c):
new_l = max((l, m_l, truncate(physical_clock())))
if new_l in (l, m_l):
new_c = max([(l, c), (m_l, m_c)])[1] + 1
else:
new_c = 0
return HybridLogicalClock(new_l, new_c)

## Example Trace

![](./img/5e546f7694d40c0001755ee8_hlc_clock.png)

Note that the paper has a typo in this image. The cell which reads as "3, 13" should read as "3, 10, 3". This is because they recycled another figure from their paper but forgot to re-annotate that cell.

# Why it works

## Frozen PT = Lamport Clock

Let's suppose for a moment that $pt$ never increments for any node. In such a scenario, this will act exactly like a Lamport Clock. Fortunately, most systems *don't* have permanently frozen clocks.

## PT and L are Tightly Coupled

If $pt$ ever gets past $l$, then $l$ will change and $c$ will reset. So the way we compare two timestamps is just by using tuple comparisons of the form $\left\langle X.l, X.c\right\rangle$. For review, the way that works is that you first compare the first element. If they are not equal, return the result of the comparison. If they are equal, move on to the next element and repeat until the end. If you get past the end, they are equal (and in this case, explicitly concurrent). From this follows a very easy implication: $X$ **h.b.** $Y$ $\implies$ $\left\langle X.l, X.c\right\rangle$ $<$ $\left\langle Y.l, Y.c\right\rangle$.

This paper also shows that the gap between $pt$ and $l$ is bounded in any system without adversarial sysadmins. To get there we have to go through another theorem though. $X.l > X.pt$ $\implies$ $(\exists Y : Y$ **h.b.** $X$ $\land$ $Y.pt = X.l)$. What they do next is introduce a system parameter $\epsilon$, which is defined as the maximum clock drift in the system (the difference between the smallest and largest $pt$s on the network).

Given that, it is impossible to find some pair of events such that $X$ **h.b.** $Y$ and $X.pt > Y.pt + \epsilon$. Because of that, it must follow that $(\forall X)(|X.l - X.pt| \le \epsilon)$.

## C has a Maximum Value

This also means that we can put bounds on the value of $c$ (assuming a system where sysadmins don't intervene). If we assume that the minimum gap between events is some timestep $d$, then $(\forall X)(X.c le \tfrac{\epsilon}{d} + 1)$.

## Encoding as an NTP Timestamp

So this ends up creating a system with the causal guarantees of a Lamport Clock, but can we encode it as a traditional timestamp? It turns out the answer is yes, but mostly because NTP provides some *absurd* precision in its timestamps.

NTP timestamps are 64-bits, divided into two 32-bit sections. The first portion is for a whole number of Unix-time seconds. The second portion is for fractional seconds. To be clear here, that means that NTP allows for $2^{-32}$ second precision. This ends up being ~$\tfrac{1}{4}$ns precision, which is just silly. It's a little easier to show the change they made when expressed as a C struct:

#define FRAC_SECONDS_DENOM (1 << 32)

struct NTP_stamp { // note: NTP is big-endian
uint32_t seconds;
uint32_t frac_seconds;
};

What this paper does is instead allow for $2^{-16}$ second precision in the physical time portion by saying:

#define FRAC_SECONDS_DENOM (1 << 16)

struct HLC_stamp { // still big-endian
uint32_t seconds; // corresponds to l
uint16_t frac_seconds; // corresponds to l
uint16_t c;
};

Or, making use of the bare union extension:

#define FRAC_SECONDS_DENOM (1 << 32)

struct HLC_stamp { // still big-endian
union {
struct {
uint64_t l : 48;
uint16_t c : 16;
};
struct {
uint32_t seconds;
uint32_t frac_seconds; // always assign this *before* c, never after
};
};
};

Note that the latter version is more faithful to the original paper, since $c$ is fully incorporated into the fractional seconds field. This, in effect, means that the fractional seconds field has an *actual* precision of $2^{-16}$, but a *perceived* precision of $2^{-32}$, with the least significant bits corresponding entirely to the $c$ value.

From here they perform some experiments to show that there is essentially no overhead to doing this, as compared to physical time synchronization.

# My Thoughts

This is a very interesting way to go about things. Coming out of it, my biggest question is whether, given enough additional information, one could use this to replace Vector Clocks. As it currently stands, that would be a poor decision in small systems, but this stands as a very promising addition to peer-to-peer protocols.

This is a topic that my group in CSE 812 will be incorporating into our research. Sometime in the next few days I hope to have a post up about the other paper we are basing things on. It is significantly longer, but also can be condensed more easily if I approach things right.

Let me know what you think below!

tags: research-review, cse-812, advisee-group, msu