History of Replication Systems¶
This document aims to describe the historical approaches to repository replication used by hg.mozilla.org. Reading this document should leave readers with an understanding of what replication approaches were used, what worked, what didn’t, why we switched strategies, etc.
Push-Time Synchronous Replication¶
The push-time replication system was our first replication system. It was very crude yet surprisingly effective:
A hook fired during the
changegroup
andpushkey
hooks as part ofhg push
operations.This hook executed the
repo-push.sh
script, which iterated through all mirrors and effectively ranssh -l hg <mirror> <repo>
.On each mirror, the SSH session effectively ran the
mirror-pull
scripts. This scripts essentially ranhg pull ssh://hg.mozilla.org/<repo>
.
Each mirror performed its replication in parallel. So the number of mirrors could be scaled without increasing replication time proportionally.
Replication was performed synchronously with the push. So, the client’s
hg push
command doesn’t finish until all mirrors had completed their
replication. This added a few seconds of latency to pushes.
There were several downsides with this replication method:
Replication was synchronous with push, adding latency. This was felt most notably on the Try repository, which took 9-15s to replicate. Other repositories typically would take 1-8s.
If a mirror was slow, it was a long pole and slowed down replication for the push, adding yet more latency to the push.
If a mirror was down, the system was not intelligent enough to automatically remove the mirror from the mirrors list. The master would retry several times before failing. This added latency to pushes.
If a mirror was removed from the replication system, it didn’t re-sync when it came back online. Instead, it needed to be manually re-synced by running a script. If a server rebooted for no reason, it could become out of sync and someone may or may not re-sync it promptly.
Each mirror synced and subsequently exposed data at different times. There was a window during replication where mirror A would advertise data that mirror B did not yet have. This could lead to clients seeing inconsistent repository state due to hitting different servers behind the load balancer.
In addition:
There was no mechanism for replicating repository creation or deletion events.
This was no mechanism for replicating hgrc changes.
This replication system was optimized for a low-latency, high-availability intra-datacenter environment and wouldn’t work well with a future, globally distributed hg.mozilla.org service (which would be far more prone to network events such as loss of connectivity).
Despite all these downsides, the legacy replication system was surprisingly effective. Mirrors getting out of sync was rare. Historically the largest problem was the increased push latency due to synchronous replication.
Kafka-Based Replication System¶
We wanted a replication system with fewer deficiencies than the push-time synchronous replication system described in the section above. Notably, we wanted:
Replication to be asynchronous with the
hg push
operation so people wouldn’t have to wait as long for their operation to complete.Mirrors that were down or slow wouldn’t slow down
hg push
operations.Mirrors that went down would recover and catch up on replication backlog automatically when they return to service.
Repository creation and deletion events could be replicated.
hgrc changes could be replicated.
The window of inconsistency across the HTTP servers would be reduced.
We devised a replication system built on top of message queues backed by Apache Kafka to implement such a system.
This system is described in detail at Replication.
Essentially, Kafka provides a distributed transaction log. During hg push
operations, the server writes messages into Kafka describing the changes that
were made to the repository. Daemons on mirrors react to new messages within
milliseconds, triggering the replication of repository data. The stream of
messages is ordered and consumers record their consume offset. This allows
consumers to go offline and resume replication at the last consumed offset
when they come back online.