Replication

Why do we replicate state/data?

fault tolerance: prevent data loss
data locality (keep data close to clients that need it)
dividing the work

But:

higher cost
hard to keep state consistent

Informally, a replicated storage system is strongly consistent when you cannot tell that the data is replicated.

Primary-Backup Replication

First, pick a process to be the primary. All other processes are then backups. In this scenario, clients can only connect to the primary

When a client makes a write to the primary, the primary sends that write to all backups - when it then receives an ACK from all backups, that is the commit point and it returns OK to the client.

On a read, the primary just returns the value to the client without consulting with any backups.

Note

The client must ask the primary instead of a backup because there might be an uncommitted write in flight - a backup might know it, but the primary might not have committed it yet.

How well does it match our replication criteria?

[x] fault tolerance
[ ] data locality
[ ] dividing work

Chain Replication

Chain Replication for Supporting High Throughput and Availability - van Renesse, Schneider, 2004

Writes go to the head, which forwards to the next one in the chain, and so on to the tail. The tail ACKs the write.

Reads go to the tail.

Well, how well does it do?

[x] fault tolerance
[~] data locality
[x] dividing work
- slightly better - reads and writes go to different processes

throughput: number of actions per unit of time

Depending on the workload, CR could give you better throughput than PB.

For CR, the optimal workload ratio is about 15% writes to 85% reads.

So what’s the downside of CR? Well, CR has a higher write latency (depending on # of nodes in chain). (the two have about the same read latency)

latency: time between start and end of one action

Since PB broadcasts the write, it’s processed in parallel by the backups, and it can be ACKed as soon as all backups ACK. For CR, the write message is forwarded, and has to be processed by each process in the chain in series.

Note

Regardless of which replication scheme you choose, the client and replicas have to agree on who’s the primary/ head/tail/etc, or else you lose the guarantees of replication!

Total Order v. Determinism

Messages sent by different clients at the same time can arrive at different times:

In the second example, there’s no violation of TO delivery, but the result is not the same depending on which client’s message receives first!

determinism: On every run, the same outcome is achieved.

Bad Things

What happens if a client can tell that data is replicated (i.e. the replication is not strongly consistent)?

Read-Your-Writes Violation: A client’s written is not immediately returned on a subsequent read.

FIFO Consistency Violation

fifo consistency: Writes done by a single process are seen by all processes in the order they were issued

Causal Consistency Violation

causal consistency: Writes that are related by happens-before (i.e. potentially causally related) must be seen in the same causal order by all processes

Consistency

Actually, we can define different consistency models:

(aside: a model is the set of assumptions you keep in mind when building a system)

But maintaining stronger consistency requires more work, which means more latency and just being harder! Remember, replication/consistency usually involves duplicating messages too, so more bandwidth too

Coordination

Going back to our strongly consistent replication protocols (PB/CR) - both of these need some kind of coordinator process to know which process is the primary/head/tail/etc.

Chain Replication

CR uses the fail-stop fault model (i.e. crashes can occur and be detected by the environment), and requires that not all processes crash. There are some ways to implement this (like heartbeating), but sometimes you’ll have a false positive.

If the head process crashes, the coordinator makes the next process in line the new head
If the tail process crashes, the coordinator makes the preceding process the new tail
If a middle processes crashes, it just gets skipped over (although the clients do not have to be notified)

Additionally, when a failure happens, there has to be some handling of writes that are partway through the chain when the failure happened - out of the scope of this class though (van Renesse & Schneider, 2004).

Important

What if the coordinator fails?!?!?! Do we have to replicate the coordinator?

(Next: Consensus)

Active v. Passive Replication

Active replication: execute an operate on each replica (aka state machine replication)
Passive replication: state gets sent to backups

Example (primary-backup replication):

You might choose one over the other for the following reasons:

the updated state might be large (use active)
an operation might be expensive to do on each replica (use passive)
the operation might depend on local process state (use passive)