Two Phase Commit

The Objective

Two phase commit is a common replication algorithm used in distributed computing to ensure consistent replication of data to multiple nodes (ex: database primary/secondary replication).

The key goal is to ensure that a data change (ex: row insert) is either reflected on all nodes (operation succeeded) or rejected by all nodes (operation failed).

The Challenge

A key challenge with replicated data systems is keeping data nodes consistent. In a relational database that could mean that all the rows of all the tables are identical.

This can be difficult to guarantee in the face of network failures, etc. Two examples of failure scenarios which are difficult to differentiate between from the perspective of the primary node.

Scenario 1:

  1. A commit request arrives at a primary node
  2. It is committed to the table
  3. A network request to a secondary is attempted but fails to arrive to the secondary
  4. Primary should rollback the transaction?

Scenario 2:

  1. A commit request arrives at a primary node
  2. It is committed to the table
  3. A network request to a secondary is attempted. The request arrives to the secondary and is committed to the table.
  4. The response from the secondary back to the primary confirming success is undelivered.
  5. Primary should rollback the transaction?

The key difference between those two scenarios is that, if the primary doesn’t receive a response from the secondary, does that mean that the secondary committed the transaction or not? Should the primary rollback the transaction or not? Also note that the secondary is unaware of this dilemma in either scenario and would happily return whatever data row it has available.

It could theoretically retry to send/receive a request to the secondary, but for how long? Eventually we have to decide to rollback the transaction or not. And if the same record(s) are requested for read while in this state, should they return the original record value or new record value.

The Solution

Two phase commit solves this problem by coordinating between primary and replica in two phases:

Phase-1
1) Primary sends COMMIT_REQUEST to secondary(s).
2A) If secondary receives the COMMIT_REQUEST, then write it to its a transaction log for pending. Place a lock on required rows. Send back AGREED message to the coordinator.
2B) If secondary can not commit, then it will send ABORT message to the primary.
3A) If the primary did not receive any response from the secondary, it will re-send the node another COMMIT_REQUEST message. After sometime, it will give up and send an ABORT message to the secondary(s).
3B) If the primary receives AGREED from secondary(s) then proceed to Phase-2.

Phase-2
1) Primary sends COMMIT to secondary(s).
2) If the secondary receives a COMMIT message, it will commit and send back a COMMITTED message.
3A) The primary will complete a commit and remove the lock, after it received COMMITTED message from secondary(s).
3B) If the primary received ABORT from any of the secondary nodes, it will send an ABORT message to other secondaries.
3C) If the primary did not received response from a secondary node(s), it will re-send the node another COMMIT message.
4) At any time, if a secondary node receives an ABORT message, it will rollback the transaction.

The reason why this works better than a single phase commit is that it allows the primary to first indicate an intention to the secondary(s) to make them aware that a synchronization event is in progress, place any required locks, and be prepared to either commit or ignore in the next phase. The second phase then can just be used to coordinate with secondary(s). At any time, if any node is unavailable, then all the required locks will still be in place and can remain in pending state until reconciliation.

Disadvantages