Distributed Consistency

From MgmtWiki
Jump to: navigation, search

Full Title or Meme

Distributed Consistency in Identity Management is required when more than one process might be assigning identifiers that are required to be unique.

Context

  • Decentralized Consistency is an easier problem when each Registration Authority is given control of a portion of a name space where it can control registration.
  • With the advent of fully distributed systems where any one machine could desire the ability to create an identifier, this problem needs a practical resolution.

Problems

  • Coordination is tricky to implements and expensive in terms of latency added.
  • It cannot be assumed that all systems are operating together, or even that some systems are not working to actively steal another's identifier.
  • In spite of lack of coordination between system, we require that there be only a single outcome. That is that when a request for an identifier is completed, only one user will be permitted to use that identifier in a transaction.
  • Resilience cannot depend on others working as expected. Maersk's network was devastated when all of the back-ups to their DNS was destroyed by a single malignant virus. They were only saved by the accident that one of the DNS servers was off-line due to a power failure.

Solutions

  • THe first important way to avoid coordination is to change the architecture. If traffic is slow at in intersection, just build an overpass for one path and the problem goes away.
  • CALM[1] = Consistency As Logical Monotonicity is proven to the same as monotonicity. So if two sites chose the same identifier, only one will get to proceed to committing a transaction.
    • Step 1 - system A receives a request for an identifier. System B receives a request for the same identifier.
    • Step 2 - each system accepts the request and sends out the result
    • step 3 - a winner is chosen, perhaps because system C (or some majority voting) agrees with one of the two other systems
    • step 4 - a transaction is sent using the identifier - it cannot be accepted by the identifier's originating system.
    • Note that the above does not prevent forking, so that situation needs to be addressed by the solution.
    • Identifiers can never be deleted, only deactivated to prevent race conditions between creation and deletion. This is called tombstoning in database systems.
  • Conflict-free replicated data types (CRDT) provide an object-oriented framework for monotonic programming patterns like tombstones.
  • Google Spanner transactional database provides fully managed relational database with unlimited scale, strong consistency and high availability

References

  1. Joseph Hellerstein and Peter Alvaro, Keeping CALM: When Distributed Consistency is Easy (2020-09) CACM 63 no 3 pp. 72ff