Distributed Systems - ID
We can enforce a single global master that take care of unique ID generation. This can be a traditional SQL database that atomically increments primary key columns. But it needs locking during incrementing, so soon we will have scalability issues.
For auto-increment IDs, the latest value need to be persisted, otherwise the state will be lost upon restart.
If not auto-increment, the IDs need to be stored so no same ID will be used again in the future.
- completely stateless, no coordination needed, multiple nodes can generate IDs at the same time
- no guarantee of uniqueness
- type 4: random
- type 1: MAC address + time component
- nothing to store on disk so we can go as fast as our CPU can go
- GTID: Global Transaction Identifier
- Twitter Snowflake
- Cassandra TimeUUID
- K-ordering is a more precise way of saying roughly sorted.
- Github: https://github.com/segmentio/ksuid
Timestamp is not reliable in distributed environment. Computer clock is constantly skewing forwards or backwards away from the actual time(called "clock drift"), and every computer skews at a different rate.
Cannot use timestamp as id, or for ordering. Unless servers are perfectly synchronized.
Generate Unique ID: http://antirez.com/news/99
A centralized clock system would be a single point of failure. The dedicated servers(called "time oracle") normally equipped with an atomic clock or GPS to keep time as precise as possible. Often used in trading platforms where super accurate time is important.
So it is not impossible, but could be very expensive. Google has its own highly available, distributed clock called TrueTime, it guarantees to generate monotonically increasing timestamps. It serves all Google's internal services and Cloud Spanner in GCP.