logo

Distributed System Design - Examples

Twitter: Push vs Pull

  • Normal Account: push model
    • a new tweet will be pushed to all the followers, each follower has a home timeline cache like a mailbox. fan-out: poster 1 write, 100 followers, 100 writes, ~80 reads
  • Large Account: pull model
    • a new tweet is inserted to the db, every home timeline update query will pull from the db, 1 write, x reads

Twitter Infrastructure

twitter :https://blog.twitter.com/2017/the-infrastructure-behind-twitter-scale

Facebook's Graph

Every entity is a node ("Object"), e.g. a person, a page, an ad account, etc. Nodes are connected by edges ("Association"). The API:

  • Object: (id) -> (otype, (key, value))
  • Assoc.: (id1, atype, id2) -> (time, (key, value))

Distributed key-value store

the simplest storage system for a distributed computing environment is a single-node key-value store. In this model, the API has two methods:

  • Put(key k, value v)
  • Get(key k)

All Put() and Get() requests go to the one and only server.

You could fix the data loss problem by adding leader-replica replication, in which the server writes the data to its peers before considering a Put() operation complete. As a result, no single hard drive is an SPoF.

However, as the service becomes more popular, the rate of requests may outstrip the leader's ability to handle them. At this point, a common distributed systems approach is to add caching proxy servers between the clients and the leader.

Reddit

When you vote, your vote isn’t instantly processed—instead, it’s placed into a queue. if you were to vote and quickly refresh the page, your vote may not have been processed yet, and it would appear that your vote had been reverted. To get around this, we cache your recent votes for a short period of time to display them back to you until they’re processed.

reddit cache https://redditblog.com/2017/01/17/caching-at-reddit/

One of the optimizations we make is splitting up our caches by workload type, rather than running them as one big pool.

when to scale up caches: caches are starting to evict more and our database is slowing down as a result

Memcached works on a memory allocation system known as slab allocation.

More Stacks