Graph

Last Updated: 2022-02-06

Why Graph

Imagine you are designing Facebook. From what you learned about OOP, probably you will create a class called User, and users can be friends; then they may have Post, which will be another class; and posts may have Comment; that's probably enough to create the timeline? What if the platform evolves to have Page, Group, or selling Item in the market place? Creating more classes and describe their relationships would soon become unmanageable. An alternative is to treat all those concepts as entities, or "node" of a graph, and their relationships as "edges".

Another example is Airbnb's knowledge graph. Similarly, nodes are entities, and edges are relationships. So queries like fetching all landmarks close to a Home at Airbnb can easily be supported.

Google is also building its Knowledge Graph for structured data, in addition to webpage results.

How To Store The Graph

Graph databases

Neo4j: a Java graph db.
JanusGraph: started as a fork of TitanDB (now TitanDB is discontinued). Supported by Google.
Amazon Neptune
- graph model: Property Graph and W3C's RDF,
- graph query: Apache TinkerPop Gremlin and SPARQL
Giraph: based on Google's Pregel, however Pregel is deprecated.

Home Grown

Big companies are not using those off-the-shelf graph databases, but building their own on existing infrastructure.

For example, Facebook is using MySQL as the underlying database, with TAO as the cache layer, to store the whole graph

Airbnb is also using a relational database, saying operation overhead is the reason of the choice: there was no production ready graph databases internally, and the in-house relational database was reliable and widely used.

How to Query The Graph

Google has a public API which uses standard schema.org types and is compliant with the JSON-LD specification.

Facebook also has a public Graph API however not the whole graph is available for query due to privacy issues. Internally GraphQL is used mostly for mobile apps to talk to servers.

Airbnb's Knowledge Graph is private, an example can be found here