Should I Use a Graph Database?

tl;dr

In case of doubts grab a RDBMS.

What is a graph database?

Where a relational database stores and represents data as relations, i.g. tables, a graph database uses graph structures (nodes, edges) to represent and store data.

Relational databases save each table separately and use join techniques like as nested loop join, sort merge join and hash join to join tables. Graph databases might not using indexes at all. Instead, graph nodes are linked to their neighbors directly in the storage (index-free adjacency). When a graph database performs a query, it has to follow the links in the storage, no index lookup is needed. The node-to-node traversal cost is O(1).

Here is a comparison to RDBMS. And here is a detailed explanation how joins work.

Implementations

  • DGraph (distributed)
    • supports transactions
    • Comparisons to other DBs
    • Consistently replicated with shard rebalancing.
    • Based on own key value store Badger
  • Neo4J (distributed)
    • ACID
    • Replications are available for only enterprise users.
  • Cayley
    • Requires backend stores like leveldb, mongodb, postgres, mysql, in-memory.
  • Amazon’s Neptune
    • ACID
    • Backups to S3
    • Pricing: $7.20/mo (db.t2.medium), $252/mo

Query languages

Use cases for graph databases

  • Social networking
  • Recommendation engines
  • Fraud detection
  • knowledge graphs
  • life science
  • network / IT operations

Comparison with relational databases

Advantages

  • Better performance – highly connected data can cause a lot of joins, which generally are expensive. After over 7 self/recursive joins, the RDMS starts to get really slow compared to Neo4j/native graph databases.
  • No schema required. Graph data is not forced into a structure like a relational table, and attributes can be added and removed easily. This is especially useful for semi-structured data where a representation in relational database would result in lots of NULL column values.
  • Simpler query in graph dedicated language.
  • More efficient data storage and query latency for datasets containing much more attributed relations between entities than the number of entities themselves.
  • More efficient for graph queries like “shortest path between two nodes”.

Disadvantages

  • you have to learn an additional language
  • Not general purpose DB. The use case is narrower: only graphs.
  • Not intended/proven to work in different kinds of environments and domains.

There are a couple of blogs like From graph DB to postgresql mentioning that the graph databases (neo4j) gets slower with complicated queries and increased data scale. On the other hand, Amazon unveil their new graph database Neptune.

There is also a trend to back to SQL. It is about the language, not the database type, although SQL is primarily used by RDBMS.

Conclusion

Pick graph databases when

  • your data is not highly structured,
  • your data contains lots of null values,
  • there is lots of relations between entities,
  • the set of queries is well defined and not a subject of change.

Pick relational databases when

  • you are doing bulk and mass queries over a single table or tables requiring few joins, RDBMS is a better choice (details),
  • the data is highly structured in predefined columns, i.e. the column values are not mostly null,
  • the environment is not well defined and volatile.