Should I Use a Graph Database?

Jan 3, 2018

tl;dr

In case of doubts grab a RDBMS.

What is a graph database?

Where a relational database stores and represents data as relations, i.g. tables, a graph database uses graph structures (nodes, edges) to represent and store data.

Relational databases save each table separately and use join techniques like as nested loop join, sort merge join and hash join to join tables. Graph databases might not using indexes at all. Instead, graph nodes are linked to their neighbors directly in the storage (index-free adjacency). When a graph database performs a query, it has to follow the links in the storage, no index lookup is needed. The node-to-node traversal cost is O(1).

Here is a comparison to RDBMS. And here is a detailed explanation how joins work.

Implementations

DGraph (distributed)
- supports transactions
- Comparisons to other DBs
- Consistently replicated with shard rebalancing.
- Based on own key value store Badger
Neo4J (distributed)
- ACID
- Replications are available for only enterprise users.
Cayley
- Requires backend stores like leveldb, mongodb, postgres, mysql, in-memory.
Amazon’s Neptune
- ACID
- Backups to S3
- Pricing: $7.20/mo (db.t2.medium), $252/mo

Query languages

Gremlin - graph traversal
SPARQL

Use cases for graph databases

Social networking
Recommendation engines
Fraud detection
knowledge graphs
life science
network / IT operations

Comparison with relational databases

Advantages

Better performance – highly connected data can cause a lot of joins, which generally are expensive. After over 7 self/recursive joins, the RDMS starts to get really slow compared to Neo4j/native graph databases.
No schema required. Graph data is not forced into a structure like a relational table, and attributes can be added and removed easily. This is especially useful for semi-structured data where a representation in relational database would result in lots of NULL column values.
Simpler query in graph dedicated language.
More efficient data storage and query latency for datasets containing much more attributed relations between entities than the number of entities themselves.
More efficient for graph queries like “shortest path between two nodes”.

Disadvantages

you have to learn an additional language
Not general purpose DB. The use case is narrower: only graphs.
Not intended/proven to work in different kinds of environments and domains.

Trends

There are a couple of blogs like From graph DB to postgresql mentioning that the graph databases (neo4j) gets slower with complicated queries and increased data scale. On the other hand, Amazon unveil their new graph database Neptune.

There is also a trend to back to SQL. It is about the language, not the database type, although SQL is primarily used by RDBMS.

Conclusion

Pick graph databases when

your data is not highly structured,
your data contains lots of null values,
there is lots of relations between entities,
the set of queries is well defined and not a subject of change.

Pick relational databases when

you are doing bulk and mass queries over a single table or tables requiring few joins, RDBMS is a better choice (details),
the data is highly structured in predefined columns, i.e. the column values are not mostly null,
the environment is not well defined and volatile.