I have a use case that essentially looks like…
- Millions (billions maybe? eventually?) of documents. Each document is a mix of various attributes (either simple or arrays) + full-text + location (optional), and so forth.
- From these documents, we are extracting entities & relationships + performing various enrichments. Generally we expect the number of entities to grow quickly at first, then have the slope level off.
- Relationships can be among/between entities, documents, etc. Queries include full-text search, geographic, graph, simple attribute comparison, etc.
One way of course to do this, is to store the documents in something like Cassandra and store the graph in… a graph database. In this case we basically store a pointer to the document in the graph database.
Another is to store everything in one database.
My question is how well DGraph supports this latter use case and if so, any gotchas/design suggestions you recommend to minimize refactoring downstream.
My frame of reference on this sort of problem is ArangoDB & Neo4J.