Performance: storing large documents associated to nodes

I have a dataset which consists of 2 billion nodes.

They have a unique JSON document associated with each of them which on average is 50,000 characters in length. It has to be stored as one whole document and cannot be broken up into other fields as there is no set schema. I will also need to keyword search on the JSON document.

Do I store the document as an edge, or would it be better to use dgraph to store the relationships between the nodes, and a NoSQL database to store the document? At this sort of scale, what would be the most performant?

Hum, you can create a sort of “wrap” to your documents. Put the docs in unique nodes. Inside a predicate/edge. Let’s say, you encode this document to base64 and put it in a String. It gonna be okay.

If you need extra references for your document. Just add other attributes for that node where the document lives.

If you want to have logical control of your document via graphql+-, you would need to transform it into the Dgraph Schema. So the best you can do is encode it to base 64 or similar and add “tagging” in each node.

@stevenayers would you mind to share a basic structure? (Do a JSON like) how do you imagine the structure your graph gonna have? So I can help you better visualize it.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.