Performance: storing large documents associated to nodes

stevenayers · December 11, 2019, 9:51am

I have a dataset which consists of 2 billion nodes.

They have a unique JSON document associated with each of them which on average is 50,000 characters in length. It has to be stored as one whole document and cannot be broken up into other fields as there is no set schema. I will also need to keyword search on the JSON document.

Do I store the document as an edge, or would it be better to use dgraph to store the relationships between the nodes, and a NoSQL database to store the document? At this sort of scale, what would be the most performant?

MichelDiz · December 12, 2019, 12:02am

Hum, you can create a sort of “wrap” to your documents. Put the docs in unique nodes. Inside a predicate/edge. Let’s say, you encode this document to base64 and put it in a String. It gonna be okay.

If you need extra references for your document. Just add other attributes for that node where the document lives.

If you want to have logical control of your document via graphql+-, you would need to transform it into the Dgraph Schema. So the best you can do is encode it to base 64 or similar and add “tagging” in each node.

@stevenayers would you mind to share a basic structure? (Do a JSON like) how do you imagine the structure your graph gonna have? So I can help you better visualize it.

system · January 11, 2020, 12:02am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DGraph as a key-value/graph hybrid (multimodal)? Dgraph	5	813	February 7, 2020
What can I store? Users	7	782	June 11, 2018
Storing and querying historical data Users	11	2519	September 14, 2018
Storing large JSON structures in string properties Users	2	398	January 18, 2020
Can dgraph be a good fit or must use also a key-value db Users	2	763	November 7, 2017

Performance: storing large documents associated to nodes

Related topics