I am currently researching graph database solutions for a large dataset, approximately 1TB. The data model includes entities such as:

    Schools have classes
    Classes have subjects
    Classes have students
    Subjects have teachers

The challenge is that each data source is different. My initial plan was to preprocess and join the data before ingestion into Dgraph and then start the insertion process. However, I came across a tutorial on YouTube ( which recommends creating all nodes and predicates first, and then establishing relationships/edges. I would appreciate expert opinions on this approach.

Additionally, I am seeking advice on the best way to ingest large datasets, considering my data is in JSON format.

Hey @tahseen,

I think either approach would work. One advantage to joining the data prior to loading is the elimination of the extra time to stitch edges (and possibly introduce errors).

For a terabyte of data, you’ll definitely want to use the Bulk Loader. Have a look at the vlg repo, specifically the section on data loading: vlg/notes/3. Data at main · dgraph-io/vlg · GitHub. And there may be other things in this repo you might find useful as you attempt your import.