Slow Json Import

Hi,

I am trying to import blockchain data that is in the form of JSON. I have some custom things I have added to the JSON, but for the most part each JSON file is roughly 3mb, and I have a few hundred thousand. ~665Gb total data. Currently I am using dgo client and trying to commit each JSON after small preprocessing. preprocessing currently takes roughly .3 seconds, but when I go to commit using:

        mu := &api.Mutation{
          CommitNow: true,
          SetJson: []byte(a),
        }
        ctx := context.Background()
        txn := dg.NewTxn()
        defer txn.Discard(ctx)
        _, err := txn.Mutate(ctx, mu)

where a = JSON string of ~3mb, generating ~50k subjects. I have 2 indexes based on the hashes of blocks, transactions, and addresses.

The core issue is that EACH json commit takes 2+ minutes (I have tried both http & dgo client import). I can’t even get .0005% through my data in less than 3 minutes. I would like to bulk import all of these but they have to be in RDF format, and I can’t find the formatter in the source code. I would really like to convert JSON → RDF → bulk import RDFs.

Any help is appreciated.

You could try this : What rdf files? how can i create it - #3 by MichelDiz

Can you split up your json such that the number of potential subject-pred-obj are about a 1000 in each transaction?
When a json translates to 10s of k spo dgraph runs slow. Change it to about 1k spo and it flies.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.