Hi,
I am trying to import blockchain data that is in the form of JSON. I have some custom things I have added to the JSON, but for the most part each JSON file is roughly 3mb, and I have a few hundred thousand. ~665Gb total data. Currently I am using dgo client and trying to commit each JSON after small preprocessing. preprocessing currently takes roughly .3 seconds, but when I go to commit using:
mu := &api.Mutation{
CommitNow: true,
SetJson: []byte(a),
}
ctx := context.Background()
txn := dg.NewTxn()
defer txn.Discard(ctx)
_, err := txn.Mutate(ctx, mu)
where a = JSON string of ~3mb, generating ~50k subjects. I have 2 indexes based on the hashes of blocks, transactions, and addresses.
The core issue is that EACH json commit takes 2+ minutes (I have tried both http & dgo client import). I can’t even get .0005% through my data in less than 3 minutes. I would like to bulk import all of these but they have to be in RDF format, and I can’t find the formatter in the source code. I would really like to convert JSON → RDF → bulk import RDFs.
Any help is appreciated.