Transactions, mutations and blank nodes

Hi,

How many mutations is recommended/optimal to add to a transaction before committing, and how many triplets per mutations?

A related questions is about the scope / life span of blank nodes? I was hoping to use blank nodes created from our own entity id:s to be able to insert a lot of data, and have edges created “on the fly”. Can’t get this to work the way I had hoped, and believe I read that the blank nodes are only valid within the individual mutations. Is that true? If so, then how is this supposed to be done?

The recommended is 1000 triples per batch (not mandatory). And you can divide this value to get the total of mutations you can send. e.g: Your mutation has 4 fields, so you can send 250 mutations in a single batch. But in the same transaction we don’t have a limit. That is a thing to be discovered based on each context of resources.

Blank nodes lives in the transaction context. You can’t use it in other ones. There are ways to record it as XIDs tho. That way you can use the blank node as a fixed identification. But I personally don’t recommend you rely on this.

If you wanna use it, please check this link https://dgraph.io/docs/deploy/fast-data-loading/#other-bulk-loader-options The flags you need are --xidmap or --store_xids

It should be per transaction. Need to check.

Thanks for a quick reply!

That --xidmap option seems like what I had hoped for (even if you didn’t recommend it). It’s a pity that something like that is not available when inserting programatically.

We plan to insert terabytes of data and we know all the nodes and edges upfront.

I would also combine with --store_xids cuz you can use upsert block or even future features in live or bulk that will use that flag. As long you save the XID, more safe is. And make 10000000% sure that all blank nodes are unique.

That’s nice, hope you love it. If you are open to share your experience in the end of you journey. Please share here a showcase or something.

Cheers.

I’ve been testing the Bulk Loader with that --xidmap option for some time now. It doesn’t work well. The problem is that it consumes way too much memory. I never managed to load more than 4% of the total data. At that point the process has consumed 400G RAM and crashes.

Running the Bulk Loader right now without the --xidmap option. This seems to progress just fine.

Blank nodes live in the transaction context, right? What does this translate to with the Bulk Loader?

Our main entities have id:s that we use to construct their blank nodes. Then entities have attributes that may reference other entities (using that id / blank node). The entities also contain/reference nested complex types. These need to be modelled as separate nodes, and therefore also need to be assigned blank nodes at insertion (we just generate something random). This is just something temporary necessary to model the structures.

I don’t understand what needs to be done after using the Bulk Loader without the --xidmap option?

Should --xidmap result in such high memory consumption? seems a bit excessive?

Dgraph version : v20.11.0
Dgraph codename : tchalla
Dgraph SHA-256 : 8acb886b24556691d7d74929817a4ac7d9db76bb8b77de00f44650931a16b6ac
Commit SHA-1 : c4245ad55
Commit timestamp : 2020-12-16 15:55:40 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : true