Batch upserts in dgo

What I want to do

I’m wondering if there is any way I can do batch upserts into dgraph.

I understand dgraph live loader has that functionality and I feel like it would be super handy in dgo, too.

I’m basically trying to upsert a bunch of predicates into dgraph and I don’t want to do it one by one. Mind you we’re not talking about gazillions of entries. At the same time it’s not a small amount and doing a one-by-one roundtrips is extremely inefficient.

Now, a single upsert is pretty simple, I do something like this:

	query := `
	{
		entity(func: eq(xid, "` + e.XID()+ `")) {
			e as uid
		}
	}
	`

	obj := &Entity{
		UID:       "uid(e)",
		XID:       e.XID(),
		Name:      e.Name(),
		DType:     []string{"Entity"},
	}

    // do the JSON encoding dance here
	mu := &dgapi.Mutation{
		SetJson: pb,
	}

	req := &dgapi.Request{
		Query:     query,
		Mutations: []*dgapi.Mutation{mu},
		CommitNow: true,
	}

    // execute the transaction

Now, this is all nice, etc., but I’m not sure how would I go about doing a batch upsert, in particular, I’m not sure what should the query should look like.

Creating a dgapi.Mutation for every item in the batch and appending it to Mutation slice is indeed possible, but the problem there is the query which is “global” per dgapi.Request, not per mutation.

Am I missing something or is this not possible and I really do need to do one-by-one upsert?

Dgraph metadata

dgraph version
[Decoder]: Using assembly version of decoder
Page Size: 4096

Dgraph version   : v20.11.0
Dgraph codename  : tchalla
Dgraph SHA-256   : 8acb886b24556691d7d74929817a4ac7d9db76bb8b77de00f44650931a16b6ac
Commit SHA-1     : c4245ad55
Commit timestamp : 2020-12-16 15:55:40 +0530
Branch           : HEAD
Go version       : go1.15.5
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs/.
For discussions about Dgraph     , visit https://discuss.dgraph.io.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.

I’m the current maintainer of dgo. I’m abit swamped atm. If you send a PR I’d be happy to accept it.

Yeah, I feel ya, time is my enemy, too :sweat_smile:

I should have some free time next week hopefully, so will give it a proper think and hopefully hack something up :crossed_fingers:

Yea I do this by hashing the variables coming out of the query for use across all mutations. My use case is streaming data into dgraph from pubsub and I want to batch unrelated data into fewer round-trips to the server. My only query is a bunch of:

query {
  UidForX as var(func: eq(tenant.xid,"X")))
  UidForY as var(func: eq(tenant.xid,"Y")))
}

Just to do a xid:uid lookup for every node in the mutations. My problem of course is failure in any part of the batch fails the whole batch and have to try again later. (Happens mostly from aborts since multiple pods are writing to dgraph simultaneously). My personal wish list for mutations is a gRPC streaming api… But I don’t know what the semantics would be exactly.

Just wanted to put my use case on here in case it helps with design.