Complete working example of an upsert operation

I’m looking for a complete, working example of an upsert operation that I can basically paste in to the mutation section on Ratel. I’ve tried many, many queries but have not seen anything but either duplicate values or errors.

Ultimately, I would like something like this:

Schema:

<foo>: string @index(exact) @upsert .
<is_a_bar>: uid @count @reverse .
<bar>: string @index(exact) @upsert .

and I’d like to be able to add is_a_bar to an existing foo (and add a new bar if necessary), without creating a duplicate.

Based on what I see in the documentation, it seems like this should be automatic: https://docs.dgraph.io/query-language/#upsert-directive . However, I’ve been searching comments here and see a lot older discussions suggesting you need to do your upsert manually: first run a query to see if the value exists, create it if it doesn’t, then you can create the value, else use the existing value. That would work, of course, but it means that @upsert isn’t useful (for me). I mean, I can manually upsert by hand but I was hoping that the database would take care of it for me. I also found this, which seems to back up the comments: https://docs.dgraph.io/howto/#upserts

I’m hoping I’m just missing something obvious here. If upserts have to be done manually in client code, what does @upsert do?

1 Like

This is what you should do.


The @upsert directive checks for conflicts for concurrently running txns mutating the same data. It does not check for existing values that have already been committed. This is useful to ensure that the first insert is unique.

The following example should clarify what @upsert does. TxnA and TxnB are concurrent transactions both running mutation to the same predicate and value—e.g., _:node <foo> "name" .. The txn operations can be represented serially over Time. Whichever txn commits first succeeds.

With @upsert:

Time    TxnA     TxnB
1       NewTxn
2                NewTxn
3                mutate
4       mutate
5       commit            (success)
6                commit   (abort)

Without @upsert:

Time    TxnA     TxnB
1       NewTxn
2                NewTxn
3                mutate
4       mutate
5       commit            (success)
6                commit   (success)

In v1.1 Dgraph will have a upsert block that allows you to do a query-and-mutate within a single call: New upsert block by mangalaman93 · Pull Request #3412 · dgraph-io/dgraph · GitHub

1 Like

I see, thanks. I’m glad upsert is being added as a feature in the future. If I may, I’d suggest renaming @upsert to something else, because it doesn’t seem to have anything to do with upserting. Better names would be @unique or @distinct.

You can do upserts with transactions today and the @upsert directive. The upcoming upsert block will simplify the operation to a single network call to Dgraph.

Appreciate the suggestion. Ultimately the directive does enable upserts. I think the confusion is how and when the directive works. The directive checks for index conflicts for concurrent transactions at commit-time.

The suggested names imply that already-committed data is automatically checked, which isn’t the case.

Oh, I misunderstood. I’m not really clear on how useful @upsert is, then, given how precise the scenario must be. If I’m now understanding you right, to put it concretely and in the format used earlier, you’re saying that this would result in duplicate values:

Time    TxnA     TxnB
1       NewTxn
2       mutate   NewTxn
3       commit            (success)
4                NewTxn
5                mutate
6                commit   (success)

and therefore you’d need to run some post-processing to clean up all of the duplicates?

(Edited: I don’t know why some of the strings are bolded, sorry about that.)

With @upsert, there would be no duplicate values for the initial insertion.

TxnB should only have a single NewTxn call. Otherwise, it’s a new and separate transaction.

Any new txn after the first successful commit would see the existing UIDs to do an upsert mutation.

Any new txn after the first successful commit would see the existing UIDs to do an upsert mutation.

But only if they also do a read query within the second transaction to figure out if there’s already a UID for the value, right?

Still trying to figure this out – I think i made things worse, though. I just noticed I messed up my example. I didn’t mean to include the NewTxn on line 2.

Yes, a query-then-mutate is always necessary for upserting.

OK, thanks for explaining.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.