Clarify scope of @upsert

damienburke · May 23, 2020, 9:06am

Hi, I found this article useful and mostly understand Complete working example of an upsert operation
One aspect I want to clarify / get some advice on:

"The  `@upsert`  directive checks for conflicts for concurrently running txns mutating the same data types.... 
 e.g.,  `_:node <foo> "name" .` . "

To further recap the linked example, we have the schema:

<foo>: string @index(exact) @upsert .

Q.1 So in this context, does “data” refer to ? And therefore we can only have one active transaction writing any foo values?

To answer by way of example, the following concurrent transactions will (obviously) fail:

_:node <foo> "name" . .
_:node <foo> "name" . .

, but what I was wondering could the following work:

_:node_1 <foo> "name" . .
_:node_2 <foo> "(other) name" . .
_:node_3 <foo> "name" . .

Based on my testing, it does not. For a high volume system, I am therefore finding it hard to see if the pros out weigh the cons for the @upsert directive.

Pros: No duplicate data
Cons: Potentially a lot of failures when trying to create different instances of the same data (in concurrent transactions)

Thanks

damienburke · May 24, 2020, 1:31pm

Working now…

My predicate had a count in it, i.e.

‘: string @index(exact) @upsert @count .’

, which seems to break @upsert?
I can do bit more testing and refine my question, etc.

damienburke · June 3, 2020, 4:28pm

This issue is happening again. I thought my tests were verifying/clarifying things for me. Its possible i have inadvertently introduced a bug (but my code is v light at mo). I can revert/test earlier code, but below question still alot of value for me to understand…

I will summarise my use case. Firstly here is my schema:

ocode: string @index(term) @upsert .
userseed: string @index(term) @upsert .

Here are 2 concurrent upserts:

"{ data(func: eq(ocode, "21543710-96ac-11ea-b073-5b088a523211")) {v as uid, ocode} }"
mutations {
 set_json: "{"types":["OCODE"],"uid":"uid(v)","ocode":"21543710-96ac-11ea-b073-5b088a523211","browsed":null}"
}

"{ data(func: eq(ocode, "ed829200-96ac-11ea-b073-5b088a523213")) {v as uid, ocode} }" 
mutations {
set_json: "{"types":["OCODE"],"uid":"uid(v)","ocode":"ed829200-96ac-11ea-b073-5b088a523213","browsed":null}"
}

The first upsert works, the second one fails with the exception:

Exception with upsert for ID “ed829200-96ac-11ea-b073-5b088a523213”, Label ocode, Node OcodeIdentity(ocode=ed829200-96ac-11ea-b073-5b088a523213, browsed=null)
java.lang.RuntimeException: java.util.concurrent.CompletionException: io.dgraph.TxnConflictException: Transaction has been aborted. Please retry

Is this expected?
Thanks

damienburke · June 3, 2020, 5:44pm

And working again - this time by changing my index types:

buffer.append("ocode: string @index(exact) @upsert . \n");
buffer.append("userseed: string @index(term) @upsert . \n");

So ocode is now exact
I left userseed as term
The upsert logic for both nodes is the same though, and fully expect my userseed upserts to break if i throw enough concurrency at it… Can someone explain the relationship between upserts and indexes - and any other considerations. Thanks

minhaj · June 16, 2020, 6:06am

Thanks, @damienburke for raising your concern. The @upsert directive checks for the conflicts for concurrently running txns mutating the same data. And the way it is done also depends on the type of index on that predicate. Data is matched using a key which is a function of predicate + value token(generated by index) and tokens depend on the type of index we are using. For example,
let’s take the following concurrent txns :
_:node_1 <foo> "name" . .
_:node_2 <foo> "(other) name" .
If we use hash Indexing on predicate foo then the key generated for both of them will be different as in the first case it depends on foo+"name" and foo+ "other name" in the second case respectively. But if we use term indexes on predicate foo then 2 keys will be generated in the second case(because of 2 tokens "other" and "name") which will be a function of foo+"other" and foo+"name" which gives us conflict with the key of the first transaction and hence one of the txn will abort. I hope it will help you understand the relation between upsert and indices. Feel free to ask any follow-up question.

system · July 16, 2020, 6:07am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Clarifying my understanding of the @upsert directive Users	1	728	July 5, 2018
Complete working example of an upsert operation Users	10	2891	July 3, 2019
@upsert directive is important to detect conflicts Dgraph	7	1486	June 29, 2018
When to use the upsert directive Users	3	1948	November 1, 2018
Something wrong with upsert Dgraph kind:question , area:querylang	4	803	November 23, 2020

Clarify scope of @upsert

Related topics