Running the same 'set' mutation twice with the uids specified creates duplicate nodes

kevin.obrien · June 9, 2020, 9:02am

I am using the Python client. When I run the following code twice, two exact same duplicates of the nodes are created in my graph, except that they have different uids, even though I have specified the uids in the mutation data. How can I ensure that the same data isn’t added twice?

client_stub = dg.DgraphClientStub("localhost:9080")
client = dg.DgraphClient(client_stub)
txn = client.txn()

customer_id = 38544202
label = 'FRAUD'
email = 'abc@def.com'
phone_number = '0581011234'
device_ids = ['42DA931A-F58A-4BB7-A4DD-EEAAF10DAEFD']

data = [
        {
            "uid": f"_:{customer_id}",
            "customer_id": customer_id,
            "label": label,
            "has_email_address": [{"uid": f"_:{email}", "email_address": email}],
            "has_phone_number": [
                {"uid": f"_:{phone_number}", "phone_number": phone_number}
            ],
            "has_device_id": [
                {"uid": f"_:{device_id}", "device_id": device_id}
                for device_id in device_ids
            ],
        }
]

txn.mutate(set_obj=data)
txn.commit()

Neeraj · June 9, 2020, 9:06am

You can first check if there’s any node present with that customer_id and if it doesn’t then you can insert a new node. Check this out once.

kevin.obrien · June 9, 2020, 10:46am

But I don’t understand - what is the point of me setting the UIDs if I can’t even use them then? Why is there an option to set the UID if it just seems to be overwritten by a UID generated by Dgraph when the node is created?

Neeraj · June 9, 2020, 11:42am

The UID option is for adding an edge between two nodes. For example

{
 set {
  _:kevin <name> "Kevin" . 
  
  _:neeraj <name> "Neeraj" .

  _:neeraj <friends_with> _:kevin .
 }
}

friends_with in this case is type UID which is adding an edge between nodes neeraj and kevin. Check it here.

ibrahim · June 9, 2020, 12:04pm

@kevin.obrien What you’re looking for is called conditional upserts https://dgraph.io/docs/mutations/#conditional-upsert . In the upsert block you can check for existence of the item and create it only if it doesn’t exists.

The uid you’ve set is just a property for dgraph. Dgraph will generate our own UID internally.

Anurag · June 9, 2020, 12:31pm

I suggest you can clean the schema and data while running your python client. You can add below to refresh your p, w and zw directories.

def drop_all(client):
    return client.alter(pydgraph.Operation(drop_all=True))

To your question of why it adds two nodes with the same information, the way you are using blank nodes is not going to achieve what you want it to. Let me try to explain how blank nodes work.

Blank nodes syntax is used when you are agnostic to what uid is assigned to the nodes which you are creating. Therefore you use blank nodes to just identify the same node while adding more predicates to it. Using @Neeraj’s example I added one more property to each node and I referenced that node by using the blank node syntax

{
 set {
  _:kevin <name> "Kevin" . 
  _:kevin <lastname> "Brian"
  
  _:neeraj <name> "Neeraj" .
  _:neeraj <lastname> "Battan" .

  _:neeraj <friends_with> _:kevin .
 }
}

Now to have specific uid’s for your nodes. This syntax is not correct: "uid": f"_:{customer_id}". What you actually want to do is either

<38544202> <label> "Fraud"
<38544202> <customer_id> 38544202
<38544202> <has_device_id> <uid-of-device>

or

p = [       {
			"uid": "38544202",
			"customer_id": "38544202",
			"has_device_id": "uid-of-device",
            },
           {
			"uid": "38544101",
			"customer_id": "38544101",
			"has_device_id": "uid-of-device",
            },
   ]

but there will be couple of issues going that way:

You don’t know the uid-of-device unless you create the device node in the same way.
Bigger problem is you can’t assign uids without leasing them from zero. You would have to hit /assign end point before assigning the uids and ensure that the uids you want to assign fall inside the range that you leased.

Since you already have the fields like customer-id, phone-id etc. in your schema I would suggest use conditional upserts like @ibrahim suggested. Another way is using xids. You can read about them here.

Tell me if this helps.

Topic		Replies	Views
Two nodes are getting created even if uid is same in the mutation Dgraph	7	286	July 27, 2021
Set mutation returns two uids for one entity - can't tell which is correct one Dgraph Clients untagged , dgraph4j	3	634	July 11, 2020
Dgraph-http-client setJson creates duplicate nodes GraphQL kind:question , dgraph	2	498	April 26, 2023
Upsert resulting duplicate node Dgraph dgraph	8	570	January 29, 2022
Why does this only create 1 node? Dgraph	1	353	April 22, 2022

Running the same 'set' mutation twice with the uids specified creates duplicate nodes

Related topics