Foreign Key Integrity

mvpmvh · August 9, 2020, 5:42pm

It is my understanding that If you provide an arbitrary value to the uid function, and there’s no matching uid for the provided value, dgraph will interpret this as an upsert, and create a new id. For example:

uid(known_reference) <foo> "foo"

Where known_reference references a known node, so it adds/updates predicate foo to value “foo”.

vs

uid(unknown_reference) <foo> "foo"

Where unknown_reference does not reference a known node, therefore it creates a new node, assigns it a new uid, and sets predicate foo to value “foo”.

Is there a way to force dgraph not to silently create a new node, but instead err?

Something like a uid!(unknown_reference) <foo> "foo" that results in some sort of NodeNotFoundErr

anand · August 10, 2020, 3:33am

Hi Michael, while we cannot change the behavior of the UID function, we can definitely detect what the upsert has done and raise the error on the client side. As a practical approach in this situation, you may consider the following:

After executing the function, when you check the uids node, look for a line like this:

"uids": {
"uid(unknown_reference)": "0xc"
}

This typically means that a match was not found and a new node was created. In case the uids node indeed has that "unknown_reference"entry, you can conclude that this is an exception. You can then do further error handling, such as adding the referred node and then re-executing the upsert.

In case a matching value exists, the new node will not be created and you should not see that line in the uids node.

mvpmvh · August 10, 2020, 12:27pm

To be clear, I’m not suggesting that you make a backwards-incompatible change like having uid(unknown_reference) raise an exception instead of silently performing an upsert (today’s behavior). What I am suggesting is adding a new function (e.g. uid!(unknown_reference)) that raises the exception at the db layer. Your suggested approach would require the client to do work that, arguably, should be the database’s responsibility. Additionally, if the client does check the response output and sees that a new node is created, then the client has to make another call to the database to try and delete all the data that was erroneously just created.
Please reconsider adding a new function (e.g. uid!) that makes it easier for developers to trust their database in production.

anand · August 10, 2020, 1:16pm

On this point:

In scenarios like master data management, a database cannot reject information because certain attributes are of poor data quality could exist in reality, such as the reference key integrity issues you mentioned. In such cases we still want to detect (and perhaps even store) the poor quality data with some kind of fixing/standardization process kicked off. IMO, We might not want to remove this flexibility from the client side.

tagging @pawan for his thoughts on this topic.

dmai · August 10, 2020, 6:17pm

@mvpmvh You can use len(v) for a variable v along with @if to only run mutations if it’s an update, not inserting a new node: https://dgraph.io/docs/mutations/conditional-upsert/. For example,

upsert {
  query {
    v as var(func: eq(name, "abc"))
  }
  mutation @if(gt(len(v), 0)) { # only runs if the node exists already
    set {
      uid(v) <name> "abc123" .
    }
  }
}

mvpmvh · August 11, 2020, 10:24pm

If I’m understanding your response, you’re stating that dgraph shouldn’t raise an exception when there’s “bad data”, because the application may have a custom process to resolve the situation. If that’s what you’re saying, that’s fine, I agree, but that doesn’t really change anything. Clients can continue to handle that use case by using uid and checking the json response (like you mentioned earlier). Adding a new uid! function would support a separate use case (a common use case), without removing any use cases that exist today. It is strictly an additive change.

mvpmvh · August 11, 2020, 10:49pm

At best, that is a temporary workaround until a more concrete solution (e.g. uid! is implemented). I do not consider your suggestion a viable longterm solution because it is not explicit. As a client, I have to read the response and make an assumption that if no data was written, then it must be some sort of data integrity violation. In your simple example above, that may be true, but in practice, there could be a number of reasons why data was not written. I could have an if statement that says don’t write data if the user’s age value is even; don’t write data if the user’s email provider is not gmail. These are all arbitrary use cases, but my point is that there’s a difference between business logic and schema integrity. I would much prefer to have an explicity NodeNotFoundException returned from dgraph instead of an upsert silently being ignored.
I’m using the go client. I would like to be able to do something like this:

// an error is returned when a foreign key constraint is not met
if _, err := txn.Do(ctx, req); err != nil {
  return err
}

I think your suggestion would require me to unmarshal the response (because no error would be returned) and make an assumption as to whether or not I need to return an error. That feels odd to me. If I submit a malformed query, I get back an error; I don’t have to unmarshal the response and check to see if I need to return an error–the database handles database issues for me.

Topic		Replies	Views
How to update existing node without creating new node? Dgraph kind:question	1	469	January 5, 2021
DGraph Go client upsert, returning UID Dgraph	2	700	March 18, 2020
Upserting for a list of predicates Dgraph	10	671	June 24, 2020
1.1.1RC1: Upsert block without node creation Users	3	408	January 11, 2020
Update without insert Dgraph kind:question , dgraph	3	501	November 25, 2020

Foreign Key Integrity

Related topics