This is a followup from a couple of posts by @calummoore linked below.
I am working on writing several inbound integrations to dgraph, and it would be extremely helpful to have a ‘unique’ tokenizer on string indexes that would guarantee that only one node in the whole database has the same value for that predicate. @calummoore, hopefully I am accurately representing your original request. Is this something you guys can see implementing in the short term?
Hi @calummoore, makes sense. I was hoping to have the guarantee of uniqueness without having to write the extra check in #2 above. Do you think this is overkill or would this be a useful feature regardless?
Another reason I can think of that enforced uniqueness would be useful is if there are multiple people developing an app on top of a given instance, and someone creates a bug that fails to check uniqueness before writing. Having this rock-solid uniqueness guarantee at the database level would still be a useful feature to have, IMHO.
Thanks @mrjn, but I’m not sure we’re referring to the same thing. The @urnary predicate seems it would apply to a Node → Predicate → Node relationship where there can be only one such relationship per origin node. But for the “unique index”, I was thinking:
Node -> Predicate with UIX -> "Unique literal value"
Where there is a database-wide guarantee that for “Predicate with UIX”, one and only one node has a given “Unique literal value”.
That way if I have ten source systems all sending entities into dgraph and populating “Predicate with UIX” with hypothetically conflicting values, I have a uniqueness guarantee at the database level that can never be violated even if someone messes up by not checking first to see if that value already exists. Basically extending the enforced uniqueness that UID’s have to other predicates as well.
In this example, we check that the email is unique, by using a hash index on email attribute.
That’s the benefit of transactions, such checks are possible completely via client code. If you have multiple people writing to Dgraph, they should all be running this logic transactionally.
I now reckon this is the same response as by @calummoore.
Thanks guys. As I understand it, transactions used in this way will ensure that uniqueness is not violated due to concurrent write operations conflicting with each other.
It doesn’t reach the original goal I had in mind of making it impossible to have a duplicate value for that predicate in the database, which could happen if someone wrote without checking to see if the value already exists. Maybe this is OK though, provided no one writes directly to the database but rather goes through an API that enforces this check for them.
My frame of reference for this request is from the relational database world, where a unique index on a column would prohibit a duplicate value ever existing in that table for that column. I like the “security blanket” that this database-level check provides, especially for cases where a violation would break the application. So that even if someone messes up with a mutation, the constraint can never be violated because the database would reject the transaction. But maybe this does not translate naturally to Dgraph or the use cases you anticipate?
Either way, thanks for your responses, and thanks @calummoore for writing the client library for node which I expect we’ll be integrating into our code base within the next couple of months.