If my system has multiple nodes, each with their own unique ID, which would be better?
Describing the node with a uniquely named predicate name. For example, if a company has an ID, it’d have the predicate “company_id”, and if a contact had an ID, it’d have the predicate “contact_id”
Describing the node with a generic name. For example, if a company has an ID, it’d have the predicate “id”, and if a contact had an ID, it’d also have the predicate “id”
Also each individual node would have a hash index to perform an eq() comparison.
I think option 1 might have better performance, but I’m not too sure. This would be in the use case of retrieiving a specific node based on their ID, where I guess if you already have the ID, then Dgraph would only have to look thorugh company nodes, compared to option 2 where there would be multiple nodes with an “id” predicate so it’d be slower. Am I correct?
Hi @gorillastanley Yes, this is correct. Having specific id field reduces the range of data to cover.
Here’s another thought from a design perspective for consideration: what if two different transactions report the same company id? This could happen because it came in from different sources and there are even slight difference in company names, and addresses (not obvious duplicates). In these cases, blank nodes (dgraph created ids) can help allowing you to first store these duplicates, and then merge them later on.
I’m confused on the example you gave. What you mean by two different transactions reporting the same company ID? Only one node with that specific company ID should exist
Agreed that only one node per customer id should exist. However, there could be periods of time when its not clear if a piece of information is a new node or has to be merged with an existing node. Imagine someone created the same company twice (with different company ids) because the name or address looked slightly different. In these type of situations where duplicates exist, one id would have to be eliminated. This may not apply to your scenario, but still wanted to pop the question. Please see this link for some more context.