Correct, uid
is reserved but not id
.
Curious on how you will be testing 21.12 I figure an export/import. And then I wonder that since the differing nodes are identical if it is not an overflow problem somewhere. I really wish someone from the @core-devs would chime in here. This is above my pay grade (sarcastic idiom).
What I would like to know is:
- Is there any difference in storage on disk for the
dgraph.type
predicate vs any other predicate? - Since this
type(Foo)
function is equivalent to and maybe even implemented with the same piece of code as theeq(dgraph.type,"Foo")
function, can this problem be duplicated if it was a different predicate where 5 million plus nodes had the same value. Which leads me to… - maybe this is related to an overflow on indexes. Is there any limit on how many predicates with the same data can be indexed? A reverse index would mean that the key would be for one value and the value would contain a list of 5 million plus (uids?) — Not sure if an index points to the uid of the node or something else.
- Or if maybe there is an actual limit to how many items can be in a posting list. And this might have changed fundamentally in 21.12 with sroar. Were there any bug fixes that sroar implementation fixed or was it truly all performance related?
Don’t know who to tag here to actually get eyes on this issue. It really that Dgraph support has crippled so much now. @MichelDiz and I can only do so much. I don’t know if discuss is actually part of the job description for Michel but I am here of my own free will/time and not an expert in golang that is needed to dig into problems like this. I really am for @mrjn to address this concern here soon as promised over on this thread: What is Dgraph lacking? - #78 by mrjn
I guess let me end with this question for @purist180, can you reproduce this problem anyway without having 5 million+ nodes of the same type? How could I replicate this problem in the simplest form, would just creating 5 million+ nodes of a type suffice to replicate? I have a little bare metal machine that I packed with RAM and can install ubuntu server and dgraph and try to replicate this with you, but need to know a sure way to replicate without having all of your dataset.
If you were a Dgraph cloud user, I would suggest you open a support ticket for this, but I am assuming you are not since you were using such an older version.
^ This may indicate that this problem is fixes as I suspected above. Or maybe it breaks this even further and would return zero nodes for the type??
In v21.12, we have added a flag to forbid any key which has greater than 1000 splits