I haven’t been able to reproduce this exact issue but it looks like something else is wrong. When calculating the sizes, the function skips all the tables with the following errors.
The error happens when trying to read the biggest key of the table and to parse it into a Dgraph key. Note that the smallest (left) key can be read without issue. Also, the value is the same in all four tables. Maybe there’s a special key at the end of a badger table?
I don’t think there’s an error with Dgraph itself because my cluster is working fine.
@jarifibrahim Do you have any insight into why the right keys of the tables might be different than what Dgraph expects?
@martinmr, the last time we spoke to @manishrjain he suggested that it’s okay to skip some tables. @parasssh would also remember this discussion.
So to deal with this should I iterate through the table backwards until I find a valid key? Can I do something like dgraph-io/badger#1309 but from the dgraph side?
The tables are not accessible out of badger. To perform a reverse iteration you would need access to the table and table iterator. The tables are not exposed. The db.Tables(..) call returns TableInfo, not the actual tables. We can expose the tables from badger and then dgraph can iterate over them (however it needs to).
Correct. The tablet size is really just a rough estimate. Unless the entire table consists of the keys from the same predicate, dgraph will skip it in the tablet size calculation.
Having said that, I think we should have the TableInfo.Right point to the rightmost valid key instead of badger internal key so the error is not seen on dgraph. After all, the field Right is exported and so applications may access it presuming it to be its valid key (and not internal badger key).
Alternatively or Additionally, on dgraph side, we can make our tablet size calculation only rely on the Left field of each TableInfo entry. As long as two consecutive Left keys have same predicate, we include it in the calculation.
@jarifibrahim I implemented what @parasssh suggested above. When I load the 1 million dataset I get a total size of 3.4GB. However, the size of the p directory (in a cluster with only one alpha running for simplicity) is 210MB.
One thing I don’t know is whether the estimated size knows how to deal with compaction. Is the size the size of the uncompressed or compressed data? Maybe that could explain the difference I am seeing.
Otherwise, I think there’s something wrong with the values EstimatedSz is reporting. The logic in the dgraph side is fairly simple and I haven’t seen any other issue than the one mentioned above (which in any case is under-reporting the numbers so it doesn’t explain the situation the user is seeing)…
When I load the 1 million dataset I get a total size of 3.4GB. However, the size of the p directory (in a cluster with only one alpha running for simplicity) is 210MB.
One thing I don’t know is whether the estimated size knows how to deal with compaction. Is the size the size of the uncompressed or compressed data? Maybe that could explain the difference I am seeing.
@martinmr How did you test this? Do you have steps that I can follow? This could be a badger bug, maybe some issue with how we do estimates in badger. The size is the estimated size of the uncompressed data but compression cannot make such a huge difference. This is definitely a bug. Let me how you tested it and I can verify it in badger.