Something wrong with tablet size calculation

Moved from GitHub dgraph/5408

Posted by Willem520:

What version of Dgraph are you using?

Dgraph v20.03.1

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

centos(126G)

Steps to reproduce the issue (command/config used to run Dgraph).

I noticed the tablet size is more than disk capacity
my machine disk capacity is
1

the p directory size is
2

when I used /state endpoint, I got the result
3

from the ratel, I got the result

md5id and gid size is about 6.1TB,but my disk capacity is 2.9T

Expected behaviour and actual result.

tablet size can calculate correctly.

Related to http://discuss.dgraph.io/t/ratel-predicate-capactiy

martinmr commented :

I haven’t been able to reproduce this exact issue but it looks like something else is wrong. When calculating the sizes, the function skips all the tables with the following errors.

alpha1    | I0611 23:25:56.138946      14 draft.go:1245] Calculating tablet sizes. Found 4 tables
alpha1    | I0611 23:25:56.139087      14 draft.go:1254] Unable to parse key: Invalid size 25185 for key [33 98 97 100 103 101 114 33 104 101 97 100 255 255 255 255 255 255 255 254]
alpha1    | I0611 23:25:56.139145      14 draft.go:1254] Unable to parse key: Invalid size 25185 for key [33 98 97 100 103 101 114 33 104 101 97 100 255 255 255 255 255 255 255 254]
alpha1    | I0611 23:25:56.139560      14 draft.go:1254] Unable to parse key: Invalid size 25185 for key [33 98 97 100 103 101 114 33 104 101 97 100 255 255 255 255 255 255 255 254]
alpha1    | I0611 23:25:56.139703      14 draft.go:1254] Unable to parse key: Invalid size 25185 for key [33 98 97 100 103 101 114 33 104 101 97 100 255 255 255 255 255 255 255 254]
alpha1    | I0611 23:25:56.139773      14 draft.go:1276] No tablets found.

The error happens when trying to read the biggest key of the table and to parse it into a Dgraph key. Note that the smallest (left) key can be read without issue. Also, the value is the same in all four tables. Maybe there’s a special key at the end of a badger table?

I don’t think there’s an error with Dgraph itself because my cluster is working fine.

@jarifibrahim Do you have any insight into why the right keys of the tables might be different than what Dgraph expects?

jarifibrahim commented :

@martinmr 33 98 97 100 103 101 114 33 104 101 97 is the !badger!head! key. The table contains keys inserted by dgraph but it also contains the internal keys inserted by badger. The biggest could be an internal badger key. See Unable to parse key: Invalid size for key · Issue #5026 · dgraph-io/dgraph · GitHub also.

. Also, the value is the same in all four tables. Maybe there’s a special key at the end of a badger table?

Each level 0 table has one !badger!head! key.

martinmr commented :

@jarifibrahim Ok. So to deal with this should I iterate through the table backwards until I find a valid key? Can I do something like TableInfo: Return valid Left key by jarifibrahim · Pull Request #1309 · dgraph-io/badger · GitHub but from the dgraph side?

jarifibrahim commented :

@martinmr, the last time we spoke to @manishrjain he suggested that it’s okay to skip some tables. @parasssh would also remember this discussion.

So to deal with this should I iterate through the table backwards until I find a valid key? Can I do something like dgraph-io/badger#1309 but from the dgraph side?

The tables are not accessible out of badger. To perform a reverse iteration you would need access to the table and table iterator. The tables are not exposed. The db.Tables(..) call returns TableInfo, not the actual tables. We can expose the tables from badger and then dgraph can iterate over them (however it needs to).

parasssh commented :

Correct. The tablet size is really just a rough estimate. Unless the entire table consists of the keys from the same predicate, dgraph will skip it in the tablet size calculation.

Having said that, I think we should have the TableInfo.Right point to the rightmost valid key instead of badger internal key so the error is not seen on dgraph. After all, the field Right is exported and so applications may access it presuming it to be its valid key (and not internal badger key).

Alternatively or Additionally, on dgraph side, we can make our tablet size calculation only rely on the Left field of each TableInfo entry. As long as two consecutive Left keys have same predicate, we include it in the calculation.

martinmr commented :

@jarifibrahim I implemented what @parasssh suggested above. When I load the 1 million dataset I get a total size of 3.4GB. However, the size of the p directory (in a cluster with only one alpha running for simplicity) is 210MB.

One thing I don’t know is whether the estimated size knows how to deal with compaction. Is the size the size of the uncompressed or compressed data? Maybe that could explain the difference I am seeing.

Otherwise, I think there’s something wrong with the values EstimatedSz is reporting. The logic in the dgraph side is fairly simple and I haven’t seen any other issue than the one mentioned above (which in any case is under-reporting the numbers so it doesn’t explain the situation the user is seeing)…

jarifibrahim commented :

When I load the 1 million dataset I get a total size of 3.4GB. However, the size of the p directory (in a cluster with only one alpha running for simplicity) is 210MB.

One thing I don’t know is whether the estimated size knows how to deal with compaction. Is the size the size of the uncompressed or compressed data? Maybe that could explain the difference I am seeing.

@martinmr How did you test this? Do you have steps that I can follow? This could be a badger bug, maybe some issue with how we do estimates in badger. The size is the estimated size of the uncompressed data but compression cannot make such a huge difference. This is definitely a bug. Let me how you tested it and I can verify it in badger.

martinmr commented :

  1. Use this branch: Fix: Change tablet size calculation to not depend on the right key. by martinmr · Pull Request #5656 · dgraph-io/dgraph · GitHub
  2. Change the tablet size calculation to happen once every minute instead of five.
  3. Live load the 1 million dataset.
  4. Wait for the tablet sizes to be calculated.

For simplicity, I used a cluster with 1 alpha and 1 zero.

EDIT: master now contains all the changes you need.

jarifibrahim commented :

@martinmr can you look at the badger code and figure out what’s wrong? The calculations are done here badger/builder.go at dd332b04e6e7fe06e4f213e16025128b1989c491 · dgraph-io/badger · GitHub

As far as another data point here, I also have impossible table sizes being reported from the /state endpoint

test2.xid - 1.1TB
test2.units - 531.1GB
test2.format - 320.6GB
avm.format - 305.2GB
test.units - 56.6GB
avm.has - 43.9GB

…but the disk itself is only 100GiB large (real sizes: p=3.3G w=258M)

master@501f33855aca5e1fa9b98ec75efb450d91e7b749

@ibrahim Can you look into this?

1 Like

On it. @iluminae let me run a few tests locally and get back.