Rollup - Error in readListPart()

Report a Dgraph Bug

What version of Dgraph are you using?

Dgraph Version
$ dgraph version
 
[Decoder]: Using assembly version of decoder
Page Size: 4096

Dgraph version   : v20.11.0
Dgraph codename  : tchalla
Dgraph SHA-256   : 8acb886b24556691d7d74929817a4ac7d9db76bb8b77de00f44650931a16b6ac
Commit SHA-1     : c4245ad55
Commit timestamp : 2020-12-16 15:55:40 +0530
Branch           : HEAD
Go version       : go1.15.5
jemalloc enabled : true

Have you tried reproducing the issue with the latest release?

no

What is the hardware spec (RAM, OS)?

kubernetes(gke) 3 alphas, 1 group - 10core, 10GB ram each, 512Gi ssds.

Steps to reproduce the issue (command/config used to run Dgraph).

Normal helm install from chart.
While troubleshooting slowness while inserting, seeing errors in the logs of all 3 alphas:

W0316 05:37:45.883308      21 mvcc.go:132] Error failed when calling List.rollup: while encoding: cannot iterate through the list: cannot initialize iterator when calling List.iterate: cannot read initial list part for list with base key 00000c6a622d68756e742e6e616d65020a633161: could not read list part with key 04000c6a622d68756e742e6e616d65020a6331610000000000000001: Key not found rolling up key [0 0 12 106 98 45 104 117 110 116 46 110 97 109 101 2 10 99 49 97]
W0316 05:37:48.406653      21 mvcc.go:132] Error failed when calling List.rollup: while encoding: cannot iterate through the list: cannot initialize iterator when calling List.iterate: cannot read initial list part for list with base key 00000c6a622d68756e742e6e616d65020a633239: could not read list part with key 04000c6a622d68756e742e6e616d65020a6332390000000000000001: Key not found rolling up key [0 0 12 106 98 45 104 117 110 116 46 110 97 109 101 2 10 99 50 57]

(I have given you a couple lines of it here to show its many keys) ~1800 of these log entries in 2hr.
I have traced this through the code in v20.11.0 and have come to this function. Seems to be a badgerdb error from txn.Get(). The cluster is 84 days old.

I am wondering if this means whatever is at that key is corrupt? I have seen this before on my development system and ended up just wiping it and trying again. This is in our production system for a live application so any help would be appreciated.

I also currently have a few of these:

E0316 06:28:51.040291      19 log.go:32] Unable to read: Key: [0 0 12 106 98 45 104 117 110 116 46 110 97 109 101 2 10 45 112 111], Version : 90863807, meta: 70, userMeta: 8 Error: file with ID: 1966 not found

which is a error I had reported with this issue in december of last year on the RC of v20.11

I have the Error failed when calling List.rollup on a cluster I spun up 2 weeks ago as well. From where this error comes from(reading the posting list) it would surprise me if it was benign.

Anyone else see this on their clusters?

@iluminae are you still having this issue? I have no idea what is it. But if you still have it, I can ping someone.

yep, ~4100 times in 8h on one of my alphas, 51 times on another (over 30m actually), 0 on the third (in the non-rotated logs anyway).

note this is the same cluster as this issue with an alpha being completely behind on raft applied index. The 4100 is not on the one with a super behind RAFT.