Report a Dgraph Bug
What version of Dgraph are you using?
$ dgraph version [Decoder]: Using assembly version of decoder Page Size: 4096 Dgraph version : v20.11.0 Dgraph codename : tchalla Dgraph SHA-256 : 8acb886b24556691d7d74929817a4ac7d9db76bb8b77de00f44650931a16b6ac Commit SHA-1 : c4245ad55 Commit timestamp : 2020-12-16 15:55:40 +0530 Branch : HEAD Go version : go1.15.5 jemalloc enabled : true
Have you tried reproducing the issue with the latest release?
What is the hardware spec (RAM, OS)?
kubernetes(gke) 3 alphas, 1 group - 10core, 10GB ram each, 512Gi ssds.
Steps to reproduce the issue (command/config used to run Dgraph).
Normal helm install from chart.
While troubleshooting slowness while inserting, seeing errors in the logs of all 3 alphas:
W0316 05:37:45.883308 21 mvcc.go:132] Error failed when calling List.rollup: while encoding: cannot iterate through the list: cannot initialize iterator when calling List.iterate: cannot read initial list part for list with base key 00000c6a622d68756e742e6e616d65020a633161: could not read list part with key 04000c6a622d68756e742e6e616d65020a6331610000000000000001: Key not found rolling up key [0 0 12 106 98 45 104 117 110 116 46 110 97 109 101 2 10 99 49 97] W0316 05:37:48.406653 21 mvcc.go:132] Error failed when calling List.rollup: while encoding: cannot iterate through the list: cannot initialize iterator when calling List.iterate: cannot read initial list part for list with base key 00000c6a622d68756e742e6e616d65020a633239: could not read list part with key 04000c6a622d68756e742e6e616d65020a6332390000000000000001: Key not found rolling up key [0 0 12 106 98 45 104 117 110 116 46 110 97 109 101 2 10 99 50 57]
(I have given you a couple lines of it here to show its many keys) ~1800 of these log entries in 2hr.
I have traced this through the code in v20.11.0 and have come to this function. Seems to be a badgerdb error from
txn.Get(). The cluster is 84 days old.
I am wondering if this means whatever is at that key is corrupt? I have seen this before on my development system and ended up just wiping it and trying again. This is in our production system for a live application so any help would be appreciated.