Steps to reproduce the issue (command/config used to run Dgraph).
Normal helm install from chart.
While troubleshooting slowness while inserting, seeing errors in the logs of all 3 alphas:
W0316 05:37:45.883308 21 mvcc.go:132] Error failed when calling List.rollup: while encoding: cannot iterate through the list: cannot initialize iterator when calling List.iterate: cannot read initial list part for list with base key 00000c6a622d68756e742e6e616d65020a633161: could not read list part with key 04000c6a622d68756e742e6e616d65020a6331610000000000000001: Key not found rolling up key [0 0 12 106 98 45 104 117 110 116 46 110 97 109 101 2 10 99 49 97]
W0316 05:37:48.406653 21 mvcc.go:132] Error failed when calling List.rollup: while encoding: cannot iterate through the list: cannot initialize iterator when calling List.iterate: cannot read initial list part for list with base key 00000c6a622d68756e742e6e616d65020a633239: could not read list part with key 04000c6a622d68756e742e6e616d65020a6332390000000000000001: Key not found rolling up key [0 0 12 106 98 45 104 117 110 116 46 110 97 109 101 2 10 99 50 57]
(I have given you a couple lines of it here to show its many keys) ~1800 of these log entries in 2hr.
I have traced this through the code in v20.11.0 and have come to this function. Seems to be a badgerdb error from txn.Get(). The cluster is 84 days old.
I am wondering if this means whatever is at that key is corrupt? I have seen this before on my development system and ended up just wiping it and trying again. This is in our production system for a live application so any help would be appreciated.
I have the Error failed when calling List.rollup on a cluster I spun up 2 weeks ago as well. From where this error comes from(reading the posting list) it would surprise me if it was benign.
yep, ~4100 times in 8h on one of my alphas, 51 times on another (over 30m actually), 0 on the third (in the non-rotated logs anyway).
note this is the same cluster as this issue with an alpha being completely behind on raft applied index. The 4100 is not on the one with a super behind RAFT.
Hi @iluminae,
Not sure if you are still having this issue, but I was also running into it. In my situation, the (corrupt?) keys narrowed down to certain types that had a reverse predicate pointing towards them. I dropped the reverse edge index and recreated/reindexed it. Not an ideal solution, as I have no idea what caused it in the first place, but got everything running as it should again.
I am also on v21.03. That is interesting though, I guess I wasn’t quite as lucky with my upgrade.
Edited:
Hmm, looks like the same issue popped back up. I can do the same “trick” again and it seems to temporarily fix the issue.
If helpful:
What version of Dgraph are you using?
Dgraph codename : rocket
Dgraph SHA-256 : b4e4c77011e2938e9da197395dbce91d0c6ebb83d383b190f5b70201836a773f
Commit SHA-1 : a77bbe8ae
Commit timestamp : 2021-04-07 21:36:38 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true
For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph , visit http://discuss.dgraph.io.
For fully-managed Dgraph Cloud , visit https://dgraph.io/cloud.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.
Have you tried reproducing the issue with the latest release?
@balazshoranyi I actually just started seeing this again on my current cluster, exactly as originally posted.
Really, I do not even know if it is causing a devastating issue or not! It would be very difficult for me to identify if a few of my billions of values were missing or something.
Current cluster specs:
Dgraph version : v21.03.1
Dgraph codename : rocket-1
Dgraph SHA-256 : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1 : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true
For me there was no errors, just logs. No idea if it was effecting the result set or not. But I have rebuilt my cluster maybe 15 times since this issue was filed.