Dgraph fails to recover from crash with - hangs in step "TryAbort selectively proposing only aborted txns"

After a crash during write-intensive operations, Dgraph is no longer able to recover. Dgraph is now stuck in an infinite loop involving “TryAbort” tasks that apparently are not correctly performed.

Logs without debugging logs enabled:

I1029 13:46:32.730499      16 groups.go:888] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
I1029 13:46:32.730542      16 groups.go:900] Got Zero leader: dgraph-zero-0.dgraph-zero-headless.parser.svc.cluster.local:5080
I1029 13:47:27.738407      16 draft.go:1545] Skipping snapshot at index: 336574. Insufficient discard entries: 0. MinPendingStartTs: 603186
I1029 13:48:27.754938      16 draft.go:1545] Skipping snapshot at index: 336574. Insufficient discard entries: 0. MinPendingStartTs: 603186
I1029 13:49:27.761498      16 draft.go:1545] Skipping snapshot at index: 336574. Insufficient discard entries: 0. MinPendingStartTs: 603186
I1029 13:50:27.750432      16 draft.go:1545] Skipping snapshot at index: 336574. Insufficient discard entries: 0. MinPendingStartTs: 603186
I1029 13:51:27.754296      16 draft.go:1545] Skipping snapshot at index: 336574. Insufficient discard entries: 0. MinPendingStartTs: 603186
I1029 13:52:27.755115      16 draft.go:1545] Skipping snapshot at index: 336574. Insufficient discard entries: 0. MinPendingStartTs: 603186
I1029 13:52:27.755368      16 draft.go:1381] Found 2 old transactions. Acting to abort them.
I1029 13:52:27.768884      16 draft.go:1342] TryAbort 2 txns with start ts. Error: <nil>
I1029 13:52:27.768914      16 draft.go:1365] TryAbort selectively proposing only aborted txns: txns:<start_ts:603187 > txns:<start_ts:603186 > 
I1029 13:53:27.767613      16 draft.go:1545] Skipping snapshot at index: 336574. Insufficient discard entries: 0. MinPendingStartTs: 603186
I1029 13:53:27.767784      16 draft.go:1381] Found 2 old transactions. Acting to abort them.
I1029 13:53:27.778257      16 draft.go:1342] TryAbort 2 txns with start ts. Error: <nil>
I1029 13:53:27.778282      16 draft.go:1365] TryAbort selectively proposing only aborted txns: txns:<start_ts:603187 > txns:<start_ts:603186 > 

This block is then repeated forever and alpha is hanging and not answering any query or mutation:

I1029 13:53:27.767613      16 draft.go:1545] Skipping snapshot at index: 336574. Insufficient discard entries: 0. MinPendingStartTs: 603186
I1029 13:53:27.767784      16 draft.go:1381] Found 2 old transactions. Acting to abort them.
I1029 13:53:27.778257      16 draft.go:1342] TryAbort 2 txns with start ts. Error: <nil>
I1029 13:53:27.778282      16 draft.go:1365] TryAbort selectively proposing only aborted txns: txns:<start_ts:603187 > txns:<start_ts:603186 > 

Logs with debugging logs enabled:

node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
oracle.go:209] ProcessDelta: Max Assigned: 611018
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:213] ProcessDelta Aborted: 610933
oracle.go:215] ProcessDelta Committed: 610932 -> 610998
oracle.go:215] ProcessDelta Committed: 610915 -> 610999
oracle.go:215] ProcessDelta Committed: 610927 -> 611000
oracle.go:215] ProcessDelta Committed: 610926 -> 611001
oracle.go:215] ProcessDelta Committed: 610917 -> 611002
oracle.go:215] ProcessDelta Committed: 610924 -> 611003
oracle.go:215] ProcessDelta Committed: 610929 -> 611004
oracle.go:215] ProcessDelta Committed: 610909 -> 611005
oracle.go:215] ProcessDelta Committed: 610881 -> 611006
oracle.go:215] ProcessDelta Committed: 610912 -> 611007
oracle.go:215] ProcessDelta Committed: 610936 -> 611008
oracle.go:215] ProcessDelta Committed: 610962 -> 611009
oracle.go:215] ProcessDelta Committed: 610964 -> 611010
oracle.go:215] ProcessDelta Committed: 610959 -> 611011
oracle.go:215] ProcessDelta Committed: 610952 -> 611012
oracle.go:215] ProcessDelta Committed: 610958 -> 611013
oracle.go:215] ProcessDelta Committed: 610961 -> 611014
oracle.go:215] ProcessDelta Committed: 610960 -> 611015
oracle.go:215] ProcessDelta Committed: 610953 -> 611016
oracle.go:215] ProcessDelta Committed: 610951 -> 611017
oracle.go:209] ProcessDelta: Max Assigned: 611049
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 382 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 588 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 6 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 209 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 167 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 190 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 199 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 158 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 190 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 272 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 272 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 544 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 462 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 390 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 209 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 357 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
groups.go:333] group 1 checksum: 10693782793514404551
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 190 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 743 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 8 entries written
groups.go:333] group 1 checksum: 10693782793514404551
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 194 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 390 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 376 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 384 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 197 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 351 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 88 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 197 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 384 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
groups.go:333] group 1 checksum: 10693782793514404551
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 5218 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 82 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 16171 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 225 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 6101 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 81 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 23467 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 325 entries written
oracle.go:209] ProcessDelta: Max Assigned: 611080
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:213] ProcessDelta Aborted: 611041
oracle.go:213] ProcessDelta Aborted: 611037
oracle.go:213] ProcessDelta Aborted: 611034
oracle.go:215] ProcessDelta Committed: 611031 -> 611050
oracle.go:215] ProcessDelta Committed: 611020 -> 611051
oracle.go:215] ProcessDelta Committed: 611044 -> 611053
oracle.go:215] ProcessDelta Committed: 611018 -> 611054
oracle.go:215] ProcessDelta Committed: 611038 -> 611055
oracle.go:215] ProcessDelta Committed: 611046 -> 611056
oracle.go:215] ProcessDelta Committed: 611043 -> 611057
oracle.go:215] ProcessDelta Committed: 611042 -> 611058
oracle.go:215] ProcessDelta Committed: 611028 -> 611059
oracle.go:215] ProcessDelta Committed: 611035 -> 611060
oracle.go:215] ProcessDelta Committed: 611033 -> 611061
oracle.go:215] ProcessDelta Committed: 611027 -> 611062
oracle.go:215] ProcessDelta Committed: 611029 -> 611063
oracle.go:215] ProcessDelta Committed: 611040 -> 611064
oracle.go:215] ProcessDelta Committed: 611048 -> 611065
oracle.go:215] ProcessDelta Committed: 611045 -> 611066
oracle.go:215] ProcessDelta Committed: 611025 -> 611067
oracle.go:215] ProcessDelta Committed: 611026 -> 611068
oracle.go:215] ProcessDelta Committed: 611024 -> 611069
oracle.go:215] ProcessDelta Committed: 611047 -> 611070
oracle.go:215] ProcessDelta Committed: 611039 -> 611071
oracle.go:215] ProcessDelta Committed: 611023 -> 611072
oracle.go:215] ProcessDelta Committed: 611019 -> 611073
oracle.go:215] ProcessDelta Committed: 611030 -> 611074
oracle.go:215] ProcessDelta Committed: 611021 -> 611075
oracle.go:215] ProcessDelta Committed: 611049 -> 611076
oracle.go:215] ProcessDelta Committed: 611036 -> 611077
oracle.go:215] ProcessDelta Committed: 611032 -> 611078
oracle.go:215] ProcessDelta Committed: 611022 -> 611079
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 1280 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 18 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 4314 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 64 entries written
oracle.go:209] ProcessDelta: Max Assigned: 611082
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:215] ProcessDelta Committed: 611080 -> 611081
oracle.go:215] ProcessDelta Committed: 611052 -> 611082
oracle.go:209] ProcessDelta: Max Assigned: 611083
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 6411 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 127 entries written
oracle.go:209] ProcessDelta: Max Assigned: 611085
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:215] ProcessDelta Committed: 611083 -> 611084
oracle.go:209] ProcessDelta: Max Assigned: 611086
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:209] ProcessDelta: Max Assigned: 611088
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:209] ProcessDelta: Max Assigned: 611094
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
oracle.go:209] ProcessDelta: Max Assigned: 611100
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
groups.go:333] group 1 checksum: 10693782793514404551
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 199 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 158 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 1791 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 23 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 23381 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 337 entries written
oracle.go:209] ProcessDelta: Max Assigned: 611143
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:215] ProcessDelta Committed: 611086 -> 611101
oracle.go:215] ProcessDelta Committed: 611088 -> 611102
oracle.go:215] ProcessDelta Committed: 611087 -> 611103
oracle.go:215] ProcessDelta Committed: 611092 -> 611104
oracle.go:215] ProcessDelta Committed: 611089 -> 611105
oracle.go:215] ProcessDelta Committed: 611093 -> 611106
oracle.go:215] ProcessDelta Committed: 611090 -> 611107
oracle.go:215] ProcessDelta Committed: 611091 -> 611108
oracle.go:215] ProcessDelta Committed: 611095 -> 611110
oracle.go:215] ProcessDelta Committed: 611098 -> 611111
oracle.go:215] ProcessDelta Committed: 611097 -> 611128
oracle.go:215] ProcessDelta Committed: 611096 -> 611129
oracle.go:215] ProcessDelta Committed: 611099 -> 611130
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 194 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 400 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 199 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 158 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 1499 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 16 entries written
groups.go:333] group 1 checksum: 10693782793514404551
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 194 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 194 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 382 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 578 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 6 entries written
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
groups.go:333] group 1 checksum: 10693782793514404551
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 194 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 384 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
groups.go:333] group 1 checksum: 10693782793514404551
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 1117 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 14 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 48369 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 725 entries written
oracle.go:209] ProcessDelta: Max Assigned: 611171
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:215] ProcessDelta Committed: 611112 -> 611144
oracle.go:215] ProcessDelta Committed: 611137 -> 611145
oracle.go:215] ProcessDelta Committed: 611114 -> 611146
oracle.go:215] ProcessDelta Committed: 611140 -> 611147
oracle.go:215] ProcessDelta Committed: 611121 -> 611148
oracle.go:215] ProcessDelta Committed: 611125 -> 611149
oracle.go:215] ProcessDelta Committed: 611124 -> 611150
oracle.go:215] ProcessDelta Committed: 611118 -> 611151
oracle.go:215] ProcessDelta Committed: 611113 -> 611152
oracle.go:215] ProcessDelta Committed: 611139 -> 611153
oracle.go:215] ProcessDelta Committed: 611117 -> 611154
oracle.go:215] ProcessDelta Committed: 611119 -> 611155
oracle.go:215] ProcessDelta Committed: 611142 -> 611156
oracle.go:215] ProcessDelta Committed: 611136 -> 611157
oracle.go:215] ProcessDelta Committed: 611116 -> 611158
oracle.go:215] ProcessDelta Committed: 611115 -> 611159
oracle.go:215] ProcessDelta Committed: 611126 -> 611160
oracle.go:215] ProcessDelta Committed: 611122 -> 611161
oracle.go:215] ProcessDelta Committed: 611123 -> 611162
oracle.go:215] ProcessDelta Committed: 611138 -> 611163
oracle.go:215] ProcessDelta Committed: 611135 -> 611164
oracle.go:215] ProcessDelta Committed: 611141 -> 611165
oracle.go:215] ProcessDelta Committed: 611131 -> 611166
oracle.go:215] ProcessDelta Committed: 611133 -> 611167
oracle.go:215] ProcessDelta Committed: 611134 -> 611168
oracle.go:215] ProcessDelta Committed: 611132 -> 611169
oracle.go:215] ProcessDelta Committed: 611120 -> 611170
oracle.go:209] ProcessDelta: Max Assigned: 611177
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
oracle.go:209] ProcessDelta: Max Assigned: 611184
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:209] ProcessDelta: Max Assigned: 611187
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
groups.go:333] group 1 checksum: 10693782793514404551
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 1514 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 20 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 23848 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 463 entries written
oracle.go:209] ProcessDelta: Max Assigned: 611196
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:215] ProcessDelta Committed: 611172 -> 611188
oracle.go:215] ProcessDelta Committed: 611176 -> 611189
oracle.go:215] ProcessDelta Committed: 611175 -> 611191
oracle.go:215] ProcessDelta Committed: 611173 -> 611192
oracle.go:215] ProcessDelta Committed: 611181 -> 611196
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 1211 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 17 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 6358 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 88 entries written
oracle.go:209] ProcessDelta: Max Assigned: 611249
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:215] ProcessDelta Committed: 611180 -> 611197
oracle.go:215] ProcessDelta Committed: 611178 -> 611198
oracle.go:215] ProcessDelta Committed: 611182 -> 611199
oracle.go:215] ProcessDelta Committed: 611179 -> 611200
oracle.go:215] ProcessDelta Committed: 611174 -> 611201
oracle.go:215] ProcessDelta Committed: 611187 -> 611202
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 163 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 194 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 1779 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 18 entries written
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
groups.go:333] group 1 checksum: 10693782793514404551
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 1211 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 17 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 31236 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 704 entries written
oracle.go:209] ProcessDelta: Max Assigned: 611252
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:215] ProcessDelta Committed: 611185 -> 611250
oracle.go:215] ProcessDelta Committed: 611183 -> 611251
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 2386 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 24 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 272 to vlog
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 1206 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 10 entries written
groups.go:333] group 1 checksum: 10693782793514404551
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 1514 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 20 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 4239 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 57 entries written
oracle.go:209] ProcessDelta: Max Assigned: 611258
oracle.go:210] ProcessDelta: Group checksum: map[1:10693782793514404551]
oracle.go:215] ProcessDelta Committed: 611195 -> 611254
oracle.go:215] ProcessDelta Committed: 611193 -> 611255
oracle.go:215] ProcessDelta Committed: 611186 -> 611256
oracle.go:215] ProcessDelta Committed: 611194 -> 611257
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 586 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 6 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 578 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 6 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 272 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
groups.go:333] group 1 checksum: 10693782793514404551
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 924 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 10 entries written
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
groups.go:333] group 1 checksum: 10693782793514404551
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 188 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 194 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 196 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 2 entries written
node.go:164] RaftComm: [0x1] Heartbeats out: 0, in: 0
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 384 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
groups.go:333] group 1 checksum: 10693782793514404551
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 384 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 4 entries written
log.go:30] writeRequests called. Writing to value log
log.go:30] Flushing buffer of size 1168 to vlog
log.go:30] Done
log.go:30] Sending updates to subscribers
log.go:30] Writing to memtable
log.go:30] 12 entries written
groups.go:333] group 1 checksum: 10693782793514404551

This is what we observed after almost an hour after the first TryAbort (same issue as described before):

I1029 17:04:44.158894      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.158975      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.159034      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.159726      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.160001      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.160288      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.160585      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.160617      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.160867      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.160915      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.160943      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.160960      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.160978      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.160999      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.161025      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.161064      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.161097      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.161163      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.161645      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.161689      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.161751      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.162138      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.162517      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.162821      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.163041      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>
I1029 17:04:44.163199      15 draft.go:1384] Done abortOldTransactions for 4 txns. Error: <nil>

Hi @christian-roggia

What version of Dgraph are you running? this appears to be some kind of a bug, so I am marking this.

@ibrahim ideas?

sorry for leaving out basic information, the following is the configuration we are using for dgraph on helm:

image:
  tag: v20.07.2

fullnameOverride: dgraph

alpha:
  replicaCount: 1

  persistence:
    size: 64Gi
    storageClass: "ssd"
  
  lru_mb: 24576

  nodeSelector:
    reserved: "dgraph"

zero:
  replicaCount: 1

  persistence:
    size: 8Gi
    storageClass: "ssd"
  
  nodeSelector:
    reserved: "dgraph"

ratel:
  enabled: false