$ dgraph version Dgraph version : v21.12.0 Dgraph codename : zion Dgraph SHA-256 : 078c75df9fa1057447c8c8afc10ea57cb0a29dfb22f9e61d8c334882b4b4eb37 Commit SHA-1 : d62ed5f15 Commit timestamp : 2021-12-02 21:20:09 +0530 Branch : HEAD Go version : go1.17.3 jemalloc enabled : true
k8s 16c 64GiB ram on GKE
Use tablet move with large tablet, ~40GiB while ingestion is going.
I did a test tablet move of a smaller tablet that took ~9m to do and wanted to do a bigger tablet move too. Mutations were happening at the time.
the target group of the move is now completely messed up, and is crash looping with the following panic:
I1220 17:13:42.463401 1 schema.go:496] Setting schema for attr 0-XXXXXXXXX: int, tokenizer: , directive: NONE, count: false 2021/12/20 17:13:42 Unable to find txn with start ts: 1419795 github.com/dgraph-io/dgraph/x.AssertTruef /ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:107 github.com/dgraph-io/dgraph/worker.(*node).applyMutations /ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:707 github.com/dgraph-io/dgraph/worker.(*node).applyCommitted /ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:744 github.com/dgraph-io/dgraph/worker.(*node).processApplyCh.func1 /ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:931 github.com/dgraph-io/dgraph/worker.(*node).processApplyCh /ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:1020 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1581
Which comes during setting the schema for the new tablet - it is looking for a transaction number that does not exist in the
pendingTransactions map. So - maybe it was removed from the map for some reason? The tablet move had been running for 20m but was not nearly complete. Here was the last
Sending predicate log message from the source group:
Sending predicate: [0-XXXXXXXXX] [19m44s] Scan (8): ~10.0 GiB/39 GiB at 11 MiB/sec. Sent: 10.0 GiB at 12 MiB/sec
Maybe the process that periodically aborts old transactions got the predicate move/schema for that move transaction and killed the whole group as a result.