Moved from GitHub dgraph/4960
Posted by JimWen:
What version of Dgraph are you using?
Dgraph version : v1.2.1
Dgraph SHA-256 : 3f18ff84570b2944f4d75f6f508d55d902715c7ca2310799cc2991064eb046f8
Commit SHA-1 : ddcda9296
Commit timestamp : 2020-02-06 15:31:05 -0800
Branch : HEAD
Go version : go1.13.5
Have you tried reproducing the issue with the latest release?
With v1.2.1
What is the hardware spec (RAM, OS)?
128G mem & 1.8T SSD
Linux version 3.10.0-1062.9.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019
Steps to reproduce the issue (command/config used to run Dgraph).
3 nodes cluster
zero rebalance_interval is set here to disable auto rebalance
nohup $dgpath zero --my=$local_ip:$port -o $offset --idx $index --replicas $replica --config $cfpath --cwd zero$index --rebalance_interval 12000h > ./log/zero$index.log 2>&1 &
alpha
nohup $dgpath alpha --snapshot_after=1000 --abort_older_than 15m --my=$local_ip:$port -o $offset --zero=$zero_1:5080 --config $cfpath --cwd alpha$index > ./log/alpha$index.log 2>&1 &
mutation through api or liveload with about 10 billion nquards
comand line is like :
./bin/dgraph live -b 1000 -c 20 -f ./data/2020-03-01 --format rdf -a "172.21.46.35:9080,172.21.46.39:9080,172.21.46.40:9080"
Expected behaviour and actual result.
The server works fine at first, and then it is in a suspended state for a long time , then it resumes, and then suspends like this…
The log is as followings:
I0318 12:14:37.877077 272392 log.go:34] Rolling up Sent 4905402 keys
I0318 12:14:37.877101 272392 draft.go:1102] Rolled up 4905396 keys. Done
I0318 12:14:37.877110 272392 draft.go:445] List rollup at Ts 10799123: OK.
I0318 12:14:37.878363 272392 predicate_move.go:195] Was instructed to delete tablet: dgraph.password
I0318 12:14:37.878618 272392 index.go:1083] Dropping predicate: [dgraph.password]
I0318 12:14:37.878632 272392 log.go:34] Writes flushed. Stopping compactions now…
E0318 12:15:07.114528 272392 log.go:32] unable to process discardstats with error: failed to push discard stats to write channel: Writes are blocked, possibly due to DropAll or Close
I0318 12:15:07.293204 272392 log.go:34] LOG Compact 1->2, del 9 tables, add 9 tables, took 41.192207806s
I0318 12:15:07.293262 272392 log.go:34] Compaction for level: 1 DONE
I0318 12:15:07.293311 272392 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:}
I0318 12:15:07.293339 272392 log.go:34] Running for level: 0
I0318 12:15:24.745589 272392 log.go:34] LOG Compact 0->1, del 9 tables, add 4 tables, took 17.45222122s
I0318 12:15:24.745642 272392 log.go:34] Compaction for level: 0 DONE
I0318 12:15:24.802984 272392 log.go:34] Got compaction priority: {level:0 score:1.74 dropPrefix:[0 0 15 100 103 114 97 112 104 46 112 97 115 115 119 111 114 100]}
I0318 12:15:24.803061 272392 log.go:34] Running for level: 0
I0318 12:15:39.769220 272392 draft.go:1353] Skipping snapshot at index: 10672779. Insufficient discard entries: 1. MinPendingStartTs: 10825054
I0318 12:15:41.545892 272392 log.go:34] LOG Compact 0->1, del 6 tables, add 4 tables, took 16.742809109s
I0318 12:15:41.545935 272392 log.go:34] Compaction for level: 0 DONE
I0318 12:15:43.236355 272392 log.go:34] LOG Compact 1->1, del 1 tables, add 1 tables, took 1.690368584s
I0318 12:15:46.814592 272392 log.go:34] LOG Compact 2->2, del 1 tables, add 1 tables, took 3.578099591s
I0318 12:15:51.135498 272392 log.go:34] LOG Compact 3->3, del 1 tables, add 1 tables, took 4.320559547s
I0318 12:15:51.135522 272392 log.go:34] DropPrefix done
I0318 12:15:51.135540 272392 log.go:34] Resuming writes
It seems to be an old problem, but it is still there, and it totally affetct the online business and performace.