What version of Dgraph are you using?
20.07.02
java client: 20.03.2
Have you tried reproducing the issue with the latest release?
This is the latest release
What is the hardware spec (RAM, OS)?
os: linux
cluster info : 3 machines with 140G memory and 16 cores
Steps to reproduce the issue (command/config used to run Dgraph).
zero
dgraph zero --my xxxx:5080 --replicas 3
alpha
dgraph alpha --whitelist 10.4.0.0:10.5.0.0 --lru_mb 42000 --my 10.4.31.22:7080 --zero 10.4.31.22:5080 --badger.vlog=disk --pending_proposals 32
I have 600 millions users and users relations will to insert dgraph. I have run my program 3 days. But I find mutations very slow. I use 5 threads and batch size 100 to upsert dgraph.
I finds some logs below.
A machine
I1120 09:47:41.694395 7160 log.go:34] 1 is starting a new election at term 606
I1120 09:47:41.694437 7160 log.go:34] 1 became pre-candidate at term 606
I1120 09:47:41.694447 7160 log.go:34] 1 received MsgPreVoteResp from 1 at term 606
I1120 09:47:41.694597 7160 log.go:34] 1 [logterm: 606, index: 2791343] sent MsgPreVote request to 2 at term 606
I1120 09:47:41.694705 7160 log.go:34] 1 [logterm: 606, index: 2791343] sent MsgPreVote request to 3 at term 606
I1120 09:47:41.694767 7160 log.go:34] raft.node: 1 lost leader 3 at term 606
I1120 09:47:41.762206 7160 log.go:34] 1 received MsgPreVoteResp from 2 at term 606
I1120 09:47:41.762257 7160 log.go:34] 1 [quorum:2] has received 2 MsgPreVoteResp votes and 0 vote rejections
I1120 09:47:41.762284 7160 log.go:34] 1 became candidate at term 607
I1120 09:47:41.762294 7160 log.go:34] 1 received MsgVoteResp from 1 at term 607
I1120 09:47:41.762469 7160 log.go:34] 1 [logterm: 606, index: 2791343] sent MsgVote request to 2 at term 607
I1120 09:47:41.762589 7160 log.go:34] 1 [logterm: 606, index: 2791343] sent MsgVote request to 3 at term 607
W1120 09:47:42.090461 7160 draft.go:1245] Raft.Ready took too long to process: Timer Total: 328ms. Breakdown: [{sync 9ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 0. MustSync: true
I1120 09:47:42.114617 7160 log.go:34] 1 received MsgVoteResp from 2 at term 607
I1120 09:47:42.114911 7160 log.go:34] 1 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections
I1120 09:47:42.115081 7160 log.go:34] 1 became leader at term 607
I1120 09:47:42.115675 7160 log.go:34] raft.node: 1 elected leader 1 at term 607
I1120 09:47:42.594558 7160 groups.go:888] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
I1120 09:47:43.022847 7160 groups.go:900] Got Zero leader: 10.4.31.22:5080
W1120 09:47:50.196476 7160 draft.go:1245] Raft.Ready took too long to process: Timer Total: 202ms. Breakdown: [{disk 202ms} {proposals 0s} {advance 0s}] Num entries: 0. MustSync: false
W1120 09:48:24.337345 7160 draft.go:1245] Raft.Ready took too long to process: Timer Total: 346ms. Breakdown: [{proposals 342ms} {disk 0s} {advance 0s}] Num entries: 0. MustSync: false
I1120 09:48:29.668724 7160 log.go:34] 1 [logterm: 607, index: 2791418, vote: 1] rejected MsgPreVote from 3 [logterm: 607, index: 2791399] at term 607
W1120 09:48:37.431686 7160 draft.go:1245] Raft.Ready took too long to process: Timer Total: 525ms. Breakdown: [{disk 524ms} {sync 1ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
I1120 09:48:38.657856 7160 groups.go:956] Zero leadership changed. Renewing oracle delta stream.
E1120 09:48:38.697401 7160 groups.go:932] Error in oracle delta stream. Error: rpc error: code = Canceled desc = context canceled
I1120 09:48:39.582348 7160 groups.go:888] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
I1120 09:48:39.582391 7160 groups.go:900] Got Zero leader: 10.4.31.22:5080
I1120 09:48:45.909440 7160 log.go:34] 1 [logterm: 607, index: 2791454, vote: 1] cast MsgPreVote for 2 [logterm: 607, index: 2791454] at term 607
W1120 09:48:53.373391 7160 draft.go:1245] Raft.Ready took too long to process: Timer Total: 689ms. Breakdown: [{proposals 689ms} {disk 0s} {advance 0s}] Num entries: 0. MustSync: false
W1120 09:49:28.170497 7160 draft.go:1245] Raft.Ready took too long to process: Timer Total: 534ms. Breakdown: [{advance 461ms} {disk 73ms} {proposals 0s}] Num entries: 0. MustSync: false
B machine
W1120 09:47:03.353028 1889 draft.go:1245] Raft.Ready took too long to process: Timer Total: 299ms. Breakdown: [{disk 159ms} {proposals 139ms} {advance 0s}] Num entries: 0. MustSync: false
I1120 09:47:10.948928 1889 log.go:34] 2 [logterm: 605, index: 2791287, vote: 2] rejected MsgPreVote from 3 [logterm: 605, index: 2791285] at term 605
W1120 09:47:26.909217 1889 draft.go:1245] Raft.Ready took too long to process: Timer Total: 234ms. Breakdown: [{disk 234ms} {proposals 0s} {advance 0s}] Num entries: 0. MustSync: false
I1120 09:47:30.088970 1889 log.go:34] 2 [logterm: 605, index: 2791323, vote: 2] cast MsgPreVote for 3 [logterm: 605, index: 2791323] at term 605
I1120 09:47:30.089031 1889 log.go:34] 2 [term: 605] received a MsgVote message with higher term from 3 [term: 606]
I1120 09:47:30.089053 1889 log.go:34] 2 became follower at term 606
I1120 09:47:30.089272 1889 log.go:34] 2 [logterm: 605, index: 2791323, vote: 0] cast MsgVote for 3 [logterm: 605, index: 2791323] at term 606
I1120 09:47:30.089294 1889 log.go:34] raft.node: 2 lost leader 2 at term 606
I1120 09:47:30.089529 1889 log.go:34] raft.node: 2 elected leader 3 at term 606
E1120 09:47:30.294109 1889 groups.go:995] No longer the leader of group 1. Exiting
E1120 09:47:30.336331 1889 groups.go:932] Error in oracle delta stream. Error: rpc error: code = Canceled desc = context canceled
I1120 09:47:41.722897 1889 log.go:34] 2 [logterm: 606, index: 2791343, vote: 3] cast MsgPreVote for 1 [logterm: 606, index: 2791343] at term 606
I1120 09:47:42.057312 1889 log.go:34] 2 [term: 606] received a MsgVote message with higher term from 1 [term: 607]
I1120 09:47:42.057361 1889 log.go:34] 2 became follower at term 607
I1120 09:47:42.057510 1889 log.go:34] 2 [logterm: 606, index: 2791343, vote: 0] cast MsgVote for 1 [logterm: 606, index: 2791343] at term 607
I1120 09:47:42.057523 1889 log.go:34] raft.node: 2 lost leader 3 at term 607
I1120 09:47:42.166533 1889 log.go:34] raft.node: 2 elected leader 1 at term 607
W1120 09:48:01.292605 1889 draft.go:1245] Raft.Ready took too long to process: Timer Total: 700ms. Breakdown: [{disk 692ms} {sync 8ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
I1120 09:48:29.668536 1889 log.go:34] 2 [logterm: 607, index: 2791418, vote: 1] rejected MsgPreVote from 3 [logterm: 607, index: 2791399] at term 607
W1120 09:48:42.081009 1889 draft.go:1245] Raft.Ready took too long to process: Timer Total: 378ms. Breakdown: [{proposals 338ms} {disk 40ms} {advance 0s}] Num entries: 0. MustSync: false
I1120 09:48:45.168502 1889 log.go:34] 2 is starting a new election at term 607
I1120 09:48:45.168544 1889 log.go:34] 2 became pre-candidate at term 607
I1120 09:48:45.168561 1889 log.go:34] 2 received MsgPreVoteResp from 2 at term 607
I1120 09:48:45.293819 1889 log.go:34] 2 [logterm: 607, index: 2791454] sent MsgPreVote request to 1 at term 607
I1120 09:48:45.293926 1889 log.go:34] 2 [logterm: 607, index: 2791454] sent MsgPreVote request to 3 at term 607
I1120 09:48:45.293984 1889 log.go:34] raft.node: 2 lost leader 1 at term 607
I1120 09:48:45.294093 1889 log.go:34] 2 became follower at term 607
I1120 09:48:45.294168 1889 log.go:34] raft.node: 2 elected leader 1 at term 607
W1120 09:48:57.260837 1889 draft.go:1245] Raft.Ready took too long to process: Timer Total: 344ms. Breakdown: [{disk 304ms} {sync 39ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W1120 09:49:16.897298 1889 draft.go:1245] Raft.Ready took too long to process: Timer Total: 306ms. Breakdown: [{advance 11ms} {disk 0s} {proposals 0s}] Num entries: 0. MustSync: false
W1120 09:49:26.763856 1889 draft.go:1245] Raft.Ready took too long to process: Timer Total: 218ms. Breakdown: [{proposals 203ms} {sync 15ms} {disk 0s} {advance 0s}] Num entries: 1. MustSync: true
W1120 09:50:06.468996 1889 draft.go:1245] Raft.Ready took too long to process: Timer Total: 227ms. Breakdown: [{disk 215ms} {sync 12ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
I1120 09:50:18.471931 1889 draft.go:1545] Skipping snapshot at index: 2791442. Insufficient discard entries: 1. MinPendingStartTs: 3201118
C Machine
I1120 09:55:02.204864 3675 log.go:34] 3 [term: 609] ignored a MsgPreVoteResp message with lower term from 1 [term: 608]
I1120 09:55:02.204944 3675 log.go:34] 3 [term: 609] ignored a MsgVoteResp message with lower term from 1 [term: 608]
I1120 09:55:02.204985 3675 log.go:34] 3 [term: 609] ignored a MsgAppResp message with lower term from 1 [term: 608]
I1120 09:55:02.204997 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205049 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205067 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205097 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205109 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205146 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205158 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205200 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205213 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205243 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205254 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205296 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205310 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205348 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205360 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205397 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205409 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205448 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205460 3675 log.go:34] 3 [term: 609] ignored a MsgHeartbeatResp message with lower term from 1 [term: 608]
I1120 09:55:02.205677 3675 log.go:34] 3 [logterm: 609, index: 2792596, vote: 2] rejected MsgPreVote from 1 [logterm: 608, index: 2792593] at term 609
E1120 09:55:02.248389 3675 groups.go:995] No longer the leader of group 1. Exiting
E1120 09:55:02.250683 3675 groups.go:932] Error in oracle delta stream. Error: rpc error: code = Canceled desc = context canceled
W1120 09:55:19.835225 3675 draft.go:1245] Raft.Ready took too long to process: Timer Total: 649ms. Breakdown: [{disk 565ms} {sync 84ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
I1120 09:55:25.305845 3675 log.go:34] 3 is starting a new election at term 609
I1120 09:55:25.305914 3675 log.go:34] 3 became pre-candidate at term 609
I1120 09:55:25.305925 3675 log.go:34] 3 received MsgPreVoteResp from 3 at term 609
I1120 09:55:25.306072 3675 log.go:34] 3 [logterm: 609, index: 2792638] sent MsgPreVote request to 2 at term 609
I1120 09:55:25.306159 3675 log.go:34] 3 [logterm: 609, index: 2792638] sent MsgPreVote request to 1 at term 609
I1120 09:55:25.306213 3675 log.go:34] raft.node: 3 lost leader 2 at term 609
I1120 09:55:25.721993 3675 log.go:34] 3 received MsgPreVoteResp from 1 at term 609
I1120 09:55:25.722050 3675 log.go:34] 3 [quorum:2] has received 2 MsgPreVoteResp votes and 0 vote rejections
I1120 09:55:25.722077 3675 log.go:34] 3 became candidate at term 610
I1120 09:55:25.722085 3675 log.go:34] 3 received MsgVoteResp from 3 at term 610
I1120 09:55:25.722193 3675 log.go:34] 3 [logterm: 609, index: 2792638] sent MsgVote request to 1 at term 610
I1120 09:55:25.722279 3675 log.go:34] 3 [logterm: 609, index: 2792638] sent MsgVote request to 2 at term 610
I1120 09:55:25.985965 3675 log.go:34] 3 received MsgVoteResp from 1 at term 610
I1120 09:55:25.986018 3675 log.go:34] 3 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections
I1120 09:55:25.986048 3675 log.go:34] 3 became leader at term 610
I1120 09:55:25.986193 3675 log.go:34] raft.node: 3 elected leader 3 at term 610
W1120 09:55:30.034861 3675 draft.go:1245] Raft.Ready took too long to process: Timer Total: 3.973s. Breakdown: [{advance 3.749s} {proposals 216ms} {sync 8ms} {disk 0s}] Num entries: 1. MustSync: true
I1120 09:55:30.035166 3675 groups.go:888] Leader idx=0x3 of group=1 is connecting to Zero for txn updates
I1120 09:55:30.035182 3675 groups.go:900] Got Zero leader: 10.4.31.22:5080
I1120 09:55:30.038150 3675 log.go:34] 3 [logterm: 610, index: 2792639, vote: 3] cast MsgPreVote for 2 [logterm: 610, index: 2792639] at term 610
I1120 09:55:30.038189 3675 log.go:34] 3 [term: 610] received a MsgVote message with higher term from 2 [term: 611]
I1120 09:55:30.038211 3675 log.go:34] 3 became follower at term 611
I1120 09:55:30.038339 3675 log.go:34] 3 [logterm: 610, index: 2792639, vote: 0] cast MsgVote for 2 [logterm: 610, index: 2792639] at term 611
I1120 09:55:30.038360 3675 log.go:34] raft.node: 3 lost leader 3 at term 611
I1120 09:55:30.074023 3675 log.go:34] raft.node: 3 elected leader 2 at term 611
E1120 09:55:30.263403 3675 groups.go:995] No longer the leader of group 1. Exiting
E1120 09:55:30.263834 3675 groups.go:932] Error in oracle delta stream. Error: rpc error: code = Canceled desc = context canceled
W1120 09:55:30.350105 3675 draft.go:1245] Raft.Ready took too long to process: Timer Total: 312ms. Breakdown: [{sync 240ms} {disk 72ms} {proposals 0s} {advance 0s}] Num entries: 0. MustSync: true
W1120 09:56:15.523927 3675 draft.go:1245] Raft.Ready took too long to process: Timer Total: 261ms. Breakdown: [{disk 212ms} {sync 49ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W1120 09:57:04.805294 3675 draft.go:1245] Raft.Ready took too long to process: Timer Total: 207ms. Breakdown: [{sync 205ms} {proposals 1ms} {disk 0s} {advance 0s}] Num entries: 2. MustSync: true
I1120 09:58:21.062441 3675 log.go:34] Storing value log head: {Fid:1215 Len:29 Offset:9139140}
W1120 09:58:21.090172 3675 draft.go:1245] Raft.Ready took too long to process: Timer Total: 416ms. Breakdown: [{disk 416ms} {proposals 0s} {advance 0s}] Num entries: 0. MustSync: false
W1120 09:58:40.452997 3675 draft.go:1245] Raft.Ready took too long to process: Timer Total: 515ms. Breakdown: [{disk 510ms} {sync 5ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W1120 09:58:51.801853 3675 draft.go:1245] Raft.Ready took too long to process: Timer Total: 735ms. Breakdown: [{disk 726ms} {sync 9ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W1120 09:59:08.857891 3675 draft.go:1245] Raft.Ready took too long to process: Timer Total: 201ms. Breakdown: [{sync 201ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W1120 09:59:27.014183 3675 draft.go:1245] Raft.Ready took too long to process: Timer Total: 2.028s. Breakdown: [{disk 2.024s} {sync 4ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
from my program log I feel obvious pause. how can i do?