Zero rebalance_interval server write error predicate_move

example

(X Bee) #1

log has predicate_move.go:228: Proposed 5680330 keys. Error:While proposing error: context deadline exceeded

process:
1.use 21million.rdf.gz bulk to server need postings file
command:dgraph bulk -r 21million.rdf.gz -s 21million.schema --map_shards=6 --reduce_shards=2 --http localhost:19000 --zero=localhost:15080
ps: has use one zero to servers
2. setup 3 zero and 3 server cluster
use bulk result set one server postings
waiting rebalance_interval

zero_1 logs

2018/10/30 12:57:34 tablet.go:179:
Groups sorted by size: [{gid:1 size:0} {gid:2 size:0} {gid:3 size:1421860684}]
2018/10/30 12:57:34 tablet.go:184: size_diff 1421860684
2018/10/30 12:57:34 tablet.go:73: Going to move predicate: [name], size: [691 MB] from group 3 to 1
2018/10/30 13:09:00 tablet.go:221: Got error during move: While calling MovePredicate: rpc error: code = Unknown desc = While proposing error: context deadline exceeded

gid 1 logs

2018/10/30 13:08:13 predicate_move.go:41: Writing 277629 keys
2018/10/30 13:08:27 predicate_move.go:41: Writing 277629 keys
2018/10/30 13:08:43 predicate_move.go:41: Writing 277629 keys
2018/10/30 13:09:00 predicate_move.go:228: Proposed 5680330 keys. Error: While proposing error: context deadline exceeded

and gid 3 find gid 2 rebalance

zero 1 logs

2018/10/30 13:09:00 tablet.go:179:
Groups sorted by size: [{gid:2 size:0} {gid:1 size:31} {gid:3 size:1421866726}]
2018/10/30 13:09:00 tablet.go:184: size_diff 1421866726
2018/10/30 13:09:00 tablet.go:73: Going to move predicate: [name], size: [691 MB] from group 3 to 2
2018/10/30 13:12:48 raft.go:763: INFO: 1 [logterm: 2, index: 280, vote: 1] cast MsgPreVote for 3 [logterm: 2, index: 280] at term 2

gid 2 logs

2018/10/30 13:09:01 predicate_move.go:194: Got ReceivePredicate. Group: 2. Am leader: true
2018/10/30 13:09:01 predicate_move.go:143: Receiving predicate. Batching and proposing key values
2018/10/30 13:09:05 predicate_move.go:163: Predicate being received: name
2018/10/30 13:09:05 index.go:929: Dropping predicate: [name]
2018/10/30 13:09:05 schema.go:63: Deleting schema for predicate: [name]
2018/10/30 13:09:11 predicate_move.go:41: Writing 245493 keys
...
2018/10/30 13:28:59 predicate_move.go:41: Writing 717868 keys
2018/10/30 13:29:00 predicate_move.go:228: Proposed 5680330 keys. Error: context canceled
2018/10/30 13:29:01 predicate_move.go:194: Got ReceivePredicate. Group: 2. Am leader: true
2018/10/30 13:29:01 predicate_move.go:143: Receiving predicate. Batching and proposing key values
2018/10/30 13:29:04 predicate_move.go:163: Predicate being received: name

when 1h ago,zero_1 has info

2018/10/30 14:09:01 tablet.go:179:

Groups sorted by size: [{gid:2 size:0} {gid:1 size:31} {gid:3 size:1421866726}]

data no change!
gid 2 as has

2018/10/30 14:09:01 predicate_move.go:228: Proposed 5680330 keys. Error: While proposing error: context deadline exceeded

CPU spike with heavy writes (1 zero, 3 alphas, replicas set to 3)
(Daniel Mai) #2

For bulk loader, you should set the --reduce_shards options to the number of groups in your cluster. If you have 3 Alphas (Servers) with a replication of 1, then you have 3 groups. If there’s a replication of 3, there’s 1 group.

Once the bulk load is finished you must copy the p directories to each member of the group for the correct group. The directory ./out/0/p/ is the p directory for the first group, ./out/1/p/ for the second group, and so on.


(X Bee) #3

According to your description and in combination with this article, I generally understand the relationship between replicas and groups, but in the Alphas and groups mapping, is an Alphas can only be a member of a group, set by the --idx parameter If you do not specify it, the raft will be set to cause 3 alphas and 1 replicas to be divided into 3 groups.
I have an idea that three Alphas to do 3 replicas and 2 groups of cluster configuration is not feasible? But if there are 4 Alphas, can you meet the above requirements? Is it correct for me to say?
means bulk data reduce_shards =2
2Alphas set groups_1(Alphas_1 ,Alphas_2) , The directory ./out/0/p/ is the first groups of Alphas_1’s postings, Alphas_2 don’t set postings.
2Alphas set groups_2(Alphas_3 ,Alphas_4) ,The directory ./out/1/p/ is the second groups of Alphas_2’s postings, Alphas_2 don’t set postings.
3 zeros setting --replicas=3,4.


(Daniel Mai) #4

An Alpha can only be part of one group at a time, and each group must have an odd-number of members to form a quorum.

The bulk-load p directories must be copied over to each replica. For example, if you have three Alpha replicas in Group 1, you’d make three copies of out/0/p—one for each Alpha.


(X Bee) #5

THX,I generally understand that I put one postings from the bulk into an alpha’s p directory, but I set the
three alphas to 3 when the replicas become 1 group and kill one of the processes or even 2 processes. Use ratel request last alpha, I can still get the data. Is there a problem with the client method? So far I have not used the client to access the Draph cluster. Does the client link the zero cluster or continue to access the alpha cluster?


(Arthur Wiebe) #6

I just got caught by this wondering why I got a different schema depending on which server I connected to.
Using bulk loader for the first time, and I assumed since I have 3 replicas that --reduce_shards should be 3. But I don’t even have the --idx flag set, so all 3 replicas are in the same group, and then I reduce to 1 shard, and copy that same directory to each of the replicas from what you’re saying.

Would it make sense to add a note about this to the documentation on bulk loader?


(Daniel Mai) #7

The docs will be updated to be clearer that the --reduce_shards options determines the number of Alpha groups, not total Alpha instances running.


(Daniel Mai) #8

@artooro The bulk loader docs have been updated (currently the master docs): https://docs.dgraph.io/master/deploy/#bulk-loader