Zero rebalance_interval server write error predicate_move

xesdiny · October 30, 2018, 6:32am

log has predicate_move.go:228: Proposed 5680330 keys. Error:While proposing error: context deadline exceeded

process:
1.use 21million.rdf.gz bulk to server need postings file
command:dgraph bulk -r 21million.rdf.gz -s 21million.schema --map_shards=6 --reduce_shards=2 --http localhost:19000 --zero=localhost:15080
ps: has use one zero to servers
2. setup 3 zero and 3 server cluster
use bulk result set one server postings
waiting rebalance_interval

zero_1 logs

2018/10/30 12:57:34 tablet.go:179:
Groups sorted by size: [{gid:1 size:0} {gid:2 size:0} {gid:3 size:1421860684}]
2018/10/30 12:57:34 tablet.go:184: size_diff 1421860684
2018/10/30 12:57:34 tablet.go:73: Going to move predicate: [name], size: [691 MB] from group 3 to 1
2018/10/30 13:09:00 tablet.go:221: Got error during move: While calling MovePredicate: rpc error: code = Unknown desc = While proposing error: context deadline exceeded

gid 1 logs

2018/10/30 13:08:13 predicate_move.go:41: Writing 277629 keys
2018/10/30 13:08:27 predicate_move.go:41: Writing 277629 keys
2018/10/30 13:08:43 predicate_move.go:41: Writing 277629 keys
2018/10/30 13:09:00 predicate_move.go:228: Proposed 5680330 keys. Error: While proposing error: context deadline exceeded

and gid 3 find gid 2 rebalance

zero 1 logs

2018/10/30 13:09:00 tablet.go:179:
Groups sorted by size: [{gid:2 size:0} {gid:1 size:31} {gid:3 size:1421866726}]
2018/10/30 13:09:00 tablet.go:184: size_diff 1421866726
2018/10/30 13:09:00 tablet.go:73: Going to move predicate: [name], size: [691 MB] from group 3 to 2
2018/10/30 13:12:48 raft.go:763: INFO: 1 [logterm: 2, index: 280, vote: 1] cast MsgPreVote for 3 [logterm: 2, index: 280] at term 2

gid 2 logs

2018/10/30 13:09:01 predicate_move.go:194: Got ReceivePredicate. Group: 2. Am leader: true
2018/10/30 13:09:01 predicate_move.go:143: Receiving predicate. Batching and proposing key values
2018/10/30 13:09:05 predicate_move.go:163: Predicate being received: name
2018/10/30 13:09:05 index.go:929: Dropping predicate: [name]
2018/10/30 13:09:05 schema.go:63: Deleting schema for predicate: [name]
2018/10/30 13:09:11 predicate_move.go:41: Writing 245493 keys
...
2018/10/30 13:28:59 predicate_move.go:41: Writing 717868 keys
2018/10/30 13:29:00 predicate_move.go:228: Proposed 5680330 keys. Error: context canceled
2018/10/30 13:29:01 predicate_move.go:194: Got ReceivePredicate. Group: 2. Am leader: true
2018/10/30 13:29:01 predicate_move.go:143: Receiving predicate. Batching and proposing key values
2018/10/30 13:29:04 predicate_move.go:163: Predicate being received: name

when 1h ago,zero_1 has info

2018/10/30 14:09:01 tablet.go:179:

Groups sorted by size: [{gid:2 size:0} {gid:1 size:31} {gid:3 size:1421866726}]

data no change!
gid 2 as has

2018/10/30 14:09:01 predicate_move.go:228: Proposed 5680330 keys. Error: While proposing error: context deadline exceeded

dmai · October 30, 2018, 10:08pm

For bulk loader, you should set the --reduce_shards options to the number of groups in your cluster. If you have 3 Alphas (Servers) with a replication of 1, then you have 3 groups. If there’s a replication of 3, there’s 1 group.

Once the bulk load is finished you must copy the p directories to each member of the group for the correct group. The directory ./out/0/p/ is the p directory for the first group, ./out/1/p/ for the second group, and so on.

xesdiny · October 31, 2018, 3:56am

According to your description and in combination with this article, I generally understand the relationship between replicas and groups, but in the Alphas and groups mapping, is an Alphas can only be a member of a group, set by the --idx parameter If you do not specify it, the raft will be set to cause 3 alphas and 1 replicas to be divided into 3 groups.
I have an idea that three Alphas to do 3 replicas and 2 groups of cluster configuration is not feasible? But if there are 4 Alphas, can you meet the above requirements? Is it correct for me to say?
means bulk data reduce_shards =2
2Alphas set groups_1(Alphas_1 ,Alphas_2) , The directory ./out/0/p/ is the first groups of Alphas_1’s postings, Alphas_2 don’t set postings.
2Alphas set groups_2(Alphas_3 ,Alphas_4) ,The directory ./out/1/p/ is the second groups of Alphas_2’s postings, Alphas_2 don’t set postings.
3 zeros setting --replicas=3,4.

dmai · October 31, 2018, 8:42pm

An Alpha can only be part of one group at a time, and each group must have an odd-number of members to form a quorum.

The bulk-load p directories must be copied over to each replica. For example, if you have three Alpha replicas in Group 1, you’d make three copies of out/0/p—one for each Alpha.

xesdiny · November 1, 2018, 3:39pm

THX，I generally understand that I put one postings from the bulk into an alpha’s p directory, but I set the
three alphas to 3 when the replicas become 1 group and kill one of the processes or even 2 processes. Use ratel request last alpha, I can still get the data. Is there a problem with the client method? So far I have not used the client to access the Draph cluster. Does the client link the zero cluster or continue to access the alpha cluster?

artooro · November 2, 2018, 9:46pm

I just got caught by this wondering why I got a different schema depending on which server I connected to.
Using bulk loader for the first time, and I assumed since I have 3 replicas that --reduce_shards should be 3. But I don’t even have the --idx flag set, so all 3 replicas are in the same group, and then I reduce to 1 shard, and copy that same directory to each of the replicas from what you’re saying.

Would it make sense to add a note about this to the documentation on bulk loader?

dmai · November 2, 2018, 10:21pm

The docs will be updated to be clearer that the --reduce_shards options determines the number of Alpha groups, not total Alpha instances running.

dmai · November 5, 2018, 5:32pm

@artooro The bulk loader docs have been updated (currently the master docs): https://docs.dgraph.io/master/deploy/#bulk-loader

Topic		Replies	Views
Bulk load - missing predicates Dgraph	14	1542	July 26, 2018
What is the correct process for importing data using Bulk loader? Dgraph	10	587	December 11, 2020
Unable to load bulk loaded data into Dgraph Users	4	703	March 21, 2019
Bulk load to initial multi host cluster Users	7	780	July 12, 2019
Serving bulk-loaded data (HA cluster) Dgraph kind:question	13	752	May 13, 2021

Zero rebalance_interval server write error predicate_move

Related topics