What is the correct process for importing data using Bulk loader?

zzl221000 · December 11, 2020, 7:49am

Dgraph version : v20.07.2,
raph codename : shuri-2,
Dgraph SHA-256 : a927845127dab735c24727d5a24af411168771b55236aec50f0b987e8c0ac910,
Commit SHA-1 : a7bc16d56,
Commit timestamp : 2020-10-22 10:17:53 -0700,
Branch : HEAD,
Go version : go1.14.4,

The bulk loader has generated the out directory. Then I copied the data to alpha.
After starting all the alphas, I see that zero is deleting my predicates.

I1211 07:01:29.490233      19 node.go:327] Group 0 found 0 entries,
I1211 07:01:29.490305      19 log.go:34] 1 became follower at term 0,
I1211 07:01:29.490386      19 log.go:34] newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0],
I1211 07:01:29.490411      19 log.go:34] 1 became follower at term 1,
I1211 07:01:29.490622      19 run.go:374] Running Dgraph Zero...,
E1211 07:01:29.491082      19 raft.go:456] While proposing CID: Not Zero leader. Aborting proposal: cid:"38255841-af27-4e2a-adc4-2b733dbc8211" . Retrying...,
I1211 07:01:29.491132      19 node.go:186] Setting conf state to nodes:1 ,
I1211 07:01:29.491842      19 raft.go:757] Done applying conf change at 0x1,
I1211 07:01:30.355162      19 pool.go:160] CONNECTING to 192.168.3.11:5080,
I1211 07:01:30.355182      19 node.go:587] Trying to add 0x2 to cluster. Addr: 192.168.3.11:5080,
I1211 07:01:30.355190      19 node.go:588] Current confstate at 0x1: nodes:1 ,
I1211 07:01:30.490894      19 log.go:34] 1 no leader at term 1; dropping index reading msg,
I1211 07:01:32.090746      19 log.go:34] 1 is starting a new election at term 1,
I1211 07:01:32.090774      19 log.go:34] 1 became pre-candidate at term 1,
I1211 07:01:32.090829      19 log.go:34] 1 received MsgPreVoteResp from 1 at term 1,
I1211 07:01:32.090863      19 log.go:34] 1 became candidate at term 2,
I1211 07:01:32.090874      19 log.go:34] 1 received MsgVoteResp from 1 at term 2,
I1211 07:01:32.090902      19 log.go:34] 1 became leader at term 2,
I1211 07:01:32.090925      19 log.go:34] raft.node: 1 elected leader 1 at term 2,
I1211 07:01:32.091001      19 raft.go:722] I've become the leader, updating leases.,
I1211 07:01:32.091028      19 assign.go:42] Updated Lease id: 1. Txn Ts: 1,
I1211 07:01:32.091860      19 node.go:186] Setting conf state to nodes:1 nodes:2 ,
I1211 07:01:32.091919      19 raft.go:757] Done applying conf change at 0x1,
I1211 07:01:32.091923      19 node.go:750] [0x2] Done joining cluster with err: <nil>,
W1211 07:01:32.490897      19 node.go:675] [0x1] Read index context timed out,
E1211 07:01:32.492094      19 raft.go:456] While proposing CID: Not Zero leader. Aborting proposal: cid:"092d68f8-0b34-4076-8e1c-9fef69f77fed" . Retrying...,
I1211 07:01:35.493880      19 raft.go:449] CID set for cluster: 909e2059-ae11-4824-86fb-d02e16b59e62,
I1211 07:01:35.495225      19 license_ee.go:45] Enterprise trial license proposed to the cluster: key:"z1-6233833293846793503" license:<maxNodes:18446744073709551615 expiryTs:1610262095 > ,
I1211 07:01:44.068743      19 pool.go:160] CONNECTING to 192.168.3.12:5080,
I1211 07:01:44.068812      19 node.go:587] Trying to add 0x3 to cluster. Addr: 192.168.3.12:5080,
I1211 07:01:44.068827      19 node.go:588] Current confstate at 0x1: nodes:1 nodes:2 ,
I1211 07:01:44.070964      19 node.go:186] Setting conf state to nodes:1 nodes:2 nodes:3 ,
I1211 07:01:44.071018      19 node.go:750] [0x3] Done joining cluster with err: <nil>,
I1211 07:01:44.071030      19 raft.go:757] Done applying conf change at 0x1,
I1211 07:02:56.168820      19 zero.go:422] Got connection request: cluster_info_only:true ,
I1211 07:02:56.169947      19 zero.go:440] Connected: cluster_info_only:true ,
I1211 07:02:56.171520      19 zero.go:422] Got connection request: group_id:1 addr:"192.168.3.9:7085" force_group_id:true ,
I1211 07:02:56.172500      19 pool.go:160] CONNECTING to 192.168.3.9:7085,
I1211 07:02:56.174247      19 zero.go:574] Connected: id:1 group_id:1 addr:"192.168.3.9:7085" force_group_id:true ,
W1211 07:02:56.174549      19 pool.go:254] Connection lost with 192.168.3.9:7085. Error: rpc error: code = Unknown desc = No node has been set up yet,
I1211 07:02:58.380906      19 zero.go:422] Got connection request: cluster_info_only:true ,
I1211 07:02:58.381870      19 zero.go:440] Connected: cluster_info_only:true ,
I1211 07:02:58.383003      19 zero.go:422] Got connection request: group_id:1 addr:"192.168.3.12:7085" force_group_id:true ,
I1211 07:02:58.383865      19 pool.go:160] CONNECTING to 192.168.3.12:7085,
I1211 07:02:58.385285      19 zero.go:574] Connected: id:2 group_id:1 addr:"192.168.3.12:7085" force_group_id:true ,
W1211 07:02:58.385346      19 pool.go:254] Connection lost with 192.168.3.12:7085. Error: rpc error: code = Unknown desc = No node has been set up yet,
I1211 07:03:13.387470      19 zero.go:422] Got connection request: cluster_info_only:true ,
I1211 07:03:13.388543      19 zero.go:440] Connected: cluster_info_only:true ,
I1211 07:03:13.390325      19 zero.go:422] Got connection request: group_id:1 addr:"192.168.3.11:7085" force_group_id:true ,
I1211 07:03:13.391422      19 pool.go:160] CONNECTING to 192.168.3.11:7085,
I1211 07:03:13.392958      19 zero.go:574] Connected: id:3 group_id:1 addr:"192.168.3.11:7085" force_group_id:true ,
W1211 07:03:13.393311      19 pool.go:254] Connection lost with 192.168.3.11:7085. Error: rpc error: code = Unknown desc = No node has been set up yet,
I1211 07:04:22.502151      19 zero.go:422] Got connection request: cluster_info_only:true ,
I1211 07:04:22.503127      19 zero.go:440] Connected: cluster_info_only:true ,
I1211 07:04:22.504500      19 zero.go:422] Got connection request: group_id:2 addr:"192.168.3.9:7086" force_group_id:true ,
I1211 07:04:22.505394      19 pool.go:160] CONNECTING to 192.168.3.9:7086,
I1211 07:04:22.506839      19 zero.go:574] Connected: id:4 group_id:2 addr:"192.168.3.9:7086" force_group_id:true ,
W1211 07:04:22.507300      19 pool.go:254] Connection lost with 192.168.3.9:7086. Error: rpc error: code = Unknown desc = No node has been set up yet,
I1211 07:04:30.207414      19 zero.go:422] Got connection request: cluster_info_only:true ,
I1211 07:04:30.208426      19 zero.go:440] Connected: cluster_info_only:true ,
I1211 07:04:30.210127      19 zero.go:422] Got connection request: group_id:3 addr:"192.168.3.9:7087" force_group_id:true ,
I1211 07:04:30.211150      19 pool.go:160] CONNECTING to 192.168.3.9:7087,
I1211 07:04:30.212692      19 zero.go:574] Connected: id:5 group_id:3 addr:"192.168.3.9:7087" force_group_id:true ,
W1211 07:04:30.213012      19 pool.go:254] Connection lost with 192.168.3.9:7087. Error: rpc error: code = Unknown desc = No node has been set up yet,
I1211 07:04:36.136624      19 zero.go:422] Got connection request: cluster_info_only:true ,
I1211 07:04:36.137736      19 zero.go:440] Connected: cluster_info_only:true ,
I1211 07:04:36.139102      19 zero.go:422] Got connection request: group_id:2 addr:"192.168.3.12:7086" force_group_id:true ,
I1211 07:04:36.140076      19 pool.go:160] CONNECTING to 192.168.3.12:7086,
I1211 07:04:36.141496      19 zero.go:574] Connected: id:6 group_id:2 addr:"192.168.3.12:7086" force_group_id:true ,
W1211 07:04:36.141566      19 pool.go:254] Connection lost with 192.168.3.12:7086. Error: rpc error: code = Unknown desc = No node has been set up yet,
I1211 07:04:37.032653      19 zero.go:422] Got connection request: cluster_info_only:true ,
I1211 07:04:37.033780      19 zero.go:440] Connected: cluster_info_only:true ,
I1211 07:04:37.035476      19 zero.go:422] Got connection request: group_id:2 addr:"192.168.3.11:7086" force_group_id:true ,
I1211 07:04:37.036415      19 pool.go:160] CONNECTING to 192.168.3.11:7086,
W1211 07:04:37.037890      19 pool.go:254] Connection lost with 192.168.3.11:7086. Error: rpc error: code = Unknown desc = No node has been set up yet,
I1211 07:04:37.037934      19 zero.go:574] Connected: id:7 group_id:2 addr:"192.168.3.11:7086" force_group_id:true ,
I1211 07:04:46.745022      19 zero.go:422] Got connection request: cluster_info_only:true ,
I1211 07:04:46.746037      19 zero.go:440] Connected: cluster_info_only:true ,
I1211 07:04:46.747465      19 zero.go:422] Got connection request: group_id:3 addr:"192.168.3.11:7087" force_group_id:true ,
I1211 07:04:46.748614      19 pool.go:160] CONNECTING to 192.168.3.11:7087,
I1211 07:04:46.750215      19 zero.go:574] Connected: id:8 group_id:3 addr:"192.168.3.11:7087" force_group_id:true ,
W1211 07:04:46.750302      19 pool.go:254] Connection lost with 192.168.3.11:7087. Error: rpc error: code = Unknown desc = No node has been set up yet,
I1211 07:04:51.074087      19 zero.go:422] Got connection request: cluster_info_only:true ,
I1211 07:04:51.074986      19 zero.go:440] Connected: cluster_info_only:true ,
I1211 07:04:51.076373      19 zero.go:422] Got connection request: group_id:3 addr:"192.168.3.12:7087" force_group_id:true ,
I1211 07:04:51.077396      19 pool.go:160] CONNECTING to 192.168.3.12:7087,
W1211 07:04:51.079036      19 pool.go:254] Connection lost with 192.168.3.12:7087. Error: rpc error: code = Unknown desc = No node has been set up yet,
I1211 07:04:51.079096      19 zero.go:574] Connected: id:9 group_id:3 addr:"192.168.3.12:7087" force_group_id:true ,
I1211 07:07:56.183525      19 zero.go:709] Tablet: Company.invest does not belong to group: 1. Sending delete instruction.,
I1211 07:07:56.401964      19 zero.go:709] Tablet: Person.ownId does not belong to group: 1. Sending delete instruction.,
I1211 07:08:00.718691      19 zero.go:709] Tablet: RlNode.rlid does not belong to group: 1. Sending delete instruction.,
W1211 07:08:06.181112      19 zero.go:673] While deleting predicates: rpc error: code = DeadlineExceeded desc = context deadline exceeded,
I1211 07:09:22.512973      19 zero.go:709] Tablet: Company.wasManager does not belong to group: 2. Sending delete instruction.,
I1211 07:09:24.530768      19 zero.go:709] Tablet: Company.invested does not belong to group: 2. Sending delete instruction.,
I1211 07:09:24.745519      19 zero.go:709] Tablet: Person.manager does not belong to group: 2. Sending delete instruction.,
I1211 07:09:26.899405      19 zero.go:709] Tablet: Person.legal does not belong to group: 2. Sending delete instruction.,
I1211 07:09:29.490426      19 tablet.go:208] ,
,
Groups sorted by size: [{gid:2 size:0} {gid:3 size:0} {gid:1 size:4443749634}],
,
I1211 07:09:29.490489      19 tablet.go:213] size_diff 4443749634,
I1211 07:09:29.490500      19 tablet.go:213] size_diff 0,
I1211 07:09:29.648343      19 zero.go:709] Tablet: Person.name does not belong to group: 2. Sending delete instruction.,
W1211 07:09:32.511499      19 zero.go:673] While deleting predicates: rpc error: code = DeadlineExceeded desc = context deadline exceeded,
I1211 07:17:29.490365      19 tablet.go:208] ,
,
Groups sorted by size: [{gid:2 size:0} {gid:3 size:0} {gid:1 size:4443749634}],
,
I1211 07:17:29.490405      19 tablet.go:213] size_diff 4443749634,
I1211 07:17:29.490421      19 tablet.go:213] size_diff 0,
I1211 07:25:29.490393      19 tablet.go:208] ,
,
Groups sorted by size: [{gid:2 size:0} {gid:3 size:0} {gid:1 size:4443749634}],
,
I1211 07:25:29.490440      19 tablet.go:213] size_diff 4443749634,
I1211 07:25:29.490472      19 tablet.go:213] size_diff 0,
I1211 07:33:29.490455      19 tablet.go:208] ,

Zero’s stat endpoint shows that there is no my schema.
So should I wait? Or is there a problem with the entire cluster

dgraph bulk -f ./ --reduce_shards=3 --map_shards=6 --zero=192.168.3.11:5080 -s empty.schema -g graphql.schema --xidmap ./map --ignore_errors

Copy the p under the 0, 1, and 2 directories to the alpha directory of nine nodes, one group for every three nodes, and then start alpha, the command is as follows:

dgraph alpha --my=192.168.3.11:7080 --zero=192.168.3.9:5080,192.168.3.11:5080,192.168.3.12:5080

MichelDiz · December 11, 2020, 1:19pm

Share the steps you have done to bulk load. Also, the steps to start the cluster in details.

zzl221000 · December 11, 2020, 2:24pm

I found that using an empty schema with graphql schema will cause data loss. All nodes are empty and only uids are assigned.

Similar to the following rdf data, it is normal to be around 400,000, but there will be a lot of empty nodes when the 40 million data is imported using bulk.

<997b85ba196f8dd29ae594911276302f> <RlNode.rlid> "997b85ba196f8dd29ae594911276302f" .
<997b85ba196f8dd29ae594911276302f> <Company.name> "恒润集团(香港)有限公司" .
<997b85ba196f8dd29ae594911276302f> <dgraph.type> "Company" .
<cb95ee9ab75d96710be5e601d8776d0e> <RlNode.rlid> "cb95ee9ab75d96710be5e601d8776d0e" .
<cb95ee9ab75d96710be5e601d8776d0e> <Company.name> "中瑞鑫达(深圳)科技有限公司" .
<cb95ee9ab75d96710be5e601d8776d0e> <dgraph.type> "Company" .
<cb95ee9ab75d96710be5e601d8776d0e> <Company.status> "存续" .
<cb95ee9ab75d96710be5e601d8776d0e> <Company.province> "44" .
<997b85ba196f8dd29ae594911276302f> <Company.invest> <cb95ee9ab75d96710be5e601d8776d0e> (role="投资",guquan="100.00%") .
<cb95ee9ab75d96710be5e601d8776d0e> <Company.invested> <997b85ba196f8dd29ae594911276302f> (role="投资",guquan="100.00%") .
<e5771e636aedf4764170311809d6cdaf> <RlNode.rlid> "e5771e636aedf4764170311809d6cdaf" .
<e5771e636aedf4764170311809d6cdaf> <Company.name> "武汉造纸厂" .
<e5771e636aedf4764170311809d6cdaf> <dgraph.type> "Company" .
<cb70a6fe12bbe432474bd588c6bcde7c> <RlNode.rlid> "cb70a6fe12bbe432474bd588c6bcde7c" .
<cb70a6fe12bbe432474bd588c6bcde7c> <Company.name> "武汉远东纸制品有限公司" .
<cb70a6fe12bbe432474bd588c6bcde7c> <dgraph.type> "Company" .
<cb70a6fe12bbe432474bd588c6bcde7c> <Company.status> "吊销" .
<e5771e636aedf4764170311809d6cdaf> <Company.invest> <cb70a6fe12bbe432474bd588c6bcde7c> (role="投资",guquan="45.00%") .
<cb70a6fe12bbe432474bd588c6bcde7c> <Company.invested> <e5771e636aedf4764170311809d6cdaf> (role="投资",guquan="45.00%") .

Add my graphql schema

interface RlNode {
    id: ID!
    rlid: String! @id @search(by: [hash])
}
type Company implements RlNode {
    name: String! @search(by: [fulltext])
    status: String
    province: String
    invest: [Company] @hasInverse(field: invested)
    invested: [Company] 
    wasLegal: Person 
    wasManager: [Person] 
    wasShareholder: [Person] 
    partner: [Company]
    wasPartner: Company
}
type Person implements RlNode {
    name: String! @search(by: [hash])
    legal:[Company] @hasInverse(field: wasLegal)
    manager: [Company] @hasInverse(field: wasManager)
    shareholder: [Company] @hasInverse(field: wasShareholder)
    pid: String @search(by: [hash])
    ownId: [String] @search(by: [hash])
    avatar: String
    introduction: String
}

MichelDiz · December 11, 2020, 2:41pm

I see that you have updated your main comment. Please warn when you do so, or just add it as an answer. So we don’t miss it. When we read the first inputs, we tend to not read them again. You can just say “I have updated my comment”.

I’ll go through it and get back to you.

MichelDiz · December 11, 2020, 2:52pm

Are you using replica 3? In that case you have to have 9 Alphas.
Update:
BTW, you should copy the p dir to a single alpha on each group.

zzl221000 · December 11, 2020, 2:56pm

Yes, zero replica 3

MichelDiz · December 11, 2020, 2:58pm

After you start the cluster, Dgraph creates the DQL schema?

zzl221000 · December 11, 2020, 3:05pm

Can not find my schema in ratel.

Add a little alpha log:

I1210 16:37:26.651327      18 groups.go:900] Got Zero leader: 192.168.3.9:5080,
I1210 16:37:26.657526      18 groups.go:483] Serving tablet for: dgraph.graphql.schema,
I1210 16:37:26.659825      18 groups.go:483] Serving tablet for: dgraph.rule.predicate,
I1210 16:37:26.662378      18 groups.go:483] Serving tablet for: dgraph.rule.permission,
I1210 16:37:26.664246      18 groups.go:483] Serving tablet for: dgraph.xid,
I1210 16:37:26.666446      18 groups.go:483] Serving tablet for: dgraph.type,
I1210 16:37:26.668254      18 groups.go:483] Serving tablet for: dgraph.acl.rule,
I1210 16:37:26.670207      18 groups.go:483] Serving tablet for: dgraph.graphql.xid,
I1210 16:37:26.672156      18 access_ee.go:372] ResetAcl closed,
I1210 16:37:26.672170      18 access_ee.go:309] RefreshAcls closed,
I1210 16:37:29.779685      18 log.go:34] LOG Compact 2->3, del 1 tables, add 2 tables, took 5.058106496s,
I1210 16:37:29.779768      18 log.go:34] [Compactor: 1] Compaction for level: 2 DONE,
I1210 16:37:30.550110      18 admin.go:616] Successfully loaded GraphQL schema.  Serving GraphQL API.,
I1210 16:39:18.448662      18 pool.go:160] CONNECTING to 192.168.3.11:7085,
I1210 16:39:18.448726      18 node.go:587] Trying to add 0x2 to cluster. Addr: 192.168.3.11:7085,
I1210 16:39:18.448736      18 node.go:588] Current confstate at 0x1: nodes:1 ,
I1210 16:39:18.449274      18 node.go:186] Setting conf state to nodes:1 nodes:2 ,
I1210 16:39:18.449334      18 node.go:750] [0x2] Done joining cluster with err: <nil>,
I1210 16:39:20.459249      18 draft.go:174] Operation started with id: opIndexing,
I1210 16:39:20.459420      18 draft.go:118] Operation completed with id: opRollup,
I1210 16:39:20.459472      18 draft.go:118] Operation completed with id: opIndexing,
I1210 16:39:24.023059      18 draft.go:174] Operation started with id: opIndexing,
I1210 16:39:24.023222      18 draft.go:118] Operation completed with id: opIndexing,
I1210 16:39:24.142293      18 draft.go:174] Operation started with id: opIndexing,
I1210 16:39:24.142515      18 draft.go:118] Operation completed with id: opIndexing,
I1210 16:39:24.146986      18 draft.go:174] Operation started with id: opIndexing,
I1210 16:39:24.147159      18 draft.go:118] Operation completed with id: opIndexing,
I1210 16:39:30.459648      18 draft.go:174] Operation started with id: opRollup,
I1210 16:40:58.156050      18 pool.go:160] CONNECTING to 192.168.3.12:7085,
I1210 16:40:58.156081      18 node.go:587] Trying to add 0x3 to cluster. Addr: 192.168.3.12:7085,
I1210 16:40:58.156090      18 node.go:588] Current confstate at 0x1: nodes:1 nodes:2 ,
I1210 16:40:58.157930      18 node.go:186] Setting conf state to nodes:1 nodes:2 nodes:3 ,
I1210 16:40:58.157998      18 node.go:750] [0x3] Done joining cluster with err: <nil>,
I1210 16:41:02.990685      18 draft.go:174] Operation started with id: opIndexing,
I1210 16:41:02.990758      18 draft.go:118] Operation completed with id: opRollup,
I1210 16:41:02.990920      18 draft.go:118] Operation completed with id: opIndexing,
I1210 16:41:04.031013      18 draft.go:174] Operation started with id: opIndexing,
I1210 16:41:04.031169      18 draft.go:118] Operation completed with id: opIndexing,
I1210 16:41:12.991180      18 draft.go:174] Operation started with id: opRollup,
I1210 16:42:25.656875      18 predicate_move.go:195] Was instructed to delete tablet: Person.ownId,
I1210 16:42:25.658649      18 index.go:1238] Dropping predicate: [Person.ownId],
I1210 16:42:25.658700      18 log.go:34] DropPrefix Called,
I1210 16:42:25.658762      18 log.go:34] Writes flushed. Stopping compactions now...,
I1210 16:42:25.658785      18 log.go:34] Storing value log head: {Fid:0 Len:29 Offset:548339152},
I1210 16:42:25.739420      18 log.go:34] Dropping prefix at level 3 (1 tableGroups),
I1210 16:42:30.029897      18 log.go:34] LOG Compact 3->3, del 91 tables, add 0 tables, took 4.290438424s,
I1210 16:42:30.030107      18 log.go:34] [Compactor: 174] Running compaction: {level:0 score:1.74 dropPrefixes:[[0 0 12 80 101 114 115 111 110 46 111 119 110 73 100] [33 98 97 100 103 101 114 33 109 111 118 101 0 0 12 80 101 114 115 111 110 46 111 119 110 73 100]]} for level: 0,
I1210 16:42:30.035270      18 log.go:34] LOG Compact 0->1, del 2 tables, add 1 tables, took 5.002157ms,
I1210 16:42:30.035313      18 log.go:34] [Compactor: 174] Compaction for level: 0 DONE,
I1210 16:42:30.035437      18 log.go:34] DropPrefix done,
I1210 16:42:30.035452      18 log.go:34] Resuming writes,
I1210 16:42:30.035494      18 schema.go:103] Deleting schema for predicate: [Person.ownId],
I1210 16:42:30.036721      18 predicate_move.go:195] Was instructed to delete tablet: RlNode.rlid,
I1210 16:42:30.038190      18 index.go:1238] Dropping predicate: [RlNode.rlid],
I1210 16:42:30.038205      18 log.go:34] DropPrefix Called,

zzl221000 · December 11, 2020, 3:17pm

I missed an important message.
Due to out of memory, the reduce phase was interrupted, and then I used the skip map phase to continue, successfully generating data.

MichelDiz · December 11, 2020, 3:39pm

Do you still have the issue?

zzl221000 · December 11, 2020, 11:31pm

The problem is not solved.

Topic		Replies	Views
Cannot import with bulk loader Dgraph kind:question , area:bulk-loader	12	1118	May 12, 2022
Documentation issue with bulk loader Dgraph dgraph	2	521	January 24, 2022
Bulk Loader - Deploy Documentation	0	895	December 16, 2020
About bulk loader Users	7	1856	September 12, 2018
Bulk-Import does not really work but live successful Dgraph kind:question , dgraph	5	465	October 6, 2020

What is the correct process for importing data using Bulk loader?

Add my graphql schema

Related topics