Cannnot find the data after bulk load

duishuhaoqi · July 12, 2021, 2:50am

I bulk load the 1million.rdf gz as follows:
start one zero

./dgraph zero  --my master:5080 --replicas 3

then bulk laod data using this command

./dgraph bulk -f 1million.rdf.gz -s 1million.schema

the output is

[syy@master Dgraph]$ ./dgraph bulk -f 1million.rdf.gz -s 1million.schema 
[Decoder]: Using assembly version of decoder
Page Size: 4096
I0712 10:23:36.247798   22040 init.go:107] 

Dgraph version   : v20.11.0
Dgraph codename  : tchalla
Dgraph SHA-256   : 8acb886b24556691d7d74929817a4ac7d9db76bb8b77de00f44650931a16b6ac
Commit SHA-1     : c4245ad55
Commit timestamp : 2020-12-16 15:55:40 +0530
Branch           : HEAD
Go version       : go1.15.5
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs/.
For discussions about Dgraph     , visit http://discuss.dgraph.io.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.


I0712 10:23:36.248197   22040 util_ee.go:126] KeyReader instantiated of type <nil>
Encrypted input: false; Encrypted output: false
{
	"DataFiles": "1million.rdf.gz",
	"DataFormat": "",
	"SchemaFile": "1million.schema",
	"GqlSchemaFile": "",
	"OutDir": "./out",
	"ReplaceOutDir": false,
	"TmpDir": "tmp",
	"NumGoroutines": 1,
	"MapBufSize": 2147483648,
	"PartitionBufSize": 4194304,
	"SkipMapPhase": false,
	"CleanupTmp": true,
	"NumReducers": 1,
	"Version": false,
	"StoreXids": false,
	"ZeroAddr": "localhost:5080",
	"HttpAddr": "localhost:8080",
	"IgnoreErrors": false,
	"CustomTokenizers": "",
	"NewUids": false,
	"ClientDir": "",
	"Encrypted": false,
	"EncryptedOut": false,
	"MapShards": 1,
	"ReduceShards": 1,
	"EncryptionKey": null,
	"BadgerCompression": 1,
	"BadgerCompressionLevel": 0,
	"BlockCacheSize": 46976204,
	"IndexCacheSize": 20132659
}

The bulk loader needs to open many files at once. This number depends on the size of the data set loaded, the map file output size, and the level of indexing. 100,000 is adequate for most data set sizes. See `man ulimit` for details of how to change the limit.
Current max open files limit: 204800

Connecting to zero at localhost:5080
___ Begin jemalloc statistics ___
Version: "5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756"
Build-time option settings
  config.cache_oblivious: true
  config.debug: false
  config.fill: true
  config.lazy_lock: false
  config.malloc_conf: "background_thread:true,metadata_thp:auto"
  config.opt_safety_checks: false
  config.prof: true
  config.prof_libgcc: true
  config.prof_libunwind: false
  config.stats: true
  config.utrace: false
  config.xmalloc: false
Run-time option settings
  opt.abort: false
  opt.abort_conf: false
  opt.confirm_conf: false
  opt.retain: true
  opt.dss: "secondary"
  opt.narenas: 1
  opt.percpu_arena: "disabled"
  opt.oversize_threshold: 8388608
  opt.metadata_thp: "auto"
  opt.background_thread: true (background_thread: true)
  opt.dirty_decay_ms: 10000 (arenas.dirty_decay_ms: 10000)
  opt.muzzy_decay_ms: 0 (arenas.muzzy_decay_ms: 0)
  opt.lg_extent_max_active_fit: 6
  opt.junk: "false"
  opt.zero: false
  opt.tcache: true
  opt.lg_tcache_max: 15
  opt.thp: "default"
  opt.prof: false
  opt.prof_prefix: "jeprof"
  opt.prof_active: true (prof.active: false)
  opt.prof_thread_active_init: true (prof.thread_active_init: false)
  opt.lg_prof_sample: 19 (prof.lg_sample: 0)
  opt.prof_accum: false
  opt.lg_prof_interval: -1
  opt.prof_gdump: false
  opt.prof_final: false
  opt.prof_leak: false
  opt.stats_print: false
  opt.stats_print_opts: ""
Profiling settings
  prof.thread_active_init: false
  prof.active: false
  prof.gdump: false
  prof.interval: 0
  prof.lg_sample: 0
Arenas: 2
Quantum size: 16
Page size: 4096
Maximum thread-cached size class: 32768
Number of bin size classes: 36
Number of thread-cache bin size classes: 41
Number of large size classes: 196
Allocated: 58168, active: 73728, metadata: 2369000 (n_thp 0), resident: 2392064, mapped: 6365184, retained: 2023424
Background threads: 1, num_runs: 1, run_interval: 0 ns
--- End jemalloc statistics ---
Processing file (1 out of 1): 1million.rdf.gz
[10:23:37+0800] MAP 01s nquad_count:45.97k err_count:0.000 nquad_speed:45.61k/sec edge_count:808.7k edge_speed:802.3k/sec jemalloc: 0 B 
[10:23:38+0800] MAP 02s nquad_count:197.0k err_count:0.000 nquad_speed:95.16k/sec edge_count:1.742M edge_speed:841.3k/sec jemalloc: 0 B 
[10:23:39+0800] MAP 03s nquad_count:331.1k err_count:0.000 nquad_speed:108.8k/sec edge_count:2.285M edge_speed:751.0k/sec jemalloc: 0 B 
[10:23:40+0800] MAP 04s nquad_count:477.9k err_count:0.000 nquad_speed:118.8k/sec edge_count:2.870M edge_speed:713.3k/sec jemalloc: 0 B 
[10:23:41+0800] MAP 05s nquad_count:601.3k err_count:0.000 nquad_speed:120.0k/sec edge_count:3.357M edge_speed:670.2k/sec jemalloc: 0 B 
[10:23:42+0800] MAP 06s nquad_count:731.4k err_count:0.000 nquad_speed:121.7k/sec edge_count:3.876M edge_speed:645.0k/sec jemalloc: 0 B 
[10:23:43+0800] MAP 07s nquad_count:875.2k err_count:0.000 nquad_speed:124.8k/sec edge_count:4.386M edge_speed:625.5k/sec jemalloc: 0 B 
[10:23:44+0800] MAP 08s nquad_count:1.042M err_count:0.000 nquad_speed:130.1k/sec edge_count:4.719M edge_speed:589.3k/sec jemalloc: 97 MiB 
[10:23:45+0800] MAP 09s nquad_count:1.042M err_count:0.000 nquad_speed:115.6k/sec edge_count:4.719M edge_speed:523.5k/sec jemalloc: 97 MiB 
[10:23:46+0800] MAP 10s nquad_count:1.042M err_count:0.000 nquad_speed:104.1k/sec edge_count:4.719M edge_speed:471.7k/sec jemalloc: 97 MiB 
[10:23:47+0800] MAP 11s nquad_count:1.042M err_count:0.000 nquad_speed:94.64k/sec edge_count:4.719M edge_speed:428.7k/sec jemalloc: 0 B 
[10:23:48+0800] MAP 12s nquad_count:1.042M err_count:0.000 nquad_speed:86.71k/sec edge_count:4.719M edge_speed:392.8k/sec jemalloc: 0 B 
[10:23:49+0800] MAP 13s nquad_count:1.042M err_count:0.000 nquad_speed:80.07k/sec edge_count:4.719M edge_speed:362.7k/sec jemalloc: 0 B 
Shard tmp/map_output/000 -> Reduce tmp/shards/shard_0/000
badger 2021/07/12 10:23:49 INFO: All 0 tables opened in 0s
badger 2021/07/12 10:23:49 INFO: Discard stats nextEmptySlot: 0
badger 2021/07/12 10:23:49 INFO: Set nextTxnTs to 0
badger 2021/07/12 10:23:49 INFO: All 0 tables opened in 0s
badger 2021/07/12 10:23:49 INFO: Discard stats nextEmptySlot: 0
badger 2021/07/12 10:23:49 INFO: Set nextTxnTs to 0
badger 2021/07/12 10:23:49 INFO: DropAll called. Blocking writes...
badger 2021/07/12 10:23:49 INFO: Writes flushed. Stopping compactions now...
badger 2021/07/12 10:23:49 INFO: Deleted 0 SSTables. Now deleting value logs...
badger 2021/07/12 10:23:49 INFO: Value logs deleted. Creating value log file: 1
badger 2021/07/12 10:23:49 INFO: Deleted 1 value log files. DropAll done.
Num Encoders: 1
[10:23:50+0800] REDUCE 14s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding MBs: 0. jemalloc: 192 MiB 
Final Histogram of buffer sizes: 
 -- Histogram: 
Min value: 185112336 
Max value: 185112336 
Mean: 185112336.00 
Count: 1 
[128 MiB, 256 MiB) 1 100.00% 
 --

[10:23:51+0800] REDUCE 15s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding MBs: 176. jemalloc: 481 MiB 
[10:23:52+0800] REDUCE 16s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding MBs: 176. jemalloc: 481 MiB 
[10:23:53+0800] REDUCE 17s 15.48% edge_count:730.5k edge_speed:243.5k/sec plist_count:277.2k plist_speed:92.41k/sec. Num Encoding MBs: 176. jemalloc: 644 MiB 
[10:23:54+0800] REDUCE 18s 76.42% edge_count:3.606M edge_speed:901.5k/sec plist_count:489.8k plist_speed:122.5k/sec. Num Encoding MBs: 176. jemalloc: 644 MiB 
[10:23:55+0800] REDUCE 19s 87.17% edge_count:4.114M edge_speed:820.9k/sec plist_count:784.3k plist_speed:156.5k/sec. Num Encoding MBs: 176. jemalloc: 644 MiB 
[10:23:56+0800] REDUCE 20s 98.11% edge_count:4.630M edge_speed:772.2k/sec plist_count:1.090M plist_speed:181.8k/sec. Num Encoding MBs: 176. jemalloc: 644 MiB 
[10:23:57+0800] REDUCE 21s 100.00% edge_count:4.719M edge_speed:669.4k/sec plist_count:1.179M plist_speed:167.2k/sec. Num Encoding MBs: 0. jemalloc: 1.5 GiB 
Finishing stream id: 1
Finishing stream id: 2
Finishing stream id: 3
badger 2021/07/12 10:23:58 INFO: Table created: 2 at level: 6 for stream: 2. Size: 613 kB
Finishing stream id: 4
badger 2021/07/12 10:23:58 INFO: Table created: 3 at level: 6 for stream: 3. Size: 464 kB
badger 2021/07/12 10:23:58 INFO: Table created: 4 at level: 6 for stream: 4. Size: 2.0 MB
badger 2021/07/12 10:23:58 INFO: Table created: 1 at level: 6 for stream: 1. Size: 41 MB
Finishing stream id: 5
Finishing stream id: 6
badger 2021/07/12 10:23:58 INFO: Table created: 6 at level: 6 for stream: 6. Size: 349 kB
badger 2021/07/12 10:23:58 INFO: Table created: 5 at level: 6 for stream: 5. Size: 3.6 MB
Finishing stream id: 7
[10:23:58+0800] REDUCE 22s 100.00% edge_count:4.719M edge_speed:588.2k/sec plist_count:1.179M plist_speed:147.0k/sec. Num Encoding MBs: 0. jemalloc: 1.9 GiB 
Finishing stream id: 8
Finishing stream id: 9
badger 2021/07/12 10:23:58 INFO: Table created: 7 at level: 6 for stream: 7. Size: 2.3 MB
badger 2021/07/12 10:23:58 INFO: Table created: 8 at level: 6 for stream: 8. Size: 3.0 MB
badger 2021/07/12 10:23:58 INFO: Table created: 9 at level: 6 for stream: 9. Size: 269 kB
Writing count index for "genre" rev=false
Writing count index for "starring" rev=false
Writing count index for "director.film" rev=false
Writing count index for "actor.film" rev=false
Writing split lists back to the main DB now
badger 2021/07/12 10:23:58 INFO: copying split keys to main DB Sending batch of size: 1.5 MB.
badger 2021/07/12 10:23:58 INFO: copying split keys to main DB Sent data of size 1.5 MiB
badger 2021/07/12 10:23:58 INFO: Table created: 15 at level: 6 for stream: 12. Size: 85 kB
badger 2021/07/12 10:23:58 INFO: Table created: 14 at level: 6 for stream: 16. Size: 204 kB
badger 2021/07/12 10:23:58 INFO: Table created: 13 at level: 6 for stream: 10. Size: 2.2 MB
badger 2021/07/12 10:23:58 INFO: Table created: 10 at level: 6 for stream: 13. Size: 222 kB
badger 2021/07/12 10:23:58 INFO: Table created: 11 at level: 6 for stream: 17. Size: 612 kB
badger 2021/07/12 10:23:58 INFO: Table created: 16 at level: 6 for stream: 14. Size: 99 kB
badger 2021/07/12 10:23:58 INFO: Table created: 12 at level: 6 for stream: 11. Size: 84 kB
badger 2021/07/12 10:23:58 INFO: Resuming writes
badger 2021/07/12 10:23:58 INFO: Lifetime L0 stalled for: 0s
badger 2021/07/12 10:23:58 INFO: 
Level 0 [ ]: NumTables: 01. Size: 996 B of 0 B. Score: 0.00->0.00 Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 5 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 6 [ ]: NumTables: 16. Size: 55 MiB of 55 MiB. Score: 0.00->0.00 Target FileSize: 4.0 MiB
Level Done
badger 2021/07/12 10:23:58 INFO: Lifetime L0 stalled for: 0s
badger 2021/07/12 10:23:58 INFO: 
Level 0 [ ]: NumTables: 01. Size: 1.5 MiB of 0 B. Score: 0.00->0.00 Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 5 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 6 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level Done
[10:23:58+0800] REDUCE 22s 100.00% edge_count:4.719M edge_speed:562.4k/sec plist_count:1.179M plist_speed:140.5k/sec. Num Encoding MBs: 0. jemalloc: 0 B 
Total: 22s

It shows that an out file are generated and it contains a 0 file, then I copy the out file to the other two alpha , and start the three alpha. And the three alpha all show the message:

Creating snapshot at Index: 91, ReadTs: 41

It seems that data has been bulk loaded into Dgraph, but I cannot get the data through query. Is my load process correct?
The version of dgraph is 20.11.1, and I refered this page.

iluminae · July 12, 2021, 4:09am

Sounds like you copied the 0 directory not the underlying p directory.

The 0 is the group number, so if you had 4 groups, it would have 4 directories: 0,1,2,3 each with a p directory.

The p directory is what should be copied into the working directory of dgraph alpha when it starts.

See here for more on bulk loading process.

duishuhaoqi · July 12, 2021, 8:59am

Thank you very much and that is the reason. And I have another question. How to bulk load triples with a facet which is datetime. I have not find a sample. What is the format of rdf date file and how to write the schema?

iluminae · July 12, 2021, 1:30pm

This includes information about RDF and a date example, with a link to the RDF rfc data types section: https://dgraph.io/docs/mutations/language-rdf-types/

This page also includes an example of using datetimes in a facet: https://dgraph.io/docs/query-language/facets/

Facets are not included in the schema at all.

For next time, I believe the maintainers of this forum would prefer if you made new topics for new questions, so others can search for your question and get the answer in the future.

Topic		Replies	Views
Cannot import with bulk loader Dgraph kind:question , area:bulk-loader	12	1118	May 12, 2022
Bulkload fails with no error message Dgraph	6	597	May 7, 2020
Schema and mutation RDF triples / bulk loader error handling Dgraph kind:question	4	377	January 13, 2021
Bulk loader fail Dgraph kind:question , dgraph	11	528	January 6, 2021
About bulk loader Users	7	1856	September 12, 2018

Cannnot find the data after bulk load

Related topics