ogreso
(Ogre)
May 23, 2018, 2:13am
1
When I import data using dgraph bulk, there are the following errors in the TMP directory about data 2TB. What is the problem?
MAP 04m04s rdf_count:35.61M rdf_speed:145.9k/sec edge_count:618.2M edge_speed:2.532M/sec
MAP 04m05s rdf_count:35.68M rdf_speed:145.5k/sec edge_count:619.6M edge_speed:2.527M/sec
MAP 04m06s rdf_count:35.86M rdf_speed:145.7k/sec edge_count:622.9M edge_speed:2.530M/sec
MAP 04m07s rdf_count:36.03M rdf_speed:145.8k/sec edge_count:626.2M edge_speed:2.533M/sec
...skipping...
/home/travis/gopath/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:44 +0xf1
goroutine 5275814 [chan send]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xca713ab980, 0x21, 0xc8962751a0)
/home/travis/gopath/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:95 +0x359
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
/home/travis/gopath/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:44 +0xf1
goroutine 5275815 [chan send]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readMapOutput(0xca713ab9e0, 0x21, 0xc896275200)
/home/travis/gopath/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:95 +0x359
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shuffler).run.func1
/home/travis/gopath/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/shuffle.go:44 +0xf1
rax 0x0
rbx 0x7f84a2b2b868
rcx 0xffffffffffffffff
rdx 0x6
rdi 0x44d5
rsi 0x44d6
rbp 0x145a8de
rsp 0x7f84a2762928
r8 0xa
r9 0x7f84a2763700
r10 0x8
r11 0x202
r12 0x7f82c80008c0
r13 0xf1
r14 0x11
r15 0x0
rip 0x7f84a279a277
rflags 0x202
cs 0x33
fs 0x0
gs 0x0
@mrjn @MichelDiz
ogreso
(Ogre)
May 24, 2018, 11:03am
2
Can anyone know what’s going on ?
MichelDiz
(Michel Diz)
May 25, 2018, 10:53pm
3
Sorry for the delay, please share more information.
Your specs, settings used, commands and etc.
How many shuffles are you using in bulk load? Do you have enough memory?
ogreso
(Ogre)
May 28, 2018, 2:26am
4
Hard disk capacity 12TB
Memory 64G
data:171GB
data exp:
<EB855DE> <Name> "0991j@**.com" .
<EB855DE> <Email> "0991j@**.com" .
<EB855DE> <EmailPassword> "******" .
<EB85BEF> <Name> "0992j@**.com" .
<EB85BEF> <Email> "0992j@**.com" .
<EB85BEF> <EmailPassword> "******" .
<EB85C07> <Name> "0993jb@**.com" .
<EB85C07> <Email> "0993jb@**.com" .
<EB85C07> <EmailPassword> "******" .
<EB85C0E> <Name> "0994jb@**.com" .
<EB85C0E> <Email> "0994jb@**.com" .
<EB85C0E> <EmailPassword> "******" .
schema
Name: string @index(term,fulltext,trigram) .
Email: string @index(term,fulltext,trigram) .
EmailPassword: string .
command
dgraph bulk -r all.rdf -s goldendata.schema --http localhost:8000 --zero=localhost:5080
MichelDiz
(Michel Diz)
May 28, 2018, 4:48am
5
Thanks for the information, I’ll try to investigate and warn an engineer about. I can not promise a specific result of my part because I have no ability with the Dgraph Core code. And I do not have access to your material to dig. (and it’s pretty big)
PS. Ah, please let me know how you are running the “Dgraph Zero”.
I can see where the problem happens, but I do not understand if it’s from Dgraph (It seems unlikely) or it could be a corrupted file problem (your RDF).
At first I thought it might be some capacity/specs problem. Dgraph needs at least two and a half free disk space for temporary files. And also increasing the shuffles can occur high memory usage and processing. A configuration error there could corrupt Load (in theory).
Could you build a code in Python to split your RDF into smaller pieces? (I do not know if there is a tool for this already available, but it is a tip) so it would be easier to find out where the problem comes from. In case of corruption.
And you could normally do Bulk by chunks.
But I’ll see who can give you a light.
Cheers.
ogreso
(Ogre)
May 28, 2018, 6:21am
6
I run dgraph zero just like doc
nohup dgraph zero > zero.log 2>&1 &
I’ve split the files into small pieces, and then imported them using dgraph live
.
but too slow,Import 2G data,It has been used for 10 hours。
MichelDiz
(Michel Diz)
May 28, 2018, 3:38pm
7
You can import small parts (file by file) via bulk load, as long as you never run the servers before completing all loads.
system
(system)
Closed
June 27, 2018, 3:38pm
8
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.