How should I loading data

youyin123 · September 3, 2020, 4:10pm

I loading rdf data always wrong!!

I want to load a rdf data:
first, I run :

docker run --rm -it -p 8080:8080 -p 9080:9080 -p 8000:8000 -v ~/dgraph:/dgraph dgraph/standalone:v20.03.0

second, I tried run :

dgraph bulk -f 1million.rdf -s 1million.schema --map_shards=4 --reduce_shards=1 --zero=localhost:5080

then tell wrong:

\pard\intbl\itap1\pardeftab720\partightenfactor0
\cf4 \cell 
\pard\intbl\itap1\pardeftab720\partightenfactor0
\cf2 \cb3 email                : string @index(exact) @upsert .\cell \lastrow\row
} at line 1 column 1: Invalid schema. Unexpected \
github.com/dgraph-io/dgraph/lex.(*Lexer).ValidateResult
	/ext-go/1/src/github.com/dgraph-io/dgraph/lex/lexer.go:199
github.com/dgraph-io/dgraph/schema.Parse
	/ext-go/1/src/github.com/dgraph-io/dgraph/schema/parse.go:443
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.readSchema
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:168

so I tried another way:

dgraph live -f 1million.rdf

then tell wrong:

Error while mutating: Uid: [11975888102877534808] cannot be greater than lease: [10000] s.Code Unknown

I tried :

curl "http://localhost:6080/assign?what=uids&num=10000000000000000000"

this result is :

{"startId":"9997475252299124063","endId":"1550731178589572446"}(base) MacBook-Pro:~ dengjun$

then I tried agian:

dgraph live -f 1million.rdf

it is also said “cannot be greater than lease: [10000] s.Code Unknown”
呵呵哒！！
吐血了！！
some one could tell me why???
I just want to load data!!
哭

ajeet · September 3, 2020, 5:14pm

Hey @youyin123,

In the bulk loader command, you’re using port 5080, but you haven’t exposed it via Docker. You might need to add -p 5080:5080 before you’re able to access that port.

Additionally, you’re using the Dgraph standalone image, which starts up a Zero as well as an Alpha instance. The bulk loader is supposed to be run before the cluster is set up (that is, before the Alpha is started). If your cluster is already set up, you should use the Live Loader instead.

Here’s the bulk loader documentation: https://dgraph.io/docs/deploy/fast-data-loading/#bulk-loader

The error you’ve shown, though, seems to be a parse error which shouldn’t be happening unless the schema file has been modified - could you redownload the schema and check?

In the live loader command you’ve used, you’re missing the schema file (-s 1million.schema).

youyin123 · September 3, 2020, 11:31pm

@ajeet
sorry, I have no idea what’s your mean.
I’m not a profession.
could you tell me What terminal command to run, I mean starting from zero…

MichelDiz · September 5, 2020, 4:36am

If you are running docker, you should load it inside the container. You can’t do (you can, but you will complicate the whole process) a Bulkload from an external place to a “remote” (in this case, Docker is our “remote” as it lives inside a virtual machine).

What you have to do is to copy the data into the container. There are two ways to do it.

Using Docker copy docker cp | Docker Docs
e.g:

docker cp 1million.rdf mycontainer:/1million.rdf
docker cp 1million. schema mycontainer:/1million. schema

That way you have the files transferred from the host to the docker container.

Use some download tool like curl. In the case of Dgraph’s docker image, you have only curl available. You can do the following.

curl -L -o 1million.rdf.gz "https://github.com/dgraph-io/tutorial/blob/master/resources/1million.rdf.gz?raw=true"

As said Ajeet. You can’t run a Dgraph Alpha at the same time you use the bulkloader. So, you have to start the Alpha after the load. Also you probably know that the bulkload separates the posting list files at out/${a number here starting from zero}/p/*. You have or use this path or move the p directory to a path of your choice. And then start the Alpha pointing to that path.

If you wanna make things easy, you should use Liveloader instead. Using liveloader you can just start you cluster and load the data “remotely”. From the host to the container pointing to the localhost or (in case of a Virtual Machine) to the VM IP.

If your docker is native and runs at localhost

dgraph live -a localhost:9080 -z localhost:5080 -f 1million.rdf -s 1million. schema

If your docker isn’t native, you have to find its IP. In general, is something like 192.168.99.100

dgraph live -a 192.168.99.100:9080 -z 192.168.99.100:5080 -f 1million.rdf -s 1million. schema

This is probably happening due to some confusion about one of the tools (live and bulk). If you use bulkloader with a zero that already has been used, this error may happen. You should start from scratch in both cases. Unless you are used to Dgraph you can try other ways of loading data with an already running cluster.

Cheers.

youyin123 · September 6, 2020, 4:27am

@MichelDiz
thank you!
finally I used curl -H “Content-Type: application/rdf” “localhost:8080/mutate?commitNow=true” -XPOST -d $’ to load data, it worked!

but now I have another question:
when I load data, may have 1000000 rows, I don’t know exact rows, anyway load failed.
but when I load 10000 rows , it worked… why???

may be I load too much data, now I can not run dgraph.
I used

docker run --rm -it -p 8080:8080 -p 9080:9080 -p 8000:8000 -v ~/dgraph:/dgraph dgraph/standalone:v20.03.0

to run dgraph, but there have error:

time" > > tablets:<key:"starring" value:<group_id:1 predicate:"starring" > > tablets:<key:"studyAt" value:<group_id:1 predicate:"studyAt" > > tablets:<key:"tag_name" value:<group_id:1 predicate:"tag_name" > > tablets:<key:"tagged" value:<group_id:1 predicate:"tagged" > > tablets:<key:"tagline" value:<group_id:1 predicate:"tagline" > > tablets:<key:"title" value:<group_id:1 predicate:"title" > > tablets:<key:"type" value:<group_id:1 predicate:"type" > > tablets:<key:"url" value:<group_id:1 predicate:"url" > > tablets:<key:"workAt" value:<group_id:1 predicate:"workAt" > > tablets:<key:"workFrom" value:<group_id:1 predicate:"workFrom" > > snapshot_ts:392 checksum:17010297955161069008 > > zeros:<key:1 value:<id:1 addr:"localhost:5080" leader:true > > maxLeaseId:18446055125929158704 maxTxnTs:270000 maxRaftId:1 cid:"e327aa51-9f33-40af-9a52-3dd9e1bfaa73" license:<maxNodes:18446744073709551615 expiryTs:1601690748 enabled:true > 
I0906 04:21:04.080012      14 log.go:34] 1 is starting a new election at term 31
I0906 04:21:04.080052      14 log.go:34] 1 became pre-candidate at term 31
I0906 04:21:04.080059      14 log.go:34] 1 received MsgPreVoteResp from 1 at term 31
I0906 04:21:04.080209      14 log.go:34] 1 became candidate at term 32
I0906 04:21:04.080243      14 log.go:34] 1 received MsgVoteResp from 1 at term 32
I0906 04:21:04.080571      14 log.go:34] 1 became leader at term 32
I0906 04:21:04.080734      14 log.go:34] raft.node: 1 elected leader 1 at term 32
I0906 04:21:04.080919      14 groups.go:856] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
I0906 04:21:04.080950      14 groups.go:865] Got Zero leader: localhost:5080
W0906 04:21:04.082166      14 draft.go:1146] Raft.Ready took too long to process: Timer Total: 5.162s. Breakdown: [{advance 5.162s} {disk 0s} {proposals 0s}] Num entries: 0. MustSync: false
W0906 04:21:04.348876      14 draft.go:1146] Raft.Ready took too long to process: Timer Total: 219ms. Breakdown: [{proposals 219ms} {disk 1ms} {sync 0s} {advance 0s}] Num entries: 1. MustSync: true
W0906 04:21:04.647158      14 draft.go:1146] Raft.Ready took too long to process: Timer Total: 274ms. Breakdown: [{proposals 274ms} {disk 0s} {advance 0s}] Num entries: 0. MustSync: false

I restart the computer. It doesn’t work.
do you know why???

MichelDiz · September 6, 2020, 4:34am

We recommend for clusters with few resources, that you create batches transactions between 1k and 5k n-triples. 1 million is too much for a single transaction. If you use cURL, you will end up creating a huge transaction that your instances can’t handle.

You could do this if you have a lot of resources or if you use ludicrous mode. Otherwise, please, use the liveloader with the default values.

Cheers.

PS

Feels like you’re not exposing the Zero port 5080. You need this port to use in the liveload remotely.

youyin123 · September 6, 2020, 7:23am

thank you

Topic		Replies	Views
Trouble loading 1M dataset Dgraph	2	740	October 21, 2018
How to load the rdf data? Users	7	1468	January 27, 2018
Init loading data and upsert Dgraph area:bulk-loader , area:live-loader	0	73	June 21, 2024
"A bigger dataset" fails to finish loading in docker on macOS Users	11	932	March 20, 2019
Creating Schema and loading data Dgraph kind:question	18	1598	July 23, 2021

How should I loading data

Related topics