Getting error while doing live loading

Sagar_Choudhary · August 8, 2021, 1:16pm

What I want to do

I want to do live loading in a single node cluster. although I did the same thing many times this time I am getting an error.
below are the steps I am following.
Live Load:-
Step 1:- start Zero node.

./dgraph zero --my=localhost:5080
Step 2:- start Alpha node.
./dgraph alpha --my=localhost:7080 --zero=localhost:5080
Step 3:- Live load the data in the Dgraph.
./dgraph live -f rdf-file-path -s schema-file-path -zero=localhost:5080 -a=localhost:9080

after live loading, it starts processing files but after 1-2 mins of processing. it gets killed automatically. even didn’t get any error on the terminal. please refer attached screenshot

I tried the same steps on the 2-3 different servers but getting the same issue.

MichelDiz · August 8, 2021, 6:16pm

Can you try with half of the dataset? Have you checked the limits of opened files in the OS?

Sagar_Choudhary · August 9, 2021, 2:47am

yes, I have checked the limit also. earlier I got the error: too many files open then I increase the limit from 1024 to 10 lakh as well.
I have already completed bulk loading with the same dataset and it is working fine. but I am getting this issue while doing the live load.

Sagar_Choudhary · August 12, 2021, 4:20am

Can anyone please help.

MichelDiz · August 12, 2021, 2:23pm

Please share the dataset so I can try to reproduce.

Sagar_Choudhary · August 12, 2021, 2:46pm

The size of dataset is 256 gb and its confidential. I can’t share.
please give any suggestion.

MichelDiz · August 12, 2021, 2:56pm

What are the stats of your machine? looks like there is your problem.

Sagar_Choudhary · August 13, 2021, 4:15am

1024 GB disc size and 128gb ram.
I have completed bulk load with the same data on the same configuration machine.

MichelDiz · August 13, 2021, 3:06pm

No Alpha and Zero logs? Without a way to reproduce and collect data, we can’t find a solution.

Give us some debug logs https://dgraph.io/docs/howto/retrieving-debug-information/#sidebar - Even tho, I think we need a way to reproduce it.

Sagar_Choudhary · August 14, 2021, 3:47pm

Actually live loading is consuming 124 gb memory and it killed. let me know one thing if we live load data from a folder of rdf file i.e 256gb and if we live load data from a rdf.gz file i.e 24 gb. will there be any difference in live loading or both the ways are same.

because if a live load data from zip file i.e rdf.gz format then it start loading data i.e elapsed. please refer below screenshot. it will not consume much memory only 4 to 5 gb memory has been taken.

and if I live load data from a directory of rdf files then it starts processing in different way. first it showing processing data. please refer below screenshot.
it takes almost all the memory of my system and killed the system after consuming all the memory. in my case I have 124 GiB server. and it took all 124 gb memory and kill the process.

please let me know the difference in both the approaches. why it is taking 124GiB memory while loading data from a folder of RDF file and if read data from rdf.gz format then it takes much less memory. only 4-5 GiB memory.

MichelDiz · August 18, 2021, 11:42pm

It might be related to the concurrency of loading multiple files at once. In the last case.

Can you give us a memory profile? https://dgraph.io/docs/howto/retrieving-debug-information/#memory-profile

Sagar_Choudhary · August 20, 2021, 8:33am

Hi,
yes, this is the concurrency issue.
can you help me in compressing the data like if I have 256 gb of rdf file in a directory my_data. How can I compress this data in rdf.gz format. eg:- my_data.rdf.gz
it will solve my problem because this compressed data can be easily live-loaded.

MichelDiz · August 20, 2021, 5:15pm

Just merge all files in a single RDF and then gzip it normally.

Sagar_Choudhary · August 25, 2021, 1:18pm

can you please suggest how to merge all files in a single RDF?

MichelDiz · August 25, 2021, 1:35pm

In general, we do a custom code for this, but you can use tools like grep, sed, or cat * >

Sagar_Choudhary · August 25, 2021, 1:55pm

that’s great! Thankyou
it would be really helpful if you can provide any link for reference. or any example for merging 2-3 rdf files.

Sagar_Choudhary · August 25, 2021, 2:08pm

cat * >
this command is working for less no of files but I have 1 million files. need to merge them in a single rdf.

need to know 1 more thing If I load files direclty from directory them ot shows processing and I have a zip file when I upload that file then it shlows elapsed. what is it.
please refer below screenshot.

MichelDiz · August 25, 2021, 2:51pm

Not sure about your question. But the work “elapsed” means “elapsed time”, it is just a time count.

I know that we have several code spread across repositories. This one is an example of code benchmarks/convert/main.go at master · dgraph-io/benchmarks · GitHub But I guess this converts from CVS to RDF. Or from a custom dataset(From Google) to RDF.

You can create your own code for this case based on that principles in that code.

Topic		Replies	Views
Dgraph crashed during live loading using dgraph live and unable to start the db Dgraph	12	788	February 24, 2019
Live load kills an alpha on a 6 node cluster on docker Mac Dgraph kind:bug	3	727	October 21, 2020
Bulk loader fail Dgraph kind:question , dgraph	11	528	January 6, 2021
Production instance is taking entire load for cluster Users	8	685	November 21, 2019
Dgraph live load crashing after few min Dgraph	11	710	March 23, 2021

Getting error while doing live loading

What I want to do

Related topics