Live load kills an alpha on a 6 node cluster on docker Mac

Sankalan13 · October 20, 2020, 12:03pm

Report a Dgraph Bug

When running live load (1 million rdf dataset) on a 6 node cluster using docker-compose (3 alpha/3 zero), one of the alpha dies with error code 255 but the live load completes successfully. If the leader dies, the live load fails. This can only be reproduced in Mac OS.

What version of Dgraph are you using?

This is reproducible in all versions of dgraph.

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, OS)?

16 Gb memory, 2 core processor, Mac OS Catalina version 10.15.6 (19G2021)

Steps to reproduce the issue (command/config used to run Dgraph).

Go to dgraph/compose directory and generate the docker-compose file using ./compose -a 3 -w 3 -d ./data -w
For any version of dgraph, run make install in root directory to install the binary
Run docker-compose up -d
Apply the 1 million dataset schema found in the benchmarks repository using the command:

curl --location --request POST 'localhost:8080/alter' --header 'Content-Type: application/rdf' --data '@schema'

Run live load using the following command:

dgraph live -f 1million.rdf.gz --alpha localhost:9180 --zero localhost:5180

Let the live load run through, you will notice that one of the alpha docker container exits
The issue is reproducible when schema is passed in the dgraph live command with the -s flag, there were no memory fluctuations during the live load

Expected behaviour and actual result.

None of the alphas are supposed to exit during live load
Alpha container exits without any error logs on verbosity 3, the live load completes if a follower exits, else the live load errors out.

MichelDiz · October 20, 2020, 4:08pm

If this is running on docker, it isn’t related to macOS. You should be able to reproduce in other OS. Maybe the issue is related to the default resources available to the VM running docker. Please, share this info. So I can reproduce it.

I also use Catalina and I have tested 1million RDF several times and 21 million. And this never happened so far. I had OOM simulating low resources tho.

PS. This comment doesn’t invalidate the bug report. Just trying to find who is the guilty.

Sankalan13 · October 21, 2020, 7:59am

Thanks Michel!
Yes I did have a feeling it might be related to resource usage. Not sure if 2 GB memory and 1 GB swap is enough for 6 nodes. Was able to reproduce it in a different Mac as well. Here is my docker resource allocation:

MichelDiz · October 21, 2020, 3:13pm

I would say that 2GB is enough for a single node. Have you tried with jemalloc enabled?

Also, are you using the multiple address config? example

dgraph live -s 21.schema -f 21million.rdf.gz -a "localhost:9080,localhost:9081,localhost:9082 -z localhost:5080

Also you can use multiple Zero addresses on the Alphas. So they can keep looking for a new leader if so.

dgraph alpha -z "127.0.0.1:5080,127.0.0.1:5081,127.0.0.1:5082"

Topic		Replies	Views
Getting error while doing live loading Dgraph kind:question , dgraph	17	705	August 25, 2021
"A bigger dataset" fails to finish loading in docker on macOS Users	11	932	March 20, 2019
Lost one of alpha after stoping dgraph Dgraph	1	359	February 13, 2023
How to import data into new cluster Dgraph kind:question	19	1060	September 23, 2020
Live loader not working in docker Users	5	1378	March 5, 2020