DGraph Times Out Processing Graph

jm4games commented :

So I changed the graph.gz to be a single mutate and no change occurred (so far). If this is the expected behavior then I think I might just explore some other options. Still a cool product, just might not yet be up for my use cases.

MichelDiz commented :

Hey, btw. Don’t use @upsert in your schema. This has another purpose. Its an old way of doing upserts. It doesn’t do anything with your context.

jm4games commented :

@MichelDiz Not sure if you want to do anything with this issue. I feel like there is a bug here, but I can close it if you think otherwise. For comparison I finally loaded my graph into another graph db (arango db) and It had a much easier time with this data set. Load time ~3min, but probably ignore that since the way data is loaded is fundamentally different. The final residual memory was ~3GB after a full ingest. So maybe golang is just really relaxed about releasing memory back to the OS?

MichelDiz commented :

This week we start investigating this and other related things. We have some engineers trying to replicate similar experiences and comparing with other DBs.

Tell me, is there an equivalent for Upsert in ArangoDB? have docs about it? Did the upload take 3 minutes via some live upload? By the size of your dataset Bulkload would load it within seconds.

We are investigating this. I’ve been investigating since last week and now the cavalry has arrived. We gonna have some north on this.

Let’s keep it open. I think someone is tracking your issue too.

Cheers.

jm4games commented :

The upload into arango db was live. They support a bulk upload format for large data sets. And it lets you do the bulk upload at anytime into existing graphs. I could try the dgraph bulk loader as a one off to see if it makes any difference, but that doesn’t serve my end purpose since I will be uploading many similar sized data sets continuously. The data set I attached to this issue does have 3m+ edges, so its non-trivial fwiw. I think my bigger concern for this issue would be the memory outstanding after upload. Even if it takes 40mins, I would expect a much smaller active memory set after.

MichelDiz commented :

Hum, I’ll check this tomorrow.

By the top, it is not recommended to have such a large transaction via a client (in your case HTTP). If you take the code from LiveLoad as an example, you will see that it splits the dataset into smaller parts and multiple transactions. Making the load smoother. But you wanna use upsert, this could be done but differently if you go with Liveload.

I would recommend that you make a pipeline with Liveload and your code/env. If you have multiple instances of Dgraph, I recommend adding the URLs of all of them in the process (you can put them separated by commas in the -a flag). So LiveLoad balances for you.

You could eventually (if you have multiple Alpha instances), if using HTTP only, use a load balancer in front of your instances. I have a code for this with Traefik and Dgraph, I just need to find where is it, to share it.

MichelDiz commented :

hey @jm4games just to update you. Here the LB with Dgraph if you need GitHub - OpenDgraph/ItisTimetoReproduce: This is a personal repository with tests to be reproduced.

It shouldn’t happen simply because of the upsert

There’s a set of other issues that covers this one.
I doesn’t solve it, but will be useful to debug in the future.

https://github.com/dgraph-io/dgraph/issues/4048