Hi. I’m running into a problem with my DGraph server, where it keeps freezing up for no apparent reason. I’m running it in a Docker container on a local host machine, and I have a client that interacts with it over HTTP.
Earlier, I was committing all transactions immediately by setting X-Dgraph-CommitNow in the header. Now, I am adding batch operations to my client. I maintain the transaction ID and list of transaction keys to commit later. In the process of developing this, sometimes I don’t commit transactions properly. For example, I might not send all the keys or not use the right start_ts in the URL. After a transaction that doesn’t get properly committed, my DGraph server stops responding to HTTP requests from my client. I need to destroy my Docker container and re-build to get it working again.
It might solve the problem if I told DGraph to clear the buffer of all uncommitted transactions, but I can’t figure out how to do that. I’m also not sure it’ll fix things. Any ideas on what I should do?
This might be related to snapshot not happening and the data is kept in memory leading to OOM. Though, I have to see the server logs to confirm. Can you share the logs as well as details about your setup? Dgraph server has an automatic mechanism to abort long pending transactions so that should kick in at some point and free up the memory. Also, what language are you programming in? Maybe you could use one of the official clients and abort/commit your transactions.
I am running DGraph from the Docker image dgraph/dgraph:v1.0.4. My client is written in Python.
I’ve checked the DGraph server logs and the DGraph zero logs and neither is showing any errors. Maybe I need to start them in debug mode to capture more logging?
I’ve also narrowed down the problem. It is only mutation transactions that delete a node that end up hanging. Mutation transactions that add a node or change a node attribute/edge do not hang.
Here is exactly what causes the issue:
My client instantiates a new transaction_id based on current timestamp in milliseconds.
I submit a delete mutation to the endpoint /mutate/{transaction_id}. I extract the keys from the response.
I submit the keys returned from (3) to the endpoint /commit/{transaction_id}.
The server does not respond.
The same steps with a mutation that adds a node or sets a node attribute works fine. Note I am not submitting any lin_read parameters. According to the docs, since I have a single-server setup with no replication, I should not have to do that.
For now, I’ve gotten around the problem by submitting all delete mutations with the “X-Dgraph-CommitNow” header but it would be nice to get to the bottom of this issue.
EDIT: To be clear, in my original post I said this was due to transactions that I did not commit properly. The reason I thought that was because I was getting errors from the server when trying to resubmit failed transactions. I think that was actually because these were duplicate transactions submitted with a different start_ts parameter.
Based on the steps to reproduce above, this now actually seems to be a bug.
Transaction id(StartTs) should be allocated by Dgraph server. Why does the client instantiate it?
Can you try with the official Python client and dgraph/dgraph:v1.0.5 and see if the issue is reproducible? If it is, then please file a bug on Github with steps to reproduce.