“Transaction has been aborted” occurred because of dfrat ts


Report a Dgraph Client Bug

Transactions has been anborted after dgraph is started for a period of time.

Some there info:
I add some info to source code. And found the reason is:

What are the possible reasons for this problem? Is there any way to solve this problem?

What Dgraph client (and version) are you using?

(put “x” in the box to select)

  • Dgo
  • PyDgraph
  • Dgraph4J
  • Dgraph-js
  • Dgraph-js-http
  • Dgraph.NET

Version:
V230

What version of Dgraph are you using?

v23.1.0

Have you tried reproducing the issue with the latest release?

What is the hardware spec (RAM, OS)?

Steps to reproduce the issue (command/config used to run Dgraph).

Expected behaviour and actual result.

Typically this message from a client means that concurrent modifications were made, so one of them was aborted. There are a couple ways to address this:

First, be aware that an abort error is retry-able and the intent is for the client to re-submit or not based on business requirements (e.g. retry if it is ok to write the data even though involved data changed on the server during the update).

Second, if this is happening a lot, review your code to see if the same values on the same entities (values or edges) are being updated by multiple concurrent threads. The aborts will be more frequent if the server is overloaded or slow, or transactions are very large, because those situations extend the time when transactions overlap and one may be rejected. So also monitor the server for CPU or IO overload and transaction durations.

There should be no conflict with the increment command. This situation seems to occur after doing a lot of “Alter” schema operations.

I am not deeply familiar with the processes that occur after an alter command, but particularly if you changed indexes there will be a period of updates to indexes. If your incoming transactional requests conflict with those updates, that might (unsure) cause a transaction conflict where the same index structure is updated both by the reindex process and by your transaction (e.g. increment), resulting in a conflict and an abort.

I would start by correlating the abort timestamps with Dgraph logs which may show relevant updates that take place in the background after an alter command. Is there substantial overlap with some process? Do the logs tell you which activity and predicate or index is involved?

When Oracle performs conflict detection, it has not yet started predicate detection, and conflicts has occurred directly when checking the timestamp.

It’s strange. CheckpointTs of snapshot under stand-alone deployment > zero.server.next[Num_TXN_TS]

I have not seen this before, but the code comment suggests that the zero that is checking for conflicting transactions is unable to ensure consistency because it became the leader during the transaction.

If the issue is due to the zero leader changing, you should see logs about the “election” “candidates” and “became leader” for all zeros. If the leadership is changing, other than very rarely, that indicates a serious problem where nodes cannot talk to each other, or are too overloaded to process the heartbeats they use to monitor one another.