Getting "Transaction has been aborted. Please retry." Far Too Often

We have a highly parallel system, with 2 clients accepting many messages from a pub sub stream. When we get a client who authenticates with our system, we get alot of data very quickly. It so happens that the said client will have a Profile ID that is frequently read -> written/updated all in parallel. I have Ignore Index Conflict in the client on every call we make, and no @upsert directive on any predicate. We have a retry system in place that handles this, but why am I getting soooo many “Transaction has been aborted. Please retry.” messages? We have 3 servers running, and one dgraph zero. We are on version 1.0.4. I have tried bulking our mutations, and doing them smaller. Bulk seems to actually produce less of these warnings, but still, multiple per few seconds. How can I reduce these errors, and is this retry logic planned to be implemented on the server side at some point?

Also, now I am getting --> rpc error: code = Unknown desc = Conflicts with pending transaction. Please abort.

after seeing the above error for some time, it looks like dgraph died --> Mar 22 20:43:26 dev-dgraph-master-03.c.charlotte-161616.internal docker[1431]: 2018/03/22 20:43:26 pool.go:168: Echo error from dev-dgraph-master-01:7080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure

Could you create an issue for this on Github? If others want it too, we can prioritize this.

Do you have any logs from the crash? Also, I would suggest using the nightly binary as it contains bug fixes on top of the last release.

1 Like

I don’t have the logs prior to it dying, but I was seeing alot of “couldn’t take snapshot”. Pretty much constantly.

Ok, please move to the nightly and let me know if you still see the Couldn't take snapshot message. It should happen rarely on the nightly.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.