Running Jepsen tests in TeamCity

Current state

In master, all the Jepsen tests are passing. However, tests sometimes cannot start due to issues related to cluster setup (e.g running apt-get update fails). Usually, retrying these tests fixes the issue but it’s annoying to do and it means that not all tests are guaranteed to run. Fortunately, none of these failures seem related to Dgraph itself and they mostly happen at the beginning of the test, which means retrying is not very time consuming.

Kyle said some of these issues are fixed in the latest Jepsen master but when I tried to merge the newest changes I have run into other issues issues (for example Incomplete tests in Dgraph test suite. · Issue #451 · jepsen-io/jepsen · GitHub). There has also been some refactoring going on which broke the Dgraph test suite. These issues should be addressed eventually but given that aside from the incomplete tests the tests run fine, I think it’s better to try to run the tests as they are right now.

Proposed solution

While manually running the tests I have found that running the tests in a fresh cluster decreases the amount of flaky tests so I have made two changes to the Jepsen script in contrib/jepsen:

  1. Added a new command line option to destroy and create the cluster before each test is run.
  2. Retry incomplete tests. There’s already a way to tell incomplete and failing tests apart and my changes are taking advantage of that. Failing tests are not retried.

When I ran the full test suite with those changes, only three tests (out of 36) were incomplete and all of them succeeded after only one retry.

I think retrying the incomplete tests is a more robust solution than trying to fix every possible cause of flakiness. We don’t have a lot of experience with the internals of Jepsen, our fixes won’t be exhaustive, and the fixes need to be merged into Jepsen, which can be a slow process.

PR with the changes to our Jepsen tool: test: Deal with incomplete tests in Jepsen tool by martinmr · Pull Request #5804 · dgraph-io/dgraph · GitHub

Running the tests

The next step would be running the tests in TeamCity. There are 36 total tests so I don’t think running every one for each PR is feasible. I propose the following.

  1. Running the full suite of tests nightly.
  2. Running a small set of tests (no more than four or five) for each PR. This should serve as a basic sanity check.

The tests are independent of each other so they could be sharded and run by multiple agents.

@dmai @joaquin @vvbalaji: thoughts on this?

1 Like

Thanks Martin.
Do we know how much time running those 36 tests take? If it is few minutes, we can do for each PR. Otherwise, If we have to pick a subset, can you propose which ones to run for every PR?

Also, same question for release branches such as v20.03

@martinmr: thanks for the writeup.

@Paras: a full jepsen run takes about 7 hrs: and it would make sense to run it once a day

@joaquin: do we have any provision to get alerted if a nightly test fails? would it be simple to enable an option for weekly tests? Could be an option for the load/stress tests that we are running on a adhoc basis

A Jepsen test can run for an arbitrary amount of time. Each test should probably run for 10 minutes at the very least.

The current TeamCity task runs all the tests sequentially but they can be parallelized.



Notifications are setup per user account under My Settings & Tools for the email notifier. Would we want to blast this out to a generic email address? Also, future versions of TC have a Slack Notifier, so we could sent to a channel, if that would be of interests.

Simplicity: email, can setup now with a few clicks, for generic account, may need to work w/ gsuite admin (@dmai). Slack notifications more complex, need to upgrade our TC.

Ad-Hoc Basis

For ad-hoc basis, of course. A human click-on-button trigger is supported by default with the Run button. You can even expose parameters (with defaults) in the build configurations, should you want to run a test with different options, e.g. n node cluster with n shards, etc. Typically, we start off by running these manually ad-hoc basis, then once satisfied, create a trigger (schedule or github event).