Moved from GitHub dgraph/4546
Posted by aphyr:
What version of Dgraph are you using?
1.1.1-65-g2851e2d9a.
Have you tried reproducing the issue with the latest release?
Yes, 1.1.1-65-g2851e2d9a is the latest development release.
What is the hardware spec (RAM, OS)?
A 5-node EC2 m4.large cluster.
Steps to reproduce the issue (command/config used to run Dgraph).
With Jepsen 0b6b5f858ae04053a014c4fea1c21b7b762db2d2, try
lein run test --username admin --nodes-file ~/nodes --concurrency 2n --local-binary dgraph-1.1.1-65-g2851e2d9a --time-limit 600 -w bank --test-count 5
Expected behaviour and actual result.
In bank tests, even without any nemesis activity, Dgraph appears to return null for some (but not all) values in read queries. Nulls appear randomly distributed through time. For instance, take this test run, where in roughly 600 seconds, we observed 52 nulls (45 in reads, 7 in transfers). Because transfers crash when given a null balance for an account, this bug did not manifest as logical state corruption; only transient read errors.
For instance, here, account 1’s key came back null, rather than 1:
{:type :ok, :f :read, :process 14, :time 31349508708, :value {0 21, 1 5, 2 4, 3 33, 4 9, 5 7, 6 7, 7 14}, :index 9220}
{:type :invoke, :f :read, :process 14, :time 31349651962, :index 9221}
{:type :ok, :f :read, :process 10, :time 31350933004, :value {nil 5, 0 21, 2 4, 3 33, 4 9, 5 7, 6 7, 7 14}, :message {:type :unexpected-key, :unexpected (nil), :span-id "SpanId{spanId=bb30948dedadace5}", :trace-id "TraceId{traceId=8511be855f530c9ed1a36e3f707fe7ee}"}, :error :checker-violation, :index 9222}
Here, account 1 had a null balance:
{:type :ok, :f :read, :process 21, :time 72493654542, :value {0 12, 1 nil, 2 42, 3 6, 4 3, 5 1, 6 20, 7 5}, :message {:type :nil-balance, :nils {1 nil}, :span-id "SpanId{spanId=450a9e72f6296204}", :trace-id "TraceId{traceId=c180b44ac232ce5629e9472c5cb21ca3}"}, :error :checker-violation, :index 24694}