Transactions can read stale values

Report a Dgraph Bug

What version of Dgraph are you using?

Dgraph Version
$ dgraph version
 
Dgraph version   : v21.12.0
Dgraph codename  : zion
Dgraph SHA-256   : 078c75df9fa1057447c8c8afc10ea57cb0a29dfb22f9e61d8c334882b4b4eb37
Commit SHA-1     : d62ed5f15
Commit timestamp : 2021-12-02 21:20:09 +0530
Branch           : HEAD
Go version       : go1.17.3
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.

Have you tried reproducing the issue with the latest release?

Yes.

What is the hardware spec (RAM, OS)?

  • Spec: Aliyun ecs.c6e.large cloud VM
  • OS: Ubuntu 20.04 LTS
  • Environment: Docker dgraph/dgraph:latest
  • RAM: 4G

Steps to reproduce the issue (command/config used to run Dgraph).

Note: due to the restriction that new users can only put two links, I have to put all links in a code block in the bottom, referenced by the number in the square brackets.

We are using the docker-compose to run dgraph [1].

We setup the database to simulate a key-value store, using uid as the key [2].

val: int .

type KV {
  val
}

The initial values are first inserted into the database [3]. We then spawn a
number of threads (sessions) to do random reads and writes to the database, and
record the results [4]. The values written to a single uid are unique.

We then use a verifier [5] to check where are violations of SI in the results.
The complete script to run the tests can be found at [6].

Please note that the chance of reproducing this varies heavily on different
mechines. On the cloud VM listed in the hardware spec. section, the anomaly
occurs almost every run, while it takes about 30 mins to find an anomaly on a
laptop with 16GB memory and a 6-core CPU in our tests.

Expected behaviour and actual result.

As per the docs [7], dgraph should support snapshot isolation, and all commits
before a transaction should be visible to it. However, we have found violations
of SI in our tests. An instance is shown below:

dgraph-violation-2.drawio

Each transaction in this graph is identified by a pair (session id, transaction id). We use R(uid, value) and W(uid, value) to denote reads and writes. The
start_ts of each transaction is also included in the graph.

The edges in this graph means the ordering of transactions. There are session
orders (SO, because transactions in a session are executed one after another),
write-read order (WR, which means a value is written by one transaction and read
by another), and write-write order (WW, two transactions have written to the
same uid, so they can not commit concurrently under SI)

In this graph, transaction (9, 249) reads uid=457, value=2, which is written
by (4, 167) and (10, 471). Regardless of which one commits first, this
constitutes a violation of SI (shown in the graph as the two cycles).

Judging by their timestamps, it appears that (10, 471) should have overwritten
the value of uid=457, but the stale value is read by (9, 249).

The database logs and dump are attached in [8].

Is this expected? Am I missing something in the docs?

[1]: https://github.com/amnore/dbcop/blob/master/docker/dgraph/docker-compose.yml
[2]: https://github.com/amnore/dbcop/blob/d7f5e745ec0d24d259abaec7fbd1465ba588573b/examples/dgraph.rs#L105
[3]: https://github.com/amnore/dbcop/blob/a3bf2ea810de088d6057eec8d9f4b083d4085f57/examples/dgraph.rs#L122
[4]: https://github.com/amnore/dbcop/blob/d7f5e745ec0d24d259abaec7fbd1465ba588573b/examples/dgraph.rs#L56
[5]: https://github.com/amnore/CobraVerifier
[6]: https://github.com/amnore/dbcop/blob/master/script/test-dgraph.sh
[7]: https://dgraph.io/docs/design-concepts/consistency-model/#sidebar
[8]: https://1drv.ms/u/s!Ao9rNU5eah0xqlL8BM5yq_LILUs6