We are reporting snapshot isolation anomalies found in DGraph. In particular, Dgraph transactions have seen overwritten values.
Report a Dgraph Bug
What version of Dgraph are you using?
Dgraph version : v21.12.0
Dgraph codename : zion
Dgraph SHA-256 : 078c75df9fa1057447c8c8afc10ea57cb0a29dfb22f9e61d8c334882b4b4eb37
Commit SHA-1 : d62ed5f15
Commit timestamp : 2021-12-02 21:20:09 +0530
Branch : HEAD
Go version : go1.17.3
jemalloc enabled : true
Have you tried reproducing the issue with the latest release?
Yes.
What is the hardware spec (RAM, OS)?
-
Spec: Aliyun ecs.c6e.large cloud VM
-
OS: Ubuntu 20.04 LTS
-
Environment: Docker dgraph/dgraph:latest
-
RAM: 4G
Steps to reproduce the issue (command/config used to run Dgraph).
We are using the docker-compose to run dgraph [1] (see the code block at the end for reference).
We setup the database to simulate a key-value store, using uid as the key [2].
val: int .
type KV {
val
}
The initial values are first inserted into the database [3]. We then spawn a number of threads (sessions) to do random reads and writes to the database, and record the results [4]. The values written to a single uid are unique. We then use a verifier [5] to check where are violations of snapshot isolation (SI) in the results.
The complete script to run the tests can be found at [6].
Please note that the chance of reproducing this varies on deployment: On the cloud VM listed in the hardware spec. section, the anomaly occurs almost every run, while it takes about 30 mins to find an anomaly on a laptop with 16GB memory and a 6-core CPU.
Expected behaviour and actual result.
As per the docs [7], dgraph should support snapshot isolation, and all commits before a transaction should be visible to it. However, we have found violations of SI in our tests. An instance is shown below:
Each transaction in this graph is identified by a pair (session id, transaction id)
. Transactions with smaller ids are executed before those with bigger ones in a session. We use R(uid, value)
and W(uid, value)
to denote reads and writes. The start_ts of each transaction is also included in the graph. The edges in this graph means the ordering of transactions. There are session orders (SO, because transactions in a session are executed one after another), write-read order (WR, which means a value is written by one transaction and read by another), and write-write order (WW, two transactions have written to the same uid, so they can not execute concurrently under SI due to write conflict)
In this graph, transaction (9, 249)
reads uid=457, value=2
, which is written by (4, 167)
and (10, 471)
. Regardless of which one commits first, this constitutes a violation of SI (shown in the graph as the two cycles). Note that it’s not possible for (9, 429)
to start before (10, 471)
commits because there is a path (10, 471) -> (10, 471) -> (1, 43) -> (9, 429)
. Judging by their timestamps, it appears that (10, 471)
should have overwritten the value of uid=457
, but the stale value is read by (9, 249)
.
The database logs and dump are attached in [8].
[1]: https://github.com/amnore/dbcop/blob/master/docker/dgraph/docker-compose.yml
[2]: https://github.com/amnore/dbcop/blob/d7f5e745ec0d24d259abaec7fbd1465ba588573b/examples/dgraph.rs#L105
[3]: https://github.com/amnore/dbcop/blob/a3bf2ea810de088d6057eec8d9f4b083d4085f57/examples/dgraph.rs#L122
[4]: https://github.com/amnore/dbcop/blob/d7f5e745ec0d24d259abaec7fbd1465ba588573b/examples/dgraph.rs#L56
[5]: https://github.com/amnore/CobraVerifier
[6]: https://github.com/amnore/dbcop/blob/master/script/test-dgraph.sh
[7]: https://dgraph.io/docs/design-concepts/consistency-model/#sidebar
[8]: https://1drv.ms/u/s!Ao9rNU5eah0xqlL8BM5yq_LILUs6