Read skew leading to logical data corruption with tablet moves

Moved from GitHub dgraph/4543

Posted by aphyr:

What version of Dgraph are you using?

1.1.1-48-g157896305

Have you tried reproducing the issue with the latest release?

1.1.1-48-g157896305 is a few days old; I’m trying 1.1.1-65 next.

What is the hardware spec (RAM, OS)?

A 5-node EC2 cluster, running m4.larges.

Steps to reproduce the issue (command/config used to run Dgraph).

With Jepsen 9f780a144248776e0143bc9699647802a29e1d28, run

lein run test --username admin --nodes-file ~/nodes --concurrency 2n --local-binary dgraph-1.1.1-48-g15789630 --time-limit 600 -w bank --nemesis move-tablet --test-count 5

I hit this case after roughly 1800 seconds of testing.

Expected behaviour and actual result.

In this bank test run, at index 39931, the total of all accounts changes from 100 to 98, and remains at 98 for the remainder of the test. Here’s an excerpt from the history, restricted to just operations involving account 6:

{:type :invoke, :f :read, :process 52, :time 164054695072, :index 39868}
{:type :ok, :f :read, :process 68, :time 164056940521, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39869}
{:type :invoke, :f :read, :process 68, :time 164057067767, :index 39870}
{:type :fail, :f :transfer, :value {:from 6, :to 0, :amount 4}, :process 94, :time 164064944870, :error :insufficient-funds, :index 39871}
{:type :invoke, :f :transfer, :value {:from 4, :to 6, :amount 2}, :process 94, :time 164065096650, :index 39872}
{:type :fail, :f :transfer, :value {:from 1, :to 6, :amount 4}, :process 69, :time 164080718008, :error :conflict, :index 39873}
{:type :invoke, :f :read, :process 69, :time 164080860932, :index 39874}
{:type :ok, :f :read, :process 56, :time 164102955082, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39875}
{:type :fail, :f :transfer, :value {:from 6, :to 7, :amount 5}, :process 63, :time 164110308064, :error :insufficient-funds, :index 39877}
{:type :invoke, :f :read, :process 63, :time 164110476403, :index 39878}
{:type :invoke, :f :transfer, :value {:from 6, :to 5, :amount 2}, :process 50, :time 164110856399, :index 39880}
{:type :invoke, :f :read, :process 55, :time 164134503400, :index 39882}
{:type :ok, :f :read, :process 68, :time 164177129784, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39883}
{:type :invoke, :f :read, :process 68, :time 164177263552, :index 39884}
{:type :ok, :f :read, :process 69, :time 164177838798, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39885}
{:type :invoke, :f :read, :process 69, :time 164177999313, :index 39886}
{:type :fail, :f :transfer, :value {:from 6, :to 5, :amount 2}, :process 50, :time 164180105093, :error :insufficient-funds, :index 39887}
{:type :invoke, :f :read, :process 50, :time 164180226548, :index 39888}
{:type :ok, :f :read, :process 52, :time 164223467325, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39889}
{:type :invoke, :f :read, :process 52, :time 164223606112, :index 39890}
; This transfer results in account 6 going from $1 to $3
{:type :ok, :f :transfer, :value {:from 4, :to 6, :amount 2}, :process 94, :time 164231051906, :index 39891}
{:type :ok, :f :read, :process 69, :time 164243265925, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39893}
{:type :invoke, :f :transfer, :value {:from 4, :to 6, :amount 2}, :process 69, :time 164243399496, :index 39894}
{:type :ok, :f :read, :process 63, :time 164248385887, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39897}
{:type :invoke, :f :read, :process 63, :time 164248539257, :index 39898}
{:type :ok, :f :read, :process 68, :time 164250792913, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39899}
{:type :invoke, :f :read, :process 68, :time 164250949055, :index 39900}
{:type :ok, :f :read, :process 50, :time 164251226618, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39901}
{:type :invoke, :f :read, :process 50, :time 164251325868, :index 39902}
{:type :ok, :f :read, :process 55, :time 164255927121, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39903}
{:type :fail, :f :transfer, :value {:from 4, :to 6, :amount 5}, :process 67, :time 164261472791, :error :insufficient-funds, :index 39905}
{:type :ok, :f :read, :process 68, :time 164315481013, :value {0 55, 1 11, 3 4, 5 20, 6 3, 7 7}, :index 39907}
{:type :invoke, :f :read, :process 68, :time 164315611834, :index 39908}
{:type :ok, :f :read, :process 63, :time 164317140132, :value {0 55, 1 11, 3 4, 5 20, 6 3, 7 7}, :index 39909}
{:type :ok, :f :read, :process 52, :time 164387501966, :value {0 58, 1 11, 3 1, 5 20, 6 3, 7 7}, :index 39911}
{:type :invoke, :f :read, :process 52, :time 164387666283, :index 39912}
{:type :invoke, :f :read, :process 56, :time 164396547850, :index 39916}
{:type :ok, :f :read, :process 50, :time 164398498075, :value {0 55, 1 11, 3 4, 5 20, 6 3, 7 7}, :index 39917}
{:type :invoke, :f :read, :process 50, :time 164398645310, :index 39918}
{:type :ok, :f :read, :process 68, :time 164405111456, :value {0 55, 1 11, 3 4, 5 20, 6 3, 7 7}, :index 39919}
{:type :invoke, :f :read, :process 94, :time 164406008839, :index 39922}
{:type :invoke, :f :read, :process 63, :time 164475061769, :index 39928}
{:type :fail, :f :transfer, :value {:from 4, :to 6, :amount 2}, :process 69, :time 164487140702, :error :insufficient-funds, :index 39929}
{:type :invoke, :f :read, :process 69, :time 164487270518, :index 39930}
; But then account 6 abruptly changes back to 1--without an obvious transfer
{:type :ok, :f :read, :process 50, :time 164722480000, :value {0 55, 1 11, 3 4, 5 20, 6 1, 7 7}, :message {:type :wrong-total, :total 98, :span-id "SpanId{spanId=c235656ed2e9a541}", :trace-id "TraceId{traceId=2b430081259ce2267ea6a90aab280adb}"}, :error :checker-violation, :index 39931

Because there’s no obvious proximate transfer operation here, this makes me wonder if maybe this single account got rolled back somehow? Whatever the cause, the effects persisted for the rest of the test!

danielmai commented :

I ran these tests with the release/v1.1 branch (e18986f1c3feb0acfba15423827639f813eadbaf) that will be released as v1.1.2. This contains the fix from PR #4496.

Dgraph version   : v1.1.1-56-ge18986f1c
Dgraph SHA-256   : 36eecf05e86803b38a4a77da4a2410b11a7ec8d1c558d661a8024e4d0914d919
Commit SHA-1     : e18986f1c
Commit timestamp : 2020-01-11 17:49:16 -0800
Branch           : release/v1.1
Go version       : go1.13.5

This build passes 10 individual 600-second test runs of the bank test with the move-tablet nemesis.

20200112T144150.000Z.zip

aphyr commented :

It looks like this bug is either rare or wasn’t present in 1.1.1; I haven’t managed to reproduce it there in a few hours of testing.

aphyr commented :

I’ve managed to confirm this issue was present in 1.1.1 as well: here’s a failing test case. You can reproduce this with Jepsen b7c7bdc5f8476e009d591927060e7e8e786f015a by running:

lein run test-all --username admin --nodes-file ~/nodes -w bank --nemesis move-tablet --version 1.1.1 --time-limit 600 --test-count 30

It’s not as common as the null record problem we saw in #4534, but it does happen. This can’t be caused by the null record issue, because in this case, the account total rose from 100 to 102: if we’d propagated null state back into the DB via a transfer transaction, we could only lower, rather than raise, the account balance.

manishrjain commented :

Reopening based on last comment by @aphyr .

aphyr commented :

Digging into this 1.1.1 failure a little more, I think this might help shed some light on things. Process 5 completes a transfer of $2 from 6 to 3 while a tablet move is taking place. The increment of account 3 takes place, but the decrement doesn’t. Then process 5 fails with an old-timestamp error, which means something like “readTs: \d+ less than minTs: \d+ for key:”

{:type :ok, :f :transfer, :value {:from 6, :to 3, :amount 2}, :process 5, :time 263508090019, :index 58018}
{:type :invoke, :f :transfer, :value {:from 6, :to 1, :amount 1}, :process 5, :time 263508241599, :index 58019}
{:type :ok, :f :read, :process 16, :time 263508906869, :value {0 51, 2 1, 3 5, 4 11, 5 3, 6 8, 7 21}, :index 58020}
{:type :invoke, :f :transfer, :value {:from 0, :to 5, :amount 2}, :process 16, :time 263509029440, :index 58021}
{:type :ok, :f :read, :process 14, :time 263516597337, :value {0 51, 2 1, 3 5, 4 11, 5 3, 6 8, 7 21}, :index 58022}
{:type :invoke, :f :transfer, :value {:from 3, :to 5, :amount 2}, :process 14, :time 263516725225, :index 58023}
{:type :fail, :f :transfer, :value {:from 0, :to 5, :amount 2}, :process 16, :time 263592684274, :error :conflict, :index 58024}
{:type :invoke, :f :read, :process 16, :time 263592858323, :index 58025}
{:type :fail, :f :transfer, :value {:from 6, :to 1, :amount 1}, :process 5, :time 263596806299, :error :conflict, :index 58026}
{:type :invoke, :f :read, :process 5, :time 263596947851, :index 58027}
{:type :fail, :f :transfer, :value {:from 3, :to 5, :amount 2}, :process 14, :time 263790314458, :error :conflict, :index 58028}
{:type :invoke, :f :read, :process 14, :time 263790469958, :index 58029}
{:type :ok, :f :read, :process 16, :time 263833251597, :value {0 51, 2 1, 3 7, 4 11, 5 3, 6 8, 7 21}, :message {:type :wrong-total, :total 102, :span-id "SpanId{spanId=dec5e56fcb430e47}", :trace-id "TraceId{traceId=8811112f1f038549f03f4ecf10c33fae}"}, :error :checker-violation, :index 58030}
{:type :invoke, :f :read, :process 16, :time 263833381441, :index 58031}
{:type :fail, :f :transfer, :value {:from 2, :to 4, :amount 4}, :process 2, :time 263836386394, :error :insufficient-funds, :index 58032}
{:type :invoke, :f :read, :process 2, :time 263836519182, :index 58033}
{:type :info, :f :move-tablet, :value {"amount_0" [2 1], "amount_2" [2 1], "dgraph.type" [1 2], "key_2" [2 1], "type_2" [1 2]}, :process :nemesis, :time 264319082817, :index 58034}
{:type :fail, :f :read, :process 5, :time 264612363612, :error :old-timestamp, :index 58035}

aphyr commented :

After extensive testing, I haven’t managed to reproduce this on 1.1.1-57-g4abba793a.

aphyr commented :

Nope, it’s still present on 1.1.1-57-g4abba793a. I need to design a more aggressive variant of the bank test, I think; it’s taking too long for us to catch these bugs.

aphyr commented :

Haven’t seen this in 1.1.1-59-g191232226 yet, but I also haven’t given it as much time as it took to detect in 1.1.1-57-g4abba793a.