Moved from GitHub dgraph/4543
Posted by aphyr:
What version of Dgraph are you using?
1.1.1-48-g157896305
Have you tried reproducing the issue with the latest release?
1.1.1-48-g157896305 is a few days old; I’m trying 1.1.1-65 next.
What is the hardware spec (RAM, OS)?
A 5-node EC2 cluster, running m4.larges.
Steps to reproduce the issue (command/config used to run Dgraph).
With Jepsen 9f780a144248776e0143bc9699647802a29e1d28, run
lein run test --username admin --nodes-file ~/nodes --concurrency 2n --local-binary dgraph-1.1.1-48-g15789630 --time-limit 600 -w bank --nemesis move-tablet --test-count 5
I hit this case after roughly 1800 seconds of testing.
Expected behaviour and actual result.
In this bank test run, at index 39931, the total of all accounts changes from 100 to 98, and remains at 98 for the remainder of the test. Here’s an excerpt from the history, restricted to just operations involving account 6:
{:type :invoke, :f :read, :process 52, :time 164054695072, :index 39868}
{:type :ok, :f :read, :process 68, :time 164056940521, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39869}
{:type :invoke, :f :read, :process 68, :time 164057067767, :index 39870}
{:type :fail, :f :transfer, :value {:from 6, :to 0, :amount 4}, :process 94, :time 164064944870, :error :insufficient-funds, :index 39871}
{:type :invoke, :f :transfer, :value {:from 4, :to 6, :amount 2}, :process 94, :time 164065096650, :index 39872}
{:type :fail, :f :transfer, :value {:from 1, :to 6, :amount 4}, :process 69, :time 164080718008, :error :conflict, :index 39873}
{:type :invoke, :f :read, :process 69, :time 164080860932, :index 39874}
{:type :ok, :f :read, :process 56, :time 164102955082, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39875}
{:type :fail, :f :transfer, :value {:from 6, :to 7, :amount 5}, :process 63, :time 164110308064, :error :insufficient-funds, :index 39877}
{:type :invoke, :f :read, :process 63, :time 164110476403, :index 39878}
{:type :invoke, :f :transfer, :value {:from 6, :to 5, :amount 2}, :process 50, :time 164110856399, :index 39880}
{:type :invoke, :f :read, :process 55, :time 164134503400, :index 39882}
{:type :ok, :f :read, :process 68, :time 164177129784, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39883}
{:type :invoke, :f :read, :process 68, :time 164177263552, :index 39884}
{:type :ok, :f :read, :process 69, :time 164177838798, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39885}
{:type :invoke, :f :read, :process 69, :time 164177999313, :index 39886}
{:type :fail, :f :transfer, :value {:from 6, :to 5, :amount 2}, :process 50, :time 164180105093, :error :insufficient-funds, :index 39887}
{:type :invoke, :f :read, :process 50, :time 164180226548, :index 39888}
{:type :ok, :f :read, :process 52, :time 164223467325, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39889}
{:type :invoke, :f :read, :process 52, :time 164223606112, :index 39890}
; This transfer results in account 6 going from $1 to $3
{:type :ok, :f :transfer, :value {:from 4, :to 6, :amount 2}, :process 94, :time 164231051906, :index 39891}
{:type :ok, :f :read, :process 69, :time 164243265925, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39893}
{:type :invoke, :f :transfer, :value {:from 4, :to 6, :amount 2}, :process 69, :time 164243399496, :index 39894}
{:type :ok, :f :read, :process 63, :time 164248385887, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39897}
{:type :invoke, :f :read, :process 63, :time 164248539257, :index 39898}
{:type :ok, :f :read, :process 68, :time 164250792913, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39899}
{:type :invoke, :f :read, :process 68, :time 164250949055, :index 39900}
{:type :ok, :f :read, :process 50, :time 164251226618, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39901}
{:type :invoke, :f :read, :process 50, :time 164251325868, :index 39902}
{:type :ok, :f :read, :process 55, :time 164255927121, :value {0 58, 1 11, 3 1, 4 2, 5 20, 6 1, 7 7}, :index 39903}
{:type :fail, :f :transfer, :value {:from 4, :to 6, :amount 5}, :process 67, :time 164261472791, :error :insufficient-funds, :index 39905}
{:type :ok, :f :read, :process 68, :time 164315481013, :value {0 55, 1 11, 3 4, 5 20, 6 3, 7 7}, :index 39907}
{:type :invoke, :f :read, :process 68, :time 164315611834, :index 39908}
{:type :ok, :f :read, :process 63, :time 164317140132, :value {0 55, 1 11, 3 4, 5 20, 6 3, 7 7}, :index 39909}
{:type :ok, :f :read, :process 52, :time 164387501966, :value {0 58, 1 11, 3 1, 5 20, 6 3, 7 7}, :index 39911}
{:type :invoke, :f :read, :process 52, :time 164387666283, :index 39912}
{:type :invoke, :f :read, :process 56, :time 164396547850, :index 39916}
{:type :ok, :f :read, :process 50, :time 164398498075, :value {0 55, 1 11, 3 4, 5 20, 6 3, 7 7}, :index 39917}
{:type :invoke, :f :read, :process 50, :time 164398645310, :index 39918}
{:type :ok, :f :read, :process 68, :time 164405111456, :value {0 55, 1 11, 3 4, 5 20, 6 3, 7 7}, :index 39919}
{:type :invoke, :f :read, :process 94, :time 164406008839, :index 39922}
{:type :invoke, :f :read, :process 63, :time 164475061769, :index 39928}
{:type :fail, :f :transfer, :value {:from 4, :to 6, :amount 2}, :process 69, :time 164487140702, :error :insufficient-funds, :index 39929}
{:type :invoke, :f :read, :process 69, :time 164487270518, :index 39930}
; But then account 6 abruptly changes back to 1--without an obvious transfer
{:type :ok, :f :read, :process 50, :time 164722480000, :value {0 55, 1 11, 3 4, 5 20, 6 1, 7 7}, :message {:type :wrong-total, :total 98, :span-id "SpanId{spanId=c235656ed2e9a541}", :trace-id "TraceId{traceId=2b430081259ce2267ea6a90aab280adb}"}, :error :checker-violation, :index 39931
Because there’s no obvious proximate transfer operation here, this makes me wonder if maybe this single account got rolled back somehow? Whatever the cause, the effects persisted for the rest of the test!