Releasing distributed transactions in v0.9 - Dgraph Blog

It all started with a Github issue.

At Dgraph, we really care about user feedback. Most of what we’ve built starting January 2017, has been based what our community (that’s you!) told us. The biggest contribution that we get from our community, is in the form of feedback. We’ll forgo any code contribution for quality feedback based on real-world usage.

Since the beginning of Dgraph, transactions were road mapped as a post v1.0 feature. Dgraph is a distributed and synchronously replicated system. Adding transactions in such a system is a hard challenge; something that we felt wasn’t worth the complexity to tackle early on.

That changed when Gustavo Niemeyer filed this issue. In this issue, he made a very convincing case for supporting transactions sooner rather than later.

So, coming back to Dgraph, if the idea is indeed to position it as a general alternative for existing databases as I’ve watched in one of your videos, please don’t make the same mistake of postponing transactions for too long, or voicing them as relevant mainly for financial transactions. The sort of consistency at stake is relevant for pretty much any application at all that uses data, even more when even basic details about a record are recorded as multiple individual terms. This would make the situation even worse than with MongoDB. –Gustavo

The arguments made by Gustavo were intriguing enough for us to look seriously in that direction. Once we started looking, evidence was everywhere. People have been complaining about lack of transactions in MongoDB. Bigtable author and Google Fellow, Jeff Dean, considered not implementing transactions in Bigtable his “biggest mistake as an engineer”.

It was clear that transactions were something that we should implement right away.

So that’s what we did. We used what we call the Blitzkrieg approach. Explaining it can be a blog post of its own, but the general idea is that a single or a very small set of engineers work initially to make changes deep into the core, which would break most things minus the core (possibly one package). And then the rest of the team helps fix up the outer shells level by level. We use this technique regularly to implement major design changes at a lightning speed.

The entire work from the reporting of the issue (Sep 13) to implementing transactions in Badger (Oct 5), to releasing v0.9 with transactions (on Nov 14), was done within a time span of two months.

Wow.. that's an amazing turnaround time for that level of complexity. Thank you! https://t.co/niILz03jw0

— Gustavo Niemeyer (@gniemeyer) November 14, 2017

In this blog post, we won’t go into the details of how Dgraph’s transactions work. There’s a lot of interesting bits there, due to the uniqueness of this challenge; so the team decided that a blog post won’t do justice to what has gone into building this amazingly distributed graph database over the past two years. Instead, we plan to write a technical paper about Dgraph’s unique design. Watch for that in the coming months!

So, instead of how it works, this blog post focuses on how to use transactions to build your application.

The transaction model

Transactions come along with a new model for how to interact with Dgraph. Previously, it has just been single queries and data mutations on their own. Now all queries and mutations are performed as part of a transaction.

Dgraph can perform read-modify-write transactions, the typical lifecycle being:

  1. Create a transaction. Go client: client.NewTxn().

  2. Execute a series of queries and mutations. Go client: txn.Mutate(...) and txn.Query(...).

  3. Finally, commit or abort the transaction. Go client: txn.Commit() and txn.Abort().

If two concurrently running transactions write to the same data, then one of the commits will fail. It’s up to the user to retry.

Why are transactions important?

Database transactions are important for any app that needs to update its state based upon its previous state in a consistent manner or has operations that need to apply in atomic units.

That covers a lot of different things. Just to name a few:

  • Marking inventory in an online shop as sold. You wouldn’t want to sell the last remaining item to two customers.

  • Paying out bets on an online poker site. It’s important to ensure the same win isn’t paid twice.

  • Inventory management for a warehouse. Restocking an item twice without seeing the new quantity could result in twice as much held stock as intended.

  • Financial transactions. When transferring money, it’s important that credits and debts on two accounts are either both applied or not applied at all.

Dgraph v0.9 introduces distributed ACID transactions with synchronous replication. What this means is that transactions work across multiple servers each holding a part of the graph, providing ACID guarantees.

Increasing throughput is still just a matter of bringing up additional dgraph instances. There is no need to worry about seeing a previous database state when querying a replica. From the point of view of a single client, once a transaction is committed its changes are guaranteed to be visible in all future transactions. These guarantees help simplify application code significantly while providing a high level of scalability and crash resilience.

Client Libraries

Dgraph exposes its API via gRPC and HTTP. However…

Transactions require some bookkeeping and state management on the client side. Because of this, it’s strongly recommended to use a client library to interact with dgraph.

At the time of writing, official Go, Java and a community-driven Javascript clients are available.

Client libraries for other languages can be implemented on top of the gRPC or HTTP APIs. The best way to approach this is to read the documentation about how to use the raw HTTP API and look at the implementations for other existing clients.

The examples in this blog post will use the Go client.

A simple login system

Prior to v0.9.0 dgraph had an upsert feature which is now removed. Upsert atomically searches and retrieves or creates and retrieves depending on whether an entity exists or not.

With transactions, an explicit upsert feature is no longer required. This is because upsert style operations can be performed atomically within a transaction.

So how is this done?

In this example, we model a simple login system, where a user has to provide an email address and password in order to gain access to the system.

If the user already exists, then the password must match. If the user doesn’t yet exist, then their password should be stored for later logins.

It’s important to do all of this in a transaction. If it’s not, then the same account might inadvertently be created twice.

Error checking and JSON marshalling/unmarshalling have been omitted for brevity:

// Create a new transaction. The deferred call to Discard
// ensures that server-side resources are cleaned up.
txn := client.NewTxn()
defer txn.Discard(ctx)
// Create and execute a query to looks up an email and checks if the password
matches.
q := fmt.Sprintf(`
    {
        login_attempt(func: eq(email, %q)) {
            checkpwd(pass, %q)
        }
    }
`, email, pass)
resp, err := txn.Query(ctx, q)
// Unmarshal the response into a struct. It will be empty if the email couldn't
// be found. Otherwise it will contain a bool to indicate if the password matched.
var login struct {
    Account []struct {
        Pass []struct {
            CheckPwd bool `json:"checkpwd"`
        } `json:"pass"`
    } `json:"login_attempt"`
}
err = json.Unmarshal(resp.GetJson(), &login); err != nil {
// Now perform the upsert logic.
if len(login.Account) == 0 {
    fmt.Println("Account doesn't exist! Creating new account.")
    mu := &protos.Mutation{
        SetJson: []byte(fmt.Sprintf(`{ "email": %q, "pass": %q }`, email, pass)),
    }
    _, err = txn.Mutate(ctx, mu)
    // Commit the mutation, making it visible outside of the transaction.
    err = txn.Commit(ctx)
} else if login.Account[0].Pass[0].CheckPwd {
    fmt.Println("Login successful!")
} else {
    fmt.Println("Wrong email or password.")
}
Bank Account Transfers

The classical example for database transactions is to transfer money between two bank accounts. In this example, we have a set of bank accounts, each represented by a node in the graph. Each node is known by a uid and has its balance represented by a bal predicate.

This example was extracted from a tool we used when testing the correctness of our transaction implementation. The full source is here.

Given the uid of two accounts, we want to transfer money from one account to the other, i.e. reduce one balance and increase the other by the same amount.

It’s important that this is done in a transaction; if it isn’t, then two transfers happening concurrently could result in the net balance of all accounts changing. It could also result in double spending.

txn := s.dg.NewTxn()
defer txn.Discard(ctx)
// Get current balances for the two accounts.
q := fmt.Sprintf(`{both(func: uid(%s, %s)) { uid, bal }}`, from, to)
resp, err := txn.Query(ctx, q)
type Accounts struct {
    Both []Account `json:"both"`
}
var a Accounts
err := json.Unmarshal(resp.Json, &a); err != nil {
// Perform the transfer.
a.Both[0].Bal += 5
a.Both[1].Bal -= 5
if a.Both[0].Bal < 0 || a.Both[1].Bal < {
    // Abandon the transaction if there are insufficient funds.
    return
}
// Write back to dgraph.
var mu protos.Mutation
data, err := json.Marshal(a.Both)
mu.SetJson = data
_, err = txn.Mutate(ctx, &mu)
err = txn.Commit(ctx)

Conclusion

It has historically been difficult to implement transactions in NoSQL technologies. Notably, MongoDB has been working on a solution for a while.

So implementing transactions with synchronous replication is a massive milestone for Dgraph. With this complex but valuable feature, our community will be able to build apps on top of dgraph without having to worry about tricky data integrity issues!

We are building an open source, real time, horizontally scalable and distributed graph database.

We're starting to support enterprises in deploying Dgraph in production. Talk to us, if you want us to help you try out Dgraph at your organization.

Top image: When (Neutron) Stars Colide


This is a companion discussion topic for the original entry at https://blog.dgraph.io/post/v0.9/
4 Likes

Great that you have added Tx support !

I have not looked at the code as to how you implemented them exactly, but I guess dedicated upsert functions would still provide lower overhead and locking times for a number of use cases since they run with at least two network hops less than reading/updating. Can you shed some light on the thinking that went into this decision ?

cheers,
matthias

1 Like

A big simplication that we did was to remove mutation variables. Previously, they were being used to give a sense of read, update, write. And then was extended to provide atomicity over upserts. But, after transactions, we separated out mutations from queries, which simplified the code and the interaction. In fact, simplifying interaction has been one of the main targets.

We have implemented lockless transactions. So, the only real cost here is the client to server hops, i.e. read, then mutate, then commit. Ideally, you could achieve all 3 of them in one query. GraphQL doesn’t provide a good way to do updates like these in the language spec. So, after v1.0, we plan to support Cypher, which is built around SQL and provides these benefits.

The way Dgraph implements distributed transactions is actually unique. We’re writing a paper about the design of the entire system, including our implementation of transactions.

2 Likes

Very interesting, I think going with a more expressive language that can be processed server-side would indeed solve the remaining issue.

I am looking forward to reading the transaction code/blog post and don’t forget to include or follow up when ‘verified’ by tools/people such as Jepsen/aphyr or comparable for correctness.

We mentioned the correctness test, inspired by Jepsen but written in Go in this blog post briefly. You can see the full directory of such tests here:

Do transactions also apply to indexes? For example, if I have the following index:

name: string @index(term) .

Would one of the following abort if I ran them at the same time (but as part of different transactions)?

<node1> <name> "John" .
<node2> <name> "Peter" .

@calummoore, transactions do apply to indexes. However, in your example both mutations should succeed, because they will be under different keys (the name:John key and the name:Peter key).

If instead you were to have the following different mutations, one would abort:

<node1> <name> "John Paul Young" .
<node2> <name> "John McCartney" .

(since both would modify the name:John key).

1 Like

did you publish a paper about dgraph transaction?and where can i find it?thx

We did not. It’s on my long term todo list to write up how transactions work in Dgraph. I’d like to find some time to write a blog post about it.

1 Like

Thanks for you fast reply! I am a master student, and recently i want to do some researching and developing on transaction and MCVV about dgraph and badger. Could you give me some suggestions about how to be familiar with its theory more quickly?thx

Ok, the login code snippet is doubly broken:

In ratel, latest Dgraph, this blog post query:

{
       login_attempt(func: eq(email, %q)) {
           checkpwd(pass, %q)
       }
}

returns:

"login_attempt": [
     {
        "checkpwd(pass)": true,
        "uid": "0x2719"
     }
]

Therefore in the code snippet of this blog post, the unmarshal will fail; pass will be null instead of true or false. (pass: null)
The server will, in turn, gives you a panic serving xyz, runtime error: index out of range (nice meaningful error, I know)

For newbie like me, who tries to learn both Golang+Dgraph at the same time, you have to change your query:

{
   login_attempt(func: eq(email, %q)) {
      password_match: checkpwd(pass, %q)
      }
 }

And the data structure for the unmarshal:

Account []struct {
            PasswordMatch bool `json:"password_match"`
    } `json:"login_attempt"`

...
else if login.Account[0].PasswordMatch {
	fmt.Fprintf(w, "Login successful!")
}

If someone at Dgraph can fix this blog post, it would be appreciated.

Thx.