Reading uncommitted writes in a single transaction

Hi there. We’re building an application in Dgraph and want to make sure we understand the transactional model fully.

Within the context of a single transaction, is a client able to read uncommitted writes that were part of a previously executed mutation?

ex.

Note: all of the below is within the context of a single transaction.

  1. mutation (create entity)
  2. query (get entity)
  3. commit

Would 2 return the entities created (but not yet committed) in 1?

Nope. No client can do this, but the context of that transaction persist in the cluster itself. Mean, if you use a Blank Node twice in the same transaction. It should use the same UID on both.

This is not possible, you can’t query an uncommited object as far as I know.

Ok, that’s good to know. Coming from SQL, I had the assumption that transactions would work similarly, but better to learn now!

That’s probably a thing for the future. A DB is a decade of work in general. Dgraph have about 6 years old. Stick with us more 4 and you gonna see a monster of DB.

2 Likes

@MichelDiz

This may be obvious, but how can a single transaction write data and then read that data back to the client? If the transaction is immediately committed, it can’t be rolled back later if something else errors. Am I missing something obvious here?

Using Dgo v210.

I thought that I read somewhere that you could read-write-read-commit in a single transaction (can’t find it now though, so obviously I may be wrong in this thought. Thanks for the clarification. Haven’t needed to handle transactions by hand yet, but good to know what is under the hood.

Well, Dgraph can do mutations and queries(like upsert) in the transaction context. But I don’t think that you can self query it “on the fly” you know? You can query only existing data in the cluster(that’s what an Upsert Block does. It uses the transaction context in one batch). You can’t open an transaction, then mutate and then query for it in without committing. This could be possible, but I don’t see why. You can control this way before deciding to send to the DB. Unless your need is to hold the transaction for days/weeks open. Which also doesn’t make sense in a DB transaction logic. Right?

I saw this question this year. (I found it Query a node after insert without commit using the same transaction) - See my comment(the solution one).

I think these things like roll back, handling the transaction on the fly and so on could be a thing in the future. But this sounds more like perfumery than actual necessity. Please, correct me if I’m wrong.

Honestly, I think this is pretty fundamental behavior that I hope Dgraph solves for in the near future.

There are cases where you’d want to roll back the transaction after inserting data. For example, you would often want to roll back transactions in tests so that each test is isolated and doesn’t bleed into other test contexts. Write → read and assert schema → rollback.

But here is a common API scenario:

  1. Mutate: create user with organization, 2 nodes.
  2. Commit
  3. Query: user and its associated org
  4. Random application layer bug. The mutation has already committed, but the application layer will send an error code to the client.

Am I misunderstanding something here?

In the linked thread the answer provided does state:

The way Dgraph works needs an extra transaction. Cuz the data needs to be distributed(some cases replicated) across the cluster. Without that confirmation, you can’t query the data.

This is actually rather interesting and I’d learn to understand this better myself @MichelDiz . Your answer seems to imply that Dgraph only replicates on commit. So the transaction is performed against the replica you are accessing and then once commit it initiated ensures all replicas catch up?

Does this leave room for inconsistent transactions or breaking consensus? or is it because all writes are forwarded to a leader, applied and then replicated?

I agree with @samfinan performing this in two transactions presents implications. For example in a case where we’re using transactions as a testing paradigm. We can’t safely rollback to reset the test state. This is a common testing pattern. I mean my work-around or solution is to label the nodes/parts of the schema and purge them after the test runs.In other cases the lack of being able to read imposes design implications not to mentioned on odd occasion performance.

Cockroach and even Mongo are capable of this. It’s not to poo poo on Dgraph it’s just curious because Cockroack especially uses a raft model. So as users we kind of expect/hope this isn’t a stretch for Dgraph. If it is, understanding the limitation is critical for us to understand for future planning.

You can read the paper to understand it better Paper – Dgraph | GraphQL Cloud Platform

Not exactly. I’m not 100% aware of it guts. But the data will be available only after the commit. You can’t run queries against uncommitted mutations. Neither on the “cluster RAM”. Also, the process of writing is a “distribution”, replication is available only if you set a replication configuration. And this happens right away, the commiting is just a ending/confirmation.

No, in the paper

Dgraph is a distributed database with a native graph backend. It is the only
native graph database to be horizontally scalable and support full
ACID-compliant cluster-wide distributed transactions. In fact, Dgraph is the
first graph database to have been Jepsen tested for transactional
consistency.

For case of curiosity, can you share a doc about this? I wanna understand how other DB does it and why.

If I understand @samfinan correctly, this actually does work – at least, I’ve done this, mostly successfully[1] – mutate and then query in the same transaction. The documentation here suggests this is supported: https://dgraph.io/docs/clients/overview/

A transaction always sees the database state at the moment it began, plus any changes it makes — changes from concurrent transactions aren’t visible.

[1] I think there’s a bug that causes the wrong data to be returned sometimes – the query will sometimes return an array containing the old and new value, rather than just the new value, but there’s a workaround. I’m not sure though, so I haven’t filed it as an actual bug report.

2 Likes