Timestamps in Dgraph
Motivation
We are working on a lightweight white-box checking algorithm of snapshot isolation (SI): given an execution of a database, to check whether it satisfies SI.
The SI checking problem is NP-hard for general executions. So it is desirable to make use of the knowledge of how SI is actually implemented in databases.
The insight is that most databases, especially distributed databases, implement SI following the generic protocol using start-timestamps and commit-timestamps. With these timestamps of transactions in an execution, the SI checking problem becomes solvable in polynomial time. Therefore, we want to obtain these timestamps when generating executions.
It is crucial for us to really understand the meaning and roles of the start-timestamps and commit-timestamps in the database under testing. We must be very sure that we have obtained the right timestamps in the right way.
That is why we ask for help here.
Background
We are digging into the implementation of snapshot isolation of Dgraph, especially into the use of timestamps in transactions.
Consider the classic description of start-timestamp and commit-timestamp in implementing Snapshot Isolation:
For start-timestamp: A transaction executing with Snapshot Isolation always reads data from a snapshot of the (committed) data as of the time the transaction started, called its Start-Timestamp. This time may be any time before the transaction’s first Read.
For commit-timestamp: When the transaction
T1
is ready to commit, it gets a
Commit-Timestamp, which is larger than any existing Start-Timestamp or Commit-Timestamp.
WhenT1
commits, its changes become visible to all
transactions whose Start-Timestamps are larger thanT1
’s Commit-Timestamp.
For conflict detection:
The transactionT1
successfully commits only if no other transactionT2
with a
Commit-Timestamp inT1
’s execution interval [Start-Timestamp, Commit-Timestamp] wrote data thatT1
also wrote. Otherwise,T1
will abort.
This feature, called First-committer-wins prevents lost updates.
Our Problem
How can we obtain such start-timestamp and commit-timestamp of a transaction in Dgraph from, e.g., operation messages or database logs using pydgraph
?
Our Solution
Environment
- Dgraph version
We use this docker image:docker pull dgraph/dgraph:latest # version:v23.1.0
- Driver:
pydgraph-23.0.1
- Python v3.8.18
Example
We use simple.py@pydgraph with two minor modifications:
- We changed the
localhost
in line 10 to175.27.241.31
.Note:
175.27.241.31
is publicly available.
You can use it directly without pulling the docker image. - We add the following code immediately after line 78 for print:
print(response)
The print
result is as follows.
Note that there is only the start_ts
for the transaction. We do not find commit_ts
.
txn {
start_ts: 260380056
keys: ...
...
preds: ...
...
}
latency {
...
}
metrics {
...
}
uids {
...
}
Our Question
- How to obtain the
commit_ts
for transactions usingpydgraph
?- Any official references (e.g., official documentation or source code) for this?
- Does the following code in
simple.py
mean that theclient
are issuing multiple transactions one by one?
def main():
client_stub = create_client_stub()
client = create_client(client_stub)
drop_all(client)
set_schema(client)
create_data(client)
query_alice(client) # query for Alice
query_bob(client) # query for Bob
delete_data(client) # delete Bob
query_alice(client) # query for Alice
query_bob(client) # query for Bob
# Close the client stub.
client_stub.close()
Thanks.