How to obtain the start and commit timestamps of transactions of Dgraph using pydgraph?

hengxin · October 5, 2023, 2:54pm

Timestamps in Dgraph

Motivation

We are working on a lightweight white-box checking algorithm of snapshot isolation (SI): given an execution of a database, to check whether it satisfies SI.

The SI checking problem is NP-hard for general executions. So it is desirable to make use of the knowledge of how SI is actually implemented in databases.

The insight is that most databases, especially distributed databases, implement SI following the generic protocol using start-timestamps and commit-timestamps. With these timestamps of transactions in an execution, the SI checking problem becomes solvable in polynomial time. Therefore, we want to obtain these timestamps when generating executions.

It is crucial for us to really understand the meaning and roles of the start-timestamps and commit-timestamps in the database under testing. We must be very sure that we have obtained the right timestamps in the right way.

That is why we ask for help here.

Background

We are digging into the implementation of snapshot isolation of Dgraph, especially into the use of timestamps in transactions.

Consider the classic description of start-timestamp and commit-timestamp in implementing Snapshot Isolation:

For start-timestamp: A transaction executing with Snapshot Isolation always reads data from a snapshot of the (committed) data as of the time the transaction started, called its Start-Timestamp. This time may be any time before the transaction’s first Read.

For commit-timestamp: When the transaction T1 is ready to commit, it gets a
Commit-Timestamp, which is larger than any existing Start-Timestamp or Commit-Timestamp.
When T1 commits, its changes become visible to all
transactions whose Start-Timestamps are larger than T1’s Commit-Timestamp.

For conflict detection:
The transaction T1 successfully commits only if no other transaction T2 with a
Commit-Timestamp in T1’s execution interval [Start-Timestamp, Commit-Timestamp] wrote data that T1 also wrote. Otherwise, T1 will abort.
This feature, called First-committer-wins prevents lost updates.

Our Problem

How can we obtain such start-timestamp and commit-timestamp of a transaction in Dgraph from, e.g., operation messages or database logs using pydgraph?

Our Solution

Environment

Dgraph version
We use this docker image:

docker pull dgraph/dgraph:latest # version:v23.1.0

Driver: pydgraph-23.0.1
- Python v3.8.18

Example

We use simple.py@pydgraph with two minor modifications:

We changed the localhost in line 10 to 175.27.241.31.

Note: 175.27.241.31 is publicly available.
You can use it directly without pulling the docker image.
We add the following code immediately after line 78 for print:

print(response)

The print result is as follows.
Note that there is only the start_ts for the transaction. We do not find commit_ts.

txn {
  start_ts: 260380056
  keys: ...
	...
	preds: ...
	...
}
latency {
	...
}
metrics {
	...
}
uids {
	...
}

Our Question

How to obtain the commit_ts for transactions using pydgraph?
- Any official references (e.g., official documentation or source code) for this?
Does the following code in simple.py mean that the client are issuing multiple transactions one by one?

def main():
    client_stub = create_client_stub()
    client = create_client(client_stub)
    drop_all(client)
    set_schema(client)
    create_data(client)
    query_alice(client)  # query for Alice
    query_bob(client)  # query for Bob
    delete_data(client)  # delete Bob
    query_alice(client)  # query for Alice
    query_bob(client)  # query for Bob

    # Close the client stub.
    client_stub.close()

Thanks.

seedoilz · October 6, 2023, 9:13am

I am in the same team with the questioner. And I had a problem with the pydgraph.
I raise an issue in github. Same with the following context.
I want to get the commit_ts by using pydgraph. However, the function ‘commit()’ doesn’t return the commit_ts. By modifying line 238 in txn.py, I return the result of the function ‘self._dc.commit_or_abort’.

 return self._dc.commit_or_abort(self._ctx, timeout=timeout,
                                     metadata=new_metadata,
                                     credentials=credentials)

Like this, I could get the commit_ts from the function ‘commit()’ .

The result is following:
start_ts: 260380200
commit_ts: 260380201

Topic		Replies	Views
Transaction related queries for conflicting transactions ( Coming from a Mysql Innodb Transaction mgmt environment) Dgraph kind:question	2	469	September 22, 2020
Consistency Model - Design concepts Documentation	0	364	August 29, 2020
Reads, writes, and timestamps Dgraph kind:question , dgraph , internals	3	703	May 7, 2021
Walkaraund for common database features in the doc Documentation	4	1230	December 30, 2021
Dgraph transactions violated snapshot isolation Dgraph	4	1404	February 6, 2023