Client.query() causes dgraph alpha to write to disk constantly

Moved from GitHub pydgraph/27

Posted by liu1000:

When I use client.query(...) to do pure reads from dgraph, I noticed the dgraph alpha process constantly writing to disk. Snippet:

for q in pure_read_queries_to_run:
    client.query(q)

I have to do something like the following as a workaround:

txn = client.txn()

for q in pure_read_queries_to_run:
    txn.query(q)

txn.discard()

By doing this, performance improved by around 4 fold.

I think it may have something to do with the transaction handling in pydgraph?

danielmai commented :

Using client.query will create a new txn for every single call. That’ll create a new timestamp which needs to get propogated to the entire cluster. Creating a single txn as you did the latter case is the better way to perform many read-only queries.

We support read-only txns which will help here (see commit message from Improve efficiency of readonly transactions by reusing the same read … · dgraph-io/dgraph@fe7b749 · GitHub). This needs to be implemented in pydgraph. @paulftw can you do that?

This is from the pydgraph README:

res = client.txn().query(query, variables=variables)
# If not doing a mutation in the same transaction, simply use:
# res = client.query(query, variables=variables)

I can see how the README can mean that it’s better to use client.query rather than client.txn().query, but they mean the same thing. This should be clarified.

manishrjain commented :

This new issue #30 should avoid getting a new timestamp from Zero, which causes the disk writes.