Cost of each mutation grows as more mutations are in a transaction

Moved from GitHub dgraph/3046

Posted by mooncake4132:

I originally asked this on slack, but it might be more useful to track it as an issue.

Every few days our application will need to insert up to 3 million (this number may grow) predicates into the database. To assess dgraph’s performance, I wrote this little python script below to benchmark the time it takes to insert 1000, 10000, 30000, 50000, and 100000 predicates. Results are as follows:

Updated schema in 1.824007272720337 seconds.
Mutating 1000 N-Quads took 0.0899970531463623 seconds.
Mutating 10000 N-Quads took 1.6726512908935547 seconds.
Mutating 30000 N-Quads took 11.846931219100952 seconds.
Mutating 50000 N-Quads took 27.030992031097412 seconds.
Mutating 100000 N-Quads took 111.02126455307007 seconds.

The growth of the time is a bit worrying. Why does inserting 100 thousand predicates take 70x the time to insert 10 thousand predicates?

Here’s the script:

#!/usr/bin/env python3
import time

import pydgraph


client_stub = pydgraph.DgraphClientStub('localhost:9080')
client = pydgraph.DgraphClient(client_stub)
client.alter(pydgraph.Operation(drop_all=True))

schema = """
test: string @index(fulltext) @lang .
"""
start_time = time.time()
client.alter(pydgraph.Operation(schema=schema))
print('Updated schema in {} seconds.'.format(time.time() - start_time))

for n in (1_000, 10_000, 30_000, 50_000, 100_000):
    rdf = '\n'.join('<_:node_{}> <test> "test" .'.format(i) for i in range(n))
    transaction = client.txn()
    start_time = time.time()
    transaction.mutate(set_nquads=rdf, commit_now=True)
    print('Mutating {} N-Quads took {} seconds.'.format(n, time.time() - start_time))

Initially, I thought it’s because of the fulltext index. So I also tried without without @index(fulltext). Here are the results:

Updated schema in 0.004003763198852539 seconds.
Mutating 1000 N-Quads took 0.07899928092956543 seconds.
Mutating 10000 N-Quads took 1.236546277999878 seconds.
Mutating 30000 N-Quads took 7.040283203125 seconds.
Mutating 50000 N-Quads took 16.69643545150757 seconds.
Mutating 100000 N-Quads took 59.379029989242554 seconds.

It’s slightly better, but the time growth is still worrying.

Any guidance is appreciated.

Configurations:

  • Running in docker on Windows.
  • One zero and one alpha.
    Dgraph version : v1.0.11
    Commit SHA-1 : b2a09c5b
    Commit timestamp : 2018-12-17 09:50:56 -0800
    Branch : HEAD
    Go version : go1.11.1

codexnull commented :

Thanks for the report and for providing the test script. We confirmed that the transaction time does grow more than linearly with the transaction size and will dig deeper for improvements.

In the mean time, we suggest clients use transaction sizes of 1000 or so and use concurrency instead to increase throughput.

mooncake4132 commented :

Thanks for confirming. We can definitely split the mutations into different transactions.

I’ll let you decide if you want to close this issue or leave it open for tracking.