Moved from GitHub dgraph/3046
Posted by mooncake4132:
I originally asked this on slack, but it might be more useful to track it as an issue.
Every few days our application will need to insert up to 3 million (this number may grow) predicates into the database. To assess dgraph’s performance, I wrote this little python script below to benchmark the time it takes to insert 1000, 10000, 30000, 50000, and 100000 predicates. Results are as follows:
Updated schema in 1.824007272720337 seconds.
Mutating 1000 N-Quads took 0.0899970531463623 seconds.
Mutating 10000 N-Quads took 1.6726512908935547 seconds.
Mutating 30000 N-Quads took 11.846931219100952 seconds.
Mutating 50000 N-Quads took 27.030992031097412 seconds.
Mutating 100000 N-Quads took 111.02126455307007 seconds.
The growth of the time is a bit worrying. Why does inserting 100 thousand predicates take 70x the time to insert 10 thousand predicates?
Here’s the script:
#!/usr/bin/env python3
import time
import pydgraph
client_stub = pydgraph.DgraphClientStub('localhost:9080')
client = pydgraph.DgraphClient(client_stub)
client.alter(pydgraph.Operation(drop_all=True))
schema = """
test: string @index(fulltext) @lang .
"""
start_time = time.time()
client.alter(pydgraph.Operation(schema=schema))
print('Updated schema in {} seconds.'.format(time.time() - start_time))
for n in (1_000, 10_000, 30_000, 50_000, 100_000):
rdf = '\n'.join('<_:node_{}> <test> "test" .'.format(i) for i in range(n))
transaction = client.txn()
start_time = time.time()
transaction.mutate(set_nquads=rdf, commit_now=True)
print('Mutating {} N-Quads took {} seconds.'.format(n, time.time() - start_time))
Initially, I thought it’s because of the fulltext index. So I also tried without without @index(fulltext)
. Here are the results:
Updated schema in 0.004003763198852539 seconds.
Mutating 1000 N-Quads took 0.07899928092956543 seconds.
Mutating 10000 N-Quads took 1.236546277999878 seconds.
Mutating 30000 N-Quads took 7.040283203125 seconds.
Mutating 50000 N-Quads took 16.69643545150757 seconds.
Mutating 100000 N-Quads took 59.379029989242554 seconds.
It’s slightly better, but the time growth is still worrying.
Any guidance is appreciated.
Configurations:
- Running in docker on Windows.
- One zero and one alpha.
Dgraph version : v1.0.11
Commit SHA-1 : b2a09c5b
Commit timestamp : 2018-12-17 09:50:56 -0800
Branch : HEAD
Go version : go1.11.1