Hello everyone !
I would like to know what is the best way to achieve the “highest throughput” regarding the update of dGraph nodes using Python. In my current setup I have to update nodes 50-100 times in a second & whenever a new “value” occurs it should get updated individually. I would like to avoid “batch” update, because it may result in loss of data.
To do this, I’ve tried two different strategies, one strategy is using the GraphqlClient, and the other one is using the official pydgraph client. Basically, what am I doing is updating the same node 100 times and measuring the time.
GraphqlClient:
from python_graphql_client import GraphqlClient
import time
client = GraphqlClient(endpoint="http://localhost:8080/graphql")
msg_count = 1
sensorOne = "temperatureSensor"
t0 = time.time()
while True:
if (msg_count <=100):
updateSensor = 'mutation MyMutation{ updateSensor(input: {filter: {name: {eq: "' + sensorOne + '"}' + '}, set: {value: ' + str(msg_count) +'}' + '}){ sensor { name value}' + ' }' + '}'
data = client.execute(query=updateSensor)
else:
t1 = time.time()
break
msg_count +=1
print("Total time: "+ str(t1-t0))
In average, I takes around 13-15 seconds to update 100 values, and that would be around 6 updates/second.
pydgraph client:
import pydgraph, time
client_stub = pydgraph.DgraphClientStub('localhost:9080')
client = pydgraph.DgraphClient(client_stub)
nodeType = "temperatureSensor"
msg_count = 1
t0 = time.time()
while True:
if (msg_count <=100):
txn = client.txn()
try:
queryV2 = '{var(func: eq(Sensor.name, "' + nodeType + '")) {Type as uid}}'
nquad = f"""uid(Type) <Sensor.value> "{msg_count}" ."""
mutation = txn.create_mutation(set_nquads=nquad)
request = txn.create_request(query=queryV2, mutations=[mutation], commit_now=True)
txn.do_request(request)
except pydgraph.AbortedError:
pass
finally:
txn.discard()
else:
t1 = time.time()
break
msg_count+=1
print("Total time: "+ str(t1-t0))
This strategy is in average twice as fast as the one using GraphqlClient. It takes around 6 seconds to update 100 values, and that is 15 updates/second.
This is the schema for Sensor:
type Sensor @withSubscription {
id: ID!
name: String! @id @search(by: [exact, regexp])
value: Float @search
timestamp: DateTime
unit: String
}
So I would like to know is there a way to achieve around 50-100 updates/second. Important to say is that I’m using a Standalone image deployed to Docker container. Should I get better results if I increase the resources of container or if I use the cluster schema?
Thanks in advance!