Moved from GitHub dgraph/5021
Posted by grantsavage:
What version of Dgraph are you using?
v1.2.1
Have you tried reproducing the issue with the latest release?
No
System Background
We are running Dgraph on Kubernetes using the supplied Helm charts. We have 3 Dgraph Zeros running and 6 Dgraph Alphas running. All pods are on their own node. For storage, we are using direct attached storage to each node (could this be the issue?). Access mode is ReadWriteMany
.
Steps to reproduce the issue (command/config used to run Dgraph).
We are running a system that is executing at a maximum of 60 mutations per second. Our mutations are executed over the HTTPS API (TLS enabled) with the route /mutate?commitNow=true
and we are performing an upsert
operation like so
upsert {
query {
var (...) { a as uid }
var (...) { b as uid }
...
}
mutation {
set {
uid(a) <something> "something" .
uid(b) <something> "something" .
}
}
}
The query we use to aggregate and report on our data uses a @groupby
and count aggregate like so:
{
var(func: type(typeA)) @filter(...) {
replationshipA @filter(...) {
relationshipB @filter(...) {
relationshipC @filter(...) {
a as uid
}
}
}
}
query(func: uid(a)) @groupby(b,c,d) {
count(uid)
}
}
When querying our data using the HTTPS API, using the endpoint /query?ro=true&be=true
with the above query, we are seeing skewed results when running the same query multiple times in a short time frame (10 seconds). For example, we are seeing one element of the grouping come back with a value of 5000
on the first query, but on the second subsequent query, the value comes back as 100
. Examples below:
Query 1
[
{
"count": 5671
},
{
"count": 23535
},
...
]
Query 2
[
{
"count": 113
},
{
"count": 3000
},
...
]
The reduction in the count and large amount of variance does not seem correct to us.
However, if we stop all mutations to Dgraph though, and run the same query experiment, we start to see reasonable and consistent results. What could be causing this behavior? Is this user error related?
Expected behaviour and actual result.
We expect to see consistent results when querying our data and not see fluctuations of +/- 5,000.