Skewed Query Results with Ongoing Mutations

diggy · March 24, 2020, 8:26pm

Moved from GitHub dgraph/5021

What version of Dgraph are you using?

v1.2.1

Have you tried reproducing the issue with the latest release?

No

System Background

We are running Dgraph on Kubernetes using the supplied Helm charts. We have 3 Dgraph Zeros running and 6 Dgraph Alphas running. All pods are on their own node. For storage, we are using direct attached storage to each node (could this be the issue?). Access mode is ReadWriteMany.

Steps to reproduce the issue (command/config used to run Dgraph).

We are running a system that is executing at a maximum of 60 mutations per second. Our mutations are executed over the HTTPS API (TLS enabled) with the route /mutate?commitNow=true and we are performing an upsert operation like so

upsert {
    query {
         var (...) { a as uid }
         var (...) { b as uid }
         ...
    }

    mutation {
        set {
             uid(a) <something> "something" .
             uid(b) <something> "something" .
        }
    }
}

The query we use to aggregate and report on our data uses a @groupby and count aggregate like so:

{
  var(func: type(typeA)) @filter(...) {
    replationshipA @filter(...) {
      relationshipB @filter(...) {
        relationshipC @filter(...) {
          a as uid
        }
      }
    }
  }

  query(func: uid(a)) @groupby(b,c,d)  {
    count(uid)
  }
}

When querying our data using the HTTPS API, using the endpoint /query?ro=true&be=true with the above query, we are seeing skewed results when running the same query multiple times in a short time frame (10 seconds). For example, we are seeing one element of the grouping come back with a value of 5000 on the first query, but on the second subsequent query, the value comes back as 100. Examples below:

Query 1

[
            {
                "count": 5671
            },
            {
                "count": 23535
            },
            ...
]

Query 2

[
            {
                "count": 113
            },
            {
                "count": 3000
            },
            ...
]

The reduction in the count and large amount of variance does not seem correct to us.

However, if we stop all mutations to Dgraph though, and run the same query experiment, we start to see reasonable and consistent results. What could be causing this behavior? Is this user error related?

Expected behaviour and actual result.

We expect to see consistent results when querying our data and not see fluctuations of +/- 5,000.

diggy · April 20, 2020, 7:58am

harshil-goel commented :

Hi, We are looking at the issue. If it is possible, could you provide us with your schema and mutations, so that we can replicate the issue?

Topic		Replies	Views
Diverged replicas Dgraph kind:bug	4	518	January 14, 2020
Issues with Dgraph running in Kubernetes (K8 Loadbalancing?) Dgraph kind:bug	6	1108	October 7, 2020
Mutations can break queries on unrelated nodes Dgraph dgraph , area:mutations	4	651	May 13, 2020
V1.0.7 - Inconsistent query results between dgraph servers in same group Dgraph	10	779	August 10, 2018
Mutation performance issue Dgraph kind:question	3	279	June 24, 2021