Sorting and counting issue

Hello,

I am running the 2 following queries to count the number of nodes. I was expecting the results to be the same but somehow I am getting different results. Is that the correct behavior?

{
infoLevel(func: has(typedonation), orderdesc:amount) {
count(uid)
}
}

Result:
{
“data”: {
“infoLevel”: [
{
“count”: 1000
}
]
}

When I just remove the sorting:

{
infoLevel(func: has(typedonation)) {
count(uid)
}
}

Result:
{
“data”: {
“infoLevel”: [
{
“count”: 191662
}
]
}

Can somebody help explaining this behavior?

Thanks,
Marcelo

1 Like

Let me ask you;
Why do you need to sort by amount? There is no need to sort if you are just seeking for count the total of donations. In that query you are counting the total of nodes that has Type donation. And you’re not querying for others predicates.

And maybe some of your nodes doesn’t have amount values. So the first query one just returns the nodes that has amount values. And the last one returns everything that has or hasn’t amount values.

But of course it’s strange. It should return everything in both queries.

I am not trying to do a count. I was just trying to debug why I am getting different results on some aggregations when I use sorting.

I checked the query without sorting and all the nodes returned have amount.

I am starting to think there is a bug.

Thanks,
Marcelo

Sorry but I see a count here :stuck_out_tongue:

well, when you query like:
Q:returns different?

{
infoLevel(func: has(typedonation)) {
 uid
 name
 some
 someother
   }
}

{
infoLevel(func: has(typedonation), orderdesc:amount) {
 uid
 name
 some
 someother
   }
}

If so, could be a bug, but it’s hard to say without touching in your schema and mutations. It would be necessary to see every context. Or more details so I can reproduce here and attest a bug.

As I said my intention is not to have count on my final query. I am now using count to debug why the results of an aggregation query are returning different values when using sorting.

When I run the the query as you indicated, without count, the lists returned are definitely different. The one with sorting is much smaller.
Even the query latency is much larger when sorting is not used as the result set is much larger.
That is the reason I did the count. Counting manually will be tedious and time consuming.

Thanks,
Marcelo

i think it doesn’t matter if there is a sort or not, the result should not be different when you use this query.

if it happened, i believe this is not a bug. it happened when your dgraph were still inserting data, and not be stable yet. please check your dgraph servers’ log and see what happened exactly.

@shanghai-Jerry That is not the case. The data was loaded several days ago using bulk method and the query returns the same results consistently.

I really thing there is some bug.

Thanks,
Marcelo

woow, it might be, i have no ideas for this any more.

orderdesc:amount

Hi ,I have an hypothesis.
Is your data all have amount predicates?

{
typedonation
amount
}
and 
{
typedonation
}

all match has(typedonation).
if some data don’t have amount predicates.
Is it the reason make difference?

I didn’t do any test. just an hypothesis. :joy:

good, it make sense, just need more test to prove that some data don’t have amount predicate will influence the result.

try

{
  infoLevel(func: has(typedonation) and has(amount), orderdesc:amount) {
    count(uid)
  }
}

@margallardo

the same question.

{
A(func: has(elementId),orderdesc:<panorama#Taxi/行驶距离>) @filter(has(<panorama#Taxi/行驶距离>)){
   count(uid)
}
B(func: has(elementId)) @filter(has(<panorama#Taxi/行驶距离>)){
   count(uid)
}

}

Result:


{
  "A": [
    {
      "count": 1000
    }
  ],
  "B": [
    {
      "count": 22157
    }
  ]
}

dgraph version

Dgraph version : v1.0.5
Commit SHA-1 : 82787414
Commit timestamp : 2018-04-20 15:50:53 +1000
Branch : HEAD

I believe that this problem has already been clarified. To circumvent the limitation simply use paging greater than 1000.

e.g:

 A (func: has(price), orderdesc: <price>, first: 10000 ){

Quoting below:

Well folks, it’s not a bug. This is a limitation by default.

// Sort and paginate directly as it’d be expensive to iterate over the index which
// might have millions of keys just for retrieving some values.

// Only retrieve up to 1000 results by default.

“if no “first” or “last” etc. argument is specified, it would default to 1000.” mrjn.

Thank you for reporting this.

Cheers.

1 Like

Thanks …

However,is it reasonable?