Let me ask you;
Why do you need to sort by amount? There is no need to sort if you are just seeking for count the total of donations. In that query you are counting the total of nodes that has Type donation. And you’re not querying for others predicates.
And maybe some of your nodes doesn’t have amount values. So the first query one just returns the nodes that has amount values. And the last one returns everything that has or hasn’t amount values.
But of course it’s strange. It should return everything in both queries.
As I said my intention is not to have count on my final query. I am now using count to debug why the results of an aggregation query are returning different values when using sorting.
When I run the the query as you indicated, without count, the lists returned are definitely different. The one with sorting is much smaller.
Even the query latency is much larger when sorting is not used as the result set is much larger.
That is the reason I did the count. Counting manually will be tedious and time consuming.
At the risk of necroing a thread, this has come up a bunch of times across the forum over the years and I’d like to surface it to the new dgraph team. I’m replying to this one as I feel this is your most complete response on the matter from @MichelDiz
I just came across this issue again recently while my data science was performing some risk analysis on our data sets. I don’t think this default behaviour really makes sense.
I completely understand that you dont want users on shared clusters doing this as it’d be bad for other users experience.
Given that one of the main reasons that people choose to use dgraph is to analyse large and disparate data sets in a high performance manner, you’d imagine that it’d be very common to want to query a whole data set with sorting. 1000 is not many when doing something like fraud detection based on user interactions in the last month (for example).
I’d like to propose removing these limitations from dedicated and self hosted clusters. If its for quality of service protection on shared clusters, I can understand the limitation, but for dedicated or self hosted clusters I’m not sure I agree with the limitation. In my opinion its also a very reasonable ‘upsell’ for getting users to upgrade to a dedicated cluster as well.
If you don’t want to do that, it’d be great if you could include an option to disable this ‘safety’ feature in scenarios where it wont affect other users.