RFC: is filtering/computation really distributed?

Request for Comments

Similar to RFC: Is Mutation request really distributed?


One behavior I want to confirm is about computation in Dgraph’s query execution. With my time of experience, I noticed some inconsistencies in the execution. It is indeed distributed, but filters don’t appear to be. Which makes it difficult to execute complex queries.

In my observations, Dgraph asks the network for the predicates and their respective attributes. And then performs the filtering locally on the instance that it was requested (not on the client, on the DB node itself). Which can create a bottleneck. This increases CPU and memory consumption on a single instance.

PS. Math, aggregation and so on is part of computation.


Ideally, all filtering should be performed on the remote node beforehand and then returns the computed data. This will ensure a better response time from the cluster as a whole. Also avoiding OOM and CPU spikes.


Break the filters into multiple blocks as if you had a pipeline and finally in a simple block return all data. This will help a lot.

cc: @gajanan, @sudhish, @akon


You’re asking the right questions and finding the bottlenecks. +1

I agree with your findings and would love to see how these might be resolved

1 Like

I also observe this. Interestingly, when the number of filtered results is relatively low, our application gets a speed boost by performing the filtering on the results (client side). I suspect that filters indeed limit the distribution of a query.

1 Like

Question is then, what to do in production? And how to do auth if client has to filter everything lol.

In my example, we have an engine standing in front of DGraph ingesting unstructured data and updating DGraph with the results. So we are in full control of our queries. I was more making the point to show that I beleive the filtering issue does indeed exist and would need to be fixed in DGraph to optimise client/customer queries in the same way.