RFC: is filtering/computation really distributed?

MichelDiz · September 21, 2022, 4:10pm

Request for Comments

Similar to RFC: Is Mutation request really distributed?

Summary

One behavior I want to confirm is about computation in Dgraph’s query execution. With my time of experience, I noticed some inconsistencies in the execution. It is indeed distributed, but filters don’t appear to be. Which makes it difficult to execute complex queries.

In my observations, Dgraph asks the network for the predicates and their respective attributes. And then performs the filtering locally on the instance that it was requested (not on the client, on the DB node itself). Which can create a bottleneck. This increases CPU and memory consumption on a single instance.

PS. Math, aggregation and so on is part of computation.

Motivation

Ideally, all filtering should be performed on the remote node beforehand and then returns the computed data. This will ensure a better response time from the cluster as a whole. Also avoiding OOM and CPU spikes.

Workaround

Break the filters into multiple blocks as if you had a pipeline and finally in a simple block return all data. This will help a lot.

cc: @gajanan, @sudhish, @akon

amaster507 · September 21, 2022, 5:16pm

You’re asking the right questions and finding the bottlenecks. +1

I agree with your findings and would love to see how these might be resolved

RJKeevil · September 23, 2022, 7:38am

I also observe this. Interestingly, when the number of filtered results is relatively low, our application gets a speed boost by performing the filtering on the results (client side). I suspect that filters indeed limit the distribution of a query.

amaster507 · September 23, 2022, 10:14pm

Question is then, what to do in production? And how to do auth if client has to filter everything lol.

RJKeevil · September 25, 2022, 10:14am

In my example, we have an engine standing in front of DGraph ingesting unstructured data and updating DGraph with the results. So we are in full control of our queries. I was more making the point to show that I beleive the filtering issue does indeed exist and would need to be fixed in DGraph to optimise client/customer queries in the same way.

Topic		Replies	Views
Search and Filtering Using GraphQL and Dgraph - Dgraph Blog Blog	0	481	July 27, 2021
DGraph v0.2 Release Announce	1	1331	April 4, 2016
Subscription implementation: polling vs. update filters Dgraph	5	1424	September 11, 2020
"really" large datasets in dgraph Dgraph	1	504	May 25, 2019
Releasing Dgraph v0.7.1 - Dgraph Blog Blog	0	864	August 18, 2017

RFC: is filtering/computation really distributed?

Summary

Motivation

Workaround

Related Topics