We have an interesting performance issue hopefully someone can help shed some light on.
In our use case we have nodes which are unique by the sum of their taxonomy, which are arbitrary key/value pairs.
Our schema is such that
key_nodes |__ value_nodes |__ item_nodes
The item nodes could have any amount of key/values associated with them. Our queries are based on the key/values, but our returned result set includes all of the key/values which is what led us to dgraph as we can easily traverse these relationships backwards in our result set.
We did notice quickly however that when using multiple filters to intersect the result sets the ordering mattered for performance.
In order to try and solve this we started indexing the counts of edges. So we would run two queries, the first to get the indexed count of the number of results in each query, then the second ordering these for the actual query with results in a sensible manner.
For context, our current dataset can consist of hundreds of thousands of
value_nodes each related to potentially thousands of
item_nodes. Although performing the indexed count query beforehand improves the overall performance, the count query itself is pretty slow (in some cases still taking seconds to execute) and looks to be overwhelming dgraph-server. We are wondering if there is anything you can suggest we do to optimise this?
Here’s an example of these queries: