Hi everyone, long time no see.
I’m here to share a small experience. I’ve been using Dgraph in two real-world projects with minimal hardware resources. I’ve noticed that tweaking the data model significantly boosts query performance. The little trick is to transition from approaches like “myEdge @filter(eq(something, 'great'))
” to something along the lines of “myEdge.something.great
”.
It’s quite straightforward. This avoids processing the value on the edge, which was causing a ridiculous bottleneck in a EC2 poor machine. Even in a project where each chain data in my model had around 50,000 nodes/obj times 19 parents. Filtering 50k based on the Filter would make everything painfully slow. Imagine multiply it by 19.
The approach is simple. However, you need to create an edge for each state. In my case, I had “myEdge.something.great
” and “myEdge.something.notgreat
” among other variations. And in the applications, I dealt with the cases individually modifying each of my mutations carefully.
It was a lot of work to migrate the model. Therefore, I think it’s interesting to start with this approach since the beginning. Think of this edge/predicate modeling as similar to what is done in Redis, like “example:test:test2
” which creates prefixes. It’s clear that Dgraph inherits this from Badger, which works similarly to Redis. The benefits are substantial, even on low-performance machines I achieved good results.
I’m not saying to abandon using Filter completely. I’m saying that using pure edges as if they were parameters is better. But you will end up accumulating a significant amount of predicates in your schema. Which for some may be confusing or a mess.
Update: Another advantage of this approach becomes evident when you implement sharding. It can significantly accelerate performance in systems with millions upon millions of objects, as each predicate will be segmented by shard. The performance improvement can be simply astronomical.
Cheers!