How does dgraph deal with supernode problem?
For example in,
Twitter, most followed person has ~112 million followers
Instagram, Instagram account has ~331 million followers
That means a single posting list for such vertex:follower is
331 million * 64 bits (UID size) ~= 21,184,000,000 bits ~= 2.7 GBs!
How does dgrah deal with
- Queries/Upserts, which try to get all followers? (even with filters, does dgraph load all followers and apply filter, maybe leading to OOM?)
- Writes will be impacted (adding/removing follower)? since creating posting list will be expensive?
- Impact on cache? (for example changes to posting list are cached in memory and flushed eventually) Could this lead to OOM as well?
I understand that dealing with such supernodes is challenging
So intention behind this question is to
- Understand limitations/support in dgraph for supernode problem
- So that me/community is aware of the problems and design their queries/schema/sharding accordingly.