Yes its quite correct in the understanding. We currently don’t split shards. So if you have one big predicate, it would have to reside on one disk. But querying it would still be fast. Badger supports vlog, so big values are part of logs and we directly query the pointer in the vlog. We would face problems if you query the entire data at once because we won’t able to read it in memory. So stuff like traversal, indexes could face issues.
We are thinking that we can split a predicate in multiple groups after a certain size to avoid this issue. We are thinking we can shard on the basis of nodes, but researching different avenues as well.
We did it this way so that it’s faster to traverse within a predicate. Hence we want to try to maintain the performance of queries even with splits.