Splitting predicates into multiple groups

@hardik - In the not-too-distant future, my team and I are going to be building a search engine that will be caching all the publicly available web pages on the net (i.e. like Google / Bing do). In addition, the plan is to try to add in the URLs from the Way Back Machine.

From here, we could tentatively say that we’d need to consider 2 billion websites. Assuming each website has an average of 100 URLs (which may be high or low), we’re talking potentially 200 billion URLs.

Since almost all the URLs will be greater than 5 bytes long, we’ll be talking at least a TB in storage for that. Moreover, if we’re using the above estimates, we’re talking about 200 billion nodes of a single type, and every node will have many different predicates.

We’ll obviously be storing the web cache in other software and doing most of the page analysis using other tools, but the plan is to feed all that data analysis into Dgraph and build our search engine on top of that.

While this is obviously going to be somewhat at the extreme end, it is a use-case that we’ll be developing in the not-too-distant future.