Splitting predicates into multiple groups

eugaia · May 4, 2021, 5:02am

@hardik - In the not-too-distant future, my team and I are going to be building a search engine that will be caching all the publicly available web pages on the net (i.e. like Google / Bing do). In addition, the plan is to try to add in the URLs from the Way Back Machine.

From here, we could tentatively say that we’d need to consider 2 billion websites. Assuming each website has an average of 100 URLs (which may be high or low), we’re talking potentially 200 billion URLs.

Since almost all the URLs will be greater than 5 bytes long, we’ll be talking at least a TB in storage for that. Moreover, if we’re using the above estimates, we’re talking about 200 billion nodes of a single type, and every node will have many different predicates.

We’ll obviously be storing the web cache in other software and doing most of the page analysis using other tools, but the plan is to feed all that data analysis into Dgraph and build our search engine on top of that.

While this is obviously going to be somewhat at the extreme end, it is a use-case that we’ll be developing in the not-too-distant future.

Topic		Replies	Views
Split predicates into multiple groups Dgraph dgraph , kind:enhancement , status:accepted	1	631	February 13, 2020
Whether Predicate Group Splitting has been implemented in Graph or not? Dgraph	1	521	February 8, 2019
Does Dgraph have a scalability problems with graphs having single heavy predicate? Dgraph	8	63	December 11, 2024
Dgraph Scalability Users	4	478	January 6, 2020
Unbalanced disk usage Users	3	928	March 7, 2020

Splitting predicates into multiple groups

Related topics