Feature Request: Making Learner Nodes more smart/useful with stopping wasting resources/money. + We need smart learning-sharding like Cloudflare Durable Objects

Discussion: Which region should I choose for world wide app? No multi-region availability? Which Load Balancer should I choose? BTW What's better for dgraph AWS or GCP? - #6 by Juri

## Experience Report for Feature Request
Learner Nodes are the important big feature to scale dgraph horizontally, and therefore making dgraph much better than other graph/sql databases or Spanner.
The problem: Learner Nodes are all 1:1 replicas. And that’s a big problem, because it’s a huge waste of resources/money. And it’s really not sophisticated, because I can just copy/paste my PostreSQL database too, making the copy read-only, and syncing all writes (I know dgraph is better & smarter than that, it’s just an example because I am a noob and don’t know how to explain better).

Request 1: We need to be able to selectively replicate things, basically sharding data on our own. (this request is the most important because we need that feature now immediately with the release of Lerner Nodes) (read the example pls for more details)
Request 2: Learner Nodes shall in future do Request 1 on their own. Learner Nodes shall shard data on their own (with LEARNING where the data is used the most times) and move it from Node to Node on their own. This is the future of Cloud Computing. The Cloud should move, replicate, & shard: data on its own. Cloudflare is doing that already NOW with their Worker Durable Object product. We should implement that with dgraph too. This is of course a complex feature for the future. (technical implementation explained after the example)

Example:
e.g I have a reddit clone with the types: User, Post(properties: author, text, title, subreddit, continent-of-subreddit), Subreddits (properties: name, members, continent(if there is any defined by subreddit mods), description, founding-date), Privat-Messages, Comments(properties: author, text, subreddit, continent-of-subreddit).
I have e.g: 3 AWS regions: USA(master node), Europe(Learner Node), Singapore(Learner Node)

  1. Use Case
    E.g: People in singapore don’t use much the privat-messages capability of my reddit clone (because they use e.g LINE WeChat etc). So I want that the Singapore Learner Node does replicate everything except the whole Privat-Messages type (to safe resources/money).

  2. Use Case
    E.g: I have 3 subreddits with high volume: r/Taiwan, r/Japan, r/Tokyo. the mods of this subreddits have set the continent property to ‘Asia’.
    I want to replicate the subreddits, their posts & comments, only on the Singapore Learner Node (because it would be a waste of resources/money to replicate it in europe).

Technical Implementation (just my noob thoughts):
first feature: well, the first use case is obvious, if we say we don’t need e.g type subreddits just don’t replicate them. that’s it
for the second use case it is i think the same, i really don’t know how dgraph works behind the hood. but we really need this first feature

second feature: maybe getting some inspiration from cloudflare, like they are doing with their durable objects. they check where it is frequently used and move it to this specific location.
here some links:

here an interesting comment about serverless:

So how can we apply the serverless philosophy to state? Just like serverless compute is about splitting compute into fine-grained pieces, serverless state is about splitting state into fine-grained pieces. Again, we seek to find a unit of state that corresponds to logical units in our application. The logical unit of state in an application is not a “table” or a “collection” or a “graph”. Instead, it depends on the application. The logical unit of state in a chat app is a chat room. The logical unit of state in an online spreadsheet editor is a spreadsheet. The logical unit of state in an online storefront is a shopping cart. By making the physical unit of storage provided by the storage layer match the logical unit of state inherent in the application, we can allow the underlying storage provider (Cloudflare) to take responsibility for a wide array of logistical concerns that previously fell on the developer, including scalability and regionality.

dgraph stores predicates per group and storage does not cross groups, so could you not have your europe-only predicate on a group that does not have singapore learner nodes attached, and have the globally used predicates on groups that do have learner nodes? Obviously this would be manual instead of automagic based on usage, but seems like that meets your use case.

The automatic rebalancing system in dgraph is based on disk usage per predicate, and I feel like a better algorithm could be used to distribute based on usage instead of disk size, but that may be a different issue all together.

A different tangential feature but may be interesting here is virtual tablets, like those implemented by cassandra/scylla (and slightly different style but cockroachdb) to split up the storage of a tablet into virtual segments and replicate those as opposed to keeping all values for tablet X in one group, no exceptions.

3 Likes