## Experience Report for Feature Request
Learner Nodes are the important big feature to scale dgraph horizontally, and therefore making dgraph much better than other graph/sql databases or Spanner.
The problem: Learner Nodes are all 1:1 replicas. And that’s a big problem, because it’s a huge waste of resources/money. And it’s really not sophisticated, because I can just copy/paste my PostreSQL database too, making the copy read-only, and syncing all writes (I know dgraph is better & smarter than that, it’s just an example because I am a noob and don’t know how to explain better).
Request 1: We need to be able to selectively replicate things, basically sharding data on our own. (this request is the most important because we need that feature now immediately with the release of Lerner Nodes) (read the example pls for more details)
Request 2: Learner Nodes shall in future do Request 1 on their own. Learner Nodes shall shard data on their own (with LEARNING where the data is used the most times) and move it from Node to Node on their own. This is the future of Cloud Computing. The Cloud should move, replicate, & shard: data on its own. Cloudflare is doing that already NOW with their Worker Durable Object product. We should implement that with dgraph too. This is of course a complex feature for the future. (technical implementation explained after the example)
Example:
e.g I have a reddit clone with the types: User, Post(properties: author, text, title, subreddit, continent-of-subreddit), Subreddits (properties: name, members, continent(if there is any defined by subreddit mods), description, founding-date), Privat-Messages, Comments(properties: author, text, subreddit, continent-of-subreddit).
I have e.g: 3 AWS regions: USA(master node), Europe(Learner Node), Singapore(Learner Node)
-
Use Case
E.g: People in singapore don’t use much the privat-messages capability of my reddit clone (because they use e.g LINE WeChat etc). So I want that the Singapore Learner Node does replicate everything except the whole Privat-Messages type (to safe resources/money). -
Use Case
E.g: I have 3 subreddits with high volume: r/Taiwan, r/Japan, r/Tokyo. the mods of this subreddits have set the continent property to ‘Asia’.
I want to replicate the subreddits, their posts & comments, only on the Singapore Learner Node (because it would be a waste of resources/money to replicate it in europe).
Technical Implementation (just my noob thoughts):
first feature: well, the first use case is obvious, if we say we don’t need e.g type subreddits just don’t replicate them. that’s it
for the second use case it is i think the same, i really don’t know how dgraph works behind the hood. but we really need this first feature
second feature: maybe getting some inspiration from cloudflare, like they are doing with their durable objects. they check where it is frequently used and move it to this specific location.
here some links:
here an interesting comment about serverless:
So how can we apply the serverless philosophy to state? Just like serverless compute is about splitting compute into fine-grained pieces, serverless state is about splitting state into fine-grained pieces. Again, we seek to find a unit of state that corresponds to logical units in our application. The logical unit of state in an application is not a “table” or a “collection” or a “graph”. Instead, it depends on the application. The logical unit of state in a chat app is a chat room. The logical unit of state in an online spreadsheet editor is a spreadsheet. The logical unit of state in an online storefront is a shopping cart. By making the physical unit of storage provided by the storage layer match the logical unit of state inherent in the application, we can allow the underlying storage provider (Cloudflare) to take responsibility for a wide array of logistical concerns that previously fell on the developer, including scalability and regionality.