Hi this comment gives me shivers. I thought I could use dgraph worry free and scale without any problem horizontally, even with 500 million new nodes or predicates everyday.
Maybe I understood your comment wrong @seanlaff ? Or is it really true that dgraph can scale with high volume, only vertically? That would be really a pain in the heart. do you mean with incoming data, mutations? What I assume with reading your comment is: You have basically too many tweets, and therefore dgraph can’t scale that horizontally anymore; only vertically? is that right? that would be horrible. Can you maybe elaborate further what happened?
I think I misunderstand something, please enlighten me guys/gals I thought dgraph scales performant horizontally, always, no matter if I have 500 million new tweets every day or 500 billion, and i don’t have to worry
If you get that big, you can afford to hire engineers to build a complex database system. In reality, that is crazy. Twitter uses and intricate network of sharding, caching (front, back, middleware— redis, memcache… nosql, sql)…
So I should use Firestore Dgraph PostgreSQL Redis together?
but the problem is not about splitting about different functionalities/nodes
the problem is if single nodes / predicates, grow too big, how to handle them. as we saw now dgraph unfortunately fails with 100 Million total entries… What shall I do when that happens?
Leave dgraph, use Cassandra (which can scale that high) and built my own query logic, indexes, badger/redis caching layer, etc on top of it? I would end up building a second dgraph (._.) The goal of dgraph was to scale endlessly even if you have 5 billion nodes, or not? I thought dgraph is THE database that grows with you even if you end up becoming a big player like discord instagram twitter facebook reddit
For me, this issue with 100 million entries is a “no issue”. You can force the balancing to stop. We could give more control to the user on that. But you can simply give an infinity number to the balancer and it will never balance it again. And “problem solved”. So, that’s not a problem of scaling. It is unwanted automated behavior.
Dgraph is capable of infinitely growing horizontally. (actually up to the limit of uint64). I think Dgraph doesn’t scale that well vertically, maybe that’s just my impression. But horizontally it is very capable.
The sharding of predicates is another story. And that is complicated.
We have/had(?) customers with close to Terabyte data using Dgraph(and +Terabyte in BadgerDB) and in the beginning there was some issues. But all solved as far as I know.
Not sure where we affirm that. But we say that Dgraph is potentially capable. But very hard to prove without someone creating such a big platform and push Dgraph to the limits. All we can do is theorize.
Dgraph is ready to be a Terabyte Graph DB in my opinion. But need someone to do it. Generic tests will not do it.
I’ll reply here instead of discord since you posted it here.
As @jdgamble555 stated, if you were really that big, you would own your own very customized Dgraph database or something different by then, and you would have your own very good engineers work on this problem. Don’t worry about handling the load that Twitter is handling, but be ready for handle the load that your app will handle. Consider some apps that really took off recently and then crashed because they could not scale. The problem was that they did not think about these issues and were not prepared. With Dgraph, you are prepared for the issue and can scale very very large horizontally and can even scale up vertically with each server having terrabytes of disk space to handle these large shards (Don’t think most people realize how much text a single shard would be if it was actually close to 1Tb.)
IMO most startups could very easily be handled in Dgraph Cloud shared instance with a limit of 25Gb. I thought I had a few Gb, but just looked today and have around .5Gb! And that still is a lot of data over many many many predicates.
I think you have many different areas where you can push Dgraph to its limits before you ever worry about needing to scale a single predicate down further. Not saying it is not something that should eventually figured out by the team, but there are other features to get right first IMO.
You referenced the other topic and I think the main discussion there was around moving large predicates constantly between cluster nodes. And moving something based on “hotness” in context, I understood to mean, if a predicate is read and mutated often, then don’t move it very often if at all, move other predicate shards first that are just sitting around collecting dust.
The other aspect of this conversation is how to move shards to where they are closest to where they are normally queried the most. Sometimes, it may actually be wise to break up predicates of the same type across different server nodes even. This would allow 2 servers to simultaneously filter on two separate types and indexes more efficiently at the same time and then combine their results in the response back.
In other words, get to building and Dgraph will be there behind you as you grow. And then when you are making $$$ hire some good devs to work with Dgraph in figuring out how to split shards into smaller peices that can be sharded
internal rebalancer is not good enough and will constantly flap inordinately large predicates back and forth - just a fact and a design flaw with only using size as a variable. Whatever, we can disable it.
a single predicate must be stored on a single alpha group, no exception. So if it gets too big, you cannot horizontally scale to help. Cassandra/Scylla fixed a similar issue my making ‘virtual tablets’ that are the sharing element that represent ranges of their original storage unit ‘tablets’.
Edit to expand:
No, dgraph cannot back Twitter, but no off the shelf DB could. @seanlaff and I work together and our production dgraph has 16 billion edges in it. To support ingestion to it we run 5 groups (15 alpha servers) with a total of 250 cpus and 480GiB ram.
This is not a typical setup. I would say most apps will never reach this size, I would not worry about it so much. However it is something we have to worry about, and would love some work towards.
To upgrade dgraph between major versions, you have to dump 100% of the data and rebuild it from text files. That right there would disqualify it from backing all of Twitter!
At this point to clean up the database you have to manually delete anything you want to. This sounds fine but if you want to delete 1 billion entries at once, you are going to take a long time to do that. Some other nosql databases support lifecycle rules for self cleanup, that would be a start there.
Thank you very much Jonathan Michel Anthony Kenan!!!
but by disabling it, he won’t shard anymore, or? do you then shard by yourself or how? Or do you enable it once a day for an hour?
And how did you solve that issue (i am really really interested about that)? Did you create virtual shards like BigTable/Cassandra create virtual tables? Or did you move away from dgraph?
Are there some schema architecture measures I can take already now to countermeasure against that problem? Maybe e.g If I have a Twitter app, I should split up tweets between english russian indian users? <0x45> <tweettext> "I like pizza" . and for russian tweets <0x45> <tweettext_ru> "мне нравится пицца" . But honestly I don’t feel that that’s the right way to go…
That’s really a pain in the heart that discourages me to use dgraph, but I think other databases like PostgreSQL have the same problems, and using BigTable/Cassandra with custom built logic is also just a reinventment of the wheel and a hard difficult task… But Twitter is Twitter, why would that be a disqualification? Wouldn’t they be able to migrate that tons of data over? Before migrating, starting to record mutations, then migrate the terrabytes of data over, and then quickly do the mutations, and start using the new database with the new dgraph version?
BTW one more question: Did the dgraph team support you with your problems? or did you had to solve everything yourself? If I ever get to the trouble and won’t have these specific engineers yet (because there are not that much dgraph engineers out there, and the ones that are out there already have a project they are working on, no matter if as subcontractors/self-employed or employees), can I pay the dgraph team and they will solve everything?
did they solve everything themselves, or with you together? if I will ever have scaling troubles (like Kenan had), will you guys help me if I pay you? Did you solve Kenans problems also with him together? What are your thoughts on the migration problems? what about the too-many-predicate problem
Would that also solve that ‘too-much-predicates’ problem?
but dgraph needs problems like that solved to gain big $$$ customers or not?
Depends on who you ask and who is their target audience. I would put all my money right now on the fact that Twitter is not who Dgraph is targeting. Maybe the next Twitter-like startup, but not twitter as it sits today.
The shard happens by design. The shard and the balancing are two different things. A predicate is Dgraph’s atomic unit, and a shard is basically a predicate distributed to groups. Your objects(aka nodes, types, entities) are sharded across the groups - No group would have an entire “Node”, a query gather the pieces across the cluster. See more at https://dgraph.io/docs/deploy/cluster-setup/#sharding If you disable the balancing it will “freeze” the predicate in a group forever. So we need a way to control it manually, which isn’t possible right now.
Dgraph supports languages, but IMHO I believe having a predicate for each lang gives you more performance. Due the sharding nature of Dgraph.
You don’t need a Dgraph engineer. Any Go lang engineer with distributed system experience is what you need.
All issues were related to customers, which means a contract, which means enterprise support. So yes, Dgraph’s engineers were there to support them.
If you have an EE contract Dgraph will support you. The big thing of having am EE contract is that you gonna have Dgraph’s engineers covering you in the Cluster context. And some fancy features.
I’m just the open-source voice here. I’m here to help the open-source community within the limits of my knowledge. I just route some big issues above me. I’m not totally aware of Kenan’s end. I only handed over his issue to the big guys.
Which one specifically?
Please, give me more details about those too.
You have a lot of problems to talk about. Are you sure about them?
Dgraph is a pretty solid DB by now. It was a kind of problematic 2 years ago. Now I think most of the issues are related to bad usage from the user. Thing that can be fixed with more experience.
I see that people has a lot of issues, misconcepts and fears with GraphQL implementation. Which will be addressed soon. But in general Dgraph is solid. Especially for small startups. And Manish is the right guy with the right vision about this product. He just need time and more resources. Also, believers. The type of believers who used Dgraph and felt the same as me about this DB. When I used Dgraph for the first time I didn’t saw anything special. I was about to pick Redis Graph solution(2017 it was a kind of an addon in the Redis DB). And a week more of study I had blow my mind. And a year after I have decided that this Graph DB was The ultimate database for me.
what’s in your opinion the best way to how to solve that too-many-predicates problem? As the others said a custom solution? or do you think dgraph will tackle that problem and be able to solve that with an own virtual table/dgraph feature?
I totally agree with you!! I also decided to use it now into my apps. - I am just interested/keen about that problems that happen when growing big, sorry about the annoying questions bro ._.
If I feel I need a rebalancing (because too many predicates within a group??) I basically just turn on balancing for an hour (once a month)? is that correct?
The rebalancer just runs /moveTablet every 8m moving the biggest predicate by size to the smallest group by size. You can run /moveTablet whenever you want yourself.
Did you create virtual shards like BigTable/Cassandra create virtual tables? Or did you move away from dgraph?
We did not solve it really, we have some inordinately large predicates like name and xid that every tenant needs, and we run into some issues processing them sometimes. Like crazy long index rebuilds or moves that take hours. If a tablet over 50GB were split over two groups and processed in parallel that may solve that - as a future feature maybe. I saw this on a roadmap at some point but I do not see it now.