Trying to find a good database for a starting project, I was wondering if Dgraph could be efficiently used to store user data, user’s threads, questions and answers, notifications, some interests relationship, etc.
I’m sure it would be fine for 1000 users for example but what if I have 50 000 or 2 millions users with at least 1 thread per day per user and multiple answers?
What are the read/write expectations?
Also, is there any way to subscribe for real time change notifications?
Can’t comment on exact numbers, that’s something you’d have to experiment with. But, can say that Dgraph is designed specifically for large scale use cases, with lots of data flowing in and lots of queries running at the same time. That’s why we built a distributed graph DB, instead of a single server DB.
We’ll be working on real time query streaming later, it’s in the roadmap. But, no ETA yet.
But, correct me if I’m wrong,
It uses 1 int64 id for all operations. Every query receives its unique number, every “record” has its unique id, but they’re all 1 sequence.
Example, insert node, it will receive ID 1.
Query node 2 times, the ID is now 3, insert another node and it will receive the ID 4.
At least that’s what I was observing at 0.9 and you stated that you won’t change int64 because it’s large enough, but it isn’t.
It should’ve been int128 from the start.
Sure, it’s 9,223,372,036,854,775,807 in range and uint64 is double.
But maybe not in 5 years but in 10 15 years it will be exhaused.
This and that you have to define field types globally and can’t distinguish between node types is the main reason I’m not using it. Even if it would be more convenient to use.
Query node 2 times, the ID is now 3, insert another node and it will receive the ID 4.
I think you’re mixing up different concepts. It seems like you’re talking about the uint64s being used in transactions. Because, if you’re just querying data, and not allocating UIDs, then your “ID” won’t jump by 2.
So, assuming you’re talking about transaction ids. Let’s see. Say you’re running 100 transactions per second on average for a whole year. That’d leave us with 100 txns/sec, 3600 seconds/hr, 24 hrs/day, 365 days/year.
genius> 2^64/(3600*100*24*365.0)
= 5849424173.55
It would take 5.8 Billion years for you to exhaust uint64, not 10. Are you still concerned about uint64?
Similarly, if you were allocating 100 entities per second on average, you still would have a lot of leeway.
Yes I am, and btw I’ve done the math. Take something like Facebook. How many transactions do they run each second? And how long until this value overflows? Or Google. What if you have 15 billion transactions / second?
That would be 5 billion people running 3 transactions every second, not too many per person. And how many people will be on the planet and how many on Mars in 30 years or 50?
In traditional databases (pg,mysql) where you have separate sequences per table and no “id” allocated per query (in dgraph every query, every action allocates a new Id, that’s my observation and you didn’t rebute this, only named an exception to the rule), in those databases a uint64 id sequence is not such a big problem.
But dgraph is essentially 1 big table. I’m not risking it. What if I really write something so popular that I’ll have Facebook or Google amounts of transactions? I wouldn’t want to change the underlying tech because the design didn’t consider that uint64 might overflow, if not in 10 maybe in 30 years or 100.
And back to your quote, or rather my quote of your reply, if I insert 1 dataset or whatever you call it here, mutation, does it get a new Id assigned? And does that counter increase if I query for this dataset 10 times and then insert another dataset? Are those using the same iterator?
If they are, that is the root of the problem. There need to be namespaces (aka tables), since we’re in the 64bit era of computing I understand you going with uint64, but wouldn’t a time based id generator be better? Maybe not for showing off performance numbers but for actual reliability. Namespaces would solve the “unknown type” problem and a time based Id, maybe take inspiration from bson.ObjectId, but they use 4 bytes for their unix timestamp, which is… short sighted. Sure it will take another 68 years for it to overflow again, but it’s still not future proof, they’ll still have the y2k38 problem and many use this value to extract creation date. I digress.
Yes, uint64 the way you use it works for small, medium and big projects, but not huge gigantic projects that are successful worldwide and maybe beyond in the future.
p.s. I don’t want to offend you or rain on your parade, it’s just what I think, for me it’s just not safe enough. I’m always thinking worst case scenario (or best case in terms of success).
I do not understand why so many numbers. Do you work for Google? xD
Facebook for example works with several silos with several DBs. They diversify a lot as far as I know (seeing their lectures, and that’s one motive of GraphQL existence).
If you want to get bigger “numbers”. Why not use a micro-services logic with Dgraph? This would be ++ several years for each service xD
This will also depend on how you handle the orchestration of the microsservice. But it is an option.
You share the full potential of the Dgraph in numerous services. Sure you will have what you want.