Realtime streaming graph data

I have a need that there is one billion graph data per day to import into dgraph. it is impossible to use mutate or live loader. anybody have any idea?

Not sure, this is equivalent to 695k inserts per second. It is feasible, but I never experienced that. If you have a huge cluster well planned it could go further I think. But I think it would be hard to accomplish that in a running cluster. You can do it in Bulk loader, but not easy to do in Live or clients (which are the same thing).

You will need a lot of resources (I think - As I have no experience in giant clusters I can not talk much. But maybe someone has). And I would recommend for a portion of Alphas Groups just for “input” and another portion for “output”. And “balancing load” between these input instances.

Do you know any other DB that can do 695k+ per second?

Sorry,I have no idea. my project consume data from kafka, and 2000 per batch to upsert. in the log,I saw sometime it cost 500+ms,sometime it cost 30s, I set timeout(30s) in my client.so it means the data fail to upsert.Maybe dgraph could not work well at realtime with mutate or upsert?

Why you sorry? no need. I’m here to help with what I can.

This value seems reasonable. But you need to test out.

kafka or Dgraph?

Humm, seems Dgraph - But what is the context? There is no way of stating/affirm without understanding what configuration you are using. How you are allocating resources. Which approach you taking and so on.

Take a “divide to conquer” approach. Balance the load, understand how the sharding system works and build a cluster focused on what you want to do.

Share what is your context and lets talk.

my cluster has 3 zero 9 alpha and 3 replica.
the zero config is default.
the alpha config set lru_mb 20480,pending_proposals 4096

the log is my project log,and I printed the time. I found when dgraph do rolling up and write flushed.the upsert would be blocked.
this is alpha log


the upsert would be blocked during the Writes flushed and the Resuming writes.