Realtime streaming graph data

Willem520 · December 19, 2019, 8:17am

I have a need that there is one billion graph data per day to import into dgraph. it is impossible to use mutate or live loader. anybody have any idea?

MichelDiz · December 19, 2019, 6:07pm

Not sure, this is equivalent to 695k inserts per second. It is feasible, but I never experienced that. If you have a huge cluster well planned it could go further I think. But I think it would be hard to accomplish that in a running cluster. You can do it in Bulk loader, but not easy to do in Live or clients (which are the same thing).

You will need a lot of resources (I think - As I have no experience in giant clusters I can not talk much. But maybe someone has). And I would recommend for a portion of Alphas Groups just for “input” and another portion for “output”. And “balancing load” between these input instances.

Do you know any other DB that can do 695k+ per second?

Willem520 · December 20, 2019, 3:09am

Sorry,I have no idea. my project consume data from kafka, and 2000 per batch to upsert. in the log,I saw sometime it cost 500+ms，sometime it cost 30s, I set timeout(30s) in my client.so it means the data fail to upsert.Maybe dgraph could not work well at realtime with mutate or upsert？

MichelDiz · December 20, 2019, 4:21am

Why you sorry? no need. I’m here to help with what I can.

This value seems reasonable. But you need to test out.

kafka or Dgraph?

Humm, seems Dgraph - But what is the context? There is no way of stating/affirm without understanding what configuration you are using. How you are allocating resources. Which approach you taking and so on.

Take a “divide to conquer” approach. Balance the load, understand how the sharding system works and build a cluster focused on what you want to do.

Share what is your context and lets talk.

Willem520 · December 20, 2019, 4:38am

my cluster has 3 zero 9 alpha and 3 replica.
the zero config is default.
the alpha config set lru_mb 20480,pending_proposals 4096

the log is my project log,and I printed the time. I found when dgraph do rolling up and write flushed.the upsert would be blocked.
this is alpha log

the upsert would be blocked during the Writes flushed and the Resuming writes.

Topic		Replies	Views
Realtime streaming writing performence is so bad Dgraph kind:question	5	437	October 26, 2020
Support Batch Upserts in Live Loader Dgraph dgraph , kind:enhancement , status:accepted , area:upsert , area:live-loader	8	1016	October 12, 2022
Mutate performance optimization Dgraph	10	1487	December 6, 2019
How to commit transactions as batch? Dgraph Clients untagged , pydgraph	12	1020	July 8, 2020
Build Kafka Connector for Dgraph in Live and Bulk Loader Dgraph dgraph , area:integrations , kind:feature , popular , status:needs-specs	10	1221	January 16, 2020

Realtime streaming graph data

Related topics