Best Practices of Streaming into Dgraph

anand · August 14, 2020, 11:16am

Here are some suggestions:

Conflict Exceptions and Retry Loops

Given enough retries and spacing, a transaction which is facing conflict exception (>1 transaction modifying the same node in Dgraph), will eventually succeed. This means that a transaction could be retrying in a streaming listener for a while before succeeding. It is important to configure this aspect for your transaction profile appropriately and reduce unnecessary rejections / drops due to conflict exceptions.

Kafka Partitioning

Ensure that the workload profile across partition is similar. If majority of data ends up in fewer partitions, JVM based listeners could lose valuable time garbage collecting leading to rebalances/crashes.

Support for Adaptive behavior

Dgraph publishes metrics on the alpha. In unpredictable environments, you can leverage this information to perform some adaptive behavior, such as graceful shutdown or any other kind of signalling in the stream topology.

kbliefernich · August 17, 2020, 6:19pm

Things that my team has noticed while streaming data into Dgraph.

Batching up nquads to about 1000 gives a good balance of performance and not hammering dgraph too much.
Using a timeout interceptor at 2 seconds gives a decent job of keeping messages moving. It also gives an early indicator of when Dgraph is starting to have trouble.
Putting retry logic off of the kafka message processing thread is a good idea if you want to keep message processing up.
Using the uid leasing is a good way handle the mutations if you want to keep track of the UIDs.

Doing this with a single Kstream processor inserting into a 3 node replica cluster with no partitioning we see good through put up to about 650k nodes and 6ish million nquads. At that point dgraph would become unresponsive and all mutations would fail for about 15 minutes. After that time, the mutations would start succeeding again.

The dgraph clusters are 4 core 26GB memory X 3.

We have tried this with a dataset that is about 30m nodes. In total it took about 50 hours to load.

Questions that I have:

Would partitioning improve performance?
Is there an optimal transaction timeout, or just trial and error?
Would more smaller servers expect to have better throughput?
Would adding cores improve the dgraph performance?

We have used the bulk loader to load this same dataset and it takes 20-30 minutes.

anand · September 28, 2020, 10:19am

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transactions with single mutations failing Dgraph	3	705	July 30, 2020
Does dgraph support asynchrony when importing rdf into the library Users	10	451	March 24, 2023
Getting "Transaction has been aborted. Please retry." Far Too Often Users	6	2619	April 21, 2018
Realtime streaming graph data Dgraph	4	1137	December 20, 2019
Many small mutations vs one large. Best usage patterns Users kind:question	5	628	April 20, 2023

Best Practices of Streaming into Dgraph

Related topics