Build Kafka Connector for Dgraph in Live and Bulk Loader

diggy · September 11, 2019, 1:10pm

Moved from GitHub dgraph/3967

Posted by mangalaman93:

This will allow loading data directly from Kafka.

diggy · September 12, 2019, 9:16am

Willem520 commented :

great idea，I hoop Dgraph can has a close connection with process engine(eg. Flink, Spark) in the near future

diggy · September 13, 2019, 2:43pm

campoy commented :

Hey @Willem520,

Could you tell us more about what you would expect from these integrations with Flink or Spark?

diggy · September 15, 2019, 7:07am

Willem520 commented :

Hey @Willem520,

Could you tell us more about what you would expect from these integrations with Flink or Spark?

hello, in my project,I want to use Flink or Sparking streaming to process Rdf or json data in realtime, and I need to transport history data from other graph database(eg janusGraph) to Dgraph.but I found, when I used the spark and dgraph4j to process large dataset(eg 5 million node),it was always failed, and sometime, there was breakdown in alpha.

diggy · September 17, 2019, 3:48pm

campoy commented :

I’m sorry but I’m going to need more information on what you were actually building and how it failed.

If I understand correctly, you’re processing a stream of events in RDF or JSON format?
Or is it a batch analysis with 5 million nodes?

What exact API would you like us to provide to integrate with Spark or Flink?

diggy · September 19, 2019, 2:56am

Willem520 commented :

hi,I used Spark to load 5 million node into memory and used 100 partitions to process data, in each partition, I build 2000 node with JSON format into a mutation,an used dgraph4j client to execute txn.mutate. when I run the program,it was failed, and got the error message

if I used a small dataset(eg 500000 node) in the same program, it was successed.

diggy · September 19, 2019, 7:14am

mangalaman93 commented :

How many cores are you providing to each executor? How many executors are you running concurrently? You could try reducing the size of each transaction so that it finishes quickly and total number of pending transactions are fewer.

diggy · September 23, 2019, 3:19am

Willem520 commented :

I used 4executor-cores,5num-executors.I needed to import at least 100 million data to Dgraph

diggy · October 3, 2019, 6:26am

AshNL commented :

Not directly related to Dgraph, but Neo4j just announced a new product which will tightly integrate Neo4j with Kafka. I feel like this is a feature which might greatly impact DB choice for (new) projects. Neo4j Gets Hooks Into Kafka

diggy · January 16, 2020, 11:12am

marvin-hansen commented :

@AshNL Have you ever used neo4j in your entire life?

We did for ~3 months and actually migrating everything away from it to save our sanity and company. I cannot remember any other database that was causing more operational problems, more concurrency issues, and consistently terrible performance. The most mind-boggling thing is, the company indeed listens to all reported problems, but they never fixed anything…

Meanwhile, we run the most mission-critical stuff on Postgres. We de-normalized those few tables to operate entirely join-free to sustain very high performance.

With DGraph, there are a few rough edges because its relatively new, but for the most part, when it runs, it just runs.

For the aforementioned Kafka connector, there are tutorials out of how to write one. I think implementing the connector with a queue and proper batch-writing should do the trick.

diggy · January 16, 2020, 12:41pm

AshNL commented :

No need to start biting. I’m sorry I’m not as experienced as you are. In the meantime I have indeed written my own connector.

Topic		Replies	Views
Load data to dgraph with kafka Dgraph	3	1521	September 11, 2019
Flink / Spark connector to Dgraph Users	2	2019	November 28, 2017
Spark Connector for dgraph Dgraph	27	3593	July 26, 2020
Dgraph live/ bulk loader golang clinet Dgraph	1	421	February 14, 2023
Dgraph Live loader In Spark Job Dgraph kind:question , dgraph	6	462	January 8, 2021

Build Kafka Connector for Dgraph in Live and Bulk Loader

Related topics