We have two types of data. One type is data from SQL tables which can be linked by foreign keys. The other type is key-value data from Kafka streaming. The first thing is to load test data from relational database and Kafka. Data from relational database can be loaded in batch. However, data from kafka should be streaming.
I am seeking for the best practice to load data from the relational database and kafka.
- What steps should I take to load data from the relational database?
1.1. What conversion should I take?
a) Export table as csv format from SQL
b) Convert csv to RDF
1.2 In the begining, all data in relational database can be loaded to Dgraph. Then everyday new data (entries) are loaded to Dgraph. I always need to check whether the node with a primary key already exists.
- What steps should I take to stream data from the kafka?
2.1 Use Go-kafka library to read key-value pairs
2.2 Insert (what is best format to use ???)
Simple illustration is shown in the Figure.