Need suggestions for prototyping our data migration from current environment


We have two types of data. One type is data from SQL tables which can be linked by foreign keys. The other type is key-value data from Kafka streaming. The first thing is to load test data from relational database and Kafka. Data from relational database can be loaded in batch. However, data from kafka should be streaming.

I am seeking for the best practice to load data from the relational database and kafka.

  1. What steps should I take to load data from the relational database?
    1.1. What conversion should I take?
    For example,
    a) Export table as csv format from SQL
    b) Convert csv to RDF
    c) graph-bulk-loader
    d) …
    1.2 In the begining, all data in relational database can be loaded to Dgraph. Then everyday new data (entries) are loaded to Dgraph. I always need to check whether the node with a primary key already exists.
  2. What steps should I take to stream data from the kafka?
    2.1 Use Go-kafka library to read key-value pairs
    2.2 Insert (what is best format to use ???)

Simple illustration is shown in the Figure.

The fastest way to load initial data would be using the bulk loader, you’ll have to convert your data to RDF for that.

For inserting data later, just use the Dgraph go client and use a format convenient to you (JSON or raw RDF’s). There shouldn’t be much difference in speed between the two. Try to load data concurrently for maximum performance. Another thing I would suggest is to have your predicate names partitioned by your table name. So have them as,, and so on.


How to refer foreign keys ? The tables can be linked using foreign keys? For example emails are used as the foreign key and they are also in other tables. they might have different names, i.e., id, foreign_key, etc.

Maybe you meant to have a single RDF after combining multiple tables. I do not know currently how to generate single RDF. I generated multiple RDF files, one for each table.

Linking foreign keys should be easy. You might have already figured it out. Say if you had a customer and order table where each order belongs to a customer and has customer_id as foreign key then you just link it like

_:customer_id <name> "Name" . 
_:order_id <amount> "10000" .

# The linking step
_:customer_id <order> _:order_id . 

In Graph databases the ordering is more natural, a customer has orders not the other way around (order belonging to a customer using a foreign id).

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.