Best way to setup dgraph live loader

darshan-manchekar · December 22, 2022, 11:37am

We are looking for best architecture / solution to be able to load 100k nodes and edges, every hour; while we have around 50k tps of read call from the golang api serving as middleware

Currently we have 2 golang API, using dgraph client. One API runs CRUD operation on dgraph by reading from our data warehouse system, almost 10-100 nodes edges per hour (aka extractor api). Second API serve and perform read operations at 10k tps (aka serving api). We are trying to introduce some more type of datasets almost 100k nodes and edges now per hour; but dgraph client is taking more than 1 hour to load and impacting our serving api by increasing its response time or sometimes crashing the lead alpha node

We found option of dgraph live loader; but we couldn’t able to fit in our extractor api written in golang, hence looking for best practice solution for hourly execution of dgraph live loader via an api which can read and transform data from other data source (in our case sql warehouse) and right now we are exploring option of python api as well, since majority of our team is with data science / python background

MichelDiz · December 22, 2022, 4:49pm

One potential solution to this issue could be to add more Alpha nodes (and therefore more shards) to your Dgraph cluster. This will allow you to distribute the workload of loading and querying the data more evenly across the cluster, and should help to improve performance. You may also want to consider using a load balancer to distribute incoming requests more evenly across the Alphas.

Live Loader, is a Go sub-application designed to help with data loading and balancing. If you are unable to use the Live Loader as-is, you may want to consider replicating its design in another language, such as Python, as your team is more familiar with that language.

It is also worth considering the resources available to each Alpha node. If the Alphas are running on isolated bare metal machines with dedicated resources such as RAM and SSD storage, this may help to improve performance.

Adding more Alphas, using a load balancer, and ensuring that each Alpha has sufficient resources may all help to improve the performance of your system.

darshan-manchekar · December 23, 2022, 11:24am

This will allow you to distribute the workload of loading and querying the data more evenly across the cluster, and should help to improve performance. You may also want to consider using a load balancer to distribute incoming requests more evenly across the Alphas

Yes, its helping but data loading still slow withcurrent golang extractor api

you may want to consider replicating its design in another language, such as Python, as your team is more familiar with that language
Yes; but understanding and replicating logic we found lot time consuming, mean time…

We are exploring options how can a python api remotely call dgraph live loader in automated fashion. Is it possible on same python api, we can clone dgraph code and run dgraph live as command pointing to external dgraph and load the data ?

Topic		Replies	Views
Realtime streaming graph data Dgraph	4	1137	December 20, 2019
Loading close to 1M edges/sec into Dgraph - Dgraph Blog Blog	3	1462	November 15, 2018
Dgraph Enhancement Proposal: bulk + live loader? Dgraph	2	602	August 9, 2019
Can DGraph handle inserting 3 million nodes per day? What would I need to worry about? Dgraph	5	615	June 6, 2019
Multiple alphas and read/write throughput Dgraph	5	637	February 9, 2020

Best way to setup dgraph live loader

Related topics