We are looking for best architecture / solution to be able to load 100k nodes and edges, every hour; while we have around 50k tps of read call from the golang api serving as middleware
Currently we have 2 golang API, using dgraph client. One API runs CRUD operation on dgraph by reading from our data warehouse system, almost 10-100 nodes edges per hour (aka extractor api). Second API serve and perform read operations at 10k tps (aka serving api). We are trying to introduce some more type of datasets almost 100k nodes and edges now per hour; but dgraph client is taking more than 1 hour to load and impacting our serving api by increasing its response time or sometimes crashing the lead alpha node
We found option of dgraph live loader; but we couldn’t able to fit in our extractor api written in golang, hence looking for best practice solution for hourly execution of dgraph live loader via an api which can read and transform data from other data source (in our case sql warehouse) and right now we are exploring option of python api as well, since majority of our team is with data science / python background
One potential solution to this issue could be to add more Alpha nodes (and therefore more shards) to your Dgraph cluster. This will allow you to distribute the workload of loading and querying the data more evenly across the cluster, and should help to improve performance. You may also want to consider using a load balancer to distribute incoming requests more evenly across the Alphas.
Live Loader, is a Go sub-application designed to help with data loading and balancing. If you are unable to use the Live Loader as-is, you may want to consider replicating its design in another language, such as Python, as your team is more familiar with that language.
It is also worth considering the resources available to each Alpha node. If the Alphas are running on isolated bare metal machines with dedicated resources such as RAM and SSD storage, this may help to improve performance.
Adding more Alphas, using a load balancer, and ensuring that each Alpha has sufficient resources may all help to improve the performance of your system.
This will allow you to distribute the workload of loading and querying the data more evenly across the cluster, and should help to improve performance. You may also want to consider using a load balancer to distribute incoming requests more evenly across the Alphas
Yes, its helping but data loading still slow withcurrent golang extractor api
you may want to consider replicating its design in another language, such as Python, as your team is more familiar with that language
Yes; but understanding and replicating logic we found lot time consuming, mean time…
We are exploring options how can a python api remotely call dgraph live loader in automated fashion. Is it possible on same python api, we can clone dgraph code and run dgraph live as command pointing to external dgraph and load the data ?