From spreadsheet to online table to dgraph storage

What I want to do

I’m building a frontend where 100s of concurrent users are working with data spreadsheets at the same time.

One feature is that users can paste/upload csv data (on average 1000-10000 rows and 5 columns = many triples) into a table for some further sorting/filtering etc.

Imagine a bunch of users try to upload such data at the same time, from a JavaScript frontend.

  1. What are our options for integrating this quickly into Dgraph?

Could we shoot it directly via DQL mutations (it’s maybe 500kb – 1mb of data!) or should we for example create some kind of worker that outsources this csv-import to a Python live loader?

  1. How do we ensure consistent upload and not overloading Dgraph with concurrent inserts?

If multiple users do this at the same time, I assume we need some kind of queuing to not overload the importer? How could we devise a bullet-proof system that holds up to many concurrent csv uploads while feeling snappy/near real-time to end users?

Thanks for any thoughts and ideas.

Hello @Daniel
In my personal experience, I implemented a custom converter and in one of the entities that had more than 10 columns I was inserting 1000 row each time(without rest time) and there was no problem since we had enough resources and 3 alphas. If you want to write more you can add more alpha nodes as far as I’m concerned. Also I think dgraph cloud is more tolerant to these massive writes.

Thank you @pshaddel. Did your approach also take concurrent uploads into account?

No we did not upload anything to dgraph. If you want to write the data of the csv files into dgraph database, you need a lambda function to handle that.

1 Like