I imported 50 billion rdf into dgraph in 15 hours

xiangzhao632 · June 4, 2020, 2:19pm

If there is anyone want to know how, I’ll write an article about that.

Neeraj · June 4, 2020, 2:21pm

Yeah, that’d be really great.

dereksfoster99 · June 4, 2020, 2:51pm

That would be great. Thanks!

zhenni · June 5, 2020, 4:23am

Please write the article and we can help with copy editing and proofreading if needed! We’d love to see an article like this!

JimWen · June 5, 2020, 7:41am

That would be great.
@ibrahim @ashishgoswami @dmai And you guys should really have a look at this pr(Optimize memory usage of bulkloader by xiangzhao632 · Pull Request #5525 · dgraph-io/dgraph · GitHub) by @xiangzhao632 , that’s a good solution to bulkload OOM.
The new bulkload with partionkey can still OOM and too complex.

ashishgoswami · June 5, 2020, 2:25pm

@JimWen, we are still discussing above PR internally. partitionkey based bulkloader changes were done to improve performance of bulk loader, where one of our user was seeing too much time taken by bulk loader to insert their data.
We are trying to find some middle ground here in terms of memory and throughput.

JimWen · June 5, 2020, 4:02pm

Actually, in my test, partitionkey based bulkloader is even more slower than v1.11 where xiangzhao632’s PR is faster.

pjolep · June 5, 2020, 5:05pm

Are you able to query anything with this amount of data? Like any query that use @filer or something?

JimWen · June 6, 2020, 10:56am

A little slower with 30 billion data, but still ok , and it’s up to your schema/machine/index etc…

Topic		Replies	Views
FYI: Running on RAM Users	2	869	November 28, 2017
Loading close to 1M edges/sec into Dgraph - Dgraph Blog Blog	3	1463	November 15, 2018
Bulk loader becomes slow when memory gets full Users	20	2169	December 17, 2017
Data Ingestion very slow Users	6	1083	October 25, 2018
Bulk loader still OOM during reduce phase Dgraph area:bulk-loader	18	871	August 1, 2021

I imported 50 billion rdf into dgraph in 15 hours

Related topics