I imported 50 billion rdf into dgraph in 15 hours

If there is anyone want to know how, I’ll write an article about that.

10 Likes

Yeah, that’d be really great.

That would be great. Thanks!

Please write the article and we can help with copy editing and proofreading if needed! We’d love to see an article like this!

That would be great.
@ibrahim @ashishgoswami @dmai And you guys should really have a look at this pr(Optimize memory usage of bulkloader by xiangzhao632 · Pull Request #5525 · dgraph-io/dgraph · GitHub) by @xiangzhao632 , that’s a good solution to bulkload OOM.
The new bulkload with partionkey can still OOM and too complex.

@JimWen, we are still discussing above PR internally. partitionkey based bulkloader changes were done to improve performance of bulk loader, where one of our user was seeing too much time taken by bulk loader to insert their data.
We are trying to find some middle ground here in terms of memory and throughput.

Actually, in my test, partitionkey based bulkloader is even more slower than v1.11 where xiangzhao632’s PR is faster.

1 Like

Are you able to query anything with this amount of data? Like any query that use @filer or something?

A little slower with 30 billion data, but still ok , and it’s up to your schema/machine/index etc…