Bulk loader crashes during reduce phase

Hi,
No error like Request size offset 18905107590 is bigger than maximum offset 4294967295 github.com/dgraph-io/badger/v2.(*valueLog).validateWrites with the PR(batch write),

However,
we got a new error like this topic : About bulk load failed with 10000-thread limit - #5 by mrjn

runtime: program exceeds 10000-thread limit

@omar
Hi , did batch the list before writing may cause go max thread limit for write too many times?

@jgoodall The following link contains dgraph master built with jemalloc (to improve memory usage). The binary needs jemalloc installed and the README.md file in the following link contains the instructions.
https://drive.google.com/drive/folders/1mZkteCjB7S-yBpjfj-TRbSTaL71UWRb0?usp=sharing

If you prefer to use docker image, you can use

docker pull jarifibrahim/dgraph:reduce-jemalloc

This docker image has jemalloc with latest dgraph binary.

@BlankRain No, that’s not expected. Does this error runtime: program exceeds 10000-thread limit show up on previous releases too? or does it show up only on the latest one?
The number of threads would also depend on how many shards you have configured. I suggest you try running the binary I’ve shared and see if that helps.

Thank you ,I add a param for setting the go max thread numbers. then reload data with a bigger number, it works.

@ibrahim – using the build with jemalloc appears to solve the out of memory issue!

If this patch included the PR that @omar described previously (Bulk loader crashes during reduce phase - #18 by omar) then I think the original panic caused by badger and the out of memory issue are both fixed, but that is assuming the PR was part of the patch @ibrahim posted (Bulk loader crashes during reduce phase - #25 by ibrahim).

@jgoodall The PR
https://github.com/dgraph-io/dgraph/pull/6312

will be part of next dgraph patch release. The patch release will fix the crash but it might not have significant memory improvments.

As for the jemalloc changes, they will be part of dgraph v20.11 release in november. v20.11 will fix the crash and also improve the memory usage.

That is great - looking forward to the next minor release!

2 Likes

@ibrahim I tested with jemalloc feature . It works well. The memory cost is stable.

2 Likes

Awesome! Thanks for testing it @BlankRain :tada:

Is this still planned for the upcoming v20.11 release?