About bulk load failed with 10000-thread limit

I encountered such exception

REDUCE 16h10m38s [49.63%] edge_count:10.86G edge_speed:372.7k/sec plist_count:2.717G plist_speed:93.27k/sec
runtime: program exceeds 10000-thread limit
fatal error: thread exhaustion

runtime stack:
runtime.throw(0x1312620, 0x11)
        /usr/local/go/src/runtime/panic.go:605 +0x95
        /usr/local/go/src/runtime/proc.go:525 +0xa4
        /usr/local/go/src/runtime/proc.go:545 +0x9f
runtime.allocm(0xc420407900, 0x0, 0xca00000000)
        /usr/local/go/src/runtime/proc.go:1344 +0x99
runtime.newm(0x0, 0xc420407900)
        /usr/local/go/src/runtime/proc.go:1637 +0x39
runtime.startm(0xc420407900, 0x1a84300)
        /usr/local/go/src/runtime/proc.go:1728 +0x13f
        /usr/local/go/src/runtime/proc.go:1755 +0x55
runtime.retake(0x1ba4f743bcbce6, 0xd0000002e)
        /usr/local/go/src/runtime/proc.go:3985 +0x135
        /usr/local/go/src/runtime/proc.go:3913 +0x1fe
        /usr/local/go/src/runtime/proc.go:1182 +0x11e
        /usr/local/go/src/runtime/proc.go:1152 +0x64

goroutine 1 [runnable]:
        /home/pawan/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:38 +0x1e9
        /home/pawan/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:294 +0x12c
        /home/pawan/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:163 +0xa7f
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.init.0.func1(0xc4200c2fc0, 0xc42012cb00, 0x0, 0x10)
        /home/pawan/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:44 +0x52
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).execute(0xc4200c2fc0, 0xc42012c900, 0x10, 0x10, 0xc4200c2fc0, 0xc42012c900)

why there could more than 10000 threads?
No SSD for bulk load. And because the ref is as big as 461G, it took more than 16hours.

Thank you very much.

@chen Can you please share goroutine dump, Unless goroutine gets blocked these many threads shouldn’t be created.

seems I missed grouting dump.

I now tried with 1/4 size of origin big data rdf.
I only have 3 servers, so during bulk load, the parameter is --reduce_shards 3 --shufflers 3 --map_shards 3

Actually I do not know if the reduce_shards can be as large as 40, and how to start server if there are 40 shards?
Under my idea, I will try to scp 3 shards to different machine ,and start the server one by one to make a cluster of 3 server

Please help me out.

Your HDD is slow in terms of disk seeks. That’s why Go is hitting this limit. You can do a couple of things:

  1. Increase the number of threads to say 10K. In bulk loader code main function, you can add this.


  1. Modify the bulk loader codebase to set a higher ValueLogThreshold, say 2048 bytes here. So, that most of your values are stored in the LSM tree, and not in value log.
  1. You could

