I figured out why this is a problem and temporarily resolve my issue by reducing the number of goroutines in loader.go#L316 to 1000. Ignore the previous message that said it didn’t work (maybe I didn’t build it correctly)
Now I can run it will -numCpus 4
and it will still not crash.
The reason this happens is that all the goroutines are blocked in a cgo
call to rocksdb. Here is a stack trace of a blocked goroutine:
goroutine 38 [runnable, locked to thread]:
github.com/dgraph-io/dgraph/vendor/github.com/tecbot/gorocksdb._Cfunc_rocksdb_get(0x5a42d30, 0x4f1fbe0, 0xc482119420, 0xf, 0xc482119470, 0xc48209ecc8, 0x0)
github.com/dgraph-io/dgraph/vendor/github.com/tecbot/gorocksdb/_obj/_cgo_gotypes.go:1059 +0x4e
github.com/dgraph-io/dgraph/vendor/github.com/tecbot/gorocksdb.(*DB).Get(0xc42010d420, 0xc42002c1a0, 0xc482119420, 0xf, 0x10, 0x0, 0x0, 0x0)
/Users/kostub/Work/go/src/github.com/dgraph-io/dgraph/vendor/github.com/tecbot/gorocksdb/db.go:224 +0x28e
github.com/dgraph-io/dgraph/store.(*Store).Get(0xc4201504b0, 0xc482119420, 0xf, 0x10, 0x0, 0x0, 0xc45bae6000, 0x40ef1d0, 0xc482128dd0)
/Users/kostub/Work/go/src/github.com/dgraph-io/dgraph/store/store.go:63 +0x73
github.com/dgraph-io/dgraph/posting.(*List).getPostingList(0xc482128dd0, 0x43cc530)
/Users/kostub/Work/go/src/github.com/dgraph-io/dgraph/posting/list.go:256 +0xd3
github.com/dgraph-io/dgraph/posting.(*List).init(0xc482128dd0, 0xc482119420, 0xf, 0x10, 0xc4201504b0, 0x0)
/Users/kostub/Work/go/src/github.com/dgraph-io/dgraph/posting/list.go:222 +0x142
github.com/dgraph-io/dgraph/posting.GetOrCreate(0xc482119420, 0xf, 0x10, 0xc4201504b0, 0x0)
/Users/kostub/Work/go/src/github.com/dgraph-io/dgraph/posting/lists.go:274 +0x1f3
github.com/dgraph-io/dgraph/uid.GetOrAssign(0xc47282a425, 0x9, 0x0, 0x1, 0x8573ec69b78b3600, 0x0, 0x0)
/Users/kostub/Work/go/src/github.com/dgraph-io/dgraph/uid/assigner.go:206 +0xbb
github.com/dgraph-io/dgraph/loader.(*state).assignUid(0xc420013540, 0xc47282a425, 0x9, 0x4c92563671bd5246, 0x9)
/Users/kostub/Work/go/src/github.com/dgraph-io/dgraph/loader/loader.go:197 +0x86
github.com/dgraph-io/dgraph/loader.(*state).assignUidsOnly(0xc420013540, 0xc420118ad0)
/Users/kostub/Work/go/src/github.com/dgraph-io/dgraph/loader/loader.go:233 +0x2d9
created by github.com/dgraph-io/dgraph/loader.AssignUids
/Users/kostub/Work/go/src/github.com/dgraph-io/dgraph/loader/loader.go:317 +0x27e
When a goroutine is blocked in a cgo, the go scheduler creates a new threads for the remaining goroutines. See discussion at Google Groups
A good explanation can be found at: The Cost and Complexity of Cgo
Most likely, the disk on my mac isn’t fast enough (it’s not an SSD) for the rocksdb calls to return in time which causes go to create the number of threads == number of goroutines and thus go over the thread limit.
A simple solution for the problem is to drop the max number of goroutines like I did, however this is not a good solution as we still end up creating 1000s of threads. A better way might be to avoid using cgo and either write a native go storage or use a separate C process to write to rocksdb.