chen
(hong)
February 28, 2018, 7:24am
1
Hello,
I encountered such exception
REDUCE 16h10m38s [49.63%] edge_count:10.86G edge_speed:372.7k/sec plist_count:2.717G plist_speed:93.27k/sec
runtime: program exceeds 10000-thread limit
fatal error: thread exhaustion
runtime stack:
runtime.throw(0x1312620, 0x11)
/usr/local/go/src/runtime/panic.go:605 +0x95
runtime.checkmcount()
/usr/local/go/src/runtime/proc.go:525 +0xa4
runtime.mcommoninit(0xc44e512380)
/usr/local/go/src/runtime/proc.go:545 +0x9f
runtime.allocm(0xc420407900, 0x0, 0xca00000000)
/usr/local/go/src/runtime/proc.go:1344 +0x99
runtime.newm(0x0, 0xc420407900)
/usr/local/go/src/runtime/proc.go:1637 +0x39
runtime.startm(0xc420407900, 0x1a84300)
/usr/local/go/src/runtime/proc.go:1728 +0x13f
runtime.handoffp(0xc420407900)
/usr/local/go/src/runtime/proc.go:1755 +0x55
runtime.retake(0x1ba4f743bcbce6, 0xd0000002e)
/usr/local/go/src/runtime/proc.go:3985 +0x135
runtime.sysmon()
/usr/local/go/src/runtime/proc.go:3913 +0x1fe
runtime.mstart2()
/usr/local/go/src/runtime/proc.go:1182 +0x11e
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1152 +0x64
goroutine 1 [runnable]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run(0xc59b1d0340)
/home/pawan/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:38 +0x1e9
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).reduceStage(0xc4203d8ab0)
/home/pawan/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:294 +0x12c
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.run()
/home/pawan/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:163 +0xa7f
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.init.0.func1(0xc4200c2fc0, 0xc42012cb00, 0x0, 0x10)
/home/pawan/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:44 +0x52
github.com/dgraph-io/dgraph/vendor/github.com/spf13/cobra.(*Command).execute(0xc4200c2fc0, 0xc42012c900, 0x10, 0x10, 0xc4200c2fc0, 0xc42012c900)
why there could more than 10000 threads?
No SSD for bulk load. And because the ref is as big as 461G, it took more than 16hours.
Thank you very much.
janardhan
(janardhan reddy)
February 28, 2018, 10:01am
2
@chen Can you please share goroutine dump, Unless goroutine gets blocked these many threads shouldn’t be created.
chen
(hong)
February 28, 2018, 1:46pm
3
seems I missed grouting dump.
I now tried with 1/4 size of origin big data rdf.
I only have 3 servers, so during bulk load, the parameter is --reduce_shards 3 --shufflers 3 --map_shards 3
Actually I do not know if the reduce_shards can be as large as 40, and how to start server if there are 40 shards?
Under my idea, I will try to scp 3 shards to different machine ,and start the server one by one to make a cluster of 3 server
Please help me out.
mrjn
(Manish R Jain)
February 28, 2018, 10:23pm
5
Your HDD is slow in terms of disk seeks. That’s why Go is hitting this limit. You can do a couple of things:
Increase the number of threads to say 10K. In bulk loader code main function, you can add this.
https://golang.org/pkg/runtime/debug/#SetMaxThreads
Modify the bulk loader codebase to set a higher ValueLogThreshold, say 2048 bytes here. So, that most of your values are stored in the LSM tree, and not in value log.
s.shufflePostings(shuffleInputChs, ci)
ci.wait()
thr.Done()
}(i, s.createBadger(i))
}
thr.Wait()
close(s.output)
}
func (s *shuffler) createBadger(i int) *badger.ManagedDB {
opt := badger.DefaultOptions
opt.SyncWrites = false
opt.TableLoadingMode = bo.MemoryMap
opt.Dir = s.opt.shardOutputDirs[i]
opt.ValueDir = opt.Dir
db, err := badger.OpenManaged(opt)
x.Check(err)
s.dbs = append(s.dbs, db)
return db
}
// How should value log be accessed
ValueLogLoadingMode options.FileLoadingMode
// 3. Flags that user might want to review
// ----------------------------------------
// The following affect all levels of LSM tree.
MaxTableSize int64 // Each table (or file) is at most this size.
LevelSizeMultiplier int // Equals SizeOf(Li+1)/SizeOf(Li).
MaxLevels int // Maximum number of levels of compaction.
// If value size >= this threshold, only store value offsets in tree.
ValueThreshold int
// Maximum number of tables to keep in memory, before stalling.
NumMemtables int
// The following affect how we handle LSM tree L0.
// Maximum number of Level 0 tables before we start compacting.
NumLevelZeroTables int
// If we hit this number of Level 0 tables, we will stall until L0 is
// compacted away.
NumLevelZeroTablesStall int
You could
system
(system)
Closed
March 30, 2018, 10:23pm
6
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.