Around 50% seems to be coming from goatomic.
I will try to run it now on v0.4.2 and the commit before updating to latest goatomic maybe to see if something changed from then.
As for the Travis build not passing its because the machine has got 2 processors and when the CPU is overloaded it kills the dgraphloader. Adding numCpu=1 should take care of that.
I don’t think CPU usage kills the loader. One thing we could try is to get the memprofile before calling the GC as it would have around 4G total usage (Currently the memprofile just shows 1.5G total) and might be able to give us some more information.
The bug is not that easy to reproduce. I can run it ten times with memory profiling to confirm that it’s gotomic hash that is taking up all the memory?
Actually, if they’re responsive, we can show them our CPU and memory profiles and get them to optimize their code. Otherwise, we’ll have to write our own.
posting.NewList, which we should be using a sync.Pool for, but it’s a tricky problem that I haven’t been able to think of a good solution for in the past. I’ll think about it a bit more and work on it.
zond.gotomic, which IMO is a badly optimized library. We should either get them to optimize it, or switch away from it. Their memory consumption is horrible, also as per tests by @Wei. It’s amazing how everything comes down and stops at zond.gotomic. It’s a memory hog.
If gotomic doesn’t fix this, I’d be eager to have the next person joining Dgraph write our own lockless map implementation.
Looking at CPU profile, I am curious about Store.Get spending quite some time with x.Errorf. This is a relatively easy one to study.
// Get returns the value given a key for RocksDB.
func (s *Store) Get(key []byte) ([]byte, error) {
valSlice, err := s.db.Get(s.ropt, key)
if err != nil {
return []byte(""), x.Wrap(err)
}
if valSlice == nil {
return []byte(""), x.Errorf("E_KEY_NOT_FOUND")
}
val := valSlice.Data()
if val == nil {
return []byte(""), x.Errorf("E_KEY_NOT_FOUND")
}
return val, nil
}
x.Errorf is adding stack trace and other information. In this case, “key not found” is not really an “error” error. How about not returning any error and if the returned []byte is nil, just treat it as no hit?
Honestly, I think we can do away with stack traces instead, and find cheaper ways, like maybe wrap, or just plain old fmt.Errorf(). My 2 cents .
Or, how about a win-win solution. Have a *opt flag, which can turn on/off stack traces? That way, our optimized binary won’t have to pay the cpu/memory costs of generating long strings.
I am ok with adding a flag for this. That aside, I also think there is no need to allocate memory with []byte("") and create an error when there is no hit. It is the caller who will determine whether this is really an error.
Yeah, agreed. Returning error when RocksDB doesn’t treat it like one seems unnecessary.
Though, the other reasoning behind my advice above is that if we’re able to see the effort in a particular code path, the effect would also be present in other code paths, and affecting us negatively. So, we should be going one level deeper and fixing the root problem; and hence the flag makes sense.
Also, can you coordinate with the gotomic guys, show them the profile and get them to optimize the library? (Unless, @pawan, you want to do it. But I want one of you guys to be the decision maker about whether we should rewrite this lib.)
Another option would be to make improvements to gotomic if the author is responsive to help/accept PR’s. Though I remember Manish mentioning that there is another paper which proposes a better implementation so that might be worth looking into.
The memory overhead for zond/gotomic hash does seem concerning. (In terms of CPU, it looks good to me.) I would support maintaining our own version and customizing it as much as possible for our own use (for improved speed and reduced memory). But let’s wait to hear back from the author.
The author makes a recent change and I am going to run a memory profile to see if it helps.
Update: I ran another mem profile after the gotomic update, and its memory consumption seems worse: getBucketByIndex cumulative usage goes to 868M (vs 500+ in the past).