Dissecting the dgraphloader memory usage

pawan · September 12, 2016, 3:53am

master
On running the dgraphloader(loading names.gz) and doing a memory profiling, I see the following in top20.

1529.86MB of 1569.39MB total (97.48%)
Dropped 57 nodes (cum <= 7.85MB)
Showing top 20 nodes out of 56 (cum >= 34MB)
      flat  flat%   sum%        cum   cum%
  422.08MB 26.89% 26.89%   422.08MB 26.89%  github.com/dgraph-io/dgraph/posting.NewList
  232.51MB 14.82% 41.71%   232.51MB 14.82%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.newMockEntry
  212.51MB 13.54% 55.25%   450.01MB 28.67%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).getBucketByIndex
  113.50MB  7.23% 62.48%   113.50MB  7.23%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.newRealEntryWithHashCode
   69.24MB  4.41% 66.89%    69.24MB  4.41%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).grow
      66MB  4.21% 71.10%   234.13MB 14.92%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).PutIfMissing
      65MB  4.14% 75.24%       67MB  4.27%  github.com/dgraph-io/dgraph/posting.(*List).merge
   52.50MB  3.35% 78.59%    52.50MB  3.35%  github.com/dgraph-io/dgraph/posting.(*List).init
   50.50MB  3.22% 81.81%    50.50MB  3.22%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*element).add
   48.51MB  3.09% 84.90%    48.51MB  3.09%  github.com/dgraph-io/dgraph/vendor/github.com/google/flatbuffers/go.(*Builder).growByteBuffer
      45MB  2.87% 87.76%       47MB  2.99%  github.com/dgraph-io/dgraph/posting.Key
      39MB  2.49% 90.25%  1087.22MB 69.28%  github.com/dgraph-io/dgraph/posting.GetOrCreate
      36MB  2.29% 92.54%       55MB  3.50%  github.com/dgraph-io/dgraph/posting.(*List).getPostingList
   16.50MB  1.05% 93.59%    16.50MB  1.05%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*element).search
   14.50MB  0.92% 94.52%       17MB  1.08%  github.com/dgraph-io/dgraph/rdf.Parse
   12.50MB   0.8% 95.31%   419.21MB 26.71%  github.com/dgraph-io/dgraph/uid.Get
      12MB  0.76% 96.08%    63.62MB  4.05%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).PutHC
       9MB  0.57% 96.65%   159.13MB 10.14%  github.com/dgraph-io/dgraph/posting.(*List).AddMutation
    8.50MB  0.54% 97.19%     8.50MB  0.54%  github.com/dgraph-io/dgraph/vendor/github.com/pkg/errors.callers
    4.50MB  0.29% 97.48%       34MB  2.17%  github.com/dgraph-io/dgraph/posting.(*List).mergeMutation

Around 50% seems to be coming from goatomic.
I will try to run it now on v0.4.2 and the commit before updating to latest goatomic maybe to see if something changed from then.

Before goatomic update - SHA-1 7fd40c8

1196.58MB of 1206.12MB total (99.21%)
Dropped 83 nodes (cum <= 6.03MB)
Showing top 20 nodes out of 31 (cum >= 140MB)
      flat  flat%   sum%        cum   cum%
  350.07MB 29.02% 29.02%   350.07MB 29.02%  github.com/dgraph-io/dgraph/posting.NewList
  210.51MB 17.45% 46.48%   210.51MB 17.45%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.newMockEntry
  196.01MB 16.25% 62.73%   406.51MB 33.70%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).getBucketByIndex
      79MB  6.55% 69.28%       79MB  6.55%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.newRealEntryWithHashCode
   63.99MB  5.31% 74.58%    63.99MB  5.31%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).grow
   57.50MB  4.77% 79.35%    57.50MB  4.77%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*element).add
   56.50MB  4.68% 84.04%    56.50MB  4.68%  github.com/dgraph-io/dgraph/posting.(*List).merge
      46MB  3.81% 87.85%   188.99MB 15.67%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).PutIfMissing
      40MB  3.32% 91.17%       71MB  5.89%  github.com/dgraph-io/dgraph/posting.(*List).init
   34.50MB  2.86% 94.03%       36MB  2.98%  github.com/dgraph-io/dgraph/posting.Key
      27MB  2.24% 96.27%   969.07MB 80.35%  github.com/dgraph-io/dgraph/posting.GetOrCreate
      22MB  1.82% 98.09%       31MB  2.57%  github.com/dgraph-io/dgraph/posting.(*List).getPostingList
    9.50MB  0.79% 98.88%   373.53MB 30.97%  github.com/dgraph-io/dgraph/uid.Get
       4MB  0.33% 99.21%        9MB  0.75%  github.com/dgraph-io/dgraph/store.(*Store).Get
         0     0% 99.21%  1014.57MB 84.12%  github.com/dgraph-io/dgraph/loader.(*state).handleNQuads
         0     0% 99.21%    56.50MB  4.68%  github.com/dgraph-io/dgraph/posting.(*List).MergeIfDirty
         0     0% 99.21%    48.50MB  4.02%  github.com/dgraph-io/dgraph/posting.gentlyMerge
         0     0% 99.21%    56.50MB  4.68%  github.com/dgraph-io/dgraph/posting.mergeAndUpdate
         0     0% 99.21%      140MB 11.61%  github.com/dgraph-io/dgraph/posting.process
         0     0% 99.21%      140MB 11.61%  github.com/dgraph-io/dgraph/posting.processOne

Still goatomic contributes for roughly 50%.

v0.4.3 - 6adc820

1396.10MB of 1408.65MB total (99.11%)
Dropped 76 nodes (cum <= 7.04MB)
Showing top 20 nodes out of 31 (cum >= 149.51MB)
      flat  flat%   sum%        cum   cum%
  440.09MB 31.24% 31.24%   440.09MB 31.24%  github.com/dgraph-io/dgraph/posting.NewList
  214.51MB 15.23% 46.47%   214.51MB 15.23%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.newMockEntry
  199.51MB 14.16% 60.63%   414.01MB 29.39%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).getBucketByIndex
   93.50MB  6.64% 67.27%    93.50MB  6.64%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.newRealEntryWithHashCode
   74.50MB  5.29% 72.56%    74.50MB  5.29%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*element).add
      68MB  4.83% 77.39%   225.49MB 16.01%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).PutIfMissing
   63.99MB  4.54% 81.93%    63.99MB  4.54%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).grow
      60MB  4.26% 86.19%       60MB  4.26%  github.com/dgraph-io/dgraph/posting.(*List).merge
   53.50MB  3.80% 89.99%       96MB  6.82%  github.com/dgraph-io/dgraph/posting.(*List).init
   46.50MB  3.30% 93.29%    47.50MB  3.37%  github.com/dgraph-io/dgraph/posting.Key
   30.50MB  2.17% 95.45%  1135.09MB 80.58%  github.com/dgraph-io/dgraph/posting.GetOrCreate
      27MB  1.92% 97.37%    42.50MB  3.02%  github.com/dgraph-io/dgraph/posting.(*List).getPostingList
      15MB  1.06% 98.43%   497.03MB 35.28%  github.com/dgraph-io/dgraph/uid.Get
    9.50MB  0.67% 99.11%    15.50MB  1.10%  github.com/dgraph-io/dgraph/store.(*Store).Get
         0     0% 99.11%  1197.59MB 85.02%  github.com/dgraph-io/dgraph/loader.(*state).handleNQuads
         0     0% 99.11%       60MB  4.26%  github.com/dgraph-io/dgraph/posting.(*List).MergeIfDirty
         0     0% 99.11%       56MB  3.98%  github.com/dgraph-io/dgraph/posting.gentlyMerge
         0     0% 99.11%       60MB  4.26%  github.com/dgraph-io/dgraph/posting.mergeAndUpdate
         0     0% 99.11%   149.51MB 10.61%  github.com/dgraph-io/dgraph/posting.process
         0     0% 99.11%   149.51MB 10.61%  github.com/dgraph-io/dgraph/posting.processOne

As for the Travis build not passing its because the machine has got 2 processors and when the CPU is overloaded it kills the dgraphloader. Adding numCpu=1 should take care of that.

ashwin95r · September 12, 2016, 8:04am

I don’t think CPU usage kills the loader. One thing we could try is to get the memprofile before calling the GC as it would have around 4G total usage (Currently the memprofile just shows 1.5G total) and might be able to give us some more information.

What are your thoughts?

ashwin95r · September 12, 2016, 9:04am

Taking a profile before GC is called, this was the result:

37818.47MB of 38457.41MB total (98.34%)
Dropped 107 nodes (cum <= 192.29MB)
      flat  flat%   sum%        cum   cum%
37744.47MB 98.15% 98.15% 37744.47MB 98.15%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).grow
   30.50MB 0.079% 98.23% 37807.99MB 98.31%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).PutIfMissing
      23MB  0.06% 98.29%      276MB  0.72%  github.com/dgraph-io/dgraph/posting.(*List).AddMutation
      15MB 0.039% 98.32% 38129.03MB 99.15%  github.com/dgraph-io/dgraph/posting.GetOrCreate
    5.50MB 0.014% 98.34% 16009.31MB 41.63%  github.com/dgraph-io/dgraph/uid.Get
         0     0% 98.34% 38441.53MB   100%  github.com/dgraph-io/dgraph/loader.(*state).handleNQuads
         0     0% 98.34%      276MB  0.72%  github.com/dgraph-io/dgraph/posting.(*List).AddMutationWithIndex
         0     0% 98.34% 16009.31MB 41.63%  github.com/dgraph-io/dgraph/rdf.NQuad.ToEdge
         0     0% 98.34% 16009.31MB 41.63%  github.com/dgraph-io/dgraph/rdf.getUid
         0     0% 98.34% 37744.47MB 98.15%  github.com/dgraph-io/dgraph/vendor/github.com/zond/gotomic.(*Hash).addSize
         0     0% 98.34% 38454.91MB   100%  runtime.goexit

Definitly a bit strange.

jchiu · September 12, 2016, 9:11am

The bug is not that easy to reproduce. I can run it ten times with memory profiling to confirm that it’s gotomic hash that is taking up all the memory?

pawan · September 12, 2016, 10:46am

I think at least on Travis that is the issue. I have added the numCpu flag for the loader on a feature branch and the loader ran fine both the times.

jchiu · September 12, 2016, 1:03pm

Martin just updated his code again. He is pretty responsive.

I will try updating and perhaps @mrjn can try again. My machine seems quite ok most of the time.

mrjn · September 12, 2016, 1:40pm

Actually, if they’re responsive, we can show them our CPU and memory profiles and get them to optimize their code. Otherwise, we’ll have to write our own.

pawan · September 13, 2016, 3:51pm

CPU profile

https://dl.dropboxusercontent.com/u/99140533/cpu.svg

Mem profile

https://dl.dropboxusercontent.com/u/99140533/mem.svg

Would be awesome to hear your thoughts on if there are any obvious improvements we could make here @minions?

mrjn · September 14, 2016, 1:02am

Looking at the memory profile, we have 2 issues:

posting.NewList, which we should be using a sync.Pool for, but it’s a tricky problem that I haven’t been able to think of a good solution for in the past. I’ll think about it a bit more and work on it.
zond.gotomic, which IMO is a badly optimized library. We should either get them to optimize it, or switch away from it. Their memory consumption is horrible, also as per tests by @Wei. It’s amazing how everything comes down and stops at zond.gotomic. It’s a memory hog.

If gotomic doesn’t fix this, I’d be eager to have the next person joining Dgraph write our own lockless map implementation.

jchiu · September 14, 2016, 1:02am

Looking at CPU profile, I am curious about Store.Get spending quite some time with x.Errorf. This is a relatively easy one to study.

// Get returns the value given a key for RocksDB.
func (s *Store) Get(key []byte) ([]byte, error) {
	valSlice, err := s.db.Get(s.ropt, key)
	if err != nil {
		return []byte(""), x.Wrap(err)
	}

	if valSlice == nil {
		return []byte(""), x.Errorf("E_KEY_NOT_FOUND")
	}

	val := valSlice.Data()
	if val == nil {
		return []byte(""), x.Errorf("E_KEY_NOT_FOUND")
	}
	return val, nil
}

x.Errorf is adding stack trace and other information. In this case, “key not found” is not really an “error” error. How about not returning any error and if the returned []byte is nil, just treat it as no hit?

mrjn · September 14, 2016, 1:05am

Honestly, I think we can do away with stack traces instead, and find cheaper ways, like maybe wrap, or just plain old fmt.Errorf(). My 2 cents .

Or, how about a win-win solution. Have a *opt flag, which can turn on/off stack traces? That way, our optimized binary won’t have to pay the cpu/memory costs of generating long strings.

jchiu · September 14, 2016, 1:09am

I am ok with adding a flag for this. That aside, I also think there is no need to allocate memory with []byte("") and create an error when there is no hit. It is the caller who will determine whether this is really an error.

mrjn · September 14, 2016, 1:10am

Yeah, agreed. Returning error when RocksDB doesn’t treat it like one seems unnecessary.

Though, the other reasoning behind my advice above is that if we’re able to see the effort in a particular code path, the effect would also be present in other code paths, and affecting us negatively. So, we should be going one level deeper and fixing the root problem; and hence the flag makes sense.

jchiu · September 14, 2016, 1:45am

Yes, I agree with that the first time around, but didn’t write much earlier. Will send a PR today to do that. Thank you for pointing out.

mrjn · September 14, 2016, 1:50am

Also, can you coordinate with the gotomic guys, show them the profile and get them to optimize the library? (Unless, @pawan, you want to do it. But I want one of you guys to be the decision maker about whether we should rewrite this lib.)

jchiu · September 14, 2016, 2:17am

I am happy to coordinate, but I doubt he/they will want to optimize for us. The author mentioned that he has not touched it for years.

mrjn · September 14, 2016, 2:19am

Alright, you’re the decision maker here. So, get advice from the team, and decide if we should write our own. You can learn more the decision making process here:
http://www.slideshare.net/pearpress/the-decisionmaker-dennisbakkeppt

Note that getting advice from the parties involved is very crucial.

pawan · September 14, 2016, 2:59am

Another option would be to make improvements to gotomic if the author is responsive to help/accept PR’s. Though I remember Manish mentioning that there is another paper which proposes a better implementation so that might be worth looking into.

jchiu · September 14, 2016, 5:59am

The memory overhead for zond/gotomic hash does seem concerning. (In terms of CPU, it looks good to me.) I would support maintaining our own version and customizing it as much as possible for our own use (for improved speed and reduced memory). But let’s wait to hear back from the author.

The author makes a recent change and I am going to run a memory profile to see if it helps.

Update: I ran another mem profile after the gotomic update, and its memory consumption seems worse: getBucketByIndex cumulative usage goes to 868M (vs 500+ in the past).

Topic		Replies	Views
Checking Bulk/Liveloader improvement. (Draft) Dgraph benchmark	1	650	September 16, 2020
Is there a memory leak here? Dgraph	5	2125	December 7, 2018
Seems to be a memory leak Dgraph status:accepted , ticket:created	26	2440	September 29, 2020
[WIP] Performance bottleneck on 1 TB data Dev	23	2731	August 14, 2020
Using sync.Pool for posting.List Users	0	843	September 18, 2016

Dissecting the dgraphloader memory usage

Related topics