Dgraph can't idle without being oomkilled after large data ingestion

Ahh @JimWen interesting I’ll give it a look.

@mrjn @ibrahim I was able to get a good heap snapshot- looks like what my snapshot captured is consistent with JimWen’s.

pprof.dgraph.alloc_objects.alloc_space.inuse_objects.inuse_space.061.pb.gz (43.4 KB)

File: dgraph
Build ID: 8ab9db95e603f2648c2702a2bb6c557d41f8348d
Type: inuse_space
Time: May 21, 2020 at 2:44pm (EDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 15275.58MB, 98.43% of 15519.04MB total
Dropped 204 nodes (cum <= 77.60MB)
Showing top 10 nodes out of 89
      flat  flat%   sum%        cum   cum%
 8438.07MB 54.37% 54.37%  8438.07MB 54.37%  go.etcd.io/etcd/raft/raftpb.(*Entry).Unmarshal
 3619.64MB 23.32% 77.70%  4685.36MB 30.19%  github.com/dgraph-io/badger/v2/table.OpenTable
  639.50MB  4.12% 81.82%  1056.52MB  6.81%  github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
  569.12MB  3.67% 85.48%   569.12MB  3.67%  github.com/DataDog/zstd.Decompress
  519.74MB  3.35% 88.83%   519.74MB  3.35%  github.com/dgraph-io/ristretto/z.(*Bloom).Size
  440.71MB  2.84% 91.67%   545.22MB  3.51%  github.com/dgraph-io/dgraph/protos/pb.(*Mutations).Unmarshal
  417.02MB  2.69% 94.36%   417.02MB  2.69%  github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
  258.37MB  1.66% 96.03%   258.37MB  1.66%  github.com/dgraph-io/ristretto.newCmRow
     207MB  1.33% 97.36%      207MB  1.33%  github.com/dgraph-io/badger/v2/table.NewTableBuilder
  166.41MB  1.07% 98.43%   166.41MB  1.07%  github.com/dgraph-io/badger/v2/skl.newArena
1 Like

Looks like the raftpb.Entry is consuming most of the memory. @martinmr would have more information about this.

The solution @JimWen proposed makes sense. It’s not in our code so we would need to wait for it to be merged. Do we have contacts of any of the maintainers of raft to try to get this merged ASAP? @mrjn

1 Like

I think this issue should be happening because ludicrous mode wasn’t doing snapshots correctly. This PR fixes that: fix(dgraph): Fix snapshot calculation in ludicrous mode by ashish-goswami · Pull Request #5585 · dgraph-io/dgraph · GitHub

1 Like