Dgraph version: v21.03.1
I am trying to load Wikidata. Dataset is about 700g and 6_000_000_000 records.
I tried to load it using Bulk but faced with oom issue. I’ve created a separate topic ‘OOM during bulk loading’
Records example:
<https://www.wikidata.org/wiki/Special:EntityData/L25> <http://schema.org/about> <http://www.wikidata.org/entity/L25> .
<https://www.wikidata.org/wiki/Special:EntityData/L25> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Dataset> .
<http://www.wikidata.org/entity/L314-F1> <http://wikiba.se/ontology#grammaticalFeature> <http://www.wikidata.org/entity/Q146786> .
<http://www.wikidata.org/entity/L314-F1> <http://www.w3.org/ns/lemon/ontolex#representation> "pis"@ca .
<http://www.wikidata.org/entity/L314-F1> <http://www.w3.org/2000/01/rdf-schema#label> "pis"@ca .
Schema example:
<http://www.w3.org/ns/lemon/ontolex#representation>: string @lang .
<http://www.w3.org/2000/01/rdf-schema#label>: string @lang .
<http://www.wikidata.org/prop/direct/P5831>: string @lang .
<http://wikiba.se/ontology#lemma>: string @lang .
<http://www.wikidata.org/prop/statement/P5831>: string @lang .
<http://www.w3.org/2004/02/skos/core#definition>: string @lang .
<http://www.wikidata.org/prop/direct/P5187>: string @lang .
<http://www.wikidata.org/prop/statement/P5187>: string @lang .
Cluster set up:
Machine is configured with 750G of RAM.
1_200_000_000 records were loaded successfully but then OOM occurred.
1003 17:09:18.329158 22 log.go:34] 2 [term: 904] received a MsgVote message with higher term from 1 [term: 905]
I1003 17:09:18.329199 22 log.go:34] 2 became follower at term 905
I1003 17:09:18.329215 22 log.go:34] 2 [logterm: 904, index: 2512553, vote: 0] cast MsgVote for 1 [logterm: 904, index: 2512553] at term 905
I1003 17:09:18.329875 22 log.go:34] raft.node: 2 elected leader 1 at term 905
I1003 17:09:18.517119 22 draft.go:211] Operation started with id: opSnapshot
I1003 17:09:18.517165 22 draft.go:136] Operation completed with id: opRollup
I1003 17:09:18.517203 22 snapshot.go:295] Got StreamSnapshot request: context:<id:3 group:1 addr:"alpha2:7081" > index:2501003 read_ts:2909272 since_ts:2784958
I1003 17:09:18.517295 22 snapshot.go:204] Waiting to reach timestamp: 2909272
I1003 17:09:18.519573 22 log.go:34] Sending Snapshot Streaming about 88 GiB of uncompressed data (43 GiB on disk)
I1003 17:09:18.644671 22 log.go:34] Number of ranges found: 17
I1003 17:09:18.644712 22 log.go:34] Sent range 0 for iteration: [000000000000000000002d687474703a2f2f7777772e77332e6f72672f323030342f30322f736b6f732f636f726523707265664c6162656c00000000001d72f138ffffffffffe4f3fc, 0000000000000000000033687474703a2f2f7777772e77696b69646174612e6f72672f70726f702f6469726563742d6e6f726d616c697a65642f5033353600000000001f6037ffffffffffffdb0c86) of size: 18 GiB
I1003 17:09:18.644728 22 log.go:34] Sent range 1 for iteration: [0000000000000000000029687474703a2f2f7777772e77696b69646174612e6f72672f70726f702f6469726563742f503238363000000000001ca34b8bffffffffffe5bf9e, 000000000000000000002a687474703a2f2f7777772e77332e6f72672f323030302f30312f7264662d736368656d61236c6162656c0000000000179ee3aeffffffffffe89a91) of size: 18 GiB
I1003 17:09:18.644738 22 log.go:34] Sent range 2 for iteration: [000000000000000000001d687474703a2f2f736368656d612e6f72672f6465736372697074696f6e000000000017260b96ffffffffffea84c5, 000000000000000000001d687474703a2f2f736368656d612e6f72672f6465736372697074696f6e00000000001d931c8affffffffffe2ac97) of size: 18 GiB
I1003 17:09:18.644747 22 log.go:34] Sent range 3 for iteration: [0000000000000000000017687474703a2f2f736368656d612e6f72672f61626f757400000000001f9c4469ffffffffffd6ce5d, 000000000000000000001d687474703a2f2f736368656d612e6f72672f6465736372697074696f6e0000000000138a3276fffffffffff5c37d) of size: 18 GiB
I1003 17:09:18.644755 22 log.go:34] Sent range 4 for iteration: [000000000000000000001d687474703a2f2f736368656d612e6f72672f6465736372697074696f6e000000000015bbc889ffffffffffeeabda, 000000000000000000001d687474703a2f2f736368656d612e6f72672f6465736372697074696f6e000000000017260b96ffffffffffea84c5) of size: 18 GiB
I1003 17:09:18.644767 22 log.go:34] Sent range 5 for iteration: [000000000000000000001d687474703a2f2f736368656d612e6f72672f6465736372697074696f6e00000000001d931c8affffffffffe2ac97, 000000000000000000001d687474703a2f2f736368656d612e6f72672f6465736372697074696f6e00000000001ead66efffffffffffdede8b) of size: 18 GiB
I1003 17:09:18.644774 22 log.go:34] Sent range 6 for iteration: [000000000000000000002a687474703a2f2f7777772e77332e6f72672f323030302f30312f7264662d736368656d61236c6162656c0000000000179ee3aeffffffffffe89a91, 000000000000000000002c687474703a2f2f7777772e77332e6f72672f323030342f30322f736b6f732f636f726523616c744c6162656c00000000001e3d4516ffffffffffe08a6c) of size: 18 GiB
I1003 17:09:18.644780 22 log.go:34] Sent range 7 for iteration: [000000000000000000002c687474703a2f2f7777772e77332e6f72672f323030342f30322f736b6f732f636f726523616c744c6162656c00000000001e3d4516ffffffffffe08a6c, 000000000000000000002d687474703a2f2f7777772e77332e6f72672f323030342f30322f736b6f732f636f726523707265664c6162656c00000000001d72f138ffffffffffe4f3fc) of size: 18 GiB
I1003 17:09:18.644789 22 log.go:34] Sent range 8 for iteration: [000000000000000000001d687474703a2f2f736368656d612e6f72672f6465736372697074696f6e00000000001f01b105ffffffffffddc1c5, 000000000000000000001d687474703a2f2f736368656d612e6f72672f6465736372697074696f6e00000000001f632fc2ffffffffffdb2e3e) of size: 18 GiB
I1003 17:09:18.644796 22 log.go:34] Sent range 9 for iteration: [0000000000000000000027687474703a2f2f7777772e77696b69646174612e6f72672f70726f702f6469726563742f50323100000000001ef54607ffffffffffd71c3d, 0000000000000000000028687474703a2f2f7777772e77696b69646174612e6f72672f70726f702f6469726563742f5036323500000000001e40f38affffffffffe1bb82) of size: 18 GiB
I1003 17:09:18.644805 22 log.go:34] Sent range 10 for iteration: [, 0000000000000000000017687474703a2f2f736368656d612e6f72672f61626f757400000000001f9c4469ffffffffffd6ce5d) of size: 18 GiB
I1003 17:09:23.054402 22 admin.go:868] namespace: 0. No GraphQL schema in Dgraph; serving empty GraphQL API
I1003 17:09:23.519663 22 log.go:34] Sending Snapshot [05s] Scan (8): ~880.0 MiB/88 GiB at 0 B/sec. Sent: 144.0 MiB at 0 B/sec. jemalloc: 1.1 GiB
I1003 17:09:29.253545 22 log.go:34] Sending Snapshot [10s] Scan (8): ~1.3 GiB/88 GiB at 42 MiB/sec. Sent: 192.0 MiB at 4.2 MiB/sec. jemalloc: 1.0 GiB
I1003 17:09:33.519747 22 log.go:34] Sending Snapshot [15s] Scan (8): ~4.7 GiB/88 GiB at 299 MiB/sec. Sent: 352.0 MiB at 15 MiB/sec. jemalloc: 1.0 GiB
I1003 17:09:34.194883 22 log.go:34] Sent range 11 for iteration: [0000000000000000000028687474703a2f2f7777772e77696b69646174612e6f72672f70726f702f6469726563742f5036323500000000001e40f38affffffffffe1bb82, 0000000000000000000029687474703a2f2f7777772e77696b69646174612e6f72672f70726f702f6469726563742f503238363000000000001ca34b8bffffffffffe5bf9e) of size: 18 GiB
I1003 17:09:35.326957 22 log.go:34] Sent range 12 for iteration: [000000000000000000001d687474703a2f2f736368656d612e6f72672f6465736372697074696f6e000000000014461d9cfffffffffff1e03d, 000000000000000000001d687474703a2f2f736368656d612e6f72672f6465736372697074696f6e000000000015bbc889ffffffffffeeabda) of size: 18 GiB
fatal error: out of memory
goroutine 29553 [running]:
runtime.throw(0x20e9612, 0xd)
/usr/local/go/src/runtime/panic.go:1117 +0x72 fp=0xc002599468 sp=0xc002599438 pc=0xa6fa32
github.com/dgraph-io/ristretto/z.Calloc(0x3739, 0x20ed2eb, 0x10, 0x3739, 0x2, 0x0)
/go/pkg/mod/github.com/dgraph-io/ristretto@v0.0.4-0.20210504190834-0bf2acd73aa3/z/calloc_jemalloc.go:78 +0x1c7 fp=0xc0025994a8 sp=0xc002599468 pc=0xec7ae7
github.com/dgraph-io/badger/v3/table.(*Table).decompress(0xc006545680, 0xc01663f420, 0x698, 0x1)
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/table/table.go:810 +0x545 fp=0xc002599580 sp=0xc0025994a8 pc=0xfbaa45
github.com/dgraph-io/badger/v3/table.(*Table).block(0xc006545680, 0x698, 0x1, 0x0, 0x0, 0x0)
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/table/table.go:580 +0x37a fp=0xc002599768 sp=0xc002599580 pc=0xfb8a5a
github.com/dgraph-io/badger/v3/table.(*Iterator).next(0xc0001749a0)
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/table/iterator.go:331 +0xd0 fp=0xc0025997b8 sp=0xc002599768 pc=0xfb42d0
github.com/dgraph-io/badger/v3/table.(*Iterator).next(0xc0001749a0)
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/table/iterator.go:348 +0x1ed fp=0xc002599808 sp=0xc0025997b8 pc=0xfb43ed
github.com/dgraph-io/badger/v3/table.(*Iterator).Next(0xc0001749a0)
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/table/iterator.go:406 +0x3e fp=0xc002599820 sp=0xc002599808 pc=0xfb485e
github.com/dgraph-io/badger/v3/table.(*ConcatIterator).Next(0xc0032d1db0)
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/table/iterator.go:529 +0x33 fp=0xc002599880 sp=0xc002599820 pc=0xfb5133
github.com/dgraph-io/badger/v3/table.(*node).next(0xc0024e30c0)
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/table/merge_iterator.go:81 +0x3d fp=0xc002599898 sp=0xc002599880 pc=0xfb579d
github.com/dgraph-io/badger/v3/table.(*MergeIterator).Next(0xc0024e3080)
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/table/merge_iterator.go:157 +0x38 fp=0xc002599900 sp=0xc002599898 pc=0xfb5b98
I decided to increase badger memory configuration and added to each alpha
--badger="memtablesize=67108864000;basetablesize=209715200;baselevelsize=10485760;valuethreshold=1048576;blocksize=4960;blockcachesize=6979321850;indexcachesize=3758096380"
After that all alphas started to consume a lot of memory one by one, approximately (100G of RAM monitored by docker stats) and then failed with another error but unfortunately i lost logs:
It was smth like: slice bounds out of range
Next day i tried to reproduce it but alpha started to run OOM again while docker stats showing that alpha max consumes 10G of RAM.
I1005 12:24:14.720143 24 log.go:295] Found file: 113 First Index: 2498793
I1005 12:24:14.720194 24 log.go:295] Found file: 114 First Index: 2525100
I1005 12:24:14.720234 24 log.go:295] Found file: 115 First Index: 2544449
I1005 12:24:14.720280 24 log.go:295] Found file: 116 First Index: 2563455
I1005 12:24:14.720379 24 storage.go:125] Init Raft Storage with snap: 2501003, first: 2501004, last: 2582169
I1005 12:24:14.720412 24 server_state.go:140] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false NumVersionsToKeep:2147483647 ReadOnly:false Logger:0x33e3080 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864000 BaseTableSize:2097152000 BaseLevelSize:1048576000 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4960 BloomFalsePositive:0.01 BlockCacheSize:697932185000 IndexCacheSize:375809638000 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false NamespaceOffset:1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0}
fatal error: runtime: out of memory
runtime stack:
runtime.throw(0x20f88bd, 0x16)
/usr/local/go/src/runtime/panic.go:1117 +0x72
runtime.sysMap(0xd6a0000000, 0x1450000000, 0x33ea730)
/usr/local/go/src/runtime/mem_linux.go:169 +0xc6
runtime.(*mheap).sysAlloc(0x31a7280, 0x1450000000, 0xa637d7, 0x31a7288)
/usr/local/go/src/runtime/malloc.go:729 +0x1e5
runtime.(*mheap).grow(0x31a7280, 0xa28000, 0x0)
/usr/local/go/src/runtime/mheap.go:1346 +0x85
runtime.(*mheap).allocSpan(0x31a7280, 0xa28000, 0x100, 0x7f6a92019558)
/usr/local/go/src/runtime/mheap.go:1173 +0x609
runtime.(*mheap).alloc.func1()
/usr/local/go/src/runtime/mheap.go:910 +0x59
runtime.systemstack(0x7f6a2c000020)
/usr/local/go/src/runtime/asm_amd64.s:379 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1246
Also i tried to check ‘p’ directory using dgraph debug but faced with error:
root@8ce6a4aee9af:/dgraph# dgraph debug --postings p
Opening DB: p
Listening for /debug HTTP requests at port: 8080
2021/10/05 12:34:01 Arena too small, toWrite:27 newTotal:87241488 limit:87241465
github.com/dgraph-io/badger/v3/y.AssertTruef
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/y/error.go:62
github.com/dgraph-io/badger/v3/skl.(*Arena).putNode
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/skl/arena.go:67
github.com/dgraph-io/badger/v3/skl.newNode
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/skl/skl.go:109
github.com/dgraph-io/badger/v3/skl.(*Skiplist).Put
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/skl/skl.go:306
github.com/dgraph-io/badger/v3.(*memTable).replayFunction.func1
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/memtable.go:258
github.com/dgraph-io/badger/v3.(*logFile).iterate
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/memtable.go:539
github.com/dgraph-io/badger/v3.(*memTable).UpdateSkipList
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/memtable.go:219
github.com/dgraph-io/badger/v3.(*DB).openMemTable
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/memtable.go:146
github.com/dgraph-io/badger/v3.(*DB).openMemTables
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/memtable.go:88
github.com/dgraph-io/badger/v3.Open
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/db.go:338
github.com/dgraph-io/badger/v3.OpenManaged
/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/managed_db.go:26
github.com/dgraph-io/dgraph/dgraph/cmd/debug.run
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/debug/run.go:917
github.com/dgraph-io/dgraph/dgraph/cmd/debug.init.0.func1
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/debug/run.go:83
github.com/spf13/cobra.(*Command).execute
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:78
main.main
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:99
runtime.main
/usr/local/go/src/runtime/proc.go:225
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1371
Files in p directory sorted by max size
total 48G
-rw-r--r-- 1 root root 3.7G Oct 4 18:18 00002.mem
-rw-r--r-- 1 root root 2.0G Oct 4 17:56 002140.vlog
-rw-r--r-- 1 root root 298M Oct 4 17:57 101352.sst
-rw-r--r-- 1 root root 296M Oct 4 17:57 101393.sst
-rw-r--r-- 1 root root 249M Oct 4 17:57 101353.sst
-rw-r--r-- 1 root root 190M Oct 4 17:56 101287.sst
-rw-r--r-- 1 root root 190M Oct 4 17:56 101294.sst
-rw-r--r-- 1 root root 157M Oct 4 17:57 101371.sst
-rw-r--r-- 1 root root 143M Oct 4 17:57 101378.sst
-rw-r--r-- 1 root root 129M Oct 4 17:56 101288.sst
-rw-r--r-- 1 root root 118M Oct 4 17:57 101416.sst
-rw-r--r-- 1 root root 111M Oct 4 17:57 101415.sst
-rw-r--r-- 1 root root 104M Oct 4 17:57 101417.sst
-rw-r--r-- 1 root root 104M Oct 4 17:57 101397.sst
-rw-r--r-- 1 root root 103M Oct 4 17:57 101336.sst
-rw-r--r-- 1 root root 102M Oct 4 17:57 101391.sst
-rw-r--r-- 1 root root 102M Oct 4 17:57 101421.sst
-rw-r--r-- 1 root root 99M Oct 4 17:57 101420.sst
-rw-r--r-- 1 root root 94M Oct 4 17:57 101402.sst
-rw-r--r-- 1 root root 94M Oct 4 17:57 101401.sst
-rw-r--r-- 1 root root 93M Oct 4 17:57 101400.sst
-rw-r--r-- 1 root root 93M Oct 4 17:56 101296.sst
-rw-r--r-- 1 root root 92M Oct 4 17:57 101389.sst
-rw-r--r-- 1 root root 92M Oct 4 17:57 101390.sst
-rw-r--r-- 1 root root 85M Oct 4 17:57 101399.sst
-rw-r--r-- 1 root root 84M Oct 4 17:57 101351.sst
-rw-r--r-- 1 root root 83M Oct 4 17:57 101383.sst
-rw-r--r-- 1 root root 82M Oct 4 17:56 101298.sst
-rw-r--r-- 1 root root 82M Oct 4 17:57 101362.sst
-rw-r--r-- 1 root root 78M Oct 4 17:57 101343.sst
-rw-r--r-- 1 root root 76M Oct 4 17:57 101382.sst
-rw-r--r-- 1 root root 76M Oct 4 17:57 101396.sst
-rw-r--r-- 1 root root 72M Oct 4 17:56 101324.sst
-rw-r--r-- 1 root root 71M Oct 4 17:57 101341.sst
-rw-r--r-- 1 root root 71M Oct 4 17:57 101385.sst
-rw-r--r-- 1 root root 70M Oct 4 17:57 101414.sst
-rw-r--r-- 1 root root 67M Oct 4 17:57 101388.sst
-rw-r--r-- 1 root root 67M Oct 4 17:56 101300.sst
-rw-r--r-- 1 root root 64M Oct 4 17:57 101413.sst
-rw-r--r-- 1 root root 63M Oct 4 17:57 101395.sst
-rw-r--r-- 1 root root 59M Oct 4 17:56 101310.sst
-rw-r--r-- 1 root root 58M Oct 4 17:57 101418.sst
....
I tried to make a memory profile but didn’t understood how to get it after execution. Where is it saved?
Can someone please help to figure out what’s going on?
Thanks