[WIP] Performance bottleneck on 1 TB data

ashishgoswami · July 10, 2020, 1:23pm

Currently I am working on a task to see how Dgraph behaves when run with 1 TB data. I will list down my experiments and findings in this blog post.

Enviroment:

Linux ashish-test1 5.3.0-1026-gcp #28~18.04.1-Ubuntu SMP Sat Jun 6 00:09:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Dgraph Version:

Dgraph version   : v20.07.0-beta.Jun22-35-gfbae026b3
Dgraph SHA-256   : 7d7c53edebc7a5db8d75983521eb922d3b76526fd8a54de5e627b450f89274e2
Commit SHA-1     : fbae026b3
Commit timestamp : 2020-07-09 16:45:42 +0530
Branch           : release/v20.07
Go version       : go1.14.4

Cluster setup

Currently I am running 1 Zero, 1 Alpha server cluster.

Work loads

I am currently running workloads from the repository GitHub - linuxerwang/dgraph-bench: A benchmark program for dgraph.

Initial data size of cluster

I have bootstrapped cluster by running bulk loader on ~800M RDFs. Resulting p directory size is ~50GB. RDF file can be generated using data generator present in same repository.

Config file for workloads

bench_cases:
    - name: "get-friends-one-hop"
      concurrency: 8
    - name: "get-friends-two-hop"
      concurrency: 8
    - name: "insert-friend"
      concurrency: 8
    - name: "insert-person"
      concurrency: 8

Schema

<created_at>: int .
<dgraph.acl.rule>: [uid] .
<dgraph.graphql.schema>: string .
<dgraph.graphql.xid>: string @index(exact) @upsert .
<dgraph.rule.permission>: int .
<dgraph.rule.predicate>: string @index(exact) @upsert .
<dt>: default .
<friend_of>: uid @reverse .
<name>: default .
<patronymic>: default .
<rel_1>: [uid] .
<rel_2>: [uid] .
<rel_3>: [uid] .
<surname>: default .
<updated_at>: int .
<xid>: default .

Observations

Throughput numbers:

Note: Time elapsed is number of seconds passed since start. Hence below observations are after running workloads for around ~22 hours.

Time elapsed: 78315, Taskname: get-friends-one-hop, Speed: 101
Time elapsed: 78315, Taskname: insert-friend, Speed: 99
Time elapsed: 78316, Taskname: insert-person, Speed: 111
Time elapsed: 78316, Taskname: insert-friend, Speed: 108
Time elapsed: 78316, Taskname: get-friends-one-hop, Speed: 90
Time elapsed: 78316, Taskname: get-friends-two-hop, Speed: 92
Time elapsed: 78317, Taskname: insert-person, Speed: 90
Time elapsed: 78317, Taskname: insert-friend, Speed: 89
Time elapsed: 78317, Taskname: get-friends-one-hop, Speed: 93
Time elapsed: 78317, Taskname: get-friends-two-hop, Speed: 92
Time elapsed: 78318, Taskname: insert-person, Speed: 93
Time elapsed: 78318, Taskname: insert-friend, Speed: 94
Time elapsed: 78318, Taskname: get-friends-one-hop, Speed: 90
Time elapsed: 78318, Taskname: get-friends-two-hop, Speed: 87
Time elapsed: 78319, Taskname: insert-person, Speed: 91
Time elapsed: 78319, Taskname: insert-friend, Speed: 89
Time elapsed: 78319, Taskname: get-friends-one-hop, Speed: 89
Time elapsed: 78319, Taskname: get-friends-two-hop, Speed: 82
Time elapsed: 78320, Taskname: insert-person, Speed: 89
Time elapsed: 78320, Taskname: insert-friend, Speed: 87
Time elapsed: 78320, Taskname: get-friends-one-hop, Speed: 86
Time elapsed: 78320, Taskname: get-friends-two-hop, Speed: 90
Time elapsed: 78321, Taskname: insert-person, Speed: 85
Time elapsed: 78321, Taskname: get-friends-one-hop, Speed: 92
Time elapsed: 78321, Taskname: insert-friend, Speed: 86
Time elapsed: 78321, Taskname: get-friends-two-hop, Speed: 84

Current size of `p` directory:

~54GB

Work in progress items

We are thinking about ways to reach 1 TB size quickly.
We are also thinking of having our own workloads. Above workloads only covers very few things in query/mutations and have been used just for starting.

LGalatin · July 10, 2020, 3:57pm

what happened to the like button?

mrjn · July 10, 2020, 4:58pm

The Vote button overwrites the like button.

Willem520 · July 12, 2020, 5:11am

Hi,I want to know how many predicates in your dataset. In my dataset, I have 30+ predicates.so there is about 17+G RDFs. so 800M RDFs is too small.But, I think it is necessary to improve the performance of bulk by useing memory.

ashishgoswami · July 12, 2020, 5:27am

Hi @Willem520, I have updated the schema in original post. We are starting simple and working on to come up with right benchmarks.

We are also working on bulk loader memory usage optimisation for large datasets. You can follow post Bulk loader xidmap memory optimization to track the progress.

ashishgoswami · July 13, 2020, 10:59am

Update:
After running it for ~3 days, mutations and queries seems to stuck. I am seeing following in client logs.

Time elapsed: 328780, Taskname: insert-friend, Speed: 0
Time elapsed: 328780, Taskname: get-friends-one-hop, Speed: 0
Time elapsed: 328780, Taskname: get-friends-two-hop, Speed: 0
Time elapsed: 328781, Taskname: insert-person, Speed: 0
Time elapsed: 328781, Taskname: get-friends-one-hop, Speed: 0
Time elapsed: 328781, Taskname: insert-friend, Speed: 0
Time elapsed: 328781, Taskname: get-friends-two-hop, Speed: 0
Time elapsed: 328782, Taskname: insert-person, Speed: 0
Time elapsed: 328782, Taskname: insert-friend, Speed: 0
Time elapsed: 328782, Taskname: get-friends-one-hop, Speed: 0
Time elapsed: 328782, Taskname: get-friends-two-hop, Speed: 0
Time elapsed: 328783, Taskname: insert-person, Speed: 0
Time elapsed: 328783, Taskname: insert-friend, Speed: 0
Time elapsed: 328783, Taskname: get-friends-one-hop, Speed: 0
Time elapsed: 328783, Taskname: get-friends-two-hop, Speed: 0
Time elapsed: 328784, Taskname: insert-person, Speed: 0
Time elapsed: 328784, Taskname: insert-friend, Speed: 0
Time elapsed: 328784, Taskname: get-friends-one-hop, Speed: 0
Time elapsed: 328784, Taskname: get-friends-two-hop, Speed: 0
Time elapsed: 328785, Taskname: insert-person, Speed: 0
Time elapsed: 328785, Taskname: insert-friend, Speed: 0
Time elapsed: 328785, Taskname: get-friends-one-hop, Speed: 0
Time elapsed: 328785, Taskname: get-friends-two-hop, Speed: 0

These are alpha logs.

E0713 10:58:16.985323   14079 log.go:32] WatchBatch.Cancel error while finishing: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.985329   14079 node.go:285] While trying to save Raft update: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists. Retrying...
E0713 10:58:16.987652   14079 log.go:32] writeRequests: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.987682   14079 log.go:32] WatchBatch.Cancel error while finishing: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.987688   14079 node.go:285] While trying to save Raft update: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists. Retrying...
E0713 10:58:16.989567   14079 log.go:32] writeRequests: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.989593   14079 log.go:32] WatchBatch.Cancel error while finishing: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.989600   14079 node.go:285] While trying to save Raft update: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists. Retrying...
E0713 10:58:16.991562   14079 log.go:32] writeRequests: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.991587   14079 log.go:32] WatchBatch.Cancel error while finishing: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.991592   14079 node.go:285] While trying to save Raft update: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists. Retrying...
E0713 10:58:16.993539   14079 log.go:32] writeRequests: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.993563   14079 log.go:32] WatchBatch.Cancel error while finishing: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.993567   14079 node.go:285] While trying to save Raft update: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists. Retrying...
E0713 10:58:16.995574   14079 log.go:32] writeRequests: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.995597   14079 log.go:32] WatchBatch.Cancel error while finishing: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.995602   14079 node.go:285] While trying to save Raft update: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists. Retrying...
E0713 10:58:16.997786   14079 log.go:32] writeRequests: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.997812   14079 log.go:32] WatchBatch.Cancel error while finishing: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.997817   14079 node.go:285] While trying to save Raft update: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists. Retrying...
E0713 10:58:16.999862   14079 log.go:32] writeRequests: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.999893   14079 log.go:32] WatchBatch.Cancel error while finishing: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists

Alpha w directory content:

total 1.8M
-rw-rw-r-- 1 ashish ashish    6 Jul  9 13:29 LOCK
-rw------- 1 ashish ashish   28 Jul  9 13:29 KEYREGISTRY
-rw------- 1 ashish ashish 1.3M Jul 11 00:49 006869.vlog
-rw------- 1 ashish ashish   20 Jul 11 00:49 006871.vlog
-rw------- 1 ashish ashish 385K Jul 11 00:49 004148.sst
-rw------- 1 ashish ashish  90K Jul 11 00:49 MANIFEST
-rw------- 1 ashish ashish  21G Jul 13 10:58 006870.vlog

After restarting Alpha logs are as below:

 BadgerCompressionLevel:3 WALDir:w MutationsMode:0 AuthToken: AllottedMemory:15072.5 HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s AclRefreshInterval:0s}
I0713 11:13:35.759000    1642 server_state.go:78] Setting Badger Compression Level: 3
I0713 11:13:35.759014    1642 server_state.go:87] Setting Badger table load option: mmap
I0713 11:13:35.759019    1642 server_state.go:99] Setting Badger value log load option: mmap
I0713 11:13:35.759026    1642 server_state.go:144] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:false TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 ReadOnly:false Truncate:true Logger:0x2ba7138 Compression:2 InMemory:false MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 KeepL0InMemory:false MaxCacheSize:10485760 MaxBfCacheSize:0 LoadBloomsOnOpen:false NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 ZSTDCompressionLevel:3 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false KeepBlockIndicesInCache:false KeepBlocksInCache:false managedTxns:false maxBatchCount:0 maxBatchSize:0}
I0713 11:13:35.773452    1642 log.go:34] All 1 tables opened in 1ms
I0713 11:13:35.776038    1642 log.go:34] Replaying file id: 6869 at offset: 1341678
I0713 11:13:35.776107    1642 log.go:34] Replay took: 49.037µs
2020/07/13 11:13:35 file size: 265012884 greater than 4294967295
github.com/dgraph-io/badger/v2/y.AssertTruef
        /home/ashish/projects/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200615081930-c45d966681d4/y/error.go:62
github.com/dgraph-io/badger/v2.(*logFile).open
        /home/ashish/projects/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200615081930-c45d966681d4/value.go:914
github.com/dgraph-io/badger/v2.(*valueLog).open
        /home/ashish/projects/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200615081930-c45d966681d4/value.go:1121
github.com/dgraph-io/badger/v2.Open
        /home/ashish/projects/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200615081930-c45d966681d4/db.go:386
github.com/dgraph-io/dgraph/worker.(*ServerState).initStorage
        /home/ashish/projects/src/github.com/dgraph-io/dgraph/worker/server_state.go:147
github.com/dgraph-io/dgraph/worker.InitServerState
        /home/ashish/projects/src/github.com/dgraph-io/dgraph/worker/server_state.go:54
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run
        /home/ashish/projects/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:655
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.2.func1
        /home/ashish/projects/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:95
github.com/spf13/cobra.(*Command).execute
        /home/ashish/projects/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
        /home/ashish/projects/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
        /home/ashish/projects/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
        /home/ashish/projects/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:70
main.main
        /home/ashish/projects/src/github.com/dgraph-io/dgraph/dgraph/main.go:78
runtime.main
        /home/ashish/go/src/runtime/proc.go:203
runtime.goexit
        /home/ashish/go/src/runtime/asm_amd64.s:1373

We are looking into this.

mrjn · July 13, 2020, 11:59pm

A post was split to a new topic: Rewrite Dgraph in Rust

MichelDiz · July 13, 2020, 5:11pm

In addition to 1 TB data testing we could append some other datasets to the test. As for sure small projects won’t get there.

This project seems to collect blockchain data. GitHub - citp/BlockSci: A high-performance tool for blockchain science and exploration
Which can get 300GB on disk.

https://citp.github.io/BlockSci/readme.html

Current tools for blockchain analysis depend on general-purpose databases that provide “ACID” guarantees. But that’s unnecessary for blockchain analysis where the data structures are append-only. We take advantage of this observation in the design of our custom in-memory blockchain database as well as an analysis library. BlockSci’s core infrastructure is written in C++ and optimized for speed. (For example, traversing every transaction input and output on the Bitcoin blockchain takes only 1 second on our r4.2xlarge EC2 machine.) To make analysis more convenient, we provide Python bindings and a Jupyter notebook interface.

We recommend using instance with 60 GB of memory or more for optimal performance (r5.2xlarge). As of August 2019 the default disk size of 500GB may not suffice anymore, we therefore recommend choosing a larger disk size (e.g., 600 GB) when you create the instance. On boot, a Jupyter Notebook running BlockSci will launch immediately. To access the notebook, you must set up port forwarding to your computer. Inserting the name of your private key file and the domain of your ec2 instance into the following command will make the Notebook available on your machine.

I gonna add more examples we can append.

ashishgoswami · July 14, 2020, 12:04pm

Update: 14/07/2020

I reran Dgraph after previous crash.

Throughput numbers:

Time elapsed: 66894, Taskname: insert-person, Speed: 117
Time elapsed: 66894, Taskname: insert-friend, Speed: 114
Time elapsed: 66894, Taskname: get-friends-two-hop, Speed: 94
Time elapsed: 66895, Taskname: insert-person, Speed: 115
Time elapsed: 66895, Taskname: get-friends-one-hop, Speed: 100
Time elapsed: 66895, Taskname: insert-friend, Speed: 112
Time elapsed: 66895, Taskname: get-friends-two-hop, Speed: 90
Time elapsed: 66896, Taskname: get-friends-one-hop, Speed: 102
Time elapsed: 66896, Taskname: insert-friend, Speed: 111
Time elapsed: 66896, Taskname: insert-person, Speed: 117
Time elapsed: 66896, Taskname: get-friends-two-hop, Speed: 94
Time elapsed: 66897, Taskname: get-friends-one-hop, Speed: 96
Time elapsed: 66897, Taskname: insert-person, Speed: 114
Time elapsed: 66897, Taskname: insert-friend, Speed: 104
Time elapsed: 66897, Taskname: get-friends-two-hop, Speed: 96
Time elapsed: 66898, Taskname: get-friends-one-hop, Speed: 95
Time elapsed: 66898, Taskname: insert-person, Speed: 124
Time elapsed: 66898, Taskname: insert-friend, Speed: 123

Directory sizes

54G	./p
4.1M	./w
2.4G	./zw

RES usage shown in htop

16GB

Profile files for Heap and CPU

mem.pprof (259.9 KB) cpu.pprof (110.7 KB)
heap top:

File: dgraph
Build ID: 371190db1475fb14a7dfd3c44905e712b120ca09
Type: inuse_space
Time: Jul 14, 2020 at 9:39am (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 1016.81MB, 97.43% of 1043.60MB total
Dropped 144 nodes (cum <= 5.22MB)
Showing top 20 nodes out of 84
      flat  flat%   sum%        cum   cum%
  384.79MB 36.87% 36.87%   384.79MB 36.87%  github.com/dgraph-io/ristretto.newCmRow (inline)
  292.10MB 27.99% 64.86%   292.10MB 27.99%  github.com/dgraph-io/ristretto/z.(*Bloom).Size (inline)
  166.41MB 15.95% 80.81%   166.41MB 15.95%  github.com/dgraph-io/badger/v2/skl.newArena (inline)
      69MB  6.61% 87.42%       69MB  6.61%  github.com/dgraph-io/badger/v2/table.NewTableBuilder
   44.63MB  4.28% 91.70%    51.37MB  4.92%  github.com/dgraph-io/dgraph/worker.(*node).processApplyCh.func1
   41.22MB  3.95% 95.65%    51.72MB  4.96%  github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
   10.50MB  1.01% 96.65%    10.50MB  1.01%  github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
    6.89MB  0.66% 97.31%     6.89MB  0.66%  github.com/dgraph-io/badger/v2/table.OpenTable
    1.27MB  0.12% 97.43%   580.10MB 55.59%  github.com/dgraph-io/ristretto.NewCache
         0     0% 97.43%    49.40MB  4.73%  github.com/dgraph-io/badger/v2.(*DB).RunValueLogGC
         0     0% 97.43%   166.41MB 15.95%  github.com/dgraph-io/badger/v2.(*DB).doWrites.func1
         0     0% 97.43%   166.41MB 15.95%  github.com/dgraph-io/badger/v2.(*DB).ensureRoomForWrite
         0     0% 97.43%    48.90MB  4.69%  github.com/dgraph-io/badger/v2.(*DB).get
         0     0% 97.43%   166.41MB 15.95%  github.com/dgraph-io/badger/v2.(*DB).writeRequests
         0     0% 97.43%    49.24MB  4.72%  github.com/dgraph-io/badger/v2.(*Iterator).Seek
         0     0% 97.43%    46.86MB  4.49%  github.com/dgraph-io/badger/v2.(*IteratorOptions).pickTables
         0     0% 97.43%    51.65MB  4.95%  github.com/dgraph-io/badger/v2.(*Txn).NewIterator
         0     0% 97.43%    51.65MB  4.95%  github.com/dgraph-io/badger/v2.(*Txn).NewKeyIterator
         0     0% 97.43%    51.65MB  4.95%  github.com/dgraph-io/badger/v2.(*levelHandler).appendIterators
         0     0% 97.43%    48.90MB  4.69%  github.com/dgraph-io/badger/v2.(*levelHandler).get

kokizzu · July 15, 2020, 4:09am

what does speed: 101 means? 101 seconds? 101 miliseconds?

ashishgoswami · July 16, 2020, 11:08am

@kokizzu, These are throughput numbers. speed means number of tasks/sec.

kokizzu · July 16, 2020, 6:32pm

Hmm… what’s the spec of the machine?
I have baseline for other types of databases with i7-4770HQ, 32GB, SSD with similar load (insert first for few seconds then insert/update while select until all update completed) about a year ago, the average insert performance:
MemSQL 6.7 Rowstore: 4587 rows/sec
ScyllaDB 3.0: 2049 rows/sec
MemSQL 6.8 Colstore: 3484 rows/sec
PostgreSQL 10.7: 172 rows/sec
NuoDB 3.4.1: 154 rows/sec
MySQL 5.7: 97 rows/sec
CrateDB 3.2.7: 73 rows/sec
CockroacDB 19.1: 52 rows/sec
YugaByte 1.2.5: 192 rows/sec
Clickhouse 19.7.3 NotBatched: 201 rows/sec

the insert performance for DGraph seems equal to MySQL (ignoring the specs and data size), but the select performance seems slow (which is equal to insert)?

ashishgoswami · July 17, 2020, 9:03am

ashishgoswami:

E0713 10:58:16.995602   14079 node.go:285] While trying to save Raft update: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists. Retrying...
E0713 10:58:16.997786   14079 log.go:32] writeRequests: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.997812   14079 log.go:32] WatchBatch.Cancel error while finishing: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.997817   14079 node.go:285] While trying to save Raft update: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists. Retrying...

We are able to reproduce above issue after rerun. I have shared the logs with @ibrahim.

Since above rate is slow to reach 1 TB, I am trying bulk load with stackoverflow data right now. I am trying to load following file from stackoverflow, It has around ~1.7B RDFs. Currently no schema has been used.

-rwxr-xr-x 1 ashish ashish 57G Jul 15 17:00 posts.rdf.gz

Observation:

Bulk loader went OOM in reduce phase. Map to Reduce phase transition logs are as follows:

[16:25:40Z] MAP 02h33m08s nquad_count:1.727G err_count:0.000 nquad_speed:188.0k/sec edge_count:1.727G edge_speed:188.0k/sec
[16:25:41Z] MAP 02h33m09s nquad_count:1.727G err_count:0.000 nquad_speed:188.0k/sec edge_count:1.727G edge_speed:188.0k/sec
[16:25:42Z] MAP 02h33m10s nquad_count:1.727G err_count:0.000 nquad_speed:188.0k/sec edge_count:1.727G edge_speed:188.0k/sec
[16:25:43Z] MAP 02h33m11s nquad_count:1.727G err_count:0.000 nquad_speed:187.9k/sec edge_count:1.727G edge_speed:187.9k/sec
[16:25:44Z] MAP 02h33m12s nquad_count:1.727G err_count:0.000 nquad_speed:187.9k/sec edge_count:1.727G edge_speed:187.9k/sec
Shard tmp/map_output/000 -> Reduce tmp/shards/shard_0/000
[16:25:45Z] REDUCE 02h33m13s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:46Z] REDUCE 02h33m14s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:47Z] REDUCE 02h33m15s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:48Z] REDUCE 02h33m16s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[16:25:49Z] REDUCE 02h33m17s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0

Most of map phase heap is taken by xidmap. Heap profile(in use) top is as follows:

ashish@ashish-test1:~/pprof$ go tool pprof pprof.dgraph.alloc_objects.alloc_space.inuse_objects.inuse_space.008.pb.gz 
File: dgraph
Build ID: 371190db1475fb14a7dfd3c44905e712b120ca09
Type: inuse_space
Time: Jul 16, 2020 at 3:45pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 26623.49MB, 98.43% of 27049.15MB total
Dropped 72 nodes (cum <= 135.25MB)
Showing top 10 nodes out of 24
      flat  flat%   sum%        cum   cum%
21492.55MB 79.46% 79.46% 21510.06MB 79.52%  github.com/dgraph-io/dgraph/xidmap.(*XidMap).AssignUid
 3455.05MB 12.77% 92.23%  3455.05MB 12.77%  strings.(*Builder).WriteString
  489.11MB  1.81% 94.04%   489.11MB  1.81%  github.com/dgraph-io/dgraph/types.Marshal
  483.27MB  1.79% 95.83%   483.27MB  1.79%  bytes.makeSlice
  264.54MB  0.98% 96.80%   264.54MB  0.98%  github.com/dgraph-io/dgraph/posting.NewPosting
  186.01MB  0.69% 97.49%   186.01MB  0.69%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newMapper.func1
  145.31MB  0.54% 98.03%   145.31MB  0.54%  strconv.Unquote
   60.01MB  0.22% 98.25%   313.55MB  1.16%  github.com/dgraph-io/dgraph/chunker.(*rdfChunker).Parse
   37.63MB  0.14% 98.39%   223.64MB  0.83%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).addMapEntry
      10MB 0.037% 98.43%   179.81MB  0.66%  github.com/dgraph-io/dgraph/chunker.ParseRDF

Next I will retry reduce phase.

ibrahim · July 17, 2020, 9:58am

ashishgoswami:

> E0713 10:58:16.995602   14079 node.go:285] While trying to save Raft update: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists. Retrying...
E0713 10:58:16.997786   14079 log.go:32] writeRequests: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.997812   14079 log.go:32] WatchBatch.Cancel error while finishing: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists
E0713 10:58:16.997817   14079 node.go:285] While trying to save Raft update: Create value log file. Path=w/006871.vlog. Error=open w/006871.vlog: file exists. Retrying...
> ```

We are able to reproduce above issue after rerun. I have shared the logs with @ibrahim.

This will be fixed by

github.com/dgraph-io/badger

Remove vlog file if bootstrap, syncDir or mmap fails

dgraph-io:master ← dgraph-io:ibrahim/vlog-file-exists

opened 01:51PM - 16 Jul 20 UTC

jarifibrahim

+12 -0

The `createVlogFile` function creates a new vlog file. The function apart from …creating a new file, bootstraps it, syncs it to the disk, and mmaps it. If any of the three operations fail, we will end up with a file on the disk but the vlog.maxFid will not be updated. On the next attempt to create the vlog files, we will get a "File already exists" error. This commit fixes this issue by removing the file if `createVlogFile` encounters any error. Fixes DGRAPH-1930 --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/dgraph-io/badger/1434)

vvbalaji · July 17, 2020, 2:17pm

Is this issue a blocker for 20.03.4 and 20.07?

ibrahim · July 17, 2020, 2:18pm

No. It is not a release blocker.

ashishgoswami · July 20, 2020, 12:43pm

I tried running bulk loader while skipping map phase. It is still getting OOM. In logs I can see it reaches OOM after inserting ~450M RDFs.

[09:29:41Z] REDUCE 18m55s edge_count:449.4M edge_speed:396.1k/sec plist_count:411.5M plist_speed:362.7k/sec. Num Encoding: 0
writeCh: Time taken to write req: 1.259s
[09:29:42Z] REDUCE 18m56s edge_count:449.5M edge_speed:395.8k/sec plist_count:411.5M plist_speed:362.4k/sec. Num Encoding: 2
[09:29:43Z] REDUCE 18m57s edge_count:449.7M edge_speed:395.6k/sec plist_count:411.7M plist_speed:362.2k/sec. Num Encoding: 2
[09:29:45Z] REDUCE 18m59s edge_count:450.1M edge_speed:395.4k/sec plist_count:412.1M plist_speed:362.0k/sec. Num Encoding: 1
writeCh: Time taken to write req: 1.15s
[09:29:46Z] REDUCE 19m00s edge_count:450.2M edge_speed:395.1k/sec plist_count:412.2M plist_speed:361.7k/sec. Num Encoding: 1
writeCh: Time taken to write req: 1.168s

Heap profile for above is like below. Most of heap is taken by mapIterator.

Type: inuse_space
Time: Jul 17, 2020 at 11:02am (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 17.45GB, 98.24% of 17.76GB total
Dropped 68 nodes (cum <= 0.09GB)
Showing top 10 nodes out of 32
      flat  flat%   sum%        cum   cum%
   10.95GB 61.69% 61.69%    10.98GB 61.81%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapIterator).startBatching.func1
    4.15GB 23.39% 85.08%    15.13GB 85.20%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapIterator).startBatching
    0.66GB  3.69% 88.77%     0.66GB  3.69%  github.com/dgraph-io/dgraph/protos/pb.(*PostingList).Marshal
    0.50GB  2.82% 91.60%     0.50GB  2.82%  github.com/dgraph-io/ristretto/z.(*Bloom).Size
    0.37GB  2.07% 93.66%     0.51GB  2.88%  github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
    0.34GB  1.90% 95.56%     0.34GB  1.90%  github.com/dgraph-io/badger/v2/table.NewTableBuilder
    0.21GB  1.20% 96.75%     0.90GB  5.05%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).toList.func2
    0.14GB  0.81% 97.57%     0.14GB  0.81%  github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
    0.10GB  0.55% 98.12%     0.10GB  0.55%  compress/flate.(*dictDecoder).init (inline)
    0.02GB  0.12% 98.24%     0.12GB  0.67%  compress/flate.NewReader

Since bulk loader is going OOM (even if we don’t have lot of xids, since we are able to complete map phase), I switched to live loader.

I ran live loader on 1 alpha, 1 zero cluster.

Live loader first run

live loader able to insert around ~1B of RDFs before getting OOM in bit more than 4 hours.
Size of generated p directory was around 124GB.

[22:36:29Z] Elapsed: 04h14m01s Txns: 1056125 N-Quads: 1056125000 N-Quads/s [last 5s]: 86400 Aborts: 154
[22:36:34Z] Elapsed: 04h14m05s Txns: 1056184 N-Quads: 1056184000 N-Quads/s [last 5s]: 11800 Aborts: 154
[22:36:39Z] Elapsed: 04h14m10s Txns: 1056562 N-Quads: 1056562000 N-Quads/s [last 5s]: 75600 Aborts: 154
[22:36:44Z] Elapsed: 04h14m15s Txns: 1056943 N-Quads: 1056943000 N-Quads/s [last 5s]: 76200 Aborts: 154
[22:36:49Z] Elapsed: 04h14m20s Txns: 1057313 N-Quads: 1057313000 N-Quads/s [last 5s]: 74000 Aborts: 154
[22:36:54Z] Elapsed: 04h14m25s Txns: 1057655 N-Quads: 1057655000 N-Quads/s [last 5s]: 68400 Aborts: 154
[22:36:59Z] Elapsed: 04h14m30s Txns: 1058024 N-Quads: 1058024000 N-Quads/s [last 5s]: 73800 Aborts: 154
[22:37:04Z] Elapsed: 04h14m35s Txns: 1058334 N-Quads: 1058334000 N-Quads/s [last 5s]: 62000 Aborts: 154
[22:37:09Z] Elapsed: 04h14m40s Txns: 1058653 N-Quads: 1058653000 N-Quads/s [last 5s]: 63800 Aborts: 154
[22:37:14Z] Elapsed: 04h14m45s Txns: 1059000 N-Quads: 1059000000 N-Quads/s [last 5s]: 69400 Aborts: 154
[22:37:19Z] Elapsed: 04h14m50s Txns: 1059319 N-Quads: 1059319000 N-Quads/s [last 5s]: 63800 Aborts: 154

Live loader second run

In second run of live loader(on fresh cluster), it is running very slow. I am still investigating this. After running for around 4 hours it has only inserted ~25M RDFs. Current p directory size is ~1.6GB.

[12:40:15Z] Elapsed: 04h04m35s Txns: 25576 N-Quads: 25576000 N-Quads/s [last 5s]:   800 Aborts: 2276414
[12:40:20Z] Elapsed: 04h04m40s Txns: 25583 N-Quads: 25583000 N-Quads/s [last 5s]:  1400 Aborts: 2276846
[12:40:25Z] Elapsed: 04h04m45s Txns: 25586 N-Quads: 25586000 N-Quads/s [last 5s]:   600 Aborts: 2277396
[12:40:30Z] Elapsed: 04h04m50s Txns: 25592 N-Quads: 25592000 N-Quads/s [last 5s]:  1200 Aborts: 2278224
[12:40:35Z] Elapsed: 04h04m55s Txns: 25598 N-Quads: 25598000 N-Quads/s [last 5s]:  1200 Aborts: 2279403
[12:40:40Z] Elapsed: 04h05m00s Txns: 25605 N-Quads: 25605000 N-Quads/s [last 5s]:  1400 Aborts: 2280209
[12:40:45Z] Elapsed: 04h05m05s Txns: 25614 N-Quads: 25614000 N-Quads/s [last 5s]:  1800 Aborts: 2280740
[12:40:50Z] Elapsed: 04h05m10s Txns: 25618 N-Quads: 25618000 N-Quads/s [last 5s]:   800 Aborts: 2281330
[12:40:55Z] Elapsed: 04h05m15s Txns: 25624 N-Quads: 25624000 N-Quads/s [last 5s]:  1200 Aborts: 2282145
[12:41:00Z] Elapsed: 04h05m20s Txns: 25630 N-Quads: 25630000 N-Quads/s [last 5s]:  1200 Aborts: 2283247
[12:41:05Z] Elapsed: 04h05m25s Txns: 25633 N-Quads: 25633000 N-Quads/s [last 5s]:   600 Aborts: 2284115

Paras · July 20, 2020, 8:50pm

@ashishgoswami , lets sync. I ran a 1B rdf on live with 20.07 and it completed successfully. It took almost 43 hours.

mrjn · July 21, 2020, 12:48am

Just to confirm, you guys are running in ludicrous mode, right?

Paras · July 21, 2020, 1:40am

My tests were part of the release criterion with defaults (i.e. ludicrous mode off).

Topic		Replies	Views
Bulk Loader performance Dgraph kind:question , dgraph	5	488	February 1, 2021
Linuxerwang Test Benchmark Result Dev	2	732	December 28, 2020
Loading close to 1M edges/sec into Dgraph - Dgraph Blog Blog	3	1463	November 15, 2018
Releasing Dgraph v0.7.1 - Dgraph Blog Blog	0	898	August 18, 2017
Bulk Loader REDUCE problem - it's very slow Dgraph dgraph , status:accepted , kind:bug , ticket:created	22	968	March 12, 2021

[WIP] Performance bottleneck on 1 TB data

Enviroment:

Dgraph Version:

Cluster setup

Work loads

Initial data size of cluster

Config file for workloads

Schema

Observations

Throughput numbers:

Current size of p directory:

Work in progress items

Throughput numbers:

Directory sizes

RES usage shown in htop

Profile files for Heap and CPU

Observation:

Live loader first run

Live loader second run

Related topics

Current size of `p` directory: