I’m working on testing badger read performance on 1TB of data. This posts lists the results of the tests.

[Note: This is a work in progress document]

# Test 1 (Data without compression)

The badger `benchmark read tool`

has been used for this test, and the test was carried out on this commit. We ran the test on 836GB of data written without compression. The level structure of the LSM tree is as follows:

```
Level 0 size: 0 B
Level 1 size: 2.2 GB
Level 2 size: 90 GB
Level 3 size: 361 GB
Total index size: 452 GB
Value log size: 445 GB
Total number of SSTs: 6615
Directory size: 836GB
```

The test samples 1million keys in random order. We then randomly choose a key from these 1million keys and get its value using `txn.Get()`

, we repeatedly do this for a fixed time span. The test is running **without block and index cache**, also **LoadBloomsOnOpen=false**.

We did this for the following time spans:

### 1 Minute Duration

In one minute it read 2.3GB of data at a speed of 38MB/s.

```
badger 2020/08/27 11:53:27 INFO: All 6615 tables opened in 40.167s
...
Time elapsed: 59s, bytes read: 2.3 GB, speed: 38 MB/sec, entries read: 14142937, speed: 239710/sec
```

The heap profile during the test:

```
Showing nodes accounting for 9351.92MB, 100% of 9355.08MB total
Dropped 15 nodes (cum <= 46.78MB)
Showing top 5 nodes out of 19
flat flat% sum% cum cum%
5584.05MB 59.69% 59.69% 9267.22MB 99.06% github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
3683.17MB 39.37% 99.06% 3683.17MB 39.37% github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
83.20MB 0.89% 100% 83.20MB 0.89% github.com/dgraph-io/badger/v2/skl.newArena
1MB 0.011% 100% 9268.87MB 99.08% github.com/dgraph-io/badger/v2/table.OpenTable
0.50MB 0.0054% 100% 9271.38MB 99.11% github.com/dgraph-io/badger/v2.newLevelsController.func1
```

### 5 Minutes Duration

```
Time elapsed: 04m59s, bytes read: 12 GB, speed: 41 MB/sec, entries read: 77040272, speed: 257659/sec
```

Heap profile after 4 minute of run:

```
Showing nodes accounting for 12773.51MB, 99.10% of 12889.37MB total
Dropped 68 nodes (cum <= 64.45MB)
Showing top 5 nodes out of 19
flat flat% sum% cum cum%
7647.57MB 59.33% 59.33% 12687.30MB 98.43% github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
5039.73MB 39.10% 98.43% 5039.73MB 39.10% github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
83.20MB 0.65% 99.08% 83.20MB 0.65% github.com/dgraph-io/badger/v2/skl.newArena
3MB 0.023% 99.10% 12690.80MB 98.46% github.com/dgraph-io/badger/v2/table.OpenTable
0 0% 99.10% 86.62MB 0.67% github.com/dgraph-io/badger/v2.Open
```

### 10 Minutes Duration

```
Time elapsed: 09m59s, bytes read: 25 GB, speed: 42 MB/sec, entries read: 155817374, speed: 260129/sec
```

Heap profile after 9 minute of run:

```
Showing nodes accounting for 12645.74MB, 99.06% of 12765.66MB total
Dropped 100 nodes (cum <= 63.83MB)
Showing top 5 nodes out of 19
flat flat% sum% cum cum%
7638.31MB 59.83% 59.83% 12559.53MB 98.39% github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
4921.23MB 38.55% 98.39% 4921.23MB 38.55% github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
83.20MB 0.65% 99.04% 83.20MB 0.65% github.com/dgraph-io/badger/v2/skl.newArena
3MB 0.024% 99.06% 12563.53MB 98.42% github.com/dgraph-io/badger/v2/table.OpenTable
0 0% 99.06% 87.41MB 0.68% github.com/dgraph-io/badger/v2.Open
```

### Full DB Scan

In this test, the full DB is scanned using iterators. It took 57 minutes to read 400GB of entries with an average read speed of 117 MB/s.

```
Time elapsed: 56m57s, bytes read: 400 GB, speed: 117 MB/sec, entries read: 2500000000, speed: 731635/sec
```

The heap profile after ~50 minutes of run is as follows:

```
Showing nodes accounting for 12572.05MB, 99.91% of 12583.90MB total
Dropped 50 nodes (cum <= 62.92MB)
Showing top 5 nodes out of 19
flat flat% sum% cum cum%
7568.12MB 60.14% 60.14% 12487.35MB 99.23% github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
4919.23MB 39.09% 99.23% 4919.23MB 39.09% github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
83.20MB 0.66% 99.89% 83.20MB 0.66% github.com/dgraph-io/badger/v2/skl.newArena
1.50MB 0.012% 99.91% 12488.85MB 99.24% github.com/dgraph-io/badger/v2/table.OpenTable
0 0% 99.91% 86.02MB 0.68% github.com/dgraph-io/badger/v2.Open
```

# Test 2 - on compressed Data

This test was carried out on compressed data. The details about the data is as follows:

```
Level 0 size: 0 B
Level 1 size: 1.1 GB
Level 2 size: 28 GB
Level 3 size: 256 GB
Level 4 size: 36 GB
Total index size: 322 GB
Value log size: 584 GB
Total number of SSTs: 21040
Directory size: 844 GB
```

This test runs with **cache enabled** and **LoadBloomsOnOpen:true**. The test is run using the following parameters:

```
go run main.go benchmark read \
--index-cache=500 \ # Index Cache of 500MB
--block-cache=1000 \ # Block Cache of 1GB
-d=(x)m \ # Duration of x minutes, x ∈ {1m, 5m and 10m}
--dir=../../data # Data directory
```

The results are as follows:

### 1 Minute Duration

```
badger 2020/08/27 13:21:47 INFO: All 21040 tables opened in 1m50.532s
...
Time elapsed: 59s, bytes read: 2.0 GB, speed: 34 MB/sec, entries read: 12606368, speed: 213667/sec
```

Heap profile during the test:

```
Showing nodes accounting for 1174.78MB, 89.28% of 1315.86MB total
Dropped 50 nodes (cum <= 6.58MB)
Showing top 5 nodes out of 79
flat flat% sum% cum cum%
384MB 29.18% 29.18% 384MB 29.18% github.com/dgraph-io/ristretto.newCmRow
354.37MB 26.93% 56.11% 506.88MB 38.52% github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
200.70MB 15.25% 71.37% 200.70MB 15.25% github.com/dgraph-io/ristretto/z.(*Bloom).Size
152.51MB 11.59% 82.96% 152.51MB 11.59% github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
83.20MB 6.32% 89.28% 83.20MB 6.32% github.com/dgraph-io/badger/v2/skl.newArena
```

### 5 Minutes Duration

```
Time elapsed: 04m59s, bytes read: 11 GB, speed: 36 MB/sec, entries read: 67763608, speed: 226634/sec
```

Heap profile at ~4min during the run:

```
Showing nodes accounting for 1178.46MB, 90.21% of 1306.37MB total
Dropped 45 nodes (cum <= 6.53MB)
Showing top 5 nodes out of 79
flat flat% sum% cum cum%
384MB 29.39% 29.39% 384MB 29.39% github.com/dgraph-io/ristretto.newCmRow
379.39MB 29.04% 58.44% 507.39MB 38.84% github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
203.86MB 15.61% 74.04% 203.86MB 15.61% github.com/dgraph-io/ristretto/z.(*Bloom).Size
128MB 9.80% 83.84% 128MB 9.80% github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
83.20MB 6.37% 90.21% 83.20MB 6.37% github.com/dgraph-io/badger/v2/skl.newArena
```

### 10 Minutes Duration

```
Time elapsed: 09m59s, bytes read: 22 GB, speed: 36 MB/sec, entries read: 134503361, speed: 224546/sec
```

Heap profile at ~9m:

```
Showing nodes accounting for 1161.13MB, 90.01% of 1290.04MB total
Dropped 63 nodes (cum <= 6.45MB)
Showing top 5 nodes out of 79
flat flat% sum% cum cum%
384MB 29.77% 29.77% 384MB 29.77% github.com/dgraph-io/ristretto.newCmRow
358.72MB 27.81% 57.57% 493.22MB 38.23% github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
200.70MB 15.56% 73.13% 200.70MB 15.56% github.com/dgraph-io/ristretto/z.(*Bloom).Size
134.50MB 10.43% 83.56% 134.50MB 10.43% github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
83.20MB 6.45% 90.01% 83.20MB 6.45% github.com/dgraph-io/badger/v2/skl.newArena
```

### Full DB scan

It took 4h24m to read 8.2 billion entries using the iterator.

```
Time elapsed: 04h24m02s, bytes read: 1.2 TB, speed: 76 MB/sec, entries read: 8279106798, speed: 522604/sec
```

The heap profile at ~4h is:

```
Showing nodes accounting for 1185.06MB, 92.34% of 1283.34MB total
Dropped 60 nodes (cum <= 6.42MB)
Showing top 5 nodes out of 39
flat flat% sum% cum cum%
384MB 29.92% 29.92% 384MB 29.92% github.com/dgraph-io/ristretto.newCmRow
328.35MB 25.59% 55.51% 525.86MB 40.98% github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
197.51MB 15.39% 70.90% 197.51MB 15.39% github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
192MB 14.96% 85.86% 192MB 14.96% github.com/dgraph-io/ristretto/z.(*Bloom).Size
83.20MB 6.48% 92.34% 83.20MB 6.48% github.com/dgraph-io/badger/v2/skl.newArena
```