Badger LSMT options on huge dataset

alexeevaa · May 2, 2023, 7:23pm

I have a self-hosted cluster with 2 shards, 3 replicas each. Dgraph version is 22.02. It is patched (made rather important fixes regarding export, which after a few month still can’t move past ristretto pull requests ) Total data is around 6TiB, 3 TiB on each shard. Here is badger config section:

badger:
    maxlevels: 8
    numgoroutines: 12
    metricsenabled: true
    numcompactors: 8
    levelsizemultiplier: 10

About 4 month ago I have restored DB from export and all data ended up on the last layer. It took around a month of non stop ingestion to reach adequate speed and fill all layers. I’ve played with levelsizemultiplier (by increasing it to 100) to make dgraph fill lower levels faster and avoid huge compactions from 0 to 6 layer and, welp, forgot about it.

But now I have restored one (actually two, among a single shard) of my replicas and once again it is compacting non-stop and hindering all r/w perfomance. Are there any ways to fill layers faster, than just changing levelsizemultiplier?

Replica layers after a few hours of painful compactions look like this:

Level 0 [ ]: NumTables: 15. Size: 924 MiB of 0 B. Score: 3.00->18.52 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 29 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 4.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 290 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 8.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 2.8 GiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 16 MiB
Level 5 [ ]: NumTables: 00. Size: 0 B of 28 GiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 32 MiB
Level 6 [B]: NumTables: 1282. Size: 46 GiB of 283 GiB. Score: 0.00->0.00 StaleData: 180 GiB Target FileSize: 64 MiB
Level 7 [ ]: NumTables: 23536. Size: 2.8 TiB of 2.8 TiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 128 MiB
Level Done

Topic		Replies	Views
Tuning Dgraph / Badger for HDDs instead of SSDs? Dgraph	1	426	February 22, 2025
Badger read performance on 1TB data Badger	0	709	August 27, 2020
Badger write performance on 1 TB data Badger	4	2744	March 15, 2022
Dgraph is filling up my disk, and now I can't start the service Dgraph kind:question	11	1114	January 27, 2021
Dgraph Enhancement Proposal: bulk + live loader? Dgraph	2	602	August 9, 2019

Badger LSMT options on huge dataset

Related topics