High disk space usage by DGraph

Dgraph started consuming a lot of disk space.

We were running Dgraph on 1TB disk space for almost 5 months.
And from past one month we hav eto increase disk space to 2 TB. And in less than 48 hours its consumed 0.5 TB of disk space. Please help is identifying the root cause of the issue.

FYI-
per day data size remains the same since Day 1

Machine specification-

  • AWS instance-type: m5.2xlarge, (8vCPU and 32GB memory)
  • Dgraph version : v1.0.11
    Commit SHA-1 : b2a09c5b
    Commit timestamp : 2018-12-17 09:50:56 -0800
    Branch : HEAD
    Go version : go1.11.1
  • 2TB disk size
1 Like

What are the disk usage sizes for the p, w, and zw directories? And what do the logs say during the times where disk usage is increasing?

There have been lots of improvements in the latest Dgraph release since v1.0.11 that contribute to reducing the disk space in certain cases such as snapshot streaming between replicas. You might want to consider upgrading to the latest version of Dgraph.

We are using dgraph 1.0.16 and have inserted around 16 Lakhs sample records.
The disk space used by p directory is 18GB which is the biggest concern for us as this is just a sample records.
Our actual data is 20 million+ so disk space it uses around 200GB which is too much.
Inside p directory, I found 2 different types of file .vlog and .sst.

What is the use of vlog file?

If we delete the .vlog files then does it affect the data?

Any idea, why it takes so much of disk space? And how to optimize it to reduce the disk space?

Thanks,
Nishit

Hey @nshah14285,

You cannot delete the vlog or sst files. But an interesting question in this context would be to “Optimize Badger/Dgraph for disk space”. @dmai , what would you recommend?