Environment and Dgraph Version
dgraph:v23.1.0 with docker on aws ec2 m6a.4xlarge (single node setup)
Issue
We are seeing a lot of disk space usage of our dgraph instance.
Used space (docker volume): 996 GB
Size of backup: 3 GB
We do not use a lot of indexing (only three “exact”, one “boolean” and one “@reverse”).
Interestingly, if we set up a new instance from scratch and load data from the backup the volume starts with around 6-10 GB. Over the period of 2 months it increased to 996 GB. It does not seem to stop increasing.
Why does it use so much disk space and why does this increase over time so much? Is this expected and what can we do about it?
Thanks so much for your help.
@magentarepeater In the past, these sorts of file bloats have been caused by transactions issues in Badger (Dgraph’s underlying store). In your case, I’m assuming the increase over time occurs when the graph is being mutated. Can you share how you’re accomplishing that, in other words what language and SDK/API are you using?
In the meantime, if you normally schedule downtime of your cluster (backups, etc), you might consider trying the badger flatten
command on your datadir
Thanks for the answer.
I’m using the JS SDK in NodeJS (v21.3.1). Some mutation are quite complex (upserts) if that helps. Do you need more details?
Can badger flatten
only be called on downtime? Any risks here?
Appreciate your help.
I’d recommend having a hard look at your transaction management. This section of the SDK readme covers it pretty well: GitHub - dgraph-io/dgraph-js: Official Dgraph JavaScript client
I’ve seen cases in clusters in which transactions that are not “cleaned up” correctly can cause bloat. I’m not saying that’s definitely the problem, but it’s the first place I’d start looking.
Thank you.
I just checked all our transactions and we’re using the setCommitNow(true)
option with all mutations. If i now add txn.commit()
it throws Error: Transaction has already been committed or discarded
. So I assume this is not the problem here (or is it better to use txn.commit()
instead?)
Do you have any other ideas?
Thanks again, appreciate it.
We could finally resolve the issue. It had nothing to do with our mutations or upserts.
After analyzing our badger directories we found that the issue was the export directory. Over time, there were a lot of backups saved there. We expected them to be cleaned up after they were uploaded to S3. It would be great if the docs could be updated so people know that their exports are persisted there as well.
Thanks @matthewmcneely for your help!