Why p folder size increase 100G after delete all nodes and releations?

I use dgraph v22. When I delete all nodes and releations, the size of p folder increased 100G, mostly is the vlog files, and sst file also did not decrease.

I used following Nquad to delete all nodes and releations:

* * .

Kindly waiting for your response!

Thanks a lot!

Hi @doushubao1984,

Without knowing the specifics of your graph, I ran the following test:

Load the 1million movie database via live loader, ls -l of the p directory:

/Users/matthew/test-dgraph-data/p:
total 152224
-rw-r--r--  1 matthew  staff    22424252 Dec 28 13:34 000001.sst
-rw-r--r--  1 matthew  staff  2147483646 Dec 28 13:27 000001.vlog
-rw-r--r--  1 matthew  staff    21006320 Dec 28 13:34 000002.sst
-rw-r--r--  1 matthew  staff   134217728 Dec 28 13:34 00003.mem
-rw-r--r--  1 matthew  staff     1048576 Dec 28 13:27 DISCARD
-rw-------  1 matthew  staff          28 Dec 28 13:26 KEYREGISTRY
-rw-r--r--  1 matthew  staff           2 Dec 28 13:26 LOCK
-rw-------  1 matthew  staff          44 Dec 28 13:34 MANIFEST

Delete Person nodes via:

upsert {
  query {
    PERSONS as var(func: type(Person)) {
      uid
    }
  }

  mutation {
    delete {
      uid(PERSONS) * * .
    }
  }
}

After delete with wildcard mutation:

-rw-r--r--  1 matthew  staff    22424252 Dec 28 13:34 000001.sst
-rw-r--r--  1 matthew  staff  2147483646 Dec 28 13:59 000001.vlog
-rw-r--r--  1 matthew  staff    21006320 Dec 28 13:34 000002.sst
-rw-r--r--  1 matthew  staff    18763120 Dec 28 13:59 000003.sst
-rw-r--r--  1 matthew  staff   134217728 Dec 28 13:59 00004.mem
-rw-r--r--  1 matthew  staff     1048576 Dec 28 13:27 DISCARD
-rw-------  1 matthew  staff          28 Dec 28 13:26 KEYREGISTRY
-rw-r--r--  1 matthew  staff           2 Dec 28 13:26 LOCK
-rw-------  1 matthew  staff          58 Dec 28 13:59 MANIFEST

Note that the write-ahead/value log (.vlog) didn’t increase in size from the initial 2GB size. Now this was a fresh graph, so guessing things are different for you if the size increase was >100GB.

One thing I’ve done in the past is to backup the db using badger, which seems to “clear out” (my words) the vlog. Stop the cluster and then badger backup --dir p. After this, the vlogs are smaller:

total 136264
-rw-r--r--  1 matthew  staff  22424252 Dec 28 13:34 000001.sst
-rw-r--r--  1 matthew  staff   1144245 Dec 28 14:15 000001.vlog
-rw-r--r--  1 matthew  staff  21006320 Dec 28 13:34 000002.sst
-rw-r--r--  1 matthew  staff        20 Dec 28 14:17 000002.vlog
-rw-r--r--  1 matthew  staff  18763120 Dec 28 13:59 000003.sst
-rw-r--r--  1 matthew  staff   5359340 Dec 28 14:15 000004.sst
-rw-r--r--  1 matthew  staff   1048576 Dec 28 13:27 DISCARD
-rw-------  1 matthew  staff        28 Dec 28 13:26 KEYREGISTRY
-rw-------  1 matthew  staff        72 Dec 28 14:17 MANIFEST

One final note, if your goal is to remove ALL nodes (as you mentioned in the title), a more efficient way is available thru the admin endpoint: curl -X POST localhost:8080/alter -d '{"drop_op": "DATA"}'

Thank you very much for your reply! Matthewmcneely.

In you example, after delete person, why generated new sst file named 000003.sst?

Thanks a lot.

I won’t pretend to understand when/why new .sst files (kv-stores) are created, but at some point RunValueLogGC (a badger function) will be called by the alpha to clean up vlogs. Also, running badger info on the p directory actually deleted one of the stale vlogs.

Check out this thread for more details on how to flatten sst files and other insights on how Dgraph manages the files in the p folder: Database becomes much smaller when reimported

Thanks a lot for your reply!

Was it that you just had to wait for this Badger function to be called in order to see your storage space reclaimed ?

Yes, at some point the alpha seems to get around to calling RunValueLogGC and things get cleaned up.

One other thing I’ve discovered is that if I invoke /admin/shutdown to stop an alpha, the p folder is in a better state than if I simply stop the container (SIGTERM) the alpha.

1 Like