Empty Dgraph instance consumes 45GiB Disk

It’s been a while since I last built a Dgraph instance from scratch. Docker image and older instances started up with minimal disk usage. However, I downloaded and built v20.11.0-ga880912d6 today. On startup, it created the w and zw directories, but allocated 20GiB in each with 5GiB in the p directory.

du -h
5.0G	./p
 20G	./w
 20G	./zw
 45G	.

In zw and w:

 16G 20 Oct 17:18 00001.wal
4.0G 20 Oct 17:18 wal.meta

So 20 GiB each. Then in p:

  20B 20 Oct 17:16 000002.vlog
 2.0G 20 Oct 17:17 000003.vlog
530B 20 Oct 17:07 000006.sst
 2.0G 20 Oct 17:17 00002.mem
 1.0G 20 Oct 16:37 DISCARD
  28B 20 Oct 16:37 KEYREGISTRY
   5B 20 Oct 17:17 LOCK
 136B 20 Oct 17:17 MANIFEST

An empty DB requires 45GiB storage - is there any way to bring this to a rational figure, I’ve clearly missed something?

Thanks in advance.
Mike

You are referring to just the allocated space correct? I am sure there is probably a way already to change the allocated space. I have not been through all of the startup commands, but it is probably either in there somewhere or could the allocated space be updated even after the instance starts?

Hey @mikehawkes, dgraph creates some files upfront and these are memory-mapped. The file sizes you’re seeing is because we truncate the file to a large value so that we can memory map them.

The 45GB disk space that you’re seeing doesn’t mean that dgraph is actually consuming 45 GB of disk space. Here’s what I see on my Linux machine

ls -alh                                                         
total 40K
drwx------ 2 ibrahim ibrahim 4.0K Oct 21 15:45 .
drwxrwxr-x 5 ibrahim ibrahim 4.0K Oct 21 15:45 ..
-rw-rw-r-- 1 ibrahim ibrahim 2.0G Oct 21 15:45 000001.vlog
-rw-rw-r-- 1 ibrahim ibrahim  205 Oct 21 15:45 000002.sst
-rw-rw-r-- 1 ibrahim ibrahim  240 Oct 21 15:45 000004.sst
-rw-rw-r-- 1 ibrahim ibrahim 2.0G Oct 21 15:45 00013.mem
-rw-rw-r-- 1 ibrahim ibrahim 1.0G Oct 21 15:45 DISCARD
-rw------- 1 ibrahim ibrahim   28 Oct 21 15:45 KEYREGISTRY
-rw-rw-r-- 1 ibrahim ibrahim    6 Oct 21 15:45 LOCK
-rw------- 1 ibrahim ibrahim   88 Oct 21 15:45 MANIFEST

du -h *
4.0K	000001.vlog
4.0K	000002.sst
4.0K	000004.sst
4.0K	00013.mem
4.0K	DISCARD
4.0K	KEYREGISTRY
4.0K	LOCK
4.0K	MANIFEST

As you see, ls -al shows the size of the file (which we increase so that mmap can work) but du shows the actual disk space used. On my Linux machine, the disk space used is very less because those files have empty holes.

Running ls -alhs would also show you the actual number of disk block being used

ls -alhs
total 40K
4.0K drwx------ 2 ibrahim ibrahim 4.0K Oct 21 15:45 .
4.0K drwxrwxr-x 5 ibrahim ibrahim 4.0K Oct 21 15:45 ..
4.0K -rw-rw-r-- 1 ibrahim ibrahim 2.0G Oct 21 15:45 000001.vlog
4.0K -rw-rw-r-- 1 ibrahim ibrahim  205 Oct 21 15:45 000002.sst
4.0K -rw-rw-r-- 1 ibrahim ibrahim  240 Oct 21 15:45 000004.sst
4.0K -rw-rw-r-- 1 ibrahim ibrahim 2.0G Oct 21 15:45 00013.mem
4.0K -rw-rw-r-- 1 ibrahim ibrahim 1.0G Oct 21 15:45 DISCARD
4.0K -rw------- 1 ibrahim ibrahim   28 Oct 21 15:45 KEYREGISTRY
4.0K -rw-rw-r-- 1 ibrahim ibrahim    6 Oct 21 15:45 LOCK
4.0K -rw------- 1 ibrahim ibrahim   88 Oct 21 15:45 MANIFEST

That being said, we’re working on improving how we mmap files (and lay them on disk).

@mikehawkes are you using windows/mac? I see that du shows 45 GB of disk usage on your computer while it is very less on my computer.

However, I downloaded and built v20.11.0-ga880912d6 today.

I couldn’t find this commit on dgraph master branch. Where did you get it from?

1 Like

Thanks for the reply - actually, I’m on a Mac and it caused mayhem - because the internal drive’s only 512G, it’s reasonably full of all manner of assorted junk. I keep source on the local machine and then all the files generally on an external drive. I had about 50G free (external drive has 12TB) and for some insane reason ran everything on my internal drive - at that point, it ran out of disk space and killed off tasks, prevented stuff closing down, writing to disk etc … anyway, drive management aside, it would appear that this space had been allocated:

ls -alhs zw
total 41943040
       0 drwx------  4 mikehawkes  staff   136B 20 Oct 16:37 .
       0 drwxr-xr-x  5 mikehawkes  staff   170B 20 Oct 16:37 ..
33554432 -rw-r--r--  1 mikehawkes  staff    16G 20 Oct 17:51 00001.wal
 8388608 -rw-r--r--  1 mikehawkes  staff   4.0G 20 Oct 17:51 wal.meta

Perhaps it’s a Mac ‘thing!’

I just used the git clone https://github.com/dgraph-io/dgraph.git command to get the source. That’s the version number cut/paste from startup.

I’m concerned because we’re about to kick off a training course internally around Dgraph, and I’d like to have several instances running for different people - if each comes in at 45GiB, then we’re into significant image sizes. Also, for some of our smaller clients, we’d use small AWS images (with 16GiB RAM and limited disk capacity) so it could cause problems - unless, it’s per your Linux instance, the OS just reports a place-holder.

I’m guessing that the Docker image route hides this ‘grab’ from the underlying OS as I see significantly smaller consumption on my Docker installs?

I wanted to go back to build in order to run through end-to-end, but could drop that from the courseware and stick to Docker.