Log compaction dropping some data

Report a Dgraph Bug

When the log compaction happens, most of my nodes are loosing their “dgraph.type” member, resulting in wrong data returned when doing a query relying on them (particularly when serving the GraphQl API).

What version of Dgraph are you using?

Dgraph Version
$ dgraph version
 
Dgraph version   : v21.12.0
Dgraph codename  : zion
Dgraph SHA-256   : 078c75df9fa1057447c8c8afc10ea57cb0a29dfb22f9e61d8c334882b4b4eb37
Commit SHA-1     : d62ed5f15
Commit timestamp : 2021-12-02 21:20:09 +0530
Branch           : HEAD
Go version       : go1.17.3
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit http://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.

Have you tried reproducing the issue with the latest release?

I am using the latest release.

What is the hardware spec (RAM, OS)?

Ubuntu 21.10 (GNU/Linux 5.13.0-30-generic x86_64), 24 thread CPU, 126G ram.

Steps to reproduce the issue (command/config used to run Dgraph).

I launched DGraph with the following config:

dgraph zero --wal /var/lib/dgraph/zw --my localhost:5080
dgraph alpha -p /var/lib/dgraph/p -w /var/lib/dgraph/w --my localhost:7080 --zero localhost:5080 --security "token=some_token; whitelist=;" --lambda "url=; num=1; port=8686; restart-after=30s; " --limit "mutations=strict" --cache "size-mb=65536; "

I then set the GraphQL schema through the /admin/schema endpoint.

Following that, I start the dgrapg live bulk loader tool with the following command:

dgraph live -f /tmp/pipe.rdf -t some_token -c 1 -x /tmp/uids/

Note that /tmp/pipe.rdf is actually a Unix FIFO, so that I can read my data source, translate it to nquads and feed it to the tool in a streaming fashion.

Everything works as expected, and I can query my data during the loading with no issues. However, after about 45 millions nquads inserted (and counting), the compaction log triggers and drops at least the “dgraph.type” member.

Here is the log of a compaction:

badger 2022/03/02 10:39:14 INFO: [2] [E] LOG Compact 0->6 (5, 0 -> 67 tables with 1 splits). [00001 00002 00003 00004 00005 . .] -> [00006 00007 00008 00009 00010 00011 00012 00013 00014 00015 00016 00017 00018 00019 00020 00021 00022 00023 00024 00025 00026 00027 00028 00029 00030 00031 00032 00033 00034 00035 00036 00037 00038 00039 00040 00041 00042 00043 00044 00045 00046 00047 00048 00049 00050 00051 00052 00053 00054 00055 00056 00057 00058 00059 00060 00061 00062 00063 00064 00065 00066 00067 00068 00069 00070 00071 00072 .], took 3.515s

After this happens, a query like this in ratel:

{
  node(func: eq(dgraph.type, "RoninBlock")) {
    count(uid)
  }
  nodee(func: has(RoninBlock.number)) {
    count(uid)
  }
}

returns different results for both counts, where it was identical just before the compaction and the corresponding GraphQL query also returns a wrong count:

query {
    aggregateRoninBlock {
        count
    }
}

Expected behaviour and actual result.

I expect the result of my queries not to change after compaction, but they do. Any help is appreciated :slight_smile:

Any help from the @core-devs ? Would be appreciated.