Bugs in delete preds when the postinglist type is BitDeltaPosting (0x4)

It has two version to show the bugs in different dgraph version.

first version:

···
Decoder]: Using assembly version of decoder

Dgraph version : v20.07.2
Dgraph codename : shuri-2
Dgraph SHA-256 : a927845127dab735c24727d5a24af411168771b55236aec50f0b987e8c0ac910
Commit SHA-1 : a7bc16d56
Commit timestamp : 2020-10-22 10:17:53 -0700
Branch : HEAD
Go version : go1.14.4

For Dgraph official documentation, visit Get started with Dgraph.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.

···

it panic and exit

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x150723e]

goroutine 23283 [running]:
github.com/dgraph-io/dgraph/posting.(*pIterator).posting(0xc081dab450, 0xc0714e9601)
	/ext-go/1/src/github.com/dgraph-io/dgraph/posting/list.go:273 +0x6e
github.com/dgraph-io/dgraph/posting.(*List).iterate(0xc0714e96e0, 0x178b, 0x0, 0xc081dab520, 0x1, 0x0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/posting/list.go:684 +0x3fe
github.com/dgraph-io/dgraph/posting.(*List).IsEmpty(0xc0714e96e0, 0x178b, 0x0, 0xc0709ea200, 0x0, 0x0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/posting/list.go:744 +0xc6
github.com/dgraph-io/dgraph/worker.(*queryState).handleHasFunction(0xc060e63110, 0x1ec40c0, 0xc07e6b26f0, 0xc06c4888c0, 0xc0709ea210, 0xc06b839e00, 0x0, 0x0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/task.go:2302 +0xaf9
github.com/dgraph-io/dgraph/worker.(*queryState).helpProcessTask(0xc060e63110, 0x1ec40c0, 0xc07e6b26f0, 0xc06c4888c0, 0x1, 0x0, 0x0, 0x0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/task.go:976 +0x3c6
github.com/dgraph-io/dgraph/worker.processTask(0x1ec40c0, 0xc07e6b26f0, 0xc06c4888c0, 0x1, 0x0, 0x0, 0x0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/task.go:890 +0x3ed
github.com/dgraph-io/dgraph/worker.ProcessTaskOverNetwork(0x1ec40c0, 0xc07f6278f0, 0xc06c4888c0, 0x0, 0x9, 0xc071591dc0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/task.go:150 +0x35a
github.com/dgraph-io/dgraph/query.ProcessGraph(0x1ec40c0, 0xc07f6278f0, 0xc050106c00, 0x0, 0xc06e8e3320)
	/ext-go/1/src/github.com/dgraph-io/dgraph/query/query.go:2049 +0x19e4
created by github.com/dgraph-io/dgraph/query.(*Request).ProcessQuery
	/ext-go/1/src/github.com/dgraph-io/dgraph/query/query.go:2716 +0x899
[Sentry] 2020/11/11 18:38:00 Sending fatal event [9936d1f9fa38469b99f9340955335957] to o318308.ingest.sentry.io project: 1805390
[Sentry] 2020/11/11 18:38:02 Buffer flushing reached the timeout.

second version:

···
./dgraph_release_v20.07.1 live -f …/data/1__3000000.rdf
Using z.Allocator with starting ref: d72c000000000000
[Decoder]: Using assembly version of decoder
Page Size: 4096
I1111 18:49:10.608026 8198 init.go:107]

Dgraph version : v2.0.0-rc1-888-gbfeb0ebc3
Dgraph codename : unnamed-mod
Dgraph SHA-256 : 989426f9ed941c6f73c50d2a8e6e941533642920111b6ea5cae351b079132ace
Commit SHA-1 : bfeb0ebc3
Commit timestamp : 2020-11-03 16:37:10 +0800
Branch : release/v20.07.1
Go version : go1.14.1
jemalloc enabled : true

For Dgraph official documentation, visit Get started with Dgraph.
For discussions about Dgraph , visit http://discuss.dgraph.io.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.

I1111 18:49:10.608481 8198 util_ee.go:126] KeyReader instantiated of type

Running transaction with dgraph endpoint: 127.0.0.1:9080
Found 1 data file(s) to process
Processing data file “…/data/1__3000000.rdf”
[18:49:15+0800] Elapsed: 05s Txns: 345 N-Quads: 345000 N-Quads/s [last 5s]: 69000 Aborts: 0
[18:49:20+0800] Elapsed: 10s Txns: 700 N-Quads: 700000 N-Quads/s [last 5s]: 71000 Aborts: 0
[18:49:25+0800] Elapsed: 15s Txns: 1020 N-Quads: 1020000 N-Quads/s [last 5s]: 64000 Aborts: 0
[18:49:30+0800] Elapsed: 20s Txns: 1363 N-Quads: 1363000 N-Quads/s [last 5s]: 68600 Aborts: 0
[18:49:35+0800] Elapsed: 25s Txns: 1634 N-Quads: 1634000 N-Quads/s [last 5s]: 54200 Aborts: 0
[18:49:40+0800] Elapsed: 30s Txns: 1990 N-Quads: 1990000 N-Quads/s [last 5s]: 71200 Aborts: 0
[18:49:45+0800] Elapsed: 35s Txns: 2287 N-Quads: 2287000 N-Quads/s [last 5s]: 59400 Aborts: 0
[18:49:50+0800] Elapsed: 40s Txns: 2527 N-Quads: 2527000 N-Quads/s [last 5s]: 48000 Aborts: 0
[18:49:55+0800] Elapsed: 45s Txns: 2781 N-Quads: 2781000 N-Quads/s [last 5s]: 50800 Aborts: 0
Number of TXs run : 3000
Number of N-Quads processed : 3000000
Time spent : 49.564251439s
N-Quads processed per second : 61224
Allocated Bytes at program end: 0 B
➜ v2 git:(master) ✗ ls
dgraph_release_v20.07.1 p w zw
➜ v2 git:(master) ✗ cd …
➜ bugs-for-delete git:(master) ✗ cd code
➜ code git:(master) ✗ ls
q.go
➜ code git:(master) ✗ go run q.go
2020/11/11 18:50:41 Before Delete Query:
2020/11/11 18:50:41 Response: {
“q”: [
{
“uid”: “0x2”,
“c”: 3000000
}
]
}
2020/11/11 18:50:41 <0x2> * .
Response:
2020/11/11 18:50:41 After Delete Query:
2020/11/11 18:50:41 Response: {
“q”: [
{
“uid”: “0x2”,
“c”: 8
}
]
}
2020/11/11 18:50:41 should be zero ,but got 8

···

Here is the github repo for reproduce this bug.

we find that , if the node(0x02) has 3000000 edges, the postinglist for 0x02 is a
bitdeltaposting, it split the value log because it’s too long.
why the count is 8 ?
we find 9 key
40 len follow 0x02 0x1
40 len follow 0x02 xxx somekey1
40 len follow 0x02 xxx somekey2
40 len follow 0x02 xxx somekey3
40 len follow 0x02 xxx somekey4

and the 8 (9-1) may be related to 9
I think there maybe some bugs with bitdeltaposting.

?? nobody here ??

Hey @BlankRain, I see two errors. The fix for the first one would be similar to Initialize posting list in moveToNextPart (#6560) · dgraph-io/dgraph@7873f04 · GitHub . This should be easy to fix.

As for the delete operation not cleaning up data, there seems to be a bug in the posting list split code. I see you have a test for this and we’ll look into it.

I don’t have an answer to why there 9 keys in the DB but I will investigate this and get back to you.

Thank you .
sorry for my late feedback, this days my computer is broken.

the remain 9-keys may because it has split to 9 sublist.
before delete, the type for key is BitDeltaPosting type.
after delete, the type should be BitEmptyPosting .
However,It;s not.

@ibrahim Hey, did there any message about this topic?

Hey @BlankRain, unfortunately, we haven’t been able to look into this issue. This is an accepted bug and we’ll work on this soon :slight_smile:

hey, I add some log to debug this

2020/12/09 18:25:21 cache ReadPostingList [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 0 0 1] 
 &{{{{0 0} 0 0 0 0} 0 0} [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 0 0 1] 0xc00010f3e0 map[] 124 124}
2020/12/09 18:25:21       plist splits:1 splits:800277 splits:1600899 splits:2400072  
2020/12/09 18:25:21 key: [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 0 0 1]
2020/12/09 18:25:21 plist: splits:1 splits:800277 splits:1600899 splits:2400072 
2020/12/09 18:25:21 pmutationMap: map[]
2020/12/09 18:25:21 mposts [] rt 156 au 0
2020/12/09 18:25:21 pitr {0xc00010f380 0xc00010f560 0xc00040fb80 0 0 0xc0004a25a0 [] 0 0 0 0} 
2020/12/09 18:25:21 pitrxx 0 0  
2020/12/09 18:25:21 in loop pp uid:800277  mp 
2020/12/09 18:25:21 iterate posting uid:800277 
2020/12/09 18:25:21 in loop pp uid:1600899  mp 
2020/12/09 18:25:21 iterate posting uid:1600899 
2020/12/09 18:25:21 in loop pp uid:2400072  mp 
2020/12/09 18:25:21 iterate posting uid:2400072 
2020/12/09 18:25:21 in loop pp  mp 
2020/12/09 18:25:21 count function &{{{{0 0} 0 0 0 0} 0 0} [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 0 0 1] 0xc00010f3e0 map[] 124 124}
2020/12/09 18:25:21 count function JSON [123 125] <nil> .... 3

In this line , it showed it read a bad postinglist from badger .
because it shouldn’t have 4 splits.

2020/12/09 18:25:21 cache ReadPostingList [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 0 0 1] 
 &{{{{0 0} 0 0 0 0} 0 0} [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 0 0 1] 0xc00010f3e0 map[] 124 124}
2020/12/09 18:25:21       plist splits:1 splits:800277 splits:1600899 splits:2400072

[0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 0 0 1] means <0x01>
124 124 is readTS and maxTs
0xc00010f3e0 is the address of plist
plist splits:1 splits:800277 splits:1600899 splits:2400072
we can see , the plist is empty but have 4 splits.

The splits should be recomputed and updated when doing mutation .

the code is here:
posting/list.go Line:1161


// Postings calls postFn with the postings that are common with
// UIDs in the opt ListOptions.
func (l *List) Postings(opt ListOptions, postFn func(*pb.Posting) error) error {
	l.RLock()
	defer l.RUnlock()
	log.Printf("key: %v", l.key)
	log.Printf("plist: %v", l.plist)
	log.Printf("pmutationMap: %v", l.mutationMap)
	err := l.iterate(opt.ReadTs, opt.AfterUid, func(p *pb.Posting) error {
		log.Printf("iterate posting %v", p)
		if p.PostingType != pb.Posting_REF {
			return nil
		}
		return postFn(p)
	})
	return errors.Wrapf(err, "cannot retrieve postings from list with key %s",
		hex.EncodeToString(l.key))
}

postings/mvcc.go Line:435

        itr.Seek(key)
	l, err := ReadPostingList(key, itr)
	log.Printf("cache ReadPostingList %v \n %v", key, l)
	log.Printf("	  plist %v ", l.plist)
	if err != nil {
		return l, err
	}
	lCache.Set(key, l, 0)
	return l, nil

Hi, I debug it and find something interesting.
The UserMeta of key changes.

My Steps:

step1. query with 
 query {
     has(follow){
         uid
         c: count(follow)
     }
 }

step2.  delete <$uidOfStep1> <follow>  * . 
step3. query again immediately
query {
     has(follow){
         uid
         c: count(follow)
     }
 }

Step4: query again after 3 seconds

My Logs with comment:

Step 1:
   # has(follow)
2020/12/10 09:45:26 rTs 5826 Attr follow Q attr:"follow" src_func:<name:"has" > read_ts:5826 first:2147483647 

   # 0 0 6 means (len(follow))  102 111 108 108 111 119 means (follow ) in char, 0 0 0 0 0 0 0 0 1 means 0x01
2020/12/10 09:45:26 startKey: [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 0 0 1]

   # do seeking with prefix follow
   # find one with uid 0x02 and UserMeta 16, 16 means posting.BitEmptyPosting , so skip EmptyPosting
2020/12/10 09:45:26 ParsedKey {0 follow 2 false 0  0 0}
2020/12/10 09:45:26 UserMeta 16

   # contiue seeking and find uid 1200002, with UserMeta 8, 8 means BitCompletePosting
2020/12/10 09:45:26 ParsedKey {0 follow 1200002 false 0  0 0}
2020/12/10 09:45:26 UserMeta 8
2020/12/10 09:45:26 BitCompletePosting v5824, key="\x00\x00\x06follow\x00\x00\x00\x00\x00\x00\x12O\x82", version=5824, meta=44
   # end seeking (because I just have 2 keys has follow edge)

   # has(follow) now find one uid 1200002
2020/12/10 09:45:26 handleHasFunction find uids:1200002 

   # get postinglist of this uid 1200002, it shouldn't have splits, because it's a BitCompletePosting not PartSplitsPosting
2020/12/10 09:45:26 unmarshalOrCopy l.plist splits:1 splits:1930801 splits:2730945 splits:3600119  l: &{{{{0 0} 0 0 0 0} 0 0} [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 18 79 130] 0xc00129df20 map[] 5824 5824}
2020/12/10 09:45:26 count function &{{{{0 0} 0 0 0 0} 0 0} [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 18 79 130] 0xc00129df20 map[] 5824 5824}

   # do count with result 3 , because it has 4 empty splits ,the value 3 is deep related to the number of splits
2020/12/10 09:45:26 count function JSON [123 125] <nil> .... 3
2020/12/10 09:45:26 unmarshalOrCopy l.plist splits:1 splits:1930801 splits:2730945 splits:3600119  l: &{{{{0 0} 0 0 0 0} 0 0} [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 18 79 130] 0xc0012ecba0 map[] 5824 5824}

Step2: delete <1200002> <follow> * .

   # 0 0 6 means len(follow) , 102 111 108 108 111 119 means follow, 0 0 0 0 0 0 18 79 130 means 1200002 in 16-base number.
2020/12/10 09:45:26 local cache get [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 18 79 130] 
   with &{{{{0 0} 0 0 0 0} 0 0} [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 18 79 130] 0xc0012ecba0 map[] 5824 5824}  err: <nil>
   # commitToDisk ,op:2 means delete, value:_STAR_ALL means *
2020/12/10 09:45:26 writeToDisk followO�  {<nil> [uid:15940516058459498681 value:"_STAR_ALL" posting_type:VALUE op:2 ] 0 [] {} [] 0}

Step3: query again
   # has(follow)
2020/12/10 09:45:26 rTs 5829 Attr follow Q attr:"follow" src_func:<name:"has" > read_ts:5829 first:2147483647 

   # seek from key 6-follow-0x01
2020/12/10 09:45:26 startKey: [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 0 0 1]

   # find 0x02 but with UserMeta 16 (BitEmptyPosting) ,so skip
2020/12/10 09:45:26 ParsedKey {0 follow 2 false 0  0 0}
2020/12/10 09:45:26 UserMeta 16

   # find 1200002 with UserMeta 4 (BitDeltaPosting ), notice here ,the UserMeta is 4
2020/12/10 09:45:26 ParsedKey {0 follow 1200002 false 0  0 0}
2020/12/10 09:45:26 UserMeta 4

   # read 1200002's postingList, it has no PartSplits,and with maxTs 5828
2020/12/10 09:45:26 After ReadPostingList PartSplits [] L&{{{{0 0} 0 0 0 0} 0 0} [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 18 79 130] 0xc0012ed0e0 map[5828:0xc0012ed140] 0 5828}

   # after delete we find zero uids 
2020/12/10 09:45:26 handleHasFunction find 
2020/12/10 09:45:26 unmarshalOrCopy l.plist splits:1 splits:1930801 splits:2730945 splits:3600119  l: &{{{{0 0} 0 0 0 0} 0 0} [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 18 79 130] 0xc0012ed4a0 map[5828:0xc0012ed560] 5824 5828}


It return as expected: "q:[]"

However, after about 3 seconds ,then I query again with query 
query {
    has(follow){
        uid
        c: count(follow)
    }
}

Query again:
# has(follow)
2020/12/10 10:13:18 rTs 5830 Attr follow Q attr:"follow" src_func:<name:"has" > read_ts:5830 first:2147483647 

# seek from key 6-follow-0x01
2020/12/10 10:13:18 startKey: [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 0 0 1]

# find 0x02 but with UserMeta 16 (BitEmptyPosting) ,so skip
2020/12/10 10:13:18 ParsedKey {0 follow 2 false 0  0 0}
2020/12/10 10:13:18 UserMeta 16

# the UserMeta is 8 , it changes from 4 to 8.
2020/12/10 10:13:18 ParsedKey {0 follow 1200002 false 0  0 0}
2020/12/10 10:13:18 UserMeta 8
# version 5828
2020/12/10 10:13:18 BitCompletePosting v5828, key="\x00\x00\x06follow\x00\x00\x00\x00\x00\x00\x12O\x82", version=5828, meta=44

# has find one uid 1200002, but it should find nothing.
2020/12/10 10:13:18 handleHasFunction find uids:1200002 
# read the uid's data ,and it has 4 splits, so count result is 3
2020/12/10 10:13:18 unmarshalOrCopy l.plist splits:1 splits:1930801 splits:2730945 splits:3600119  l: &{{{{0 0} 0 0 0 0} 0 0} [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 18 79 130] 0xc0006fee40 map[] 5828 5828}
2020/12/10 10:13:18 count function &{{{{0 0} 0 0 0 0} 0 0} [0 0 6 102 111 108 108 111 119 0 0 0 0 0 0 18 79 130] 0xc0006fee40 map[] 5828 5828}
2020/12/10 10:13:18 count function JSON [123 125] <nil> .... 3

It’s strange , do you have any idea about the UserMeta?
It’s the same version 5828. but different UserMeta.
when will badger or dgraph change it?

Oh, I find the function MarshalPostingList

set the kv.UserMeta BitEmptyPosting or BitCompletePosting, but no cases of BitDeltaPosting.

Is here any bugs?

Hey @BlankRain,

Thanks for the detailed bug report and test case! I’ve fixed this here.