isDisabled is indexed and is also part of the ORGANIZATION_USER type and there are many vertices that satisfy this condition. The predicate is indexed like so.
<isDisabled>: bool @index(bool) .
The query used to work and stopped working. This is just one example and we have noticed this across predicates. This issue is observed in bool, string and int type predicates as well. The issue has been observed in lt() and gt() functions as well for int predicates. This issue has caused some side effects in atleast one use case as the query did not return the expected data and the application, created unnecessary duplicates.When I dropped the bool index and reapplied it, the query started working again. Reapplying the index works for int type too. But then, it stops working AGAIN after a while/some hours. I know this cause we have reapplied the index multiple times for strings and finally went ahead and used regexp instead of eq() as a workaround. But what do we do about ints and bool?
Hey @joseph, thanks for reporting this.
Would you be able to share your data set so that we can try to reproduce it? We have tests for count(..) queries that seem to work. Maybe this issue shows up with some specific data set.
Also, have you tried running one of the latest releases of dgraph?
Thanks Ibrahim, appreciate your help. We have the issue in a lower environment as well but exports dont work as we have crossed the 30 day trial. Is there any other way?
Hey @joseph, you can zip and share your data directories. Each alpha has p and w directory and each zero has a zw directory. Please share all the three directories.
You can share the directories here or you can email them to me at ibrahim[at]dgraph.io .
Hey Joseph,
I’ve replied to your email and copied over the following two questions from our email conversation.
Can I import this data in my local machine running v20.03.0? If so what steps should I follow? The folders above are from v1.1.1. Since this was not exported, will live & bulk loader work?.
The data format of dgraph v1.1.1. is not compatible with v20.03.0. The only way to run the same data on v20.03.0 would be by doing an export from v1.1.1. and then an import in v20.03.0 (using bulk/live loader) . @MichelDiz Please correct me if I’m wrong.
We just have one zero and one alpha (for now) and we store p, w and zw in the same folder.
If we used docker and added more alphas in the same host, should we map them to different volumes?
Ideally, you should map different alphas to different volumes so that loss of one volume would not result in the loss of all the data.
I was able to reproduce this issue after running the same query for about 4 hours. This is related to roll ups in alpha.
These are the alpha logs when the first query failed at 22:57:31 and it has been failing since then.
One important observation is that roll ups were running in alpha at 22:57:31
I0416 22:19:23.103866 23073 log.go:34] Compaction for level: 0 DONE
I0416 22:54:29.544599 23073 groups.go:604] No healthy Zero leader found. Trying to find a Zero leader...
I0416 22:54:29.551467 23073 groups.go:897] Zero leadership changed. Renewing oracle delta stream.
E0416 22:54:29.552915 23073 groups.go:873] Error in oracle delta stream. Error: rpc error: code = Canceled desc = context canceled
I0416 22:54:29.645984 23073 groups.go:641] Found connection to leader: localhost:5080
I0416 22:54:29.696129 23073 groups.go:833] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
I0416 22:54:29.696150 23073 groups.go:842] Got Zero leader: localhost:5080
I0416 22:55:30.942289 23073 draft.go:678] [0x1] Set Raft progress to index: 562031.
I0416 22:55:32.108007 23073 draft.go:383] Creating snapshot at index: 562083. ReadTs: 6160730.
I0416 22:55:32.108031 23073 storage.go:589] CreateSnapshot i=562083, cs=nodes:1
I0416 22:55:34.090251 23073 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]}
I0416 22:55:34.090315 23073 log.go:34] Running for level: 0
I0416 22:55:41.604493 23073 log.go:34] LOG Compact 0->1, del 6 tables, add 1 tables, took 7.514154308s
I0416 22:55:41.604526 23073 log.go:34] Compaction for level: 0 DONE
I0416 22:56:29.701271 23073 draft.go:678] [0x1] Set Raft progress to index: 564962.
I0416 22:57:29.710277 23073 draft.go:678] [0x1] Set Raft progress to index: 567688.
I0416 22:57:30.696727 23073 log.go:34] Rolling up Time elapsed: 01s, bytes sent: 0 B, speed: 0 B/sec
I0416 22:57:31.450630 23073 log.go:34] Rolling up Created batch of size: 1.5 MB in 4.53621ms.
I0416 22:57:31.450650 23073 log.go:34] Rolling up Sent 668 keys
I0416 22:57:31.454309 23073 draft.go:1076] Rolled up 665 keys. Done
I0416 22:57:31.454331 23073 draft.go:425] List rollup at Ts 6160730: OK.
I0416 22:57:31.456136 23073 draft.go:1109] Sent tablet size update to Zero. Total size: 279 MB
I0416 22:58:29.699431 23073 draft.go:678] [0x1] Set Raft progress to index: 567775.
I don’t see any activity in zero at the time of failure
I0416 19:01:02.070633 22755 log.go:34] 1 is starting a new election at term 15
I0416 19:01:02.070731 22755 log.go:34] 1 became pre-candidate at term 15
I0416 19:01:02.070779 22755 log.go:34] 1 received MsgPreVoteResp from 1 at term 15
I0416 19:01:02.070940 22755 log.go:34] 1 became candidate at term 16
I0416 19:01:02.070969 22755 log.go:34] 1 received MsgVoteResp from 1 at term 16
I0416 19:01:02.071167 22755 log.go:34] 1 became leader at term 16
I0416 19:01:02.071204 22755 log.go:34] raft.node: 1 elected leader 1 at term 16
I0416 19:01:02.071292 22755 raft.go:663] I've become the leader, updating leases.
I0416 19:01:02.071343 22755 assign.go:41] Updated Lease id: 690001. Txn Ts: 6170001
W0416 19:01:03.971531 22755 node.go:669] [0x1] Read index context timed out
I0416 19:01:18.607681 22755 zero.go:415] Got connection request: cluster_info_only:true
I0416 19:01:18.609582 22755 zero.go:433] Connected: cluster_info_only:true
I0416 19:01:18.612966 22755 zero.go:415] Got connection request: addr:"localhost:7080"
I0416 19:01:18.615132 22755 zero.go:452] Found a member with the same address. Returning: id:1 group_id:1 addr:"localhost:7080" leader:true last_update:1582009811
I0416 19:01:18.615213 22755 zero.go:454] Connected: addr:"localhost:7080"
I0416 22:54:29.645034 22755 zero.go:415] Got connection request: cluster_info_only:true
I0416 22:54:29.645551 22755 zero.go:433] Connected: cluster_info_only:true
I0416 22:55:34.697404 22755 oracle.go:107] Purged below ts:6160730, len(o.commits):0, len(o.rowCommit):0
Can you point me to the relevant PR which fixes this? Another user has reported a similar issue here. I feel best solution would be to upgrade, I want to ensure that their queries would need minimum changes once they go from v1.1.1 to v20.3 before I suggest the switch.