zzl221000
(JimZhang)
January 20, 2021, 9:37am
1
A lot of update operations do not trigger Compaction, my disk is already full with sst files.
Who can help me?
I Want to Do
Is it possible to manually Compact sst files?
Now I can only delete all the files and start the service to synchronize the data from the cluster. This is too slow!
Dgraph Metadata
dgraph version
Dgraph version : v20.11.0-rc5
raph codename : tchalla
Dgraph SHA-256 : 95d845ecec057813d1a3fc94394ba1c18ada80f584120a024c19d0db668ca24e
Commit SHA-1 : b65a8b10c
Commit timestamp : 2020-12-14 19:09:28 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : true
ibrahim
(Ibrahim Jarif)
January 20, 2021, 10:21am
2
Hey @zzl221000 , we do not have a way to manually compact the SST files. Badger’s compaction should clean up the deleted data automatically.
Can you show me the contents of your data directory? I’m looking for the total number of SST files and their total size.
zzl221000
(JimZhang)
January 20, 2021, 12:04pm
3
I listed one of the most abnormal nodes right now. The real data size should be about 50G.
total number
[root@zk02 p]# ll |wc -l
3079
total size
[root@zk02 p]# du -h --max-depth=1 ./
373G ./
alpha3_p_file.txt (183.7 KB)
ibrahim
(Ibrahim Jarif)
January 21, 2021, 3:34pm
4
@zzl221000 would you be able to run dgraph debug
on your data directory and share the output? The dgraph debug
command will read all your data and print some statistics about it.
vnium
January 21, 2021, 3:40pm
5
zzl221000
(JimZhang)
January 22, 2021, 7:38am
6
Hey, @ibrahim ! Everything is fine with the cluster since I rebooted.
I’ll post the Debug output after the problem reappears.
zzl221000
(JimZhang)
January 22, 2021, 7:40am
7
@vnium would you like to run the debug tool and post the output? Maybe we’re both having the same problem.
vnium
January 23, 2021, 1:44am
8
I going to post the log in the next few days.
zzl221000
(JimZhang)
January 23, 2021, 1:02pm
9
@ibrahim
The output of the debug tool is too large. I can’t post it.
posting dir
[root@zk04 dgraph]# du -h --max-depth=1 /dgraph/alpha1/p
337G /dgraph/alpha1/p
end of debug
badger 2021/01/23 20:33:45 INFO: Badger.Stream Sent data of size 265 GiB
badger 2021/01/23 20:33:45 INFO: Lifetime L0 stalled for: 0s
badger 2021/01/23 20:33:45 INFO:
Level 0 [ ]: NumTables: 02. Size: 37 MiB of 0 B. Score: 0.00->0.00 Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 2 [B]: NumTables: 05. Size: 8.5 MiB of 10 MiB. Score: 0.00->0.00 Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 21. Size: 56 MiB of 57 MiB. Score: 0.00->0.00 Target FileSize: 4.0 MiB
Level 4 [ ]: NumTables: 105. Size: 562 MiB of 566 MiB. Score: 0.00->0.00 Target FileSize: 8.0 MiB
Level 5 [ ]: NumTables: 483. Size: 5.2 GiB of 5.5 GiB. Score: 0.00->0.00 Target FileSize: 16 MiB
Level 6 [ ]: NumTables: 920. Size: 55 GiB of 55 GiB. Score: 0.00->0.00 Target FileSize: 32 MiB
Level Done
tail -n 7 output file
{d} attr: dgraph.type uid: 736325768 ts: 100794028 item: [49, b0100] sz: 49 dcnt: 1 key: 00000b6467726170682e7479706500000000002be37088
{d} attr: dgraph.type uid: 736325769 ts: 100794028 item: [49, b0100] sz: 49 dcnt: 1 key: 00000b6467726170682e7479706500000000002be37089
{d} attr: dgraph.type uid: 736325770 ts: 100794028 item: [48, b0100] sz: 48 dcnt: 1 key: 00000b6467726170682e7479706500000000002be3708a
{d} attr: dgraph.type uid: 736325771 ts: 100794148 item: [49, b0100] sz: 49 dcnt: 1 key: 00000b6467726170682e7479706500000000002be3708b
{d} attr: dgraph.type uid: 736325772 ts: 100794148 item: [84, b1000] sz: 84 dcnt: 0 isz: 84 icount: 1 key: 00000b6467726170682e7479706500000000002be3708c
Found 1634798107 keys
head -n 10 output file
[Decoder]: Using assembly version of decoder
Page Size: 4096
Listening for /debug HTTP requests at port: 8080
Opening DB: /dgraph/alpha1/p
prefix =
I�<���vt�O��x��"��1\"o�� ts: 3162147 item: [61, b0100] sz: 61 dcnt: 1 key: 00000b526c4e6f64652e726c6964020b4d910d49e23caac4ec76748e4fb0a778d616e69a22bd9c1013315c226fb01ccf
Y܆eC*�$e�TW�C��_�3{�Ro�ؒf ts: 333083 item: [61, b0100] sz: 61 dcnt: 1 key: 00000b526c4e6f64652e726c6964020b4d910d59dc860765432ab62465ed5457ca43c6e95f11c3337bc9526f8fd89266^"i�e���ߊ#e*{�b�^BF�����l7� ts: 4552271 item: [61, b0100] sz: 61 dcnt: 1 key: 00000b526c4e6f64652e726c6964020b4d910d5e2269a4659a048dd8df8a23652a7bda62965e4246f08bf68f996c37a0
h����L���*����8e.rlid term: [11] M�
�n�n��� ts: 2711668 item: [61, b0100] sz: 61 dcnt: 1 key: 00000b526c4e6f64652e726c6964020b4d910d689801ced2df1a4cabbceb2a95879c130589380ba0126e816e9a1dafb7
q�x�%���\���h���LI��*�c ts: 29897333 item: [61, b0100] sz: 122 dcnt: 2 key: 00000b526c4e6f64652e726c6964020b4d910d7106bc78a125e9f49c5c9d82c1689a1ccacb4c491a0ea514cc072ae063
zzl221000
(JimZhang)
January 25, 2021, 6:12am
10
Hey @ibrahim , After observing for many days, I guess the problem is related to the skip policy of snapshot. How to avoid skipping snapshot? The logs are as follows:
I0125 03:46:50.357040 17 draft.go:606] Creating snapshot at Index: 139409320, ReadTs: 189197838
I0125 03:47:41.336820 17 draft.go:1611] Skipping snapshot at index: 139409320. Insufficient discard entries: 0. MinPendingStartTs: 170455756
I0125 03:48:41.336092 17 draft.go:1611] Skipping snapshot at index: 139409320. Insufficient discard entries: 0. MinPendingStartTs: 170455756
I0125 03:49:41.337446 17 draft.go:1611] Skipping snapshot at index: 139409320. Insufficient discard entries: 0. MinPendingStartTs: 170455756
I0125 03:50:41.337501 17 draft.go:1611] Skipping snapshot at index: 139409320. Insufficient discard entries: 0. MinPendingStartTs: 170455756
...
I0125 04:07:50.332705 17 draft.go:606] Creating snapshot at Index: 139419376, ReadTs: 189209100
I0125 04:08:41.334584 17 draft.go:1611] Skipping snapshot at index: 139419378. Insufficient discard entries: 2. MinPendingStartTs: 170455756
I0125 04:09:41.335293 17 draft.go:1611] Skipping snapshot at index: 139419378. Insufficient discard entries: 2. MinPendingStartTs: 170455756
I0125 04:10:41.338297 17 draft.go:1611] Skipping snapshot at index: 139419378. Insufficient discard entries: 2. MinPendingStartTs: 170455756
...
I0125 04:28:50.338235 17 draft.go:606] Creating snapshot at Index: 139429608, ReadTs: 189219985
I0125 04:29:41.336136 17 draft.go:1611] Skipping snapshot at index: 139429610. Insufficient discard entries: 2. MinPendingStartTs: 170455756
I0125 04:30:41.335993 17 draft.go:1611] Skipping snapshot at index: 139429610. Insufficient discard entries: 2. MinPendingStartTs: 170455756
Alpha keeps skipping snapshot until my hard drive is full.
This happens frequently when I slow down data updates and writes. Writing 5 to 10 RDFs per second took up 895G of disk after 6 hours.
ibrahim
(Ibrahim Jarif)
January 27, 2021, 10:22am
11
@zzl221000 the snapshot calculates the number of entries that we can discard and based on that number can choose to create a snapshot.
I noticed that the MinPendingStartTs
was at 170455756
for almost an hour. Were there no queries or mutations running at this time?
zzl221000
(JimZhang)
January 27, 2021, 2:23pm
12
@ibrahim Mutations are being executed at the rate of one per second.