Should Pstore and WALstore use the same time ticker for gc?

I’m checking value gc problem.

And from the trace message( i do some modify to print the current gc directory and file) by /debug/requests, i notice that gc is only apply on WALstore after some time as followings even there is nothing to clean, but gc on Pstore is only apply a few times just after the restart of alpha and then never happen.

2020/05/16 18:50:36.953901	0.719244	GC
18:50:36.953909	 .     8	... Could not find candidate via discard stats. Randomly picking one.
18:50:36.953914	 .     5	... Randomly chose fid: 14377 w/014377.vlog 2 0
18:50:36.953919	 .     5	... Size window: 122757335.00. Count window: 100.
18:50:36.953920	 .     1	... Skip first 622.21 MB of file of size: 1170 MB
18:50:37.673130	 .719211	... Stopping sampling after reaching window size.
18:50:37.673142	 .    11	... Fid: 14377. Skipped: 648.81MB Num iterations: 25. Data status={total:130.95389366149902 discard:0 count:3}
18:50:37.673143	 .     1	... Skipping GC on fid: 14377
2020/05/16 18:50:06.954075	0.548130	GC
18:50:06.954083	 .     8	... Could not find candidate via discard stats. Randomly picking one.
18:50:06.954087	 .     4	... Randomly chose fid: 14377 w/014377.vlog 2 0
18:50:06.954091	 .     4	... Size window: 122757335.00. Count window: 100.
18:50:06.954092	 .     1	... Skip first 651.83 MB of file of size: 1170 MB
18:50:07.502191	 .548099	... Stopping sampling after reaching window size.
18:50:07.502203	 .    12	... Fid: 14377. Skipped: 692.05MB Num iterations: 26. Data status={total:130.81290340423584 discard:0 count:3}
18:50:07.502204	 .     1	... Skipping GC on fid: 14377

Then i check the gc code here, and found that the Pstore and WALstore use the same time ticker for gc. But golang never promise that goroutines wait on the same ticker are alternately triggered. Is this what dgraph want or something degined by purpose.

	s.vlogTicker = time.NewTicker(1 * time.Minute)
	s.mandatoryVlogTicker = time.NewTicker(10 * time.Minute)
	go s.runVlogGC(s.Pstore)
	go s.runVlogGC(s.WALstore)

@JimWen The GC is triggered only if Badger (the storage layer) can determine if there is enough data to delete. The walStore (w directory) and pstore (p directory) both are opened with different options

WalStore is opened with

While pstore is opened with

This option affects how GC will reclaim space. For instance, in case of w directory we set the following to allow GC to clean space aggressively

Also, note that both walstore and pstore goroutines have different tickers in new version of dgraph

Thank you for your detial reply. I review the source code, that problem has been fixed for Dgraph v20.x, but it is still there in Dgraph v1.2.4 which is latest released.

  1. ticker define in a struct
https://github.com/dgraph-io/dgraph/blame/v1.2.4/worker/server_state.go#L34
// ServerState holds the state of the Dgraph server.
type ServerState struct {
	FinishCh   chan struct{} // channel to wait for all pending reqs to finish.
	ShutdownCh chan struct{} // channel to signal shutdown.

	Pstore   *badger.DB
	WALstore *badger.DB

	vlogTicker          *time.Ticker // runs every 1m, check size of vlog and run GC conditionally.
	mandatoryVlogTicker *time.Ticker // runs every 10m, we always run vlog GC.

	needTs chan tsReq
}
  1. init once
https://github.com/dgraph-io/dgraph/blame/v1.2.4/worker/server_state.go#L199
	s.vlogTicker = time.NewTicker(1 * time.Minute)
	s.mandatoryVlogTicker = time.NewTicker(10 * time.Minute)
	go s.runVlogGC(s.Pstore)
	go s.runVlogGC(s.WALstore)

3.and two goroutinue wait for the same one

https://github.com/dgraph-io/dgraph/blame/v1.2.4/worker/server_state.go#L69
func (s *ServerState) runVlogGC(store *badger.DB) {
	// Get initial size on start.
	_, lastVlogSize := store.Size()
	const GB = int64(1 << 30)

	runGC := func() {
		var err error
		for err == nil {
			// If a GC is successful, immediately run it again.
			err = store.RunValueLogGC(0.7)
		}
		_, lastVlogSize = store.Size()
	}

	for {
		select {
		case <-s.vlogTicker.C:
			_, currentVlogSize := store.Size()
			if currentVlogSize < lastVlogSize+GB {
				continue
			}
			runGC()
		case <-s.mandatoryVlogTicker.C:
			runGC()
		}
	}
}

@JimWen my bad. I didn’t check the code for v1.2.x . Feel free to send a PR to fix it. The relevant part of the code can be copied from the master branch to the release/v1.2 branch.

2 Likes

Thank you, a PR has been send. https://github.com/dgraph-io/dgraph/pull/5468

@ibrahim Do you know which commits include these changes? It’d be better to cherry-picked them into the 1.2 branch rather than copying code if it’s possible.

That would be better and i have update the PR by cherry-pic commit from master.

1 Like