Use Badger as Pwned-Password lookup-server


(Florian Harwöck) #1

Hi,

I’m building a service with Go and GRPC (https://grpc.io/) that is able to query the pwned password list (https://haveibeenpwned.com/Passwords).

The source list comes in a single .txt file with a size of 30 GB in the format:

SHA1;Count-Of-Breaches

In order to make this data queryable I have built a GRPC service that has a single function CheckPassword and returns a single bool Leaked. Now comes the tricky part: I searched for an integrated Key-Value database to store the 30GB in the filesystem (like SQLite, but for Key-Value pairs optimized) and found RocksDb. This (as most of you obviously know) hasn’t worked very well due to cgo. Now I found Badger and would like to use it as the persistence layer.

After the decision to use Badger was made I wrote a little “Import” tool to convert the 30GB text file to a badger database. After the start, I realized ultra-low-performance for the conversion. I ended up with 800KB of data per 30sec. Therefore I would nearly need two weeks (~13 days) to fully import the data. Therefore I think I made a terrible mistake somewhere or I’m just not fully aware of how Badger works.

Here is the somewhat shorted code I used for my Import-tool

for {
	buf, _, err := r.ReadLine()
	if err != nil {
		if err.Error() == "EOF" {
			break
		}
		log.Fatalln(err)
	}

	key := strings.ToLower(string(buf)[:40])

	err = db.Update(func(txn *badger.Txn) error {
		return txn.Set([]byte(key), []byte{})
	})
	if err != nil {
		log.Fatalln(err)
	}
}

Does someone have any idea what I did wrong? Thanks in advance :hugs:


(Pawan Rawal) #2

The fastest way to load data into badger would be to batch your updates and also perform updates concurrently. The program that we use to benchmark badger write performance https://github.com/dgraph-io/badger-bench/blob/master/populate/main.go, has 32 goroutines writing data in batches of 1000 entries.


(Florian Harwöck) #3

Now finished the rewrite :relaxed: With the things you mentioned and a few other tips and tweaks I got from Slack, I managed to get about 60-80MB/s.

Thanks for the help! :slight_smile: