C API to allow Badger usage in other languages

Hello,

I’m wondering if there is a consideration to add a C API to Badger allowing it to be used by another language.

My system is written in Elixir, and currently I use RocksDB via their C API. I was very interested in using Badger as an alternative since it seems to have some features that would fit better to my use case.

Hi @sezaru, welcome to the community. Its great to know that you want to try out Badger. I am interested in knowing the features of the badger that will fit better for your use case.

Our goal for near future is around memory optimizations and stablisation. This may be a good enhancement but is not on our immediate roadmap. Requesting comment from @ibrahim.

Hey @sezaru, we do not plan on adding support for more languages.

If you wish you use badger in a C project, you can do it via https://golang.org/cmd/cgo/#hdr-C_references_to_Go.

1 Like

Sure.

Basically I created a “small” time series database using RocksDB, what I did to this to work was to use their prefix_extractor and comparator to allow me to add keys like “my_key.iso8501_timestamp” and then iterate through then via prefix_iterator in reverse order.

The small part of the database is that I limit the data stored per key by a TTL and I would also want to limit by max number per key (I could only implement the TTL in RocksDB with their built-in TTL system, but I didn’t find any way to limit by max number of keys during compaction).

Reading about Badger online, it seems to me that it would fit nicely for this use case, I would be able to store the keys without the timestamp since Badger support key versioning, and it seems like I would be able to limit my key with TTL and max key limit because of it.

2 Likes

Yeah, you are correct, I guess I can do a “mini” api that suit my needs and use that as the wrapper.

1 Like

Hey, by the way, if you guys don’t mind if I get a little off-topic, I have some question about applying Badger to my use case.

I want to make sure Badger has all the features and performance requirements I need before I invest more time into testing it and creating the C API for my needs.

I will explain what I’m trying to do with it in more detail.

I’m trying to store time-series financial data in an efficient manner, this data doesn’t need to be stored forever, but they do come in high quantities during certain periods of time.

So, for example, every five minutes, I receive a very high number of data to be stored (around 150~300k values) divided into multiple keys. This data normally comes in batches of around 80 operations each, so 300k / 80 = 3750 batches.

Normally, each value of this data changes one single key, in rocksdb, I organize this as follows:

I have around 800 rocksdb instances (they all share the same cache memory so I can control the memory usage easily), each instance has around 20 column families and each column family has around 160 keys (actually, it is way more keys, I will explain why below).

To store the data as time-series data, I use rocksdb prefix extractor and comparator, ex:

Let’s say I want to add a new value to key rsi in column family five_minutes of a rocksdb instance. I do not want to replace the old value in the rsi key, I want to append somehow to it.

To do that, I add an ISO8601 timestamp to the end of the key, like this: rsi.2020-09-20T02:07:51.353549173+00:00.

That way I can add new data to the rsi key without having to (de)serialize a list to append the data to it, I just add a new key with that name, and then I can retrieve a list of the latest data from rsi key using rocksdb prefix iterator (note that I had to create my custom comparator since by default the lexicographical sort of rocksdb is by ascending order).

Now, I do not want to keep this data laying around forever, so I use TTL to remove old data based on their column family, for example, the five_minutes column family will remove data older than 4 days, ten_minutes, 8 days and so on.

It would be nicer to control the data remove logic by number of data and not TTL, but I don’t think that is possible with the built-in features of rocksdb, for example, if the five_minutes column family with rsi key exceed a maximum of 1000 values, remove the exceeded older values.

So… that’s is pretty much it. Now, do you guys think Badger would be a great fit for my use case?

Things that had my attention was the key versioning, which I guess would allow me to have the same behaviour as my key with timestamp workaround for rocksdb but with probably better performance since keys are store in memory and also allow me to limit the number of data (versions) per key instead of using TTL (or maybe I can even have a combination of both).

What I’m not sure is if I can open multiple database instances is ok with Badger and if I can share the cache memory between the instance to avoid too much memory usage (normally all my rocksdb instances consume in total around 5gb of ram peak).

Note that I create an instance of rocksdb per market (btc/usdt, eth/usdt, etc) because otherwise I would need to create more column families (ex. btc_usdt_five_minutes instead of a btc_usdt instance with five_minutes column family) and for some reason after each column family created, the next one takes more time in rocksdb… I also tried to create just the 20 column families and add the market to the key (ex. btc_usdt_rsi), but this would result in extremely slow column family compactions for some reason after some time.

Maybe Badger has a better solution for that?

I didn’t find any mention about something similar to column families, if there is none, how would you recommend me to organize my database?

Thanks for the help!