Add streaming read capabilities

Moved from GitHub badger/1174

Posted by kernle32dll:

What version of Badger are you using?

2.0

Does this issue reproduce with the latest master?

Yes

What did you do?

I’m using BadgerDB as a versioned document-storage system. It was only after implementing most of my code around it, that I realized that BadgerDB is designed to load values completely into memory, before passing it to the caller.

For my case, that is not quite optimal, as I have no direct control over document size.

What did you expect to see?

Some possibility to pipe values directly to something akin an io.Writer, instead of loading it all into memory.

What did you see instead?

No such method :cry:

Next steps

Well, I did get my hands a bit dirty, and dug around a bit in the BadgerDB code. You can find my current WIP here.
I made nice progress in sketching out something that would allow the functionality mentioned above. I verified basic functionality to work using some rough (uncommited) tests :fire: .

Encryption needs some fixing up, and I have no idea how to implement CRC (that would by definition mean loading everything in memory again…). See my TODO tags.

Is there any interest to further this? I think this feature would be pretty cool, and open new use-cases for BadgerDB.

jarifibrahim commented :

For my case, that is not quite optimal, as I have no direct control over document size.

@kernle32dll what do you mean by not optimal? Do you mean the memory usage is high because your values are large? How big are your values?

Anonymous commented :

Add io.Reader to Item · Issue #710 · dgraph-io/badger · GitHub :man_shrugging:t2:

kernle32dll commented :

Do you mean the memory usage is high because your values are large? How big are your values?

Right now, I don’t have anything, as the project is in active development. Its a private project though, so “I don’t know” is the best answer I can give.

But its not only memory that is my concern. The problem with the way value pulling is implemented right now is as follows: The complete value has to be read to memory by Badger, before I can start streaming it out to the caller (in my case, an RESTful HTTP endpoint).

To elaborate: Say I have a 100mb file. While the memory cost of 100mb is pretty low, I have to load these into memory, and only then I can start putting data in the output stream for the HTTP caller. This problem scales linearly with file size.

What you probably are getting at is the question if it even matters. That, I cannot say. Right now, this is all theoretical discussion, without any benchmark or requirement to back it all up :slight_smile:

#710 :man_shrugging:t2:

Okay… Well, the issue is related. But there is a good reason why I opted for a io.Writer instead of a io.Reader. With a writer, the code filling the writer is responsible for closing the writer. In the case of Badger, that is pretty important IMO, because this involves closing resources (e.g. file handle).

With a reader, Badger can only close all its resources after the caller consuming the reader has given the signal (closing). IMO, this has potential for some nasty lock-up situations.

However, its trivial to implement the use-case of #710 with an io.Writer: Simply use an io.Pipe. In fact, I use that in my WIP code, to allow asynchronous reading from a file to the actual writer provided. The beauty of io.Pipe is, that error propagation is two-way. Problem writing? This is signaled to the reader, and he can stop gracefully. Does the reader (e.g. json.Decoder) cause an error? No problem, the writer receives the error and can stop filling the pipe.

kernle32dll commented :

On a related note, I realized that a io.Reader for write-streaming would be useful. But I haven’t looked at the code yet. I suppose this would be much more difficult, as a broken write stream might result in corrupted data.

Nevertheless - would be cool.

jarifibrahim commented :

@manishrjain @campoy what do you guys think about this?

kernle32dll commented :

Just wanted to ring in here, so this doesn’t go stale.

stale commented :

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale commented :

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale commented :

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.