Storing blobs in Dgraph

adam · November 5, 2017, 11:57pm

What technical or practical limitations are there related to the storage of opaque binary data in Dgraph? Would it be possible to store a megabyte sized binary object in a node? What about a gigabyte blob? Would storing such large chunks directly in the graph result in any performance issues or otherwise be a terrible idea?

tamethecomplex · November 6, 2017, 12:01am

I’ve had the same question in my mind lately. A byte array would be a great predicate type to have if it wouldn’t hurt performance to store big blobs.

peter · November 6, 2017, 12:07am

There are some (arbitrary) hardcoded limits in badger for value sizes (2GB). These are in place for technical reasons to aid recovery when the data directory in case the data directory becomes corrupted for some reason.

Data would also have to be sent from dgraph to clients and other dgraph instances, so that’s also something that you would want to take into consideration when storing values that large.

Binary blobs aren’t supported right now (although I think it’s a good idea to in the future). As a work around though, you could always do something like Base 64 encoding in a string.

mrjn · November 6, 2017, 12:49am

The reason this is true, is because it’s redundant to have separate support for them, when you can just as easily do base 64 encoding and store them as strings. It’s not clear what more do we need to provide on top.

Badger has a limit that the key+value size shouldn’t exceed the size of a single value log. We don’t support a value spilling across file boundaries; which is the reason why the total size should be below that. You could set the value log size to a higher limit, and then send out larger values (say 4GB), it’s all configurable.

There’s nothing inherent to Dgraph, which would slow a binary blob’s retrieval, or affect its performance. Under the surface, Badger is dealing with this, and it can deal with these in a much more efficient manner than other KV DBs.

tamethecomplex · November 6, 2017, 8:21pm

Are there any read/write performance gains to be made by storing the BLOBS natively rather than having the base-64 encoding step in-between? If not, then I guess there’s no need to allow byte array storage as you said.

peter · November 6, 2017, 10:55pm

On dgraph’s side, data is data so there wouldn’t be any read/write difference. There will be slightly more data to handle because base 64 is less compact than raw bytes, but this wouldn’t matter for most use cases.

In fact if you use the Go client, it will convert any []byte fields in your objects into base64 encoded strings automatically for you and do the opposite conversion when you retrieve those edges (we get this for free because this behaviour is baked into json - The Go Programming Language).

system · December 6, 2017, 10:56pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Big Nodes Question Dgraph	2	1039	April 4, 2019
Does Dgraph have a scalability problems with graphs having single heavy predicate? Dgraph	8	77	December 11, 2024
Is there a limit to the size of the string data of a predicate (scalar, string type)? Dgraph kind:question , dgraph	10	3398	May 23, 2023
Documentation about how Badger actually stores Dgraph data on disk Badger	4	1775	August 9, 2020
Queries and Storage Questions Dgraph	2	856	November 28, 2018

Storing blobs in Dgraph

Related topics