DGraph for smaller projects / low-resource contexts

I am new to the Graph database world but have experience in traditional relational databases. I am evaluating if Dgraph would be a good choice for one of my projects.
One of the key requirements I have is the ability to run multiple small standalone Dgraph instances on single machines with low resources (something like 2 CPUs, 1 GiB memory). The amount of data stored on a single instance will be quite small (let’s say <10k documents; ~20 types; <100 predicates in total). Frequency of queries and mutations will be very low (probably less than 5 simple queries per second). High availability and high performance are not a requirement in this context.

When reading the docs, I felt like Dgraph was not built for this “hobby size” scenario; and these questions came up:

  • The docs state that 16 CPUs and 32 GiB of memory per machine are a common configuration. Is it possible to run Dgraph (both the Alpha and Zero node) on one tiny machine with e.g. 2 CPUs and 1 GiB of memory?
  • I read that the standalone Docker image is not recommended for production. What is the reason for this; and what is the difference to a single-node setup?
  • Would you recommend using Dgraph also for low-resource environments; or should I stick to a traditional RDBMS?

Thanks!

It will work until it does not is the thing - and when it does not, the result will be a OOM situation on dgraph, and the suggestion you will get here is ‘add more ram’

The standalone image has 1 alpha and 1 zero - if you hit any corruption, the whole cluster is dead. With 3 alphas and 3 zeros, a corruption can be fixed by removing the effected node and adding a new one.

If you don’t care if your cluster OOMkills and possibly looses all data to corruption, then you can run whatever you want - but that is unpalatable to most running a real app.

1Gb RAM is probably not enough to process schema updates if you have a fairly decent amount of schema. Dgraph is not recommended for this use case.

Thanks for your answers. In this case I’ll probably use something else; but I’ll keep Dgraph in mind for other purposes.

1 Like

We (the community) hashed around some ideas how to get something like Dgraph running as a smaller profile for instances like yours, @maaft particularly is heavily invested in this idea as well.

Just try it. Then you got the answer.

Hi! We’ll probably adapt ent on-top of a postgres database for our usecase. Ent already integrates nicely with graphql. And although the boilerplate overhead is significant, system requirements are suuuuper low.

This is a problem that is limited to the standalone? Wondering why an OOM situation causes unrecoverable data corruption? Is it because Dgraph currently has no solution in place to prevent data corruption, or is it because the solution is actually sharding and replication (which you don’t have with a standalone node)?

EDIT:

@maaft Did you look at Neo4j? Their GraphQL offering is pretty good, and auto-generating. Managing a Neo4j instance involves more manual work with configuring indexes etc, and the performance is slower but it seems like the best alternative to Dgraph right now.

The lack of clarification from the team about what causes these corruption issues, slow inserts etc. and lack of roadmap for how and when they will be fixed is forcing me to look at alternatives. It’s really troubling.

Oh not not to say it’s a problem with the standalone image it’s just a different entrypoint on the normal dgraph image. Just a problem with running a single copy of anything, which the standalone image is.

The OOM was an example of something that can cause a file to not be fully fsynced to flush say the manifest state in badger for example.

Unfortunately I keep coming across issue after issue where dgraph/badger completely corrupts between one node or even one whole group of nodes

At least with the single node failure I can replace it.

1 Like