What is Dgraph lacking?

Is it lacking performance?

Better performance is always nice, but Dgraph is already very fast.

Do you really need transactions?

Yes

Should Dgraph not be a database? Should it be a serving system instead?

Dgraph should primarily a database.

Should Dgraph interface with other databases better?

No need for me.

Does Dgraph not work with GraphQL properly? or React?

GraphQL is not a priority for me. I only focus on DQL as it has far more features and allows me to offload computation to Dgraph.

What I would like to see is more focus on the core product: the graph database. Also, more graph algorithms directly integrated in Dgraph (like shortest path) would be nice.

4 Likes

I use DGraph for multiple huge databases - millions of nodes and edges, with gigs of data. To analyze this data, I use the DGraph → Spark connector. For me, the main competition to DGraph is Neo4j, which is VERY expensive. I use DQL exclusively and not GraphQL.

However, I have had multiple show-stopping crash/corrupt bugs which have impeded my progress. These are frustrating. (There are other threads on these)

There really isn’t another option OTHER than DGraph and Apache SPARK when it comes to open source graph analytics, which is a pretty huge market. Focusing DGraph on the analytics might mean integrating some graph algorithms (community detection is the primary one!), which is not a bad idea, and integrating and extending the DGraph->SPARK connector as a primary feature.

3 Likes

Yes. We’ve been impressed with dgraph’s performance at scale- but things get stressful once the cluster grows large enough. Certainly don’t take my tone as disingenuous, dgraph is young and engineering tradeoffs have to be made, but here’s what we run up against:

No sharding of predicates

This is a big concern for us- as it means if a predicate becomes large enough, the only way to deal with it is vertically scaling. (We have predicates like xid and name that have over 100 million entries, and are over 100gb in size each).

The shard-balancer wrecks so much havoc on these large predicates that we had to completely disable it. It moves things too frequently, and its only criteria is size (instead of “hotness”). Any time one of these large predicates is moved (or indexed), the cluster grinds to a halt because incoming data gets stuck behind the operation- so much so that if we have to move a predicate, we have to pause all of our ingestion pipelines and wait for dgraph to catch up.

No query planner

Dgraph makes naive decisions on how a query gets executed- for instance, if filtering by two predicate A and B in a query,

  • A with 1 billion unique UIDs, with an index
  • B with 10 unique UIDs, without an index,

dgraph will choose A over be if it sees that A is indexed, but B would have bounded the query faster. Its frustrating when end users of our system have to do things like write their query in reverse to get better performance.

Related, cascade is one of dgraph’s most powerful features, but without a smarter query planner, deep/wide cascades can be unusably slow. This is frustrating since the whole reason you chose a graph db is because you want cheap inner-joins :slight_smile:

Im excited for the roaring bitmaps work, as I could see that improving some of these issues.

Upsert by xid painful for ingest-heavy workloads

All of our inserts happen with deterministic ids- but since dgraph creates its own uid for everything, we are forced to have every insert be an upsert. (Query by xid, if it doesn’t exist, add it, otherwise update it). This puts pressure on the cluster that I wish we had a way to avoid. We want to be able to do blind-writes.

We’d be happy to hash our ids in a way that they became dgraph uint64 uids (that we could insert directly), but it feels like dgraph is not intended to work this way (the way the zeros lease blocks of uids, them being contiguous and sequential, etc)

Feature Wishlist

All of the data we store in dgraph is temporal, so we’ve had to do some gnarly gymnastics to allow users to filter nodes/edges by spans of time (we have dangling “timerange” nodes that hang off of every node in the graph… which we use as a filtering mechanism via cascade).

I would be over the moon is dgraph had a native timerange type that was just [starttime, endtime]. This would allow us to put a list of timeranges directly on a node, and then a query like intersects(100,200) return us a node that had a timerange predicate like [0,110], [180, 300]. This would reduce the stress we put on the cluster across the board.


Dgraph has a strong foundation, and I know the team has ideas about the issues I’ve brought up. To echo some of the other commenters, I am more interested in dgraph as a graph databse than a gql platform, given dgraph is the only modern, cloud-native, graphdb in town.

11 Likes

Right now it seems like we have a messaging problem. I do not seriously know if this is an app-platform you are building or a graph database. If the target of dgraph is just a means to support running GraphQL like hasura did with postgresql then I did not understand that from the get-go.

What is the future of dgraph? A Hasura competitor or a Tigergraph/Neo4j/Neptune competitor? If it is both, should they live under the same namesake? Can they coexist and coevolve?

6 Likes

Thank you for stepping in @mrjn and sorry to hear it’s been a tough year. Fingers crossed the VCs will come to their senses. I can only echo some of what was said above:

  • Important with a more transparent roadmap even if it takes twists and turns and comes to a halt sometimes due to unforeseen events etc. The product is already great in many aspects and I think the community would be more patient if we had more insight and better ways of contributing. Personally I think you could take more inspiration from Chakra-js and Next-js, two very vibrant communities that use Github issues, projects and discussions. https://github.com/chakra-ui/chakra-ui, https://github.com/vercel/next.js. I think this would really benefit the community and make for more in-depth and organized conversations. Building open source without having a tight integration between discussions and source code, prevents a natural flow of ideas and contributions. I understand that ‘policing’ Github issues could be a lot of work, but that’s what this community is for. We can help close issues and move them to discussions as needed. That way we can much better separate discussions and ideas from actual issues that need proper tracking and assignment … and it will look more cutting-edge in the eyes of VCs …

  • In terms of lacking features, I am leaning more towards the DQL crowd (with a rocksolid core and rich DQL layer) but I totally see the value of GraphQL as well. @seanlaff and @illuminae had many good points re. that core – such as anything that facilitates imports/upserts with XID, query planner, safe sharding of large predicate collections etc. Basically the things that keeps the system performant and makes I/O pleasant. I think better to improve the DQL documentation first (quite a lot is crammed into “Functions”: https://dgraph.io/docs/query-language/functions/) rather than spreading thin across DQL and GraphQL. Maybe document the GraphQL (and the desired roadmap and input you have) in a way that the community can take it on. Seems like less of rocket-science compared to the other parts, and it also requires more collaborative work to keep up with 3rd party specs and tools.

  • Overall I think more customers (both smaller and larger) will come as a result of more of us getting into production and spreading the word. So “customer success engineer” role is continuously important along with good Cloud Features that are self-explanatory. But a success engineer that is not working directly with Github issues in a transparent way, quickly leads to frustration. Here I think it’s important to draw the line more clearly so that customer success engineers can focus on issues that directly enable the customers work linked to customer payment.

  • I think your website and docs are actually quite good. But it depends on who you’re targeting obviously … The branding is good. I think the problem is a lack of understanding of ‘graph’ overall but as more developers see the benefits, the time will come. I think it does make sense to explain and emphasize graph/RDF upstream to GraphQL. For being a graph database, you seem to interface relatively little with the graph community. That might lead to the following dilemma: Frontenders are not yet fully familiar with RDF/graph data and its benefits. ‘Semantic web people’ are not yet fully familiar with GraphQL and its benefits. Also, because you’re neither fully RDF compliant nor fully GraphQL compliant (I might be wrong here), there will always be some naysayers. But then, take a look at what TypeDB is doing … They don’t take that No for an answer. They instead redefine the rulebook, much like you are doing. I think the key is to redefine it only so much that you can offer a natural segue (or enough bonus features) that a leap of faith is worth taking. For me personally, I’m willing to make the leap from “real semantics/real RDF” to Dgraph if, and only if, the performance and ease-of-use (especially of the query language) trumps over the downsides of no longer having a link to the rest of the LOD (linked open data) space. For people coming from the GraphQL side of things, I think that segue is more about delivering the GraphQL that they know and love coupled with demonstrations of performance, while gradually teaching them concepts from the world of graph logics.

Looking forward to the continuation!

7 Likes

I’m not really interested in GraphQL. What we need is a stable,fast,scalable graph database with good DQL features which is currently pretty good.

We currently use Dgraph in production, and really the only complaint I have still is that the upgrade process requires downtime.

6 Likes

I still think that every system should allow for maintenance windows, but I understand not wanting them. Facebook, Google, etc. don’t do maintenance windows, and Dgraph is pitched as a backend solution that should have been used by Google, but they passed on it.

I think the big kicker still is that the data files are part of the upgrade as well and cannot just be dumped between versions. I would think that eventually the data files would not need to be changed between versions just upgrading the algorithms to query/mutate the data in them more efficiently and with more tools/functions.

1 Like

I am here as a hobbyist developer ,

No i guess ,

From my point of view there are lot of possibility for dgraph it can be a great database for apps or apps that need complex relations ; dgraph would be better in such cases
as @BenW mentioned https://roamresearch.com.

Another Issue i find is the pricing of dgraph cloud or hosting it
It would be nice if we have one click hosting on platfroms like digital ocean for open source
or maybe have a hobby plan for it like 2$ or 5$ per month on dgraph cloud
the current with 1mb limit is just not enough and the enterprise and shared plans are just out of budget for hobbits developers like me
I myself am a very big fan of dgraph due to performance ,graphql and the many possibility of it as a general purpose database and the simplicity it provides over other databases
but kept away from it due to its plans
and same for so many hobbyist developers like me

For that graphql with typescript with Apollo becomes complicated and not easy to start as compared to supabase or firebase

3 Likes

Are you using codegen? It is an absolute must if using GraphQL, Apollo, and Typescript

With Dgraph, Apollo, Codegen, React, and Typescript you have ONE source of truth for your types that are strongly typed from the top to the bottom of your app. Need to change the types anywhere, you just change them in one place, the Dgraph GraphQL schema, and then deploy/rerun codegen and you have the same schema updated at the database (DQL schema) and the frontend (Typescript). Gamechanging!! :exploding_head:

3 Likes

The missing feature that surprised me most was lack of native timestamps – gives the impression applications aren’t the main use-case.

7 Likes
  • Is it lacking performance?

Not that I’ve found, but I echo @seanlaff 's comment above re: sharding on predicates. For us, this is a huge concern. The vast majority of our graph will only have a few predicates (xid being ubiquitous).

  • Do you really need transactions?

Yes, if this is meant to be a production application database.

  • Should Dgraph not be a database? Should it be a serving system instead?

We are looking for a graph database, not a graph layer on a relational database, if that’s what you meant.

  • Should Dgraph interface with other databases better?

If Dgraph could ingest RDBMS schemas that would be interesting, though I don’t know how you would solve that.

  • Does Dgraph not work with GraphQL properly? or React?

We are not interested in GraphQL as the main feature of Dgraph. We are looking for a massively horizontally scalable graph database that can do the high-performance graph traversal that an RDBMS isn’t tuned for. And in fact we prefer to use Dgraph Cloud rather than host it ourselves. It’s not an easy product to self-administer, and we’re thankful for the support.

In fact, the focus on GraphQL has been a little disappointing. We maintain a GraphQL schema just in case, but we use DQL exclusively.

The only wishlist item I have is a multi-RAFT approach for regional clusters like CockroachDB is doing, but it’s not at all a deal breaker.

4 Likes

You know, one thing I would pay double for is if Dgraph had it’s own mobile solution with offline sync. MongoDB has Realm, but unfortunately it only syncs with MongoDB. Even if Dgraph built a syncing gateway that works with Realm (presumably by building an extension to Realm that allows it to sync with Dgraph’s servers) that would be incredible.

As nice as that would be, I totally understand the hard stop of Dgraph to not go this direction and even limit supported OS to only Unix. This helps the team to super focus and build out what matters with the best way instead of doing it one way for Unix, another for PC, another for Mobile, etc.

2 Likes

OK my selfish requests.
I want Dgraph to provide:

  • Blazing FAST query results
  • Cheaper than other platforms because its been designed to take advantage of SSDs instead of everything having to live in ram
  • Clear examples of how to accomplish things
  • ES like search and query DX and performance if not faster, a decent if not identical query string query api with great index intersection queries
  • BM25 and or custom search ranking
  • More and custom tokenizers
  • The whole damn thing is postings lists it should have amazing search!
    UIs are built with search interfaces, ES query string query is THE interface I have used for over 8 years on multiple projects to build applications. As soon as you add search to an app, the search index drives all the views because it does the filtering, the sorting, the faceting etc. All of those queries are constructed using a simple query string query api. I would love to see this. Query string query | Elasticsearch Guide [7.15] | Elastic

Personally I don’t care about lambdas, maybe some day I will but they seem like a crutch for missing features and I have no visibility or intuition about how they impact performance, scaling and ongoing cost of operation vs something like aws lambda which I already use extensively

Also I don’t care about GraphQL, I love that I can write a simple schema and get a full API that rocks, GraphQL was just a hoop I had to jump through. Perhaps subscriptions will change my mind on the value of GraphQL but it also seems like it holds the platform back because if something is not in the spec we can’t use it? Having two query languages feels kludgy not to mention yet another graph query language seriously you couldnt just use gremlin or sparql or something? I’m certain you have great reasons for YAGQL but the pre-existing stuff already has so much documentation /rant

3 Likes

Lack of support for some algorithms, such as community algorithms. And there are many problems in path lookup, often too much memory or too long query time to get results

3 Likes

More and custom tokenizers

There is already support for custom tokenizers, fwiw. Indexing with Custom Tokenizers - Query language

2 Likes

@iluminae thanks for the link. I have avoided DQL so far and stick exclusively to the GraphQL side of the cloud service. I will certainly take a look but since I already do pre-processing in my aws lambdas upon ingest I will probably implement my custom tokenization there. Downsides to adding DQL to my project are that it’s yet another thing to figure out, it has more power but requires more maintenance of relationships whereas GraphQL has more guard rails and does more hand holding which I appreciate.

6 Likes

I think there are some things missing from DQL that could reduce friction when developers consider adopting Dgraph. For example, the ability to include non-aggregate predicates in @groupby query blocks. I’m willing to bet that it’s a common query folks try, especially those that come from SQL backgrounds and are used to queries like SELECT COUNT(1), id, name FROM table GROUP BY id. Improvements to pagination allowing you to query pages using limit/after when applying some non-uid sort on an indexed field would also be welcome. I haven’t tried GraphQL, so I don’t know if these features are available there.

I could probably do without transactions for most of my queries, with one notable exception: there’s currently no way to replace a list entirely in one query. The only option is to run two mutations, one delete and another set.

4 Likes

TLDR; DGraph makes a high quality wife, but a terrible girlfriend.

Everyone has different backgrounds, so here is mine:

I’m a senior dev freelancing to support ~5 small to mid size companies (100k/year to 20 million). Some of those projects are the kind that could explode in growth. I’ve stood up close to a dozen websites in the last 2 years from scratch and shipped 4 different react-native applications. The name of the game for me is fast scalability. I use dgraph cloud, and mostly use the graphql API for one project. I’ve played with Neo4j before, but mostly I’ve done SQL.

For me, I obviously want it all. I want a scalable backend that requires little to no code from me for the core operations. I want the ability to customize and plug into that backend when I need it to.

RESPONSE SUMMARY
DGraph makes a high quality wife, but a terrible girlfriend. It has some great virtues but you have to invest a lot into it to get them. It’s still too needy and picky, so unless it’s exactly the solution you need, it’s not worth dealing with it’s idiosyncrasies yet.

I intend to only use dgraph on some projects for the foreseeable future. The projects must both need high scalability and have extensive relationships that are critical to core operation of the project. If I can get away with Hasura, I will continue to use Hasura for the time being due to it’s extra polish and significantly better developer experience.

I want to recommend DGraph to people. I have a consulting company that I work with that I’d love to get to use dgraph instead of Hasura beacuse it would simplify many of their problems. But I can’t because Dgraph is not developer friendly.

To use DGraph more I would need to see:

  • Filtering across relationships (in graphql)
  • Bulk Update across relationships (in either, but preferably graphql since it’s cleaner than DQL)
  • Something like SQLs cascade on delete/update system so I can enforce data integrity at the data layer

To recommend DGraph more I would also need to see:

  • the graphql for dql tutorial be improved to help more than junior developers (sub-select queries, sub-select in mutations, group by, windows, indexing, rank)
  • Improved DQL docs for upserts OR add sub-select statements in graphql (preferred!)
  • Faster uptime to first external query
  • More in cloud editor help & messages for when certain changes will orphan data or cause negative side effects. IE - "You dropped the column “oldDataColumn” from your schema, would you like to remove it from your DQL schema and also remove all predicates for that column?
  • Better lambda support. Currently lambdas are slow and unwieldy. Hasura has a simple UI that would help a ton.

Also, I’d be happy to do some User meetings to discuss things more in depth, walk you through my development experience and it’s frustrations, brainstorm ideas, discuss training materials or help almost in any other way.

I’ve already mostly committed a pretty large project to dgraph and would love to be able to use it for more projects and recommend it more. It could easily become my go to backend for everything.

DETAILED RESPONSE

To help out I’d like to rate DGraph compared to both Postgres, and Hasura

Scoring PostgreSQL vs Dgraph (out of 5)

Basic CRUD:
Dgraph: 4 Postgres: 5
Relationships:
Dgraph: 5 Postgres: 3
Data Maintenance:
Dgraph: 2 Postgres: 4
Data Validation:
Dgraph: 2 Postgres: 4
Data Transforms:
Dgraph: 3 Postgres: 5
Scalability
Dgraph: 5 Postgres: 3

Explanations:
Basic CRUD
Pretty close here, the -1 to dgraph is mostly due to having to maintain a giant schema file instead of being able to break things out into separate files
Relationships:
Dgraph wins here, and this + scalability is the reason I’m using DGraph still even with it’s limitations
Data Maintenance:
This is painful in DGraph. I shouldn’t need anything more than an editor to maintain my data. I have to write scripts way too often. Bulk updates, bulk deletes, renaming columns, moving/copying data from one column to another. Even using DQL I still find this obnoxiously painful too often, not always, but way more than it should be.
Data Validation:
Mostly minus points here because of a lack of more find grained data types, and any form of constraints to ensure data continuity. Please don’t improve this too much until after making Data Maintenance better.
Data Transforms:
There are no triggers, except Lambda, and Labmda is a non-performant pile of avoid me right now. Also, doing any sort of data transform across a table requires a script. Doesn’t work good in DQL or GQL. Say what? We have a graph database that excels at relationships and none of our core query languages support maintaining or working with your data across those relationships.
Scalability:
Yeah, Graph databases win here big time when dealing with large datasets. Only applicable to larger clients. Since my project is still in-progress I’m just hoping dgraph scalability lives up to what it feels like it should be capable of.

Scoring Hasura Cloud vs Dgraph Cloud (out of 5)

Authentication/Authorization:
Dgraph: 3 Hasura: 5
Side Effects:
Dgraph: 3 Hasura: 5
UI Console vs Dgraph Cloud Console
Dgraph: 3 Hasura: 5
First Time Setup:
Dgraph: 4 Hasura: 4
Graphql API Quality:
Dgraph: 2 Hasura: 5

Explanations:
Authentication/Authorization:
-1 to DGraph because Hasura’s permissions UI is phenomenal. Dgraph’s is good enough, but if they were to copy Hasura’s permission mechanism, that would be way better. -1 Because it wasted 4+ hours of my life trying to get my first query working externally. Being on DGraph cloud I was missing adding an SDK key. Finally found the docs in a completely different section of the website, buried. Write a guide: “Getting your first external query working”, or even better allow me to have my server in “dev mode”, and give me a better error message with a link to the docs.
Side Effects:
Mostly because DGraph’s lambda system is painful. I get one tiny little editor in the cloud, or I have to set up a new custom build and upload to an endpoint. If I’m going to do that I’ll just do it on AWS or something where there is already a lot of tooling support. Especially because lambda’s on dgraph so far are slow.
UI Console vs Dgraph Cloud Console:
Hasura’s is more polished. DGraph still has one giant schema file, the inability to delete a type and all it’s child predicates and their data, one giant lambda file. Way too much padding so I have to scroll all the time. No separated permissions or relationship definitions.
First Time Setup:
They’re both pretty equal here. In Hasura you have to setup your DB separately, but you get a UI that can help you start building your schema effectively and with separated tables.
In DGraph the Graphql Schema is a lot nicer for modeling data than SQL, no separate database, but you have one place for one giant schema file which leads to a LOT of scrolling. Plus, putting a comment in the bottom of the schema file for your auth stuff? Seriously?
Graphql API Quality: I actually think that DGraph has more features overall here, but the big reason I ding DGraph so hard is they are a Graph database that doesn’t allow you to walk relationships on a filter. Cascade fits some use cases, but not all (like mutations). Hasura only works on SQL databases, but still allows you to walk across relationships.

7 Likes

To be clear, you’re talking about Nested Filters (probably the most requested GraphQL Feature).

And this is my suggestion, the @reference directive. This could be accomplished now by using a post-hook lambda, but you would have to write it. This is definitely a huge missing feature, but do-able now.

J

1 Like