The Good, The Bad, The Ugly - State of Dgraph

Talking with @gajanan (or @gajanansc) earlier today, He requested, I post a discuss topic summarizing a bunch of links for feature requests, current problems, ideas, suggestions, etc. The goal here is to be factual and technical. Oh and for those who may have not known before, @verneleem is also me :wink:

If I have to do all of the raw data filtering, manipulation, and analytics in a client outside of the database, then why have the database at all?

^ Sometimes it seems right now that we have 2 API languages and 0 DB languages with Dgraph.

If you haven’t yet, please see The State of Dgraph’s GraphQL API Notion Document. I will reference many of the same topics from this document over again here to try to get a complete overview all in one place linking to many different subjects.

  • Missing @auth rule for post update state
  • Field level authorization, not the same as ACL predicate control in DQL Enterprise
  • External node/type based auth rules
  • Scalar Validation/Constraints (possible solution pre-hooks in GraphQL API)
  • Edge/Relationship Validation/Constraints in DQL
  • Separating/Combining Interface and Implementing Type @auth rules

Auth rules on interfaces cascade to the implementing type auth rules and get combined that implementing types must match ALL rules of the type and the interface, but sometimes that is not wanted, but rather the need is for the rules to be combined with OR logic instead.

  • Separating @auth outside of the GraphQL schema itself
  • Re-usable/Global @auth rules
  • Scoping/Cascading @auth rules
  • Hard limiting results in GraphQL API — Prevent Data Scraping
  • [Completed?] Combining @auth and @custom DQL resolvers
  • Nested Filtering for DQL and GraphQL API (without using @cascade directive)

    AKA: Dealing with more normalized forms of data

  • Paginating child nodes as a whole irrelevant to their multiple parental levels (related to nested filtering)
  • Logically combining filters together from different levels in the graph (related to nested filtering)
  • Ordering/Sorting by nested data
  • Filter by Aggregated results
  • Scalar comparisons
  • String pattern matching (not full regexp)
  • Date/time filtering and manipulation
  • Order by enums in GraphQL API
  • Sorting by Aggregation
  • Calculated Fields/Triggers
  • Full Text Search Best Match Scoring
  • Full text search across multiple fields:
  • String Functions for inter graph comparisons and manipulation
  • Simplifying/Enhancing groupby
  • Verbosity of Multiple Node Updates in GraphQL
  • Deep Mutations

  • Comparative Inputs

  • Auto Incrementing Fields

  • Need to correct generated payload list nullability in GraphQL API

Payloads right now in Dgraph are generated as nullable items in a list. “queryUser: [User]” But this should be corrected to the tightest possible type such as “queryUser: [User!]!” which means that the result will be an array, it could be an empty array, but no items in the array can or ever will be null.

  • No Arrays—only Sets/Lists

But what needs thought out even more so is what this change would bring with it in terms of API capability

  • Add/Update values at positions in a list chosing to replace or skoot over existing values that may exist.
  • Replace a list in its entirety. This is rather difficult to do but is such a simple use case
  • Delete values from a specific index in a list. Right now you can only delete all items in a list or items in a list by their value.
  • Allow lists to maintain order (for lists of scalars and lists of type (aka edges))
  • Move item(s) in a list to a specific location without changing any values
  • Unions sometimes produce unexpected results

  • @auth on union types

Completed? Needs to be documented or implemented if not currently—haven’t tested personally

  • Custom Directives

The idea of custom directives is centered around directives on the developer creating directives available to clients. A developer may wish to allow some kind of direct script being processed on command such as logging the result or adding some metadata to the response (not data, but in the extensions response)

  • Custom DQL Mutations

This really depends on how much is being refactored in the GraphQL codebase and how that refactoring is done. If GraphQL will still be rewritten into DQL then it is equally important to support not only DQL in custom queries, but also DQL in custom mutations.

  • Custom DQL Fields

Custom fields can currently be resolved with lambda, but it would be beneficial (again depending on how the refactor is done) to allow custom fields to also be resolved with DQL.

  • Additional and Custom Scalars

Developers often finding themselves needing to add custom scalars for various reasons. These can often be represented as strings but with additional constraints such as Email, HexColor, Tuple

  • Authentication Service

Dgraph built an authentication system for Dgraph Cloud and was discussing open sourcing it. I believe something like that should be made and integrated into the GraphQL API so that users can easily authenticate against their own data and maybe use lambdas to return the claims from the database that the developer wants to use when a client authenticates.

  • Open Source all Enterprise Features
  • Auditing
  • Namespacing
  • [Completed in 21.12?] Backups

Should these really be enterprise or are these just enterprise level to force users into the Cloud? Are Enterprise licenses even available anymore? (They were not [or ridiculously purposefully priced outside of the budget to whom it was being quoted] under the last administration)

  • Schema Migration Tools

There is a need for when users migrate their schema, they expect the data to follow. This is not done in Dgraph now.

  • Facets are not first class citizens

If nested filtering and linking-nodes are not able to be constrained, then we can’t get rid of facets. But even then, maybe we can take the concept of _linkingnodes and make them work without needing to declare the type in the middle, like how prisma creates pivot tables without you needing to specifically create them and in the prisma ORM it lets you link directly through the pivot table like as if it was a 1:1 relationship. For Reference: EdgeDB has what it terms “link properties” and abstracts these onto types.

  • Mapping GraphQL’s @hasInverse vs DQL’s @reverse

Right now Dgraph GraphQL API, uses the @hasInverse directive to “map” inverse relationships and then the API keeps these pairs of edges balanced with mutation. This creates additional work for adding RDF data with live/bulk loader to add two edges for every inverse relationship.

It might be better if Dgraph would just allow the mapping of the ~ reverse edges.

  • Var Blocks in GraphQL

  • Fuzzy Full Text Search

The ability to use TRIGRAMS on phrases and not just words. You cannot search for a fuzzy phrase, only a fuzzy word. This limits full-text search.

  • Count Words

The ability to count the number of times word appears in a text and sort by that value. This would make it possible to write relevant search algorithms.

  • DQL Loops

very much needed to simplify algorithms without stepping in and out of queries/mutations with a client.

  • More educational material
  • Some docs that allow us to get into the DGraph code easier, so we can contribute
  • Transparent roadmap, open issues and bugs, so we don’t get surprised by missing features or minor bugs

  • Communicate with your users and get us involved! One post from [the Dgraph Labs] a week giving simple product updates and plans can go a very long way.

  • Provide a generous free tier.

  • An out of the box local (offline) development experience (aka without me having to learn / do much)

  • A way to batch mutations so I can roll back a group of changes.

  • Be able to simply replace a list in a mutation.

  • Removing Edges using null does not work

  • Directly integrate other tools into Dgraph like , Auth0, and make it configurable with a few clicks
  • Detailed and accurate GraphQL errors
  • Focus really hard on making it as easy as possible for devs to build side-projects and hobby-projects free/cheap Dgraph instances

Dgraph has all the pieces in place to build the ultimate low-code tool. The simpler you can make it for users, the more users you’re gonna get.

  • No sharding of predicates
  • No Query Planner (CTO’s current vision)
  • Upsert by xid painful for ingest-heavy workloads
  • Missing a native [time]range type

  • The upgrade process requires downtime.

  • It would be nice if we have one click hosting on platfroms like digital ocean for open source

  • Lack of native timestamps.

  • have is a multi-RAFT approach for regional clusters like CockroachDB is doing

  • it’s own mobile solution with offline sync

  • BM25 and or custom search ranking
  • More and custom tokenizers

  • Lack of support for some algorithms
  • many problems in path lookup

  • sub-select statements in graphql
  • More in cloud editor help & messages for when certain changes will orphan data or cause negative side effects

  • I don’t have time to write my own middleware, and I don’t want to host anything myself to deal with servers
  • lambdas are time consuming and should only be reserved for complex tasks

  • Cannot Paste with Ctrl+V on Cloud UI

  • Separarate GraphQL from Dgraph [as a Plugin?]

  • Typescript binding for dql client as well so responses are typed?

Reference: EdgeQL TS Client Achieved this!

  • Fix Dgraph’s type system so that the type is not a string value/predicate

  • An official dgraph toolset for data and schema versioning that is manageable within our codebases.

  • Math Functions

  • Subscriptions based on CDC

  • Subscriptions should use graph-ws

  • [Completed with Learner Nodes?] Add Geo Replication

  • Cloud Requested Fixes/Improvements from @jdgamble555

    • Add the storage [for file/media uploads management possibly connected to S3]
    • Make data studio have CRUD functionality
    • Allow renaming types in the UI
    • Build an Auth System
  • The problem with Lambdas [A MUST READ!]

  • Pagination with Cascade Cache 22

  • Continue bulk/live from where is stopped earlier

  • Offset-based pagination is slow

  • Add distance for geo

  • Cannot Run Upsert in Dgraph Cloud DQL UI (only supports JSON not RDF)

  • Move @default into 22.0[?]

  • Custom [Digging] Function(s)

  • Defer field selection to subquery when using @custom DQL resolver

  • Multiple Reverse Edges

  • Counting within pagination

  • Cursor Based Pagination

Support for JSON-LD

  • Dreaded Context Exceeded Bug

  • HIPAA Compliance

  • Incomplete Items (I believe) From

    Lambda Should continue to resolve GraphQL
    Remote authorization hooks
    Auth on Union type
    Pre/post auth hooks for update mutation
    Global auth rules
    Replacing types in GraphQL schema: show left over data
    Support DQL Variables in Mutations
    String transformation functions
    TF-IDF scoring on full-text search [Dgraph 21.07]
    Integration with Kafka
    Integration with KeyLines
    Support for Gremlin
    Integration with BI Tools (e.g. Tableau)
    Import Neo4j json or CSV
    ORM for top-3 languages ( JS/TS, Py, Java )
    Load/stream data directly from SQL to Dgraph Cloud
    Load/stream data directly from MongoDB to Dgraph Cloud
    Load/stream data directly from Elastic to Dgraph Cloud

This is a start at summing up everything yet again. Others like @BenW @jdgamble555 might have more to add here too…

Probably missed someone’s beloved feature request or problem needing a workaround and I’m sorry, that was not on purpose. :heart_eyes:


Hi @amaster507,

Thanks for quickly putting this together.


Since we brought up in our discussion HL7 you might also be interested in

@gajanan just a followup after editing the OP above, I wanted to provide some other personal topics, ideas, and thoughts that I think might help in your decision making processes building a roadmap for the longterm development of Dgraph.

Just like yourself I came from a heavy SQL background and so have many other users here and potential new audiences. My personal opintion is we need more content like the following worked into official documentation:

And while talking about documentation (as I was involved with it directly before the layoffs) I believe it would be beneficial to put in official documentation clear communication about what Dgraph can and cannot do. It is shady business in my opinion to hide your own known flaws in a discuss forum instead of being the one to first bring them up. I proposed writing some “offical” documentation around Dgraph’s limitation and known workarounds, but that was unnaproved at the time. This would help new developers and teams better evaluate the product knowing in advance what the known limitations (problems) are before investing so much and then being negative PR.

If you haven’t already, I’d aldo HIGHLY recommend reading the #1 post in this forum of all time:

We (@acarey and I) were working on migrating some of this into the official documentation before the layoffs as well. I think it is important to put this kind of overview level in the “official” documentation.

You can also see all top discuss activity with this link to see what are hot topics:

And another idea I had and voiced to you over the phone was a way to store data singularities.

For reference, Vaticle (TypeDB) does this and calls it Attribute Instances:

In the data, attribute instances in TypeDB are globally unique by type and value, and immutable. Being globally unique and immutable means your data is maximally normalised at all times.

What does this mean in practice? As an example, in a database of people with ages, there will be at most one instance of age 10, which can never be changed in-place.

So how do multiple people instances have the same age with value 10? We create an ownership of the attribute instance age 10 by each person.

I believe Dgraph should consider this as well. Growing a database to terrabytes of data would then be even more impressive when you are not storing the same exact attribute millions of times.

If Dgraph could leverage the Subject-Predicate-Object (aka SPO/triple) model but at the same time also leverage the ability to operate in the reversed OPS model, then the values could be stored with singularity and provide the new query engine with the ability to find nodes by values just as if it was travering the graph by an id. This would be taking the Dgraph indexing algorithms to a whole new level and playing field.

Another think that I have learned from typeDB is their rules

With their rules you can infer deeper relationships at a closer layer and then break into the rules to show the in-between edges and nodes.

This would be useful to helping to make more normalized data appear less normalized in an API.

EdgeQL (which has an awesome User-DX btw) is another data tool to keep a cose eye on in terms of feature comparison. They are SQL based, but they have built their query language with the idea of GraphQL in mind like Dgraph did with DQL but kept the SQL parts as well (since they are still SQL technically, postgresql)

They have some awesome features (besides the amazing typescript strongly typed client) such as their Computed properties and Links:

This is similar idea of being able to make a deeper part of the graph appear closer in the API or doing some custom business logic in replace of Dgraph’s lambda implementation.

You hinted at making your new query engine pluggable as well to make future development easier. I cannot tell you how much I agree with this idea!

Other users support this idea too:

EdgeDB works with extension in this same way too: Extensions — Schema | EdgeDB Docs