A History Lesson and Challenge for Dgraph Users, Developers, and Management team

TL;DR;

If there are any newer users or even engineers, CEO, CTO, etc. I think it is VERY important for you to go back and try to get a grasp on Dgraph from the very beginning. What was the driving factor to even create Dgraph? What was the reason for each of the main pivots? What effect did these pivots have on the company and the project itself? At the very least, read the quote snippets at the bottom that I have compiled that tell the history I am going to outline


I’m prompted to write this in response to the conversation from this recent article alongside some personal conversations amongst Dgraph dev peers.

First, a driving principle that I wholeheartedly believe in:

Those who cannot remember the past are condemned to repeat it.” – George Santayana, The Life of Reason, 1905. From the series Great Ideas of Western Man.

I am not writing this to blame or call names, although some names will be used to help tell the story. What is important is to learn from this history that I believe many, even including the new organization taking over R&D/support, may not be aware of.

There was a decision made in late 2015 to use GraphQL as the main query language to build a new graph database. This was a decision that set in motion where we are today with Dgraph. Not too long after that decision and running into issues with realizing GraphQL is not a database query language, another choice was made to deviate from the GraphQL specification which initially became known as GraphQL±, now known as DQL.

If you have been a software developer for any length of time, you have made these same kinds of decisions. You see a problem and find a solution that at the time looks to be the right choice. Later on, you realize that your decision has flaws and you have to pivot. I know I have been down this same path many times.

This is the history of Dgraph, design decisions with big pivots. Sometimes a pivot leads to a complete rebuild (Java → Rust), while other times it leads to a completely new-looking, and named, project (Grakn → TypeDB). The difference with Dgraph is that almost every time Dgraph has pivoted, backward compatibility is mostly maintained at almost all costs. This seemingly has led to half-baked solutions that are 80% completed. The last 20% of every project is always the hardest and usually becomes the 300%, lol. There is one big instance where I saw this pattern was not followed, replacing RocksDB with Badger. Side thought, the claim that “data is handled directly by Dgraph, and not given off to another database layer” is not true at all.

So we have these pivots so far in Dgraph: (these may be out of order, I tried)

  1. GraphQL → GraphQL± (DQL)
  2. Non Transactional ACID → Transactional ACID
  3. No Schema Support → Schema and Schemaless Support
  4. RocksDB → Badger
  5. Commitment to Gremline, Cypher, and GraphQL Support → Only GraphQL support.
  6. Non Spec GraphQL → native GraphQL Spec
  7. MultiOS (including Windows) Support → No Windows Support
  8. Authorization Hooks support → Query only JS Lambda hooks
  9. Primary DB → Secondary Blockchain Indexing DB (failed attempt)
  10. Vector Support with Lambdas → Native Vector Support (ongoing?)

With most of these pivots a common trend has been to leave the former solution at 80% completed. DQL is missing a ton of features that were promised and were lead to believe were close to release, but then GraphQL spec came into view and those features have been mostly untouched ever since. Auth within GraphQL spec compliant was developing rapidly and then came along Lambda hooks, and Auth limitations have been mostly untouched ever since. Do I even need to mention again the commitment to add support for Gremlin and Cypher to only “punt” on that commitment?

I can agree that Vector Databases have some interesting use cases and is currently where investors are looking toward, but we (I can speak for many in the community) believe that by making another pivot will leave the existing 80% completed project in the dust with only fixing small things that will add value to the main push of vector support. Side thought, maybe we’ll finally get actual list support instead of just sets and maps as vectors need true list support I would think.

Pivots are usually done for 3 different reasons. 1) Financial Steering, 2) Technical Challenges, or 3) User/Developer Adoption. Dgraph has a history of making big pivots for a few user/developer adoption, and seemingly ridiculous pivots because of financial steering. This in return caused much fustration from existing developers who were promised new features and are left in the dust without seeing the light of day while catering to someone else. This has happened now more than once with Dgraph, and it is very hard to keep trust and hope in Dgraph during these transitions. I have seen this both second hand and first hand.

Another problem arises with these pivots due to user adoption, because if not managed very carefully and strategically, you can easily lose both the existing users and the new users you are trying to cater to. You have to decide if this is a risk worth taking before making such a pivot. It can be hard to get the pulse of the current users to see if they are willing to take a step back in priority so that the project can “evolve” into something better. Note, not all evolutions are better.

If I look at the activity of once new developers and projects using Dgraph in one way or another, I can’t help but see a big drop off. I’m sure I’m not the only one who has seen this too. I want to carefully caution the new team of making this new pivot without being very very transparent about decisions being made, as this is still considered an open source project. If it wasn’t open source, then just ignore us—the community, and do whatever money dictates you do.

If there are any newer users or even engineers, CEO, CTO, etc. I think it is VERY important for you to go back and try to get a grasp on Dgraph from the very beginning. What was the driving factor to even create Dgraph? What was the reason for each of the main pivots? What effect did these pivots have on the company and the project itself? At the very least, read the snippets below that I have compiled that tells the history I have outlined above.


[October 22, 2015] I’m working on building Dgraph… It’s still early stage, and I’m debating which graph query language to support. Facebook just launched GraphQL… But, I’ve also heard a lot about Gremlin. What do you think of them? I don’t want to stretch out too thin and support both, at least at this stage. Which one do you think would be worth aiming at (given it’s a new graph database)? - Manish Rai Jain

[October ??, 215] I like GraphQL, it has most of the nice properties of MQL1. Gremlin has more Hadoop support if that matters - Manish’s ex-manager at Google

[December 1, 2015] Thanks for your advice! Went with GraphQL, quite like the query language so far.

[4/18/2016] Dgraph… is a native graph database in the sense that the data is handled directly by Dgraph , and not given off to another database layer. Apart from use with diverse social and knowledge graphs, Dgraph can also be used to: build real-time recommendation engines, do semantic search, pattern matching, serve relationship data, and serve web apps via GraphQL… - Introducing Dgraph by Manish Rai Jain

[6/21/2016] Hey, I just checked out the demo, and it looks like the query language is similar to GraphQL in syntax, but doesn’t follow many critical parts of the specification, meaning it won’t be able to work with GraphQL clients such as Apollo Client and Relay that expect spec-compliant results. - @stubailo

[6/21/2016] Yeah, I know that our implementation of GraphQL isn’t exactly as mentioned in the spec. This is because GraphQL is meant as a REST API replacement, and not really a graph query language. So, we’re making modifications to it to ensure it can operate as a full-fledged graph query language… At some point, once we’re mature enough and have better understanding and implementation of GraphQL, we can release our mods to the official spec; and see whether they should live separately or be merged into the official GraphQL spec. - @manishrjain

[6/21/2016] I would argue it’s a bit misleading to specify that you are using GraphQL as the query language, even though it is not compatible with any GraphQL tools. - @stubailo

[6/21/2016] Honestly, my concern here is that we’re figuring out about GraphQL as we’re moving forward. And I just don’t know to what extent can we push it to behave like a Graph language and yet remain within its specs. If we’re going to deviate anyway, I don’t want to push too hard to stay within the specs. OTOH, if there is a convergence path, then I’m all for it. GraphQL ecosystem is surely growing, and we’d love to be part of that. - @manishrjain

[6/21/2016] I think if you find that GraphQL doesn’t work as a language for your database (I’ll admit it is a bit misleadingly named because it isn’t really a query language for graphs), it could be good to rephrase the documentation and marketing to say “uses a query language similar to GraphQL” so that people know what to expect. - @stubailo

[6/30/2016] I’ve been thinking about this over the past days. A lot of people get interested in Dgraph because of GraphQL, and so I think it would be worth our effort if we try to bring our implementation as close to the spec as possible. - @manishrjain

[3/28/2017] I think the GraphQL spec is not versatile. Everything has to be keys and values; there are no functions and no function chaining. …GraphQL has a lot of unnecessary stuff that we’ve decided to avoid implementing at all. So, I doubt we’d be able to reach parity with GraphQL. I think what we can do is to build a JS library that helps people interact with Dgraph. - @manishrjain

[7/26/2016] We need to have a thorough review of our QL, and see if we can get it to be close to compatibility with GraphQL. - @manishrjain

[11/15/2017] We’ll make a push to try to reconcile GraphQL± with GraphQL past v1.0. - @manishrjain

[12/31/2017] Dgraph uses GraphQL +/- only as a query concept. Dgraph does not have a schema defined as GraphQL… Basically if the Dgraph is to accept GraphQL natively, it will have to create a context mimicking the original idea. - @MichelDiz

[1/1/2018] Support GraphQL spec - @manishrjain

[1/18/2018] Dgraph needs to natively handle standard GraphQL queries, or GraphQL queries should be “compiled” into GraphQL+/- (or other supported language e.g. gramlin, cypher). Regardless, the result would be an opinionated way to structure the graph data. Is that fair to say? So, rather than “Make Dgraph work with standard GraphQL”, should we instead focus on how such an opinionated data model would look? - @ptpaterson

[1/1/2018] Yeah, we’ll try and support [GraphQL] as close to the spec as possible. I think GraphQL compatibility is needed by a lot of users. - @manishrjain

[6/27/2018] Dgraph already natively supports a modified version of GraphQL. So, supporting the official spec would be native and should perform better than the overlay support that Neo4j and others have implemented. - @manishrjain

[8/10/2018] The authors of GraphQL have stressed on multiple occasions that it isn’t intended as a complete query language for traversing graph dbs, or server-to-server. It’s a server-client API. That’s why Dgraph modified the syntax in the first place. Far better than standard GraphQL would be full support of both Gremlin and Cypher… Having client-compatible GraphQL really doesn’t accomplish much because 99% of the time Dgraph will still have to pass through the API server anyway to implement the things that are far outside the scope of responsibility for a database. - @frankdugan3

[8/10/2018] At the end of the day, it doesn’t matter what the technicalities are and what happens under the hood. Having no first class GraphQL<->Client support has inhibited dgraphs outlook as a deal breaker. - @D1no

[8/11/2018] This issue has the participation of less than 1% of the Dgraph userbase because connecting directly to the client via GraphQL is simply not a normal expectation for a DB. - @frankdugan3

[8/12/2018] Dgraph has many other areas of growth that are FAR more important for adoption before this proposal. When a team is evaluating an up-and-coming DB, the primary question is not going to be, “Did they shoehorn the API query language we like into the client drivers?” The questions are going to be the fit to domain model, the constraints, the type system, the reliability, the scalability, the financial situation of the company developing it, etc… Some people are asking for direct-to-client GraphQL API’s for databases, but far from the majority. I used to think this was important, but I think that was my inexperience showing. The idea of being able to bypass writing the API server is tempting, but it only works in very, very simple scenarios, and it burdens the DB with many problematic concerns that already have great solutions in API frameworks. - @frankdugan3

[8/12/2018] I personally find dgraph after 1,5 years In a “stuck in the middle approach” of not using a industry standard DSL like Cypher but also not implementing / caring / driving “a recent” data layer innovation like GraphQL. - @D1no

[12/18/2018] I’m building this, which will address lots of the GraphQL points for Dgraph. - @michaeljcompton

[12/20/2018] Update: We’ve punted on Cypher and Gremlin support for the roadmap. The focus is on GraphQL and other features mentioned here. Two of them are already being worked upon, i.e. binary backups and access control lists. - @manishrjain

[1/14/2019] Support official GraphQL spec natively - @manishrjain

[1/14/2019] GraphQL is a great language for apps to be built on – and that’s the aim here, is to support it to allow building apps easier on Dgraph. Dgraph is a great graph DB, but also a great, general purpose primary DB for apps; and we see more and more people/companies use Dgraph to build apps. - @manishrjain

[1/17/2019] That is smart and for that reason, I think it would be a wasted effort to offer a pure GraphQL connection to Dgraph, with the intentions of using Dgraph as a “backend” for client-side apps. There must be a layer of business logic in front of Dgraph and behind the GraphQL endpoint for any sized application to be safe and work smartly. - @smolinari

[1/22/2019] Dgraph shouldn’t be placed in the same boat as Prisma, AWS AppSync or PostGraphile. It is ONLY a database, currently and I highly doubt it will get to be much more or rather, I don’t think it should. - @smolinari

[10/29/2019] October 22, 2015… I’m [Manish] working on building Dgraph… It’s still early stage, and I’m debating which graph query language to support. Facebook just launched GraphQL… But, I’ve also heard a lot about Gremlin. What do you think of them? I don’t want to stretch out too thin and support both, at least at this stage. Which one do you think would be worth aiming at (given it’s a new graph database)? The reply was quick, short and sweet. I [ex-manager at Google] like GraphQL, it has most of the nice properties of MQL1. Gremlin has more Hadoop support if that matters… Thanks for your advice! Went with GraphQL, quite like the query language so far. And that was how GraphQL became the native query language for a new database called Dgraph. …we were having doubts about whether GraphQL can really be a query language for a database. We were no longer convinced that the official spec matched the needs of a database… So, we realized we needed to build something custom into the spec to allow for inserting and modifying data in a standardized way. …we deviated just enough from the GraphQL spec that we could no longer call it GraphQL. So, we switched the name of our query language to GraphQL±. Plus, because we added things to the query language…, and Minus, because we removed things from it… We did not want to deviate from the spec, we just had to do so to allow continue using GraphQL as our native query language, while building a graph database… Dgraph’s basis in GraphQL means that it is the closest thing available to a native GraphQL database… Ultimately, we feel interoperability with the growing GraphQL ecosystem is too important for our users to not be addressed… Today, we are changing that… putting together a native spec-compliant GraphQL support into Dgraph tapping into the power of GraphQL±… - Building a Native GraphQL Database: Challenges, Learnings and Future by Manish Rai Jain and Michael Compton

[10/29/2019] Breaking changes in 1.1 caused (and are still causing) us real headaches. As we continue to invest real effort in applications using Dgraph, could we have some clarity… - mikehawkes

[2/23/2020] Data can be retrieved from Dgraph using GraphQL and a modified version of GraphQL, called GraphQL±. GraphQL± has most of the same properties as GraphQL. But, adds various properties which are important for a database, like query variables, functions and blocks. More information about how the query language came to be and the differences between GraphQL and GraphQL± can be found in this blog post. - Dgraph Paper

[4/26/2020] I’m interested in graph databases and have taken a look at Dgraph… I think GraphQL is the wrong choice for a graph query language. GraphQL has nothing to do with graphs, except it’s unfortunate naming choice. It was designed to query specific parts of JSON data from an API endpoint… I guess you made it work with GraphQL±… However, the underlying assumption is still wrong… Put it simply, using GraphQL for a graph database query language is like trying to use a fork as a knife. I wish Dgraph would implement a better QL. It doesn’t have to be an existing QL like Cypher or Gremlin, personally I don’t like either of them. Whatever query language you choose in the end, it better not be a fork when what’s needed is a knife. - bsquaster

[6/3/2020] As of 20.03.0 Dgraph has support for spec compliant GraphQL. - @michaljcompton


References

6 Likes

I haven’t been active here for a while. But, felt this was worth a response. This is a very interesting capture of the years of back and forth we’ve had about the query language. It read practically like a documentary.

Few things which are clear now:

  • GraphQL was not a great choice for a database. I realized that early on.
  • But, then GraphQL became too popular to ignore, and we had a massive FOMO – Dgraph was in early conversations as the de-facto GraphQL DB, but never reached there. And adoption by itself was slow. People still didn’t know what they should use a graph DB for.
  • Honestly, we wanted application development to be the answer (use case for Dgraph) – and that needed GraphQL.
  • Though, in retrospect, many backend as a service products haven’t quite taken off. There’s much more to an app than just data fetching. So, Dgraph’s GraphQL implementation would also have had a similar outcome, even if we finished it.
  • GraphQL itself has dropped in popularity – most devs realized REST is simpler to build and maintain. And they don’t need Facebook’s complexity.
  • So, you ultimately need a query language that can rally devs behind. For SQL DBs, that’s easy. For graphs, that’s Cypher… maybe.
  • But, Neo4j was considered a failure by VCs, which means supporting Cypher and going after Neo4j didn’t make much sense. That ultimately proved to be a massively wrong assumption.
  • Moreover, given the small size of the graph DB market, do you really need a distributed, transactional, synchronously replicated DB? Absolutely not. Building a great performing graph layer on top of Postgres would have been sufficient to earn market from Neo4j (see Hasura, TimescaleDB, Supabase for playbook).
  • Position we were in back in 2021, purely in terms of competition (ignoring cloud adoption): Couldn’t replace GraphQL fully. Couldn’t replace Cypher fully. It’s the same today in 2023.

Long story short, without a solid Product-Market fit, you are reduced to chasing whatever hype cycle the market is in. When we were around, it was GraphQL. Now, it’s AI. But, first, a graph database needs to succeed, as well, a graph database.

5 Likes

I personally think this was a marketing issue. One thing Supabase does extremely well is the team is all over Twitter, YouTube, Dev.to, etc showing how to integrate and use the product with NextJS, Migrations, etc…

True, however, you don’t have to build anything with Dgraph at all except the Schema, which is what makes it (and Hasura) amazing. There is a difference between building a GraphQL (a pain in the butt), and using one.

1000%

I agree with this too. However, the reason a Graph Database, IMHO, has not taken off is because it is trying to be a secondary database and not a first. There is no reason it can’t be both. The definition of a graph database needs to change. I read so many articles where people only think there is only a relational database and a noSQL database.

Yes, there have been no new features since you left, just a lot of refactoring. DQL could add just a few more features like loops and key constraints to be as strong as cypher IMO, and GraphQL has about as much work as well for an API (both 80% there as you said). GraphQL really needs to be re-built on top of AST from my understanding eventually as well.

Postgres can support Vectors. Im my opinion the direction has and should always have been to push Dgraph to be a main database, not a secondary. It could be the Graph Database that replaces SQL. If Vectors pay the bills, it can be the first Vector Database that can also be a main database, as that money may dry up too when the fad goes away.

Curious to see the new direction, and hope this database succeeds.

J

1 Like

That was our argument for pushing Dgraph into app development. But, that’s a deadend. You just can’t beat Postgres (or SQL in general) in terms of developer love. And the graph DB story has already been written by Neo4j.

Supabase became popular not because of their integrations – all that came later. But, because they allowed Firebase developers to switch away from Google, and into their favorite Postgres. And new users could just stay on Postgres, while gaining a nice API layer.

Have a look at EdgeDB success as a “graph” layer on top of postgres.

Also check out SurrealDB success as a multi-modal db that uses SQL-like query language but functions much like a single node Dgraph instance under the hood

I respectfully 100% disagree.

mySQL comes close, but you don’t have to beat postgres love, you just have to monetize it well. Mongodb has a 30 Billion dollar market share, and is frankly terrible compared to SQL. neo4j has 2 billion, although you could argue in both cases the valuations are crap.

This is simply not true at all. Supabase is a culmination of different products that already existed before Supabase was born, mainly cloud hosted postgres, postgREST, gotrue, and an api-storage to connect to aws to store files.

Supabase in reality has nothing to do with Firebase and has very little similarities. The name itself is just a marketing tool. Again, great marketing. It is not the database, but the tool

People don’t want to work to configure a database

IMO this is why MongoDB (noSQL in general) does well. Scaling is important (don’t get me wrong), but developer time is fast without manually configuring tables. Throw some JSON data, done.

What makes Supabase amazing, however, is the same thing that makes Dgraph and Firebase amazing, but on a higher level. It is self-hosted, little to no config, queries just work, and you get can up and running in no time without vendor lock-in (except Firebase’s problem) with all the features you need. In that sense, Dgraph is a competitor to Hasura, Supabase, Firebase, (even prisma at some level), 8base, EdgeDB, Grafbase, Xata, etc etc… It is not the database, but the platform that matters.


That being said, I think we can both agree the biggest problem Dgraph has never had is its identity.

I believe there is plenty of room in the Database space for products outside of postgres. You always said neo4j was its competitor, but I never saw that, which is a problem.

  • Is Dgraph a graph database that competes with neo4j (analytics) ? At first this is where it was going, but GraphQL changed all that and left it dead at 80%
  • Is Dgraph a frontend cloud hosted database with an easy to use ORM that works out of the box? This is what I see. The graph database part is a plus. Either way, still 80%.

In my opinion, people don’t care what database they use. They want to get up and running, and they want it to be easy, secure, fast, and just work. Firebase, Supabase, and Dgraph could not be more different than each other on the actual database level, but the ecosystems make building things extremely easy. Imagine if DQL had those missing features and GraphQL was secure with all the queries the spec can support. I could do analytics on the same database in which I store my main data. Less server costs, less headache, and works out of the box.

That is the real store of what Dgraph is. But obviously it could go in any direction at this point and be something entirely different.

J

2 Likes

@mrjn what is your take on building vector offering on a GraphDB? I definitely have a strong need for GraphDB (dgraph is a little debatable, but wrks great for my use-case).

Also, are you considering coming back to Dgraph?

No particular take. I think every DB is starting to offer vector search, it’s inevitable in this environment.

No. I’ve moved onto building Struct.AI – a chat platform to replace Slack, Discord and Discourse.

wow - congrats @mrjn … Looks like dgraph is past its glory days. When startup founders move on to greater things, building the product with similar force as founders is not easy. The constantly changing CEOs, then the acquisition with Hypermode are all signs screaming stay away.

I wish the dgraph team maintains this project better.