I think Dgraph is on the right track to be come a very popular solution. From what I can see (front end dev) there are a few things I can see that would make a difference:
Provide a generous free tier. From experience most adoption of new technology comes from developers trying things out in their spare time / side projects, loving the tech, spreading the word, and then finally influencing their work to adopt the tech. If casual users have to pay a premium to explore the product then it makes the barrier to entry higher.
Focus on creating a product for everyone, then improve / adjust it for businesses/enterprise. I understand that businesses using the product is where you will make more money, but it’ll be a harder sell to developers if it hasn’t already got a strong presence in the community. In the context of Dgraph this would be making the GraphQL features and Dgraph Cloud the best products they can be.
Ensure the GraphQL features are equivalent (ideally better) than competitors. This includes making sure they have a similar (or better) way of working and syntax. Eg nested filtering, nesting aggregate query inside regular query etc.
Invest in building a community, docs and industry related content. I think this going to the hardest thing to do as it needs to happen organically. Building a community takes time, engagement and an emphasis on the end users. I think Dgraph has a bit of work to make it more accessible. If I personally don’t have a good time using the product, I’m not going to suggest to my work (or other people) to use it and object if other people suggest it. In fact, I know we have made decisions at my work because the team behind the product was hard to work with, didn’t respond etc. It didn’t make a difference how good the product was.
Have a strong product vision (this is very important). I think its ok to even have a couple. And this has to be clearly communicate publicly, ideally including roadmaps (or some other form of transparency). I would also make sure that the product visions have a clearly identified audience.
A few things on my wishlist (all for GraphQL):
An out of the box local (offline) development experience (aka without me having to learn / do much). I have achieved this but it took a while, and was harder than it should have been. I had to spend time trying difference parts of the docs to get something working (ie with importing data, exporting data, updating my schema etc). A great example of what to strive for is Firebase’s local emulator.
Nested filtering. This is very important to many parts of my projects.
Be able to simply replace a list in a mutation. Currently I have to remove the items in the array and then fill it back up with what I want. Very annoying and not how any other garphql framework I have used works.
I personally use Dgraph as my database. And I feel most people wanting to explore with it will do the same. However, I know this isn’t the case if I was to use it at my work. We would want to use as an interface for some of our databases, and if we liked it then maybe look at using it more as a database (if it made sense too).
I have also not really explored the DQL aspect of Dgraph. Mostly due to it (from what I can tell) not a transferable skill. If I decide to move away from Dgraph then I can take my GraphQL schema and all of my queries/mutations along with me (with some exceptions naturally).
All that aside, you have done a wonderful job @mrjn (and the entire team) in creating a very compelling product. Excited to see what’s ahead!
Thanks for giving us the opportunity to give direct feedback like this, it’s really valuable. You really should join the Discord, because there’s a lot of important conversations going on (mostly around how Dgraphers can help Dgraph succeed) and it’d be great to have your input! Click the link below:
In response to your questions:
No, the performance is insane. So good that I have to look at logs to verify that the request actually went through because there was no discernible latency. Currently not working with a lot of data though, although compared with literally anything else I’ve ever used, Dgraph completely knocks it out of the park.
GraphQL transactions are not critical for me right now. I can live with DQL transactions. But what I hope to see over time is gradual improvements in the GraphQL API so I need DQL less and less. GraphQL is preferable because my client can go direct to the server rather than via an endpoint in my app. Plus it has guardrails that prevent me from screwing up my data.
This is the thing I have the biggest opinions about. Essay incoming…
Some of what I’m saying here echoes what others have already said, but it bears repeating.
In short, one of my biggest concerns about Dgraph is that I think it positions itself wrongly. Although the native GraphQL graph database is an absolutely phenomenal achievement—it is what Dgraph’s database makes possible that is most exciting. Which is to become the ultimate backend solution. This is the direction I hope Dgraph decides to take, because:
It’s what I would invest in if I had the option to
It’s what will make Dgraph most valuable to my business
There’s obvious opportunity for growth here
When I look at Dgraph, what I see is a company that has all the pieces in place to become the no.1 backend as a service tool. Currently Dgraph markets itself as a GraphQL generator, however, I think this limits it’s market to developers who know they want a GraphQL API. However, if you market Dgraph as a Firebase killer that can provide all the benefits of Firebase (iteration speed, ease of use) minus the costs (runaway costs and complicated pricing, terrible query performance, vendor lock-in) then becoming the next Firebase in terms of popularity and profits becomes feasible.
Marketing yourself as a GraphQL generator is IMO an example of marketing the features instead of the value. Most developers, myself included, don’t care how something works internally, they just care that a given tool can help them build and ship apps faster and without headaches. If you look at Firebase’s marketing page it doesn’t mention the fact that it uses a NoSQL document store, and it’s because it’s not directly relevant to the problem it’s solving—creating a powerful backend with minimal effort.
As others have mentioned, Firebase is losing customers due to the problems mentioned above. I think a good example of a startup that has suffered as a direct result of being built on top of Firebase is the bidirectional note-taking tool https://roamresearch.com. They chose Firebase presumably so they could iterate fast. In bidirectional note-taking tools, you notes are stored as a giant interlinked graph. Roam has always had big. performance. issues, and as a result it has led to a lot of outrage from customers, which led to 15 or so different companies seeing an opportunity to create competing products. Roam have acknowledged that Firebase is at the heart of their performance issues, and it’s easy to see why; querying graph-structured data from a document store is always gonna be slow.
This has created an opportunity for other Firebase-competitive startups like https://supabase.com. However, none of the Firebase competitors I’ve looked at come close to what’s possible with Dgraph. Supabase for example uses Postgres instead of NoSQL. However, as a Supabase user, although you now have a nice ORM for Postgres, you’re still stuck with the same old query-performance issues that seem to plague every app I’ve build no matter how simple. Plus, you still have to construct your queries and endpoints, and similarly to Firebase your client code is closely intertwined with your backend implementation due to the custom API. Because Dgraph generates a spec-compliant GraphQL API I don’t have these issues, because I can technically swap Dgraph out for another GraphQL API (that can return the same data) and not have to rework my client code.
The performance of Dgraph is a game-changing feature because even if you’re building a simple MVP it means that you never have to worry about having to waste precious time optimising queries. Of course, you know this because you built it! But I don’t think it’s clear from how Dgraph describes itself, or its marketing.
Reading through Supabase’s (an open-source Firebase challenger) Github Discussions, I found some comments about Dgraph as a possible integration, and there’s confusion about whether Dgraph is just a database, or a Supabase/Firebase competitor. This was my experience learning about Dgraph—it wasn’t immediately obvious why this product is so powerful. I initially looked into Dgraph because I have a specific use-case where a graph database makes sense, however, once I realised that Dgraph is basically a BaaS that can be used for any use-case it blew my mind. My thinking has now flipped, Dgraph is the default go-to, and from the conversations I’ve had with other Dgraphers, I see a lot of agreement on all of the above. I also wonder if a lot of those devs who are using Supabase now knew what was possible with Dgraph would still be using Supabase, my guess is that Dgraph if positioned correctly could win over a large proportion of this market.
So basically, I’d like to see a simplified vision for Dgraph, and laser focussed approach to achieving it. Right now, Dgraph isn’t super clear about what problem it’s solving then it’s not clear what products it’s competing with. It’s kind of competing with 20 different products at once. It’s kind of competing with other graph databases, kind of competing with Hasura and other GraphQL generators, kind of competing with other BaaS systems like Firebase. It just needs a clear message about what exactly problem it is solving for the customer and what its mission is.
As for what Dgraph needs to do to achieve this, in no particular order:
Make it possible to configure Dgraph entirely via a few clicks in the Cloud admin UI. Down to things like configuring auth rules.
Build a migration tool that makes migrating your data after a schema change as easy as using the Ruby on Rails migration tool
Directly integrate other tools into Dgraph like https://magic.link, Auth0, and make it configurable with a few clicks
Invest heavily in education (the best kind of marketing for a dev product). Primarily polished YouTube content.
Simplify the messaging around the product and what value it creates for the user.
Detailed and accurate GraphQL errors
Focus really hard on making it as easy as possible for devs to build side-projects and hobby-projects free/cheap Dgraph instances
Involve the Dgraph community more in the success of Dgraph, by ensuring there is two-way communication and feedback. At the moment it often feels kinda one-way, we submit tickets in this forum and wait for replies. Dgraph and the community working together is key to the success of Dgraph.
In summary, no-code/low-code tools are one of the most investable areas of tech right now, because web development is getting insanely complex. Dgraph has all the pieces in place to build the ultimate low-code tool. The simpler you can make it for users, the more users you’re gonna get.
I don’t see this needing to be a priority. From my point of view, it’d be a distraction from achieving the goal of making Dgraph into the easiest to use most powerful BaaS on the market.
I would second on the thought that this is the best pitch Dgraph could have. If it’s good for Google - it’s enough for our puny workloads (:
As for the points:
Is it lacking performance? No. The only tool that handles the data I work with.
Do you really need transactions? Yes. Dgraph is our main database serving each and every requests. We have a second one, but that is just logs. Business logic is complex and being able to do complex rollbacks in the database layer vs having to write code for it is great.
Should Dgraph not be a database? Should it be a serving system instead? No. Dgraph is the database for me. Evaluated lots of stuff including TiDB, Arango, cloud offerings (snowflake etc). If you need more details I can elaborate on what my thought process was, will try to keep it short here.
Should Dgraph interface with other databases better?
Now that’s interesting.
I don’t see any drawback to rolling Dgraph as the main DB (stability, but I will talk about that later).
Given Dgraph’s performance I bet some corporate clients could be very tempted to integrate it into their zoo and do some nasty aggregates across Spark / Ancient GreenPlum setups and shiny new Dgraph. Writing that layer yourself is probably not a priority, but you could get cube.dev guys to support Dgraph as part of their tesseract.
Does Dgraph not work with GraphQL properly? or React? No. Just lacking some things. I use DQL exclusively and played with graphql today just to write this answer.
Now on to what Dgraph is for me. There is this post on ycombinator which illustrates the idea
It is the unique combination of being an open source Spanner with stellar performance and development simplicity. Setting up a Dgraph cluster took me less that an hour.
I don’t have to care about migrations. Product iteration speed is easily three times faster than with relational DB. I can’t stress enough how important this is for startups.
Being able to iterate quickly and not being performance limited is what makes Dgraph unique. Nothing else on the market fits the bill. You either sacrifice performance for speed (Hasura / ORMs / whatever). And I stand by that. Anything we tried came to a halt when we pulled in production data and added a few new features.
Dgraph lets you mutate the schema in unimaginable ways without much of a performance hit if any at all.
And DQL is amazing. What previously had to reside in application layer is now done in a single query.
I would like to coin in some arguments on why Dgraph is a database in the first place. Fast forward a few month and say Dgraph has achieved parity with BaaS services in having auth service and lambdas which I believe is all it is to BaaS.
Auth is database agnostic and has nothing to do with Dgraph directly.
Lambdas can be run anywhere and have nothing to do with Dgraph directly.
Now, lets answer a few questions:
How much will I pay to run MySQL serving those workloads?
How much will I spend combating migrations and schemas?
How fast my queries will be?
How easy will it be to setup?
That’s why I think in the end it all boils down to the core technology serving the workload and why Dgraph is a database.
Given all options on the market have auth and lambdas built in I see no way any competing techology win against Dgraph in the long run.
More performant DB will always result in lower cost of ownership, faster queries and most importantly faster iteration time.
My closing thoughts on the future trajectory: flesh out the barebones BaaS requirements and focus on the database.
Having your own lambdas is great but that’s not the deal breaker. Having insane GraphQL options is nice but not a deal breaker. I can live with DQL and tolerate a lot for what Dgraph offers, even occasional instability and corrupted state.
If the DB is the fastest there is - no competitor could offer cloud options at the lower price, and that’s what can force other to adapt and support Dgraph and not your team trying to please everyone else.
I use DGraph for multiple huge databases - millions of nodes and edges, with gigs of data. To analyze this data, I use the DGraph → Spark connector. For me, the main competition to DGraph is Neo4j, which is VERY expensive. I use DQL exclusively and not GraphQL.
However, I have had multiple show-stopping crash/corrupt bugs which have impeded my progress. These are frustrating. (There are other threads on these)
There really isn’t another option OTHER than DGraph and Apache SPARK when it comes to open source graph analytics, which is a pretty huge market. Focusing DGraph on the analytics might mean integrating some graph algorithms (community detection is the primary one!), which is not a bad idea, and integrating and extending the DGraph->SPARK connector as a primary feature.
Yes. We’ve been impressed with dgraph’s performance at scale- but things get stressful once the cluster grows large enough. Certainly don’t take my tone as disingenuous, dgraph is young and engineering tradeoffs have to be made, but here’s what we run up against:
No sharding of predicates
This is a big concern for us- as it means if a predicate becomes large enough, the only way to deal with it is vertically scaling. (We have predicates like xid and name that have over 100 million entries, and are over 100gb in size each).
The shard-balancer wrecks so much havoc on these large predicates that we had to completely disable it. It moves things too frequently, and its only criteria is size (instead of “hotness”). Any time one of these large predicates is moved (or indexed), the cluster grinds to a halt because incoming data gets stuck behind the operation- so much so that if we have to move a predicate, we have to pause all of our ingestion pipelines and wait for dgraph to catch up.
No query planner
Dgraph makes naive decisions on how a query gets executed- for instance, if filtering by two predicate A and B in a query,
A with 1 billion unique UIDs, with an index
B with 10 unique UIDs, without an index,
dgraph will choose A over be if it sees that A is indexed, but B would have bounded the query faster. Its frustrating when end users of our system have to do things like write their query in reverse to get better performance.
Related, cascade is one of dgraph’s most powerful features, but without a smarter query planner, deep/wide cascades can be unusably slow. This is frustrating since the whole reason you chose a graph db is because you want cheap inner-joins
Im excited for the roaring bitmaps work, as I could see that improving some of these issues.
Upsert by xid painful for ingest-heavy workloads
All of our inserts happen with deterministic ids- but since dgraph creates its own uid for everything, we are forced to have every insert be an upsert. (Query by xid, if it doesn’t exist, add it, otherwise update it). This puts pressure on the cluster that I wish we had a way to avoid. We want to be able to do blind-writes.
We’d be happy to hash our ids in a way that they became dgraph uint64 uids (that we could insert directly), but it feels like dgraph is not intended to work this way (the way the zeros lease blocks of uids, them being contiguous and sequential, etc)
All of the data we store in dgraph is temporal, so we’ve had to do some gnarly gymnastics to allow users to filter nodes/edges by spans of time (we have dangling “timerange” nodes that hang off of every node in the graph… which we use as a filtering mechanism via cascade).
I would be over the moon is dgraph had a native timerange type that was just [starttime, endtime]. This would allow us to put a list of timeranges directly on a node, and then a query like intersects(100,200) return us a node that had a timerange predicate like [0,110], [180, 300]. This would reduce the stress we put on the cluster across the board.
Dgraph has a strong foundation, and I know the team has ideas about the issues I’ve brought up. To echo some of the other commenters, I am more interested in dgraph as a graph databse than a gql platform, given dgraph is the only modern, cloud-native, graphdb in town.
Right now it seems like we have a messaging problem. I do not seriously know if this is an app-platform you are building or a graph database. If the target of dgraph is just a means to support running GraphQL like hasura did with postgresql then I did not understand that from the get-go.
What is the future of dgraph? A Hasura competitor or a Tigergraph/Neo4j/Neptune competitor? If it is both, should they live under the same namesake? Can they coexist and coevolve?
Thank you for stepping in @mrjn and sorry to hear it’s been a tough year. Fingers crossed the VCs will come to their senses. I can only echo some of what was said above:
Important with a more transparent roadmap even if it takes twists and turns and comes to a halt sometimes due to unforeseen events etc. The product is already great in many aspects and I think the community would be more patient if we had more insight and better ways of contributing. Personally I think you could take more inspiration from Chakra-js and Next-js, two very vibrant communities that use Github issues, projects and discussions. https://github.com/chakra-ui/chakra-ui, https://github.com/vercel/next.js. I think this would really benefit the community and make for more in-depth and organized conversations. Building open source without having a tight integration between discussions and source code, prevents a natural flow of ideas and contributions. I understand that ‘policing’ Github issues could be a lot of work, but that’s what this community is for. We can help close issues and move them to discussions as needed. That way we can much better separate discussions and ideas from actual issues that need proper tracking and assignment … and it will look more cutting-edge in the eyes of VCs …
In terms of lacking features, I am leaning more towards the DQL crowd (with a rocksolid core and rich DQL layer) but I totally see the value of GraphQL as well. @seanlaff and @illuminae had many good points re. that core – such as anything that facilitates imports/upserts with XID, query planner, safe sharding of large predicate collections etc. Basically the things that keeps the system performant and makes I/O pleasant. I think better to improve the DQL documentation first (quite a lot is crammed into “Functions”: https://dgraph.io/docs/query-language/functions/) rather than spreading thin across DQL and GraphQL. Maybe document the GraphQL (and the desired roadmap and input you have) in a way that the community can take it on. Seems like less of rocket-science compared to the other parts, and it also requires more collaborative work to keep up with 3rd party specs and tools.
Overall I think more customers (both smaller and larger) will come as a result of more of us getting into production and spreading the word. So “customer success engineer” role is continuously important along with good Cloud Features that are self-explanatory. But a success engineer that is not working directly with Github issues in a transparent way, quickly leads to frustration. Here I think it’s important to draw the line more clearly so that customer success engineers can focus on issues that directly enable the customers work linked to customer payment.
I think your website and docs are actually quite good. But it depends on who you’re targeting obviously … The branding is good. I think the problem is a lack of understanding of ‘graph’ overall but as more developers see the benefits, the time will come. I think it does make sense to explain and emphasize graph/RDF upstream to GraphQL. For being a graph database, you seem to interface relatively little with the graph community. That might lead to the following dilemma: Frontenders are not yet fully familiar with RDF/graph data and its benefits. ‘Semantic web people’ are not yet fully familiar with GraphQL and its benefits. Also, because you’re neither fully RDF compliant nor fully GraphQL compliant (I might be wrong here), there will always be some naysayers. But then, take a look at what TypeDB is doing … They don’t take that No for an answer. They instead redefine the rulebook, much like you are doing. I think the key is to redefine it only so much that you can offer a natural segue (or enough bonus features) that a leap of faith is worth taking. For me personally, I’m willing to make the leap from “real semantics/real RDF” to Dgraph if, and only if, the performance and ease-of-use (especially of the query language) trumps over the downsides of no longer having a link to the rest of the LOD (linked open data) space. For people coming from the GraphQL side of things, I think that segue is more about delivering the GraphQL that they know and love coupled with demonstrations of performance, while gradually teaching them concepts from the world of graph logics.
I still think that every system should allow for maintenance windows, but I understand not wanting them. Facebook, Google, etc. don’t do maintenance windows, and Dgraph is pitched as a backend solution that should have been used by Google, but they passed on it.
I think the big kicker still is that the data files are part of the upgrade as well and cannot just be dumped between versions. I would think that eventually the data files would not need to be changed between versions just upgrading the algorithms to query/mutate the data in them more efficiently and with more tools/functions.
From my point of view there are lot of possibility for dgraph it can be a great database for apps or apps that need complex relations ; dgraph would be better in such cases
as @BenW mentioned https://roamresearch.com.
Another Issue i find is the pricing of dgraph cloud or hosting it
It would be nice if we have one click hosting on platfroms like digital ocean for open source
or maybe have a hobby plan for it like 2$ or 5$ per month on dgraph cloud
the current with 1mb limit is just not enough and the enterprise and shared plans are just out of budget for hobbits developers like me
I myself am a very big fan of dgraph due to performance ,graphql and the many possibility of it as a general purpose database and the simplicity it provides over other databases
but kept away from it due to its plans
and same for so many hobbyist developers like me
For that graphql with typescript with Apollo becomes complicated and not easy to start as compared to supabase or firebase
Are you using codegen? It is an absolute must if using GraphQL, Apollo, and Typescript
With Dgraph, Apollo, Codegen, React, and Typescript you have ONE source of truth for your types that are strongly typed from the top to the bottom of your app. Need to change the types anywhere, you just change them in one place, the Dgraph GraphQL schema, and then deploy/rerun codegen and you have the same schema updated at the database (DQL schema) and the frontend (Typescript). Gamechanging!!
Not that I’ve found, but I echo @seanlaff 's comment above re: sharding on predicates. For us, this is a huge concern. The vast majority of our graph will only have a few predicates (xid being ubiquitous).
Do you really need transactions?
Yes, if this is meant to be a production application database.
Should Dgraph not be a database? Should it be a serving system instead?
We are looking for a graph database, not a graph layer on a relational database, if that’s what you meant.
Should Dgraph interface with other databases better?
If Dgraph could ingest RDBMS schemas that would be interesting, though I don’t know how you would solve that.
Does Dgraph not work with GraphQL properly? or React?
We are not interested in GraphQL as the main feature of Dgraph. We are looking for a massively horizontally scalable graph database that can do the high-performance graph traversal that an RDBMS isn’t tuned for. And in fact we prefer to use Dgraph Cloud rather than host it ourselves. It’s not an easy product to self-administer, and we’re thankful for the support.
In fact, the focus on GraphQL has been a little disappointing. We maintain a GraphQL schema just in case, but we use DQL exclusively.
The only wishlist item I have is a multi-RAFT approach for regional clusters like CockroachDB is doing, but it’s not at all a deal breaker.
You know, one thing I would pay double for is if Dgraph had it’s own mobile solution with offline sync. MongoDB has Realm, but unfortunately it only syncs with MongoDB. Even if Dgraph built a syncing gateway that works with Realm (presumably by building an extension to Realm that allows it to sync with Dgraph’s servers) that would be incredible.
As nice as that would be, I totally understand the hard stop of Dgraph to not go this direction and even limit supported OS to only Unix. This helps the team to super focus and build out what matters with the best way instead of doing it one way for Unix, another for PC, another for Mobile, etc.
Cheaper than other platforms because its been designed to take advantage of SSDs instead of everything having to live in ram
Clear examples of how to accomplish things
ES like search and query DX and performance if not faster, a decent if not identical query string query api with great index intersection queries
BM25 and or custom search ranking
More and custom tokenizers
The whole damn thing is postings lists it should have amazing search!
UIs are built with search interfaces, ES query string query is THE interface I have used for over 8 years on multiple projects to build applications. As soon as you add search to an app, the search index drives all the views because it does the filtering, the sorting, the faceting etc. All of those queries are constructed using a simple query string query api. I would love to see this. Query string query | Elasticsearch Guide [7.15] | Elastic
Personally I don’t care about lambdas, maybe some day I will but they seem like a crutch for missing features and I have no visibility or intuition about how they impact performance, scaling and ongoing cost of operation vs something like aws lambda which I already use extensively
Also I don’t care about GraphQL, I love that I can write a simple schema and get a full API that rocks, GraphQL was just a hoop I had to jump through. Perhaps subscriptions will change my mind on the value of GraphQL but it also seems like it holds the platform back because if something is not in the spec we can’t use it? Having two query languages feels kludgy not to mention yet another graph query language seriously you couldnt just use gremlin or sparql or something? I’m certain you have great reasons for YAGQL but the pre-existing stuff already has so much documentation /rant
@iluminae thanks for the link. I have avoided DQL so far and stick exclusively to the GraphQL side of the cloud service. I will certainly take a look but since I already do pre-processing in my aws lambdas upon ingest I will probably implement my custom tokenization there. Downsides to adding DQL to my project are that it’s yet another thing to figure out, it has more power but requires more maintenance of relationships whereas GraphQL has more guard rails and does more hand holding which I appreciate.