What is Dgraph lacking?

There actually is a hack to get this to work with the @auth directive thanks to @amaster507! I will post an article on it when one of us writes one. Join the discord if you need more details immediately.

While we can’t control the lack of transparency with the DGraph Team, this is something the community CAN control. On the discord server we are working on several example apps, documentation, articles, and I believe even one member is even working on instructional videos. I myself am going to create a site to help on this, so stay tuned!

I believe the horror stories are non-cloud users. From my understanding, the Dgraph team does work diligently to help any cloud paying users with problems like this. It also seems to be very rare, and happens with extremely large data sets with large access frequency. Perhaps @MichelDiz can speak more about this, but I don’t know that it is a systemic problem with DGraph. That being said, I don’t question whether or not the DGraph team takes bugs and data corruption seriously (spoiler—they take it very seriously). I would not add that to the negative list personally.

J

4 Likes

if it makes you feel any better, it is not a int64 it is a uint64, making the possible distinct nodes 18,446,744,073,709,551,616.

I did just push a fix for the only one we have seen in production. Not to say there cannot be others lurking.

3 Likes

And if every node only had a 12byte string in it (not counting for space of the predicate name itself) that would be around 220,000,000 Tb of space. I think I will be safe for a long long long time lol. Unless I did my math wrong and this might be only 220,000 Tb then I might have to worry about this in the next 1,000 years.

I don’t think anyone can actually comprehend how many nodes would be in 18,446,744,073,709,551,616

If you added 1,000 new nodes every millisecond, that would be 1M nodes per second… then you could ingest 2^64 nodes in 584,942 YEARS! Now that ingestion speed of 1M nodes(or rows)/second is pushing Dgraph and almost every other HA system to its limits. Maybe in another few years we could move faster and reduce this time to ingest 2^64 nodes to less than 1 millenium which means ingesting data at almost 600 million nodes a second.

4 Likes

That’s a very curious topic you know? every year someone comes with that concern. And feels like they don’t get the whole picture even after giving those numbers lol

e.g. If you create 100 million nodes per day. You would take 505 million years to full fill the DB UID leasing. LOL

So, you can abuse of the UID leasing bro. hehehehe

4 Likes

That’s true. There are cases where the team moves every quarter, but in general, it’s pure silence/quiet - everything just works. The engineers who built the foundations of Dgraph Cloud built it pretty solid and the engineers behind Dgraph Cloud are really good at what they do.

And yes, Manish takes very seriously any data corruption. If you can prove it is not a user mistake, it is a bug and you can have a generous bounty.

4 Likes

Yea I actually tried to manage my own UIDs by leasing all available numbers, and using a 64bit hash as my unique identifiers. But dgraph blows up if you do lease the full uid space (see here). And 64bit was a bit tight for guarantees of no hash collisions anyway.

oh hey handing out bounties?!? :dollar: I’m listening…

1 Like

I just wanted to say, amazing thread here by now. To make it short - Everybody is expressing their concerns and the answers are really pushing my confidence in DGraph. Looking forward to more information about coming features and releases.

3 Likes

@MichelDiz

The thread is titlted: What is Dgraph lacking. I don’t think going over issues is getting my point across so I’m going to speak more high level

What is missing? What will get developers to want to use DGraph?

Money Everything boils down to money. If DGraph makes financial sense, then developers and companies will use it.

Money can come in the form of Cash or Time. I’m not going to weigh in on Cash topics like pricing strategies etc, except having a free/super cheat tier is essential to get new users to try things. Your UX team should be discovering if your free tier is adequate or not.

Let’s talk Time, because that’s my biggest concern and what all my thoughts on DGraph stem from. We have multiple ways time factors in:

  • Time to develop a new product - 5/10 here, all the benefits of a graph schema and auto-generated API are lost to the unpolished parts and poor documentation
  • Time to implement new features on an existing product - 5/10 or 9/10 - If you need to move data, this is painful. Otherwise it’s a pretty smooth experience.
  • Time to pipe data in and out - 9/10 I occasionally get at timeout when getting < 100 records.
  • Time to architect/design schema and data flows - 7/10 Graph schema is amazing. I love it. Without examples of good patterns, I’ve had to come up with my own and re-work them a few times. Adding
  • Time to validate and maintain data integrity - 2/10. No validation, orphaned predicates, poor tooling support, extensive time spent on discuss.dgraph.io needed because docs are inadequate. I estimate 50% of my time in Dgraph has been lost to this.
  • Time to update/maintain the schema and data - 5/10. For the most part this is okay. It’s bad when you want to rename something that has data already, or worse if you want to move it. I have to build my own tooling. If I start with graphql, I have to learn DQL too.
  • Time to learn new syntax and patterns - 6/10. The core stuff is in the docs. There rest is discuss and trial and error. Heaven forbid I had a junior dev learning dgraph without my help.
  • Time to implement scaling - 9/10. Theoretically this is why I pay DGraph big money when I need it. Haven’t got their yet though.
  • Time to coordinate working on a product with my team - 2/10. I started with graphql, and yet 4/5 of my answers have been to go learn DQL, create yet another repo and connect it for basic schema tasks and updates, and then spend extensive time reviewing any change ever because getting anything wrong in DGraph the first time is very very expensive.
  • Time to train less senior team members - 4/10. To use Graphql in Production requires strong understanding of DQL and graphql, and the nuances of how to use both together. The docs are decent for building a TODO or basic movie app. But nothing real and more serious is covered.

There are others, but I think those cover the important parts. The more the DGraph team can increase how much time they will save other developers doing those tasks the more DGraph will attract users and make money. Your competitors require significant less time invested to determine quality of the offering. As is, any new user has to invest significant time to get up to speed.

I architect 6-12 new products/system a year on average. I’ve been developing for 15 years. Because of that I can see the significant value that a product like DGraph has, but when it comes down to it, DGraph costs me too much, not in $$, but in time.

It is hard even for the GraphQL community to try to convince people to a new paradigm and leave the old behind. The old does his job, but it is not the same.

In your whole life, you were introduced to the SQL world(from some course degree or dev courses) and just now you are learning a new thing.

I think the user can expend 3 or 9 months mastering Dgraph. I did the basics in 2 weeks and mastered them in about 2 months.

See? you still are in the SQL paradigm. The way Dgraph works is like a puzzle. Some tasks that you do in SQL are too expensive for us to put natively to users. Deleting something in cascade is dangerous, so ideally, the user should create their own query for it. Thus avoiding unwanted results.

That’s 4 of your quotes from your responses in the last day only in this thread. Do you notice the pattern? You shift responsibility and investment to your users or others. If you want money, you need to add value. DGraph has significant value for certain use cases. It does not for many others.
You are the experts, no one will get DGraph like you do, wo do it for me or give me tooling that will guide me to do it right.

Lets take your 3rd comment, and say that a user spends 3 months mastering DGraph. With an average developer salary of 120k/year (upper mid-level/low-senior) that means you are costing the company 30-90k in salary, per employee that will be working with DGraph.

So using that math, let’s talk through some common scenarios:

  • Is an investor going to allow a company to use DGraph, and burn through an extra 3-9 months of runway? No
  • Is the CTO of the next big unicorn going to stall releasing his MVP an extra 3 months so he can learn dgraph when he’s making almost nothing? No.
  • Is a 100 million/year company going to tell their investors there will be a 6 month delay so they can train and adopt DGraph? Probably not.

You want users, you need to deliver value, and currently the value is there, but the missing polish is too costly.

Here is a high level, database agnostic checklist:

  • Insert One: Pass
  • Insert Many: Pass
  • Update One: Pass
  • Update Many: Fail in GQL, pass in DQL
  • Update Many from other data: Fail in GQL, pass in DQL
  • Update Many Conditionally: Fail everywhere. (See bugs below)
  • Update Many using data across relationships: Fail everywhere (See bugs below)
  • Delete One: Pass
  • Delete Many: Fail in GQL, pass in DQL
  • Read One: Pass
  • Read Many: Pass
  • Read Many Nested Filters: Fail in GQL, pass in DQL (or so I’ve read)
  • Read Many Nested Loops: Fail everywhere (See bugs below)
  • Update One to one relationships: Pass
  • Update one to Many relationships: Pass
  • Update many to may relationships: Fail (See bugs below)
  • Create new schema: Pass
  • Update Schema: Fail, data integrity at risk (Acceptable short term fix could be fixed with tooling, but DB constraints should be added)
  • Read Schema: Pass
  • Delete Schema: Fail, data integrity at risk (Acceptable short term fix could be fixed with tooling, but DB constraints should be added)

Summary
Hopefully this helps me get my point across. Please note that I have spend hours writing these thoughts. I really really do want to see DGraph succeed because it would be amazing for me. Here’s to Dgraph getting some of these core operations in soon so we can both make more money together!

PS - Here is one instance of the bug I mentioned: Bug with multi-level (nested bocks) in Upsert Block
And another similar one: Fixing Inverse Relationships

7 Likes

Just responding to some of this to be fair to DGraph, while I do agree with everything you said.

There are some issues that are general to the GraphQL spec that simply cannot be done. This has nothing to do with DGraph. I think it is more fair to compare with what the competition can do with GraphQL. What can be done directly in GraphQL, and what can be done directly under-the-hood (in DQL or whatever competitor language).

However, it is very interesting to think this way to see holes in implementations.

There are a lot of half-truths to this as well.

Technically this is not true for GraphQL. You give a search criteria, and it updates all items accordingly (update many). If you want to update individual items, you could use several mutations within one mutation block, instead of a set array (like add) items. So only one set is possible, but many mutation blocks in GraphQL. This is by design, although I agree it is not intuitive.

I think both bugs you quoted were from the old github repo and imported in. They may not even be an issue anymore. For example, you can do conditional upserts in dql, which checks this box. I cannot speak for the specific bug you listed, but I believe it is out-of-date. You should post specific use cases to see if it is possible, as I don’t know how these generalities hold up. It would also be fair to see what other databases claim to do to solve the same issues. DQL should pass this.

Yes, impossible in DGraph GraphQL, and potentially in GraphQL at all.

I am not sure how this fails in GQL. What I believe you’re talking about is deleting nested items (@cascade delete), which is a different thing. This was mentioned as well.

I can confirm there are no issues in DQL with my own use case, here is another example. Yup, a problem with GraphQL as we have covered.

Not a bug in DQL both are very possible in DQL and GraphQL. I am not sure what you’re referring to here.

The @hasInverse link you mentioned is a schema problem I believe (maybe fixed now?), so unrelated something that cannot be done.

So, I think update here is too generic. I assume you’re talking about the lack of renaming nodes, which I agree is a problem and should be built into the schema editor.

Yes, it would be nice to have a delete option built into the schema as well. Also if the data studio worked more like Firebase to do all CRUD options visually.


IN SUM:

  • You basically need Nested Filters and Foreign Key constraints in GraphQL. These were on my list a few posts above (lol… we all want the same features).
  • Things like variables and conditions in GraphQL may not be possible in compliance with the GraqphQL spec (although @amaster507 has written many posts on it).
  • I also think DQL can handle the uses cases you mentioned and actually pass all of them.
  • Some of the time issues can be fixed with better documentation. I think the user community can concentrate on this while the DGraph team works on missing features.

Just my take,

J

3 Likes

Have you leased it node by node? It is quite impossible to do in so a short time.

These are to prove data corruption. The bounty is to Badger tho, but is the same thing.

1 Like

That’s kind of my own opinion. I’m talking as a user too, not on behalf of Dgraph. Before getting to work with Dgraph I was building my things and for other people. I spent time and money that I didn’t have. No one paid me 90k to learn Dgraph :stuck_out_tongue: and today Dgraph is several times better and easier. I made myself(in DQL) by reading docs and digging the code. Nothing was hard or terrible. I was studying GraphQL(redis, mongo, postgress and others) before finding Dgraph(at that time there was no GraphQL feature in Dgraph). I think it was easier for cuz of that. The familiarity with GraphQL.

Everything costs money and time. Especially if it is a new thing.

Why not? Maybe because they expects you to come to the job opportunity already knowing those things. Right? They ask for impossible things sometimes…

None of this is Dgraph’s guilt. The GraphQL spec is very rigid. We cannot create DB solutions/functions/etc for an API language that we have no control(although Dgraph is part of the GraphQL Foundation). Think about that. All the things that the engineers have added to GraphQL are kind of “out of spec” sometimes.

Some points are valid tho.

1 Like

Do you think it would be possible to add out-of-spec GraphQL features that can be enabled with a Dgraph config? Or would that be a bad idea? What I’m thinking is that Dgraph add an out-of-spec feature, either Dgraph or the community builds an extension for Apollo (or urlq or whatever) to make it work with whatever client you’re using. Then Dgraph submits a formal suggestion to the GraphQL spec team to add the feature to the spec. Doesn’t make sense to me that the power of Dgraph should be held back by the spec—which is really an arbitrary thing that is controlled by Facebook(?)

@BenW In my opinion, this would not be wise. There is still so much that can be done within the specification at the moment. And just the fact alone that there is so much tooling built to the specification that breaking it would most likely break who knows what tooling, would not be wise.

1 Like

Agree that this should not be immediate priority.

But if you’re only making it available to users who specifically enable it, and make it clear that it could break things… then I don’t see the problem. Before Dgraph I was building a GraphQL API by hand with complete disregard for the spec, I just made it do whatever I needed it to do. Never ran into problems. What I would like is this amount of freedom and control over my API.

Following Anthony, GraphQL is like that for consistency/predictability and security. We cannot require an API language to do the work of a DB language. I think the custom query solves part of this problem. It’s basically what other DBs do when they add GraphQL to their code base in some way. However, there is still a lot that can be done under the spec.

Nope, GraphQL foundation has total control of the Spec now.

There are profound changes to ask. That’s too much pretentious for us to request. We should work for now with what we have.

The logic behind GraphQL is to build your own GraphQL server by doing the whole business logic in the backend. I can’t imagine Dgraph predicting every possible means of business logic and supporting them all out of the blue. Or do something general. I think a pretty good part of the logic will stay with the DQL, lambdas and so on. Which it doesn’t mean we won’t fix or add new things to prevent the user from having so much work with the business logic.

I think you were covered by the spec, cuz no one can do anything out of the spec. In your backend, to resolve the query you can do anything you want. In Dgraph is different cuz everything is written in stone. The only logical freedom to solve a query is using Custom and so on.

1 Like

This doesn’t sound very ambitious Michel! There’s no reason that Dgraph couldn’t be the most influential force on the GraphQL landscape. From my point of view Dgraph has a responsibility to put forward suggestions to the GraphQL Foundation, to ensure that the spec doesn’t prevent Dgraph from being able to achieve everything that’s technically possible. By the looks of it, adding feature requests is as simple as creating Github issues on their repo: Issues · graphql/graphql-spec · GitHub.

Seems like this is similar to how things work with the HTML and CSS specs:

  • Browser vendors come up with new features
  • They submit feature requests to the W3C
  • A JS or CSS feature that is only available in xyz browser becomes popular with developers
  • Developers build polyfills so they can start using the feature in their projects
  • W3C recognises that the feature is required, adds it to the spec
  • Other browsers implement the feature because it’s now in the spec

That being said, I don’t know enough about what the spec is actually preventing to be able to make informed comments about this.

EDIT: In fact this could be incredible marketing opportunity for Dgraph—to implement experimental features that show what is really possible with GraphQL if you put a v12 engine inside it. Experimental features can be only be used with a special build, or with ENV flags. Build some demos of the feature, create feature request on the Spec repo with links to the demos. Get the GraphQL community talking about it. Good way to demonstrate what’s really possible with GraphQL, and that Dgraph is what makes it possible.

4 Likes

This is probably bringing Dgraph full circle for those who have been around for a while. Be careful with what you wish for, just saying that if Dgraph is the only one that implement X feature in GraphQL then it probably won’t get much attention from the foundation because it is specifically language and database agnostic.

https://github.com/dgraph-io/dgraph/issues/933#issue-228222037

This leads to one of those situations where it seems very deja vu to me and probably others.

Created a separate topic for this:

Can you point me to where the bug fix for this one is:

I tried this on Tuesday and it still wasn’t working. I have a table A, with an array relationship to B. I need to get every row of A with their B and perform an update for each one. So if A1 has B1, B2, B3, I need to do an update with [A1,B1], [A1, B2], [A1,B3], and I need to also include A2…A500 and it’s relationships.

AFAIK This is still not possible.

Update Many: Fail in GQL, pass in DQL

I gave this a fail because of the node limit error, it makes it almost impossible to use effectively because in some cases the node limit happens updating 200 records, and other times 500, also the node limit is enforced across multiple mutations in the same request. There is absolutely 0 reason to use this over DQL upsert and should probably be just removed from the graphql API if they aren’t going to improve it.

Update Many Conditionally: Fail everywhere. (See bugs below)

I can’t remember why I failed this one, so yeah, this is okay.

Update Many using data across relationships: Fail everywhere (See bugs below)
Update many to many relationships: Fail (See bugs below)
Read Many Nested Loops: Fail everywhere (See bugs below)

These bug still exists, and I’m referencing what I described at the top of this post. It essence these all boil down to the ability to permutate across relationships. In SQL, if I do a join, I will get a row for every unique combination of A and B that are joined via foreign keys. How do we do that here and get A1+B1, A1+B2, A1+B3,A2+B5,A3+B88, A3+B89 and so on and be able to use those in filters and upserts.

Otherwise yea, I think this conversation as been good for me as I have realized that that if they fixed this, added nested filtering and foreign keys then it would probably address 95-99% of my current biggest issues.
I understand that some things are not possible in Graphql due to the spec. Some/many of those issues could be addressed with tooling as they are mostly used by developers and not really needed.
I’m allowed to make mistakes, and when I do it can be punishing for a new user to have to figure out how to fix them, especially if they started using just the graphql API.

1 Like

So obviously DGraph (DQL and GQL) does not handle every single scenario. I am not sure it is even possible to do so in GraphQL the way the specs are.

However, I do believe there are inventive ways we can do things without breaking the GraphQL specs, so everything could be possible. You could probably do another table to compare what other GraphQL clients can handle; nested filters and @reference directive would probably bring DGraph GraphQL up to par with them.

Here you can see every scenario I could think of:

Update Type GraphQL (Link) DQL
- Flat Relationship - - -
All Fail Pass
One Pass Pass
Many (Conditional) Pass Pass
Many (Nested Conditional) Fail Pass
Many (Specific) Fail * Pass
- Nested Relationship - - -
All to Nested One Fail Pass
One to Nested One Fail (Pass) Pass
One to Nested Many Fail Pass
All to Nested Many Fail Fail
Many (Conditional) to Nested Many Fail Fail
Many (Conditional) to Nested One Fail (Pass) Pass
Many (Nested Conditional) to Nested Many Fail Fail
Many (Nested Conditional) to Nested One Fail Pass
Many (Specific) to Nested One Fail (Fail *) Pass
Many (Specific) to Nested Many Fail Pass

* - Can be done with multiple blocks in one mutation
() - Whether or not the nested link can be updated

Notes:

  • As far as I know, you can’t update all nodes in GraphQL without a filter, let me know if I’m wrong
  • @MichelDiz - Are these many-to-many cases possible in DQL like in this thread?
  • All is referring to no filters
  • Specific is referring to multiple ID or @id inputs

Let me know if I made a mistake somewhere, as it is a lot.

Now we know what to strive to do.

J

5 Likes