GraphQL doesn't preserve order of edges in a collection

diggy · July 4, 2020, 11:26am

Moved from GitHub dgraph/5816

Posted by martaver:

Steps to reproduce the issue (command/config used to run Dgraph).

Using slash graphql, given the schema:

type Player {
  id: ID!
  title: String! @search(by: [fulltext])  
  parent: Player
  children: [Player] @hasInverse(field: parent)
}

Add a root Player with two child Players:

mutation {
  addPlayer(input: [
    {
    title: "root",
    children: [
      {
        title: "A"
      },
      {
        title: "B"      
      }
    ]
  }
  ]) {
    player {
    	id
      title
      children {
      	id
        title
    	}
  }
  }
}

Now, I wish to re-order [A,B] to [B,A]. So I use the following mutation:

mutation SetPlayerTree($id: ID!, $children: [PlayerRef]) {
    updatePlayer(input: {
        filter: {
            id: [$id]
        }
        set: {
            children: $children
        }
    }) {
        player {
        id
        children {
            id            
        }
    }
    }
}

With vars:

{
  "id": "0x4e22",
  "children": [{"id": "0x4e26"}, {"id": "0x4e25"}]
}

Expected behaviour and actual result.

I was hoping for the order of children to have been updated to [B,A], but instead I get [A,B]:

{
  "data": {
    "updatePlayer": {
      "player": [
        {
          "id": "0x4e22",
          "children": [
            {
              "id": "0x4e25"
            },
            {
              "id": "0x4e26"
            }
          ]
        }
      ]
    }
  },
  "extensions": {
    "touched_uids": 25,
    "queryCost": 1
  }
}

I understand that ordering isn’t probably something that dgraph considers because it treats it edges as ‘sets’ rather than lists, but the goal here is to provide utility above and beyond a graphql API implemented with a relational or document db.

A relational db can’t implicitly preserve ordering of inputs without the select query sorting by an index column. So in this respect, dgraph provides equivalent functionality.

A document db can preserve ordering of inputs because when it keeps references to a shared object as a list, the order of elements is also saved. So in this respect, dgraph falls short.

Preserving ordering of one-to-many relationships is a tricky problem for developers to solve. In the case of dgraph and relational dbs, the approach of creating an ‘index’ field on the element entity is problematic - what if the element is a part of many different collections? Would each relation need a different index?

This is exactly the kind of problem that drove people away from relational dbs towards document dbs, and in my mind it’s the kind of problem that dgraph should be able to solve for developers ‘out of the box’.

Actually, dgraph is uniquely positioned to be able to solve this problem extremely elegantly, because the index of an element’s membership in a collection can be stored on the edge as a facet. In this way, the index information is stored in the context of the relationship and the element itself doesn’t need any ‘index fields’.

I firmly believe that this simple fix would be a massive quality of life improvement for developers and really set dgraph above other options for front-end development.

diggy · July 7, 2020, 3:28am

arijitAD commented :

It seems like an easy fix but we need to look into more details and investigate this. In the meantime, you could add an additional field and use (order : { asc : title }) in GraphQL layer to get the desired ordering. @ashish-goswami Can you comment on the ordering behavior?

diggy · July 8, 2020, 11:31am

arijitAD commented :

While discussing with @ashish-goswami we found that we store the Uids sorted in the posting list this helps us to achieve optimization in multiple places. So returning the uids by insertion order would require design changes

diggy · July 8, 2020, 11:41am

martaver commented :

Hi @arijitAD yeah this is a tricky one

I was actually writing some more thoughts about this as you just posted your last comment. I think this is a scenario that weighs up the expected semantics of a graphql api vs the fundamental nature of a graph database. I would argue that for the graphql API, semantics and usability should win over pure optimisation. Let me share my thinking…

The temporary approach you mentioned is a suitable hack for a demo, but I have to emphasise that the nature of a collection is that the ordering is a property of the edges themselves, and not of the elements in the collection. E.g. what if this node is a part of two different collections? Do we create an index field for each collection? This is where facets would be a perfect solution. However this is such a ubiquitous scenario, and functionality this is supported out-of-the-box with document dbs, that it would be a shame to expect consumers of the graphql api to implement a custom schema and parsing to handle it.

I think the other problem is with the approach that dgraph takes towards mutating edges in its graphql api. Right now it seems that the updateXXX resolvers that are generated have two separate APIs: set and remove. As I understand it, the set API ensures that the edges described in the mutation exist. And the remove API ensures the opposite - that they don’t exist.

This is intuitive when working with ‘partial’ updates in graphs of arbitrary size. If you think about setting properties on an object, then you want to set the values of fields you describe in the mutation, and ignore the ones that you don’t mention. Likewise, for deletions, you want to delete the values you mention and preserve the values that you don’t. So far, so good…

The problem is when you have a representational state transfer scenario. E.g. ‘This is the object state I desire, make my object graph look like this’. In this scenario, you want to set all edges mentioned in the mutation and then any edges that are NOT mentioned, you want them removed. This is really important, because in front-end development, a lot of state changes are reduced to sequence of immutable states, and the ‘diff’ itself is not always available in the process - and certainly not described as a sequence of additions and removals.

This is especially significant when working with collections. Much of the time you don’t know which elements were added or removed, but you DO know the desired state. In this scenario, I would need to read the ‘old’ state from dgraph first, run my own diff against the new state, and then encode the resulting changes as add or remove mutations, in order to achieve the collection I desired.

This is somewhere I would expect dgraph could really shine if it provided a discrete API for it. As I mentioned, it’s something that document dbs allow for inherently, and theoretically a graph db should be able to do whatever they do, but better.

In my ideal world, the updateXXX resolvers would expose three APIs:

add: this has the semantics of the current, set api… ensuring all edges exist, ignoring all others not mentioned in the mutation.
remove: stays as is… ensuring all edges are removed, ignoring all others not mentioned in the mutation.
set: ensures that the structure of the payload matches the structure of nodes and edges in dgraph.

One extra consideration is when we want to set the elements of a specific collection only on a node, and ignore other edges. In the case of the new set semantics, being able to defined and preserve index order would be a massive painkiller for front-end developers.

diggy · July 9, 2020, 1:34pm

martaver commented :

As a comparison, neo4j’s graphql engine also makes similar distinctions in the mutations they generate: https://grandstack.io/docs/graphql-schema-generation-augmentation#generated-mutations

diggy · July 14, 2020, 10:57am

arijitAD commented :

The problem is when you have a representational state transfer scenario. E.g. ‘This is the object state I desire, make my object graph look like this’. In this scenario, you want to set all edges mentioned in the mutation and then any edges that are NOT mentioned, you want them removed.

This behavior can be achieved by the diff procedure you mentioned above but cannot be currently done in a single mutation. I will add the update semantics for set in our feature list and will add this in upcoming releases. Thanks for such a detailed explanation.

amaster507 · December 17, 2020, 11:03pm

Is this still on the road map?

Topic		Replies	Views
Lists do not preserve order Issues	3	2378	January 6, 2021
Declarative strict ordering on links via GraphQL GraphQL kind:question , kind:enhancement	3	1146	July 2, 2021
Is the Order of Arrays Preserved? Dgraph	1	1107	August 6, 2020
GraphQL should allow updates of multiple relational levels GraphQL status:accepted , kind:feature	8	785	August 26, 2020
Does Lambda Query results preserve order? Dgraph Cloud ordering , lambda	1	789	December 24, 2020

GraphQL doesn't preserve order of edges in a collection

Steps to reproduce the issue (command/config used to run Dgraph).

Expected behaviour and actual result.

Related topics