Add support for datas UNWIND

Moved from GitHub dgraph/5165

Posted by Sceat:

What you wanted to do

Dynamically use external datas inside queries by unwinding arrays

with unwind:

upsert {
	query($names: [string], $ages: [int]) {
		unwind $names as user_name
		unwind $ages as user_age
		user as var(func: eq(name, user_name))
	}
	mutation {
		set {
			uid(user) <age> val(user_age) .
		}
	}
}

This would map each value in the array names and ages to the current Dgraph map index so that the first found user receive user_name from names[0] and so on.
It would be even better to support $users: [user] to be able to shorthand to

unwind $users as an_user
user as var(func: eq(name, an_user.name))

without unwind: (current behavior)

upsert {
	query($name_1: string!, $name_2: string!) {
		user_1 as var(func: eq(name, $name_1))
		user_2 as var(func: eq(name, $name_2))
	}
	mutation($age_1: int!, $age_2: int!) {
		set {
			uid(user_1) <age> $age_1 .
			uid(user_2) <age> $age_2 .
		}
	}
}

What you actually did

Thinking about manually unwinding my datas by interpolating as many query lines as needed and propagating variables with dynamic aliases like in the without unwind exemple

Why that wasn’t great

That makes the query extra heavy and insanely verbose for big arrays of nodes

Any external references to support your case

UNWIND in Cypher

MichelDiz commented :

This is a kind of duplicated of #4615
But, not sure what would be the best approach.

However, I didn’t understand how it is related to UNWIND Neo4J’s. There seems to be something else entirely.

Also, Internally I had proposed a “custom block”.

{
  var() { 
  # This is a "custom var block" all values here doesn't show up in the query result.
   ST as string("Alice") # We could use it in any query help to avoid repetition.
   T1 as string($a) # This is getting a value from graphql variable.

   Users  as  uidAt(Var1, var2) # converts UID to hex number to be used in a query, also to work with uid_in function.
   Users2 as uidAt("0x1, 0x2") # Just holds UIDs to be used in a query.

   DP as math(4) # hold a number or do a math, as we do in our examples in docs
   T2 as math($b + 33) # This is getting a value from graphql variable and summing.

   DP2 as int(1) #just hold a number
   DP3 as float(1.01111) # just hold a float number

   DT as dateTime(2006-01-02T15:04:05) # just hold dateTime
     }
  }

The custom block could be used instead of directly inserting GraphQL Variable into the mutation block.

Just for internal ref Custom Block - Discussion

btw, I think the order in Dgraph syntax would be user_name as unwind $names

Check https://dgraph.io/docs/query-language/#value-variables

It is important to note that Upsert Block does not yet support GraphQL Variables (Also known as Query Variables). We need to support this before doing anything else related.

Sceat commented :

Hum Custom blocks seems pretty handy, great proposal ! it would also go nicely with unwinds.

I didn’t even noticed that upsert doesn’t support variables as in the javascript client it seems i can set variables on the query and the mutations

Regarding the unwind statement in cypher it allows to take anything iterable and use it in a graph sequence

UNWIND [1, 2, 3] AS num

with this we have the variable num that represent each value of the array but in sequence, so on the first node found the value will be 1 then on the second node it will be 2 etc…

If one wants to upsert an array of users based on their name, the unwind operation allow to write a compact query

upsert {
	// not sure where you put the bloc (your forum link is broken)
	// so let's say we put it there
	var() {
		user_data AS UNWIND [{name: 'Alice', age: 20}, {name: 'Bob', age: 25}]
	}
	query {
		user as var(func: eq(name, user_data.name))
	}
	mutation {
		set {
			uid(user) <age> val(user_data.age) .
		}
	}
}

// the above would set both Alice and Bob with their respective ages

without unwind you actually have to do the following

upsert {
	query($alice: string, $bob: string) {
		alice as var(func: eq(name, $alice))
		bob as var(func: eq(name, $bob))
	}
	mutation($alice_age: int, $bob_age: int) {
		set {
			uid(alice) <age> val($alice_age) .
			uid(bob) <age> val($bob_age) .
		}
	}
}
// it works but in case you have 100 users your generated query is a bit insane

MichelDiz commented :

The link is an internal discuss.

user_data AS UNWIND [{name: 'Alice', age: 20}, {name: 'Bob', age: 25}]

I don’t think that would work.

The parser result would look like this:

...

set {
   uid(user) <age> "2025" .
	}
...

Because the mutation block is not able to iterate over objects. We need to implement another solution (like “foreach user_data create a mutation block”) before that be possible.

There is still the fact of “dot notation”. This is not supported in GraphQL+-. We have to go in parts.

Sceat commented :

hum i don’t understand how it could result in uid(user) <age> "2025" . :thinking: as the following totally works and seems normal for a graph database
according you have 2 nodes as

{ name: "Alice" }
{ name: "Bob" }

executing

upsert {
  query {
    var(func: has(name)) {
      name as name
    }
	// in the first sequence we find Alice's name
	// in the second sequence we find Bob's name
  }
  mutation {
    set {
	  // so here we have 2 parallel sequence
	  // one for Alice and one for Bob
      uid(name) <surname> val(name) .
    }
  }
}

would give

{ name: "Alice", surname: "Alice" }
{ name: "Bob", surname: "Bob" }

I used Objects in my exemple because it would be pretty handy but we can simplify by using simple valid types like

user_name AS UNWIND ['Alice', 'Bob']

if the nodes already exist this unwind would give the exact same representation as

q(func: has(name)) { user_name as name }

exept we couldn’t call uid(user_name) with the unwind version as it’s just a string value

MichelDiz commented :

This is different, this is a bulk upsert. It is a set of multiple mutations aligned. That is, for each node found it will perform that mutation block as a “template”.

See, Dgraph iterates over its own Entities. But not in nested or even multiple values. And there is nothing that can dynamically generate multiple lines of RDF in the mutation block, or even create multiple mutation blocks based on the value/map (for example “user_data AS UNWIND”).

This type of interaction does not exist in Dgraph.

Take for example issue #4712 and #4779. In them I had to use an intermediate block so that I could pass a value from one node to the other. Without this block (me), the operation would not work. And if I do a Bulk upsert, the values will ​​get messed up. This problem with aggregations also occurs in Queries of Facets (see #4160). Where to get accurate values ​​I have to limit the query to 1 result at a time.

This problem happens with aggregations and also nested blocks.

The nested block is the key point to explain why it doesn’t work. In issue #4779 - without the intermediate block, I can’t get the values ​​"title, released, tagline" (see the example in 4779 issue). And if I don’t limit it to 1 to 1 operation, the values ​​get messed up, get mixed up and it will mess up with the DB.

That’s why I find your UNWIND proposal complicated. We need to solve other design problems first. These uses cases were never foreseen and Upsert Block is only a few months old.

As I said before, to start thinking about UNWIND. Some problems will need to be resolved:

1 - Support GraphQL Variables in Upsert Block.
2 - Have something similar to the custom block for hold/put the GraphQL Variables.
3 - We need to solve the issue of issue #4779 - allowing free use of variables and objects in the blocks regardless of level. Without having to use an intermediate block. This would be the case of the UNWIND map of objects.
3.1 - Maybe create a loop function like “for each”. If issue #4779 is not possible to resolve.