FOREACH func in DQL (loops in Bulk Upsert)

Ref of a DB doing something similar to it:
https://neo4j.com/docs/cypher-manual/current/clauses/foreach/
https://docs.mongodb.com/manual/reference/method/cursor.forEach/

Before starting see an example in the next comment block.

Example with Bulk Upsert

upsert {
  query {
    source as var(func: has(linkto)) {
      v0 as uid
      LK as linkto
    }
    LINK as var(func: eq(name, val(LK)))
  }

 mutation @exec(foreach(in: source)) {
    set {
      uid(v0) <friend> uid(LINK) .
      uid(LINK) <friend> uid(v0) .
    }
    delete {
      uid(v0) <linkto> * .
    }
  }
}

What is this query doing? simple. Instead of the user having to execute the same query several times. This query creates a series of parallel query blocks (Sets and deletes). Thus avoiding the user to have to use hacks in the body of the query.

If he does it openly (without control) he will have an unwanted result. Where you may end up adding values that were not anticipated in the query construction.

In the example I will write in the next comment I will demonstrate the problem that happens if you send an open Bulk Upsert.

This is a sample that I’ve created to help a user. But I found it too limited. Due to the need to run 6 times manually.

1 Like

Simple schema sample

<friend>: [uid] .
<linkto>: string @index(hash) .
<name>: string @index(exact) .

Dataset sample

{
   "set": [
      {
         "name": "User 1",
         "linkto": "User 2"
      },
      {
         "name": "User 2",
         "linkto": "User 3"
      },
      {
         "name": "User 3",
         "linkto": "User 2"
      },
      {
         "name": "User 4",
         "linkto": "User 2"
      },
      {
         "name": "User 5",
         "linkto": "User 2"
      },
      {
         "name": "User 6",
         "linkto": "User 2"
      },
      {
         "name": "User 7",
         "linkto": "User 2"
      }
   ]
}

The upsert block to link them

You have to run this upsert one by one until the links are over.

How do I know the links are over? easy, if the upsert response has the field “vars”. There are links. If it has only the “uids” field. It’s over.

upsert {
  query {
    v0 as var(func: has(linkto), first:1) { # Never remove the "first" param.
    LK as linkto
    }
    LINK as var(func: eq(name, val(LK)))
  }

  mutation {
    set {
      uid(v0) <friend> uid(LINK) .
      uid(LINK) <friend> uid(v0) .
    }
    delete {
      uid(v0) <linkto> * .
    }
  }
}

After the link the Query

{
	q(func: has(name)) {
		name
        linkto #just to check if this value exists
		friend {
			name
            linkto
		}
	}
}

Reproducing the issue

Start from scratch and run this schema, and also the mutation sample.

Now run this upsert query

upsert {
  query {
    v0 as var(func: has(linkto)) { 
    LK as linkto
    }
    LINK as var(func: eq(name, val(LK)))
  }

  mutation {
    set {
      uid(v0) <friend> uid(LINK) .
      uid(LINK) <friend> uid(v0) .
    }
    delete {
      uid(v0) <linkto> * .
    }
  }
}

Query for it

{
	q(func: has(name)) {
		name
        linkto #just to check if this value exists
		friend {
			name
            linkto
		}
	}
}

The result (Bad)

This is an undesired result. The relations got repeated and so on. For example., The “user 3” doesn’t have 7 friends. He has only 1 which is “User 2”.

{
  "data": {
    "q": [
      {
        "name": "User 4",
        "friend": [
          {
            "name": "User 2"
          },
          {
            "name": "User 3"
          }
        ]
      },
      {
        "name": "User 5",
        "friend": [
          {
            "name": "User 2"
          },
          {
            "name": "User 3"
          }
        ]
      },
      {
        "name": "User 6",
        "friend": [
          {
            "name": "User 2"
          },
          {
            "name": "User 3"
          }
        ]
      },
      {
        "name": "User 7",
        "friend": [
          {
            "name": "User 2"
          },
          {
            "name": "User 3"
          }
        ]
      },
      {
        "name": "User 1",
        "friend": [
          {
            "name": "User 2"
          },
          {
            "name": "User 3"
          }
        ]
      },
      {
        "name": "User 2",
        "friend": [
          {
            "name": "User 4"
          },
          {
            "name": "User 5"
          },
          {
            "name": "User 6"
          },
          {
            "name": "User 7"
          },
          {
            "name": "User 1"
          },
          {
            "name": "User 2"
          },
          {
            "name": "User 3"
          }
        ]
      },
      {
        "name": "User 3",
        "friend": [
          {
            "name": "User 4"
          },
          {
            "name": "User 5"
          },
          {
            "name": "User 6"
          },
          {
            "name": "User 7"
          },
          {
            "name": "User 1"
          },
          {
            "name": "User 2"
          },
          {
            "name": "User 3"
          }
        ]
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 41964,
      "processing_ns": 8202216,
      "encoding_ns": 90855,
      "assign_timestamp_ns": 560460
    },
    "txn": {
      "start_ts": 70092
    },
    "metrics": {
      "num_uids": {
        "friend": 7,
        "linkto": 14,
        "name": 14
      }
    }
  }
}

Foreach In Query itself

{
	var (func: eq(name, "nodeA")) {
		ED as edge
	}

	products(func: foreach(ED)) @normalize {
		T as num
		to {
			i as num
			product: math(T * i)
		}
	}
}

Result

{
  "data": {
    "products": [
      {
        "product": 3
      },
      {
        "product": 4
      },
      {
        "product": 6
      },
      {
        "product": 8
      }
    ]
  }
}

Instead of

{
	products(func: eq(name, "nodeA")) @normalize {
		name
		edge {
			T as num
			to {
				i as num
				product: math(T * i)
			}
		}
	}
}
{
  "data": {
    "products": [
      {
        "product": 12
      },
      {
        "product": 9
      },
      {
        "product": 12
      },
      {
        "product": 9
      }
    ]
  }
}

Solving Queries (facets, aggregation, maths and so on) with foreach.

Solving this issue

As you can see the partial maths get broken due some internal behavior in Dgraph’s query. To not try to change the design internally. We can add foreach to solve this.

See this query

{
	Anne(func: has(rated), first: 1, offset: 0) {
		name
		rated @facets(r as rating)
		partial_sum: sum(val(r))
	}
	Brian(func: has(rated), first: 1, offset: 1) {
		name
		rated @facets(rb as rating)
		partial_sum: sum(val(rb))
	}
}

With this query, I can overcome the issue related in the report.

I gonna have this result

{
  "data": {
    "Anne": [
      {
        "name": "Anne",
        "rated": [
          {
            "rated|rating": 2
          },
          {
            "rated|rating": 5
          }
        ],
        "partial_sum": 7
      }
    ],
    "Brian": [
      {
        "name": "Brian",
        "rated": [
          {
            "rated|rating": 2
          }
        ],
        "partial_sum": 2
      }
    ]
  }
}

BUT if I use foreach, I can “simulate” this query above by doing

{
	A as var(func: has(rated))

	q(func: foreach(A)) {
		name
		rated @facets(r as rating)
		partial_sum: sum(val(r))
	}
}

This foreach query is basically creating two queries (cuz theres is two nodes “Anne” and “Brian”) internally and merging the results into one single result.

Continuing foreach in Queries

Inspired by Shortest path syntax https://docs.dgraph.io/query-language/#k-shortest-path-queries I thought of a new way with foreach in mind, for query in https://discuss.dgraph.io/t/improve-groupby/4630 example

In this case foreach would be a reserved word.

{
 var(func: has(group_name)) {
    member @groupby(first_name) {
    Group as count(uid)
    }
  }
# It gets the name of the "block" 
# from "first_name" predicate
  foreach(func: uid(Group)) {
    first_name
    last_name
    Total_with_this_name : val(Group)
  }

}

The result

{
  "data": {
    "q": [
      {
        "Michel": [ 
              {
                "Total_with_this_name": 6
              },
              {
                "uid": "0x1",
                "first_name": "Michel"
              },
              {
                "uid": "0x6",
                "first_name": "Michel"
              },
              {
                "uid": "0x8",
                "first_name": "Michel"
              },
              {
                "uid": "0x9",
                "first_name": "Michel"
              },
              {
                "uid": "0xf",
                "first_name": "Michel"
              },
              {
                "uid": "0xa",
                "first_name": "Michel"
              }
      ]
      },
      {
        "Daniel": [
              {
                "Total_with_this_name": 3
              }, 
              {
                "uid": "0x2",
                "first_name": "Daniel"
              },
              {
                "uid": "0xf22",
                "first_name": "Daniel"
              },
              {
                "uid": "0xf4",
                "first_name": "Daniel"
              }
      ]
      },
      {
        "Lucas": [ 
              {
                "Total_with_this_name": 3
              },
              {
                "uid": "0x3",
                "first_name": "Lucas"
              },
              {
                "uid": "0xf44",
                "first_name": "Lucas"
              },
              {
                "uid": "0xf77",
                "first_name": "Lucas"
              }
      ]
      }
    ]
  }

Another example of usage

{
  Block as var(func: type(A), first:1) {
    uid
    bkn as dgraph.type
    link @normalize {
      link2 {
        J as uid
      }
    }
  }
  _foreach(func: uid(Block), blockname: val(bkn) ){
     uid #show my uid from the "Block" var
    count: (func: uid(J)){ # This is just an example
    count(uid)
  })
  }
}

The ideal Result

{
   "data": {
      "A": [
         {
            "uid": "0x1",
            "count": 2
         }
      ],
      "B": [
         {
            "uid": "0x2",
            "count": 4
         }
      ],
      "C": [
         {
            "uid": "0xa",
            "count": 6
         }
      ]
   },
   "extensions": {
      "server_latency": {
         "parsing_ns": 68608,
         "processing_ns": 1716618,
         "encoding_ns": 14078,
         "assign_timestamp_ns": 343783,
         "total_ns": 2205808
      },
      "txn": {
         "start_ts": 61
      },
      "metrics": {
         "num_uids": {
            "": 0,
            "dgraph.type": 0,
            "link": 1,
            "link2": 5,
            "uid": 3
         }
      }
   }
}

@MichelDiz, are there any kind of loops available right now?

Since there is no forEach loop. Is there a plain for or while loop? I want to repeat the same upsert block for every item (15,189 items in my list though) It is not feasable to do it one at a time unless I could do it X times.

Nope. Still need discussion and be a popular request.

:frowning: I will have to script it out in another program then since I need it ASAP. Thanks for the reply.

Here is another use case:

It seems as though the GraphQL pagination applies before the @cascade directive. If I had a while loop in Dgraph we could resolve this: Cascade does not work with pagination by keeping pagination before @cascade directive and if it did not suffice the limit X (meaning there were less than X nodes returned) we could loop through to get the next pagination set or until there was no more data.