Best way to update a predicate on all nodes in dgraph

Hi,
We’ve ingested a fair amount of data (500GBs) and now we’ve realized we ought to have ensured all our predicates off of the nodes were lower-cased. Is there a “best-practice” for performing what would amount to an update all query?

Thanks,
Ryan

Hi @MichelDiz,
I may not have phrased my question the best… I’m interested in lower-casing the value of a specific predicate type. Is the post you linked for renaming the key name of a predicate?

Thanks,
Ryan

No, there’s no lower-casing function for values. Or anything like that for values.

To lower-case all the values, is there a way to pass a script/function into dgraph to apply to the nodes? (We run elasticsearch too… you can pass “painless” scripts in there)

Alternatively, we’re thinking if there’s a way to query every node, we could then lower-case them in some custom scripting and perform an update to the node.

Is there a way to one-up query each node in the database, like iterate through each one?

Thanks,
Ryan

You can use an upsert approach with any lang to do this. It can a Py script or JS. The approach would be

  1. do a query, grab the UIDs and the value
  2. Iterate the uids and values and parse the value correctly.
  3. Send a new mutation.

There’s no way to pass scripts to run in the cluster itself.

Something to evaluate(PRs, RFC and feature requests are welcome). But elasticsearch has 13 years old of development. Dgraph is 7. We are in the way.

Yes, but I don’t get the question. Upser Mutation does that. But not the way you want. If you need those function check if we have feature requests. If not, open one.

Cheers.

This is the general process we figured we’d be left with. The challenge we have is getting an initial set of all nodes we need to update without a query timeout occuring. We need to issue a “select all”… We’re trying to think of other ways to access every node in the graph without taking that approach - like using the DGraph UID Hex. We thought it was a “one-up” type value… 0x1, 0x2, 0x3, etc.

For 1 to 10 million:
   1. Query 0x1
   2. Fix case issue
   3. Issue mutate

Is the Dgraph UID constructed in such a way we could iterate through all UIDs?

Thanks,
Ryan

Yeah, you can do that. You can check the UID leased instead of setting 10 million right away.
Because if you do an upsert on a non-existing UID it may return an error or create new ghost nodes.