Fix queryWithVars to handle UTF-8 values correctly

Posted by srfrog:

Apparently queryWithVars is altering the content of vars with UTF-8 strings when passing them to Dgraph. The main culprit seems to be the toString() method.

Reference Query variables parsed differently to raw values · Issue #2803 · dgraph-io/dgraph · GitHub

paulftw commented :

@srfrog I took a look at this and the referenced dgaph bug.

toString() in txn.ts only forces arbitrary js values to be strings, JS is pretty ok for unicode I wouldn’t suspect foul play here.

toString is really funky in there and can probably be removed altogether - the line above it makes sure type is string-like.
Seems to be an overkill to ignore non-strings - something I wouldn’t enforce if I were designing an API.
If I had to come up with a reason to enforce this requirement and even write a test for it - my guess it’s related to how Dgraph used to parse JSON in headers.

Original bug mentions the HTTP header encoding/parsing and I think that makes a lot of sense, because vars object does get sent as a header here

tl;dr Dgaph parses unicode differently in headers, than it does in protobufs.
We need to either fix the parser or require client libraries to use some encoding + make sure it works for both nodejs and browsers

srfrog commented :

We can send unicode in HTTP headers but seems more trouble than it’s worth - RFC 5978

I think the safest path would be to detect unicode and switch to regular POST form, then all the vars can be encoded and handled correctly. We would need to send the body in there too. Then we can add a new header that instructs Dgraph to get values from form, e.g., X-Dgraph-FormVars: true

For the API string restriction, I agree also. But I think it can be simple to correct. Dgraph expects var values to be string, so you could try to toString() all and send the ones that convert. It’s just a matter refactoring this test a bit.

srfrog commented :

Looking at c.query() I take back my form idea. We could just send the whole query, vars and all, as a JSON object.

{
  "vars": {
    "key1": "value1",
    "key2": "value2",
    "key3": "value3",
  },
  "body": "... the query ..."
}

We can check for Content-Type: application/json at the server and parse accordingly. I think that should have the least amount of changes. Easier to expand too. What do you think?

paulftw commented :

Yes! haven’t seen your comment until now but been thinking the same idea on weekend - we should stop relying on HTTP headers, esp on custom ones.
commitNow, vars (and everything else I’m not aware of) should be parts of the body.
I was going to write a proposal for that.
Not sure how easy it is to control Content-Type in browser and nodejs, if it is it’s a good idea, otherwise a query string param would do /mutate?format=json.

paulftw commented :

there’s also X-Dgraph-MutationType

paulftw commented :

This was an issue in the old API, in v1.1 it shouldn’t be a problem.