Response has invalid UTF-8

Posted by manishrjain:

Posted via: Response has invalid UTF-8 · Issue #1789 · dgraph-io/dgraph · GitHub

@rajeshkmindix

Hi,

I get the following error when trying to insert some nodes into dgraph.

com.google.protobuf.InvalidProtocolBufferException: Protocol message had invalid UTF-8.

I narrowed it down to

ManagedChannel channel = ManagedChannelBuilder.forAddress("localhost", 9080).usePlaintext(true).build();
DgraphGrpc.DgraphBlockingStub blockingStub = DgraphGrpc.newBlockingStub(channel);
DgraphClient dgraphClient = new DgraphClient(Collections.singletonList(blockingStub));

for(int i = 0; i < 1000; i++) {
  JsonObject json = new JsonObject();
  json.addProperty("id", i);
  json.addProperty("name", "abcdefgh" + i);

  System.out.println("for " + i);
  Mutation mu =
          Mutation.newBuilder()
                  .setCommitImmediately(true)
                  .setSetJson(ByteString.copyFromUtf8(json.toString()))
                  .build();
  dgraphClient.newTransaction().mutate(mu);
}

After inserting around 100 to 150 nodes. I get the above error. If I rerun the above code, I get the same error for the first record itself. Every subsequent mutation request gets the same error.

I installed using
curl https://get.dgraph.io -sSf | bash

Ran it using
dgraph zero --port_offset -2000 dgraph server --memory_mb 2048 --zero localhost:5080
Need help with this.

Thanks in advance,
Rajesh

deepakjois commented :

This report looks similar to #25. Investigating now.

deepakjois commented :

So this looks like a bug in the server.

The protocol buffers definition for TxnContext is like this:

message TxnContext {
  uint64 start_ts = 1;
  uint64 commit_ts = 2;
  bool aborted = 3;
  repeated string keys = 4;
  LinRead lin_read = 13;
}

note the repeated string keys = 4. This means that (according to the proto language spec) the keys field must be a valid UTF-8 string . But the server currently sends a stream of bytes without bothering to check if it is valid UTF-8. The java client (autogenerated code) tries to decode the stream as valid UTF-8 and throws an error as it is not able to decode it as valid UTF-8.

This problem does not show up in the Go client, because it just constructs a string using a byte slice returned from the server. It does not bother decoding it.

The fix is to base64 encode the strings before sending them, so that they are guaranteed to be valid UTF-8. This will require a fix to the server. Once it is fixed you will need to run the latest Dgraph server. The client code should not need to change.

I will keep this issue updated. (cc: @pawanrawal)

deepakjois commented :

This is now fixed in Fingerprint and base36 encode Keys in TxnContext · dgraph-io/dgraph@f27fed9 · GitHub

We will be doing a new v0.9.2 server release today which will contain this fix. You can try to run your code against that. I believe it will work.

Meanwhile I have added a test for this case here: 4562c788f7868070f239883a5191997980fcec42

deepakjois commented :

Dgraph v0.9.2 is released, which should work with the client at v0.9.2.

Closing now, feel free to reopen if this is not fixed or there is a related problem.