Node not suddenly not found after updating it

I have a project that uses dgraph as the read model in a CQRS architecture. I have an integration test suite that updates my domain models to produce events. The events are then published to kafka. A handler then picks up those events and updates dgraph. I then have a function that is retried within the test using a backoff algorithm to query dgraph in a different transaction to make sure that the changes are actually made to dgraph.

I have a problem in a test suite that tests about 15 changes to the model. The problem is that my event handler would see the event and update dgraph. Within the event handler, I also added a debug query to query dgraph in a new transaction immediately after updating it to make sure that it’s updated. The problem is that in the test suite, if I query dgraph (using the same query as the debug one in the event handler), it would return no results (i.e. the node could not be found), despite retrying for multiple minutes and knowing that dgraph was indeed updated already. However, I am sure the node did exist because I just updated it and queried for it within the event handler! This appears to be some sort of race condition, because the problem does not always appear. I can rerun the test suite multiple times and sometimes it passes with no problems, while it fails during some other runs.

I’d like to debug dgraph further to see why this is happening. I’ve started alpha with -v=0, but it did not seem to produce anything useful. The log levels were also undocumented, so it’s unclear what the valid values for -v are.

Is there some way to log all the grpc requests received from and sent to clients in the dgraph alpha server to see exactly what’s going on?

You can look at the query tracing info for different Alphas with the OpenCensus zPages exposed from the Alpha’s HTTP port at /z/tracez: https://docs.dgraph.io/deploy/#tracing. The traces also configurable to be sent to Jaeger.

Transactions started after the write was committed will see the updates that have been committed since that commit timestamp.

Thanks, Daniel! I’ll checkout the traces to see what I can find there. The strange thing is that when I try to query dgraph for data in the retry function, I am creating a new transaction each time. Theoretically, this should always return some data, because I am creating a new transaction for each query over a minute and the mutation was definitely written to dgraph during that time.

That is strange. You can check that the start timestamps of the queries come after the commit timestamps of mutation transaction. You could also try running the queries from a different client such as the Ratel UI.

You can also use the dgraph debug tool on your p directory to verify that the committed data is stored on disk. https://docs.dgraph.io/howto/#using-the-debug-tool

Can you also share the following?

  • dgraph version
  • cluster configuration (dgraph zero and dgraph alpha commands used to run Dgraph)

I am using the dgraph/dgraph:latest (id c1e0391441f9) docker image.

I am running the cluster in docker with 3 dgraph containers using the following commands:

  • dgraph zero --my=dgraph_zero:5080
  • dgraph alpha --my=dgraph_alpha:7080 --lru_mb=2048 --zero=dgraph_zero:5080 -v=0
  • dgraph-ratel

I’ve tried disabling a bunch of other tests to isolate the test that tends to fail, but it seems to be hard to trigger in that case. I think the problem tends to appear when there are a lot of tests have been executed, with the database being dropped and the schema updated multiple times. The tests do not run in parallel and are run one after another. I am not sure how feasible it is to use Ratel or the dgraph debug tool to query the data as the problem is most reproducible when the code for dropAll and schema update is enabled for my tests, so the data is lost after each test suite is executed.

Another interesting thing is that when I query the node after it was updated in the test suite, the node can no longer be found, even though the mutation was to update data or connect a new node to the existing node.

Using some debugging statements, here’s the timeline of one of these failures.

The structure of the data looks like this: parent -> child. For this mutation, I am updating the child and querying the parent by its id and retrieving the child as part of the parent’s fields. For all queries a brand new ReadOnlyTxn() was started.

  • 2019-04-21 22:54:07.3160467 +0000 UTC m=+6.908355901: Event handler received event to update child with id a66e938d-f82c-413e-be93-1de7267d1b27 (this is not the uid, but we query for the uid and update it in a transaction). The child is then updated.

  • 2019-04-21 22:54:07.3461316 +0000 UTC m=+6.938440801 The child was updated in the event handler and we query dgraph for the parent using its id and the child as a field immediately in the event handler. The parent’s id is 1e342620-84ff-4460-b26d-f39f800d0b7e and from the JSON response, I can see that the parent node is found and the child fields contain the child node with the id a66e938d-f82c-413e-be93-1de7267d1b27.

  • 2019-04-21 22:54:07.7504847 +0000 UTC m=+7.342794101 In the test suite we run the same query as the one in the event handler to retrieve the parent node and the child as a field. The query queries for the parent node using its id. The number of parent nodes found is 0.

  • 2019-04-21 22:54:08.1868399 +0000 UTC m=+7.779161301 Since nothing was found in the last query in the test suite, we try again. Again, the number of parent nodes found is 0.

  • 2019-04-21 22:54:09.2730528 +0000 UTC m=+8.865375001 Since nothing was found in the last query in the test suite, we try again. Again, the number of parent nodes found is 0.

  • 2019-04-21 22:54:11.3053377 +0000 UTC m=+10.897660001 Since nothing was found in the last query in the test suite, we try again. Again, the number of parent nodes found is 0.

  • 2019-04-21 22:54:14.8672491 +0000 UTC m=+14.459569801 Since nothing was found in the last query in the test suite, we try again. Again, the number of parent nodes found is 0.

  • 2019-04-21 22:54:20.5096024 +0000 UTC m=+20.101913101 Since nothing was found in the last query in the test suite, we try again. Again, the number of parent nodes found is 0.

  • 2019-04-21 22:54:25.4128616 +0000 UTC m=+25.005178101 Since nothing was found in the last query in the test suite, we try again. Again, the number of parent nodes found is 0.

  • 2019-04-21 22:54:34.1848529 +0000 UTC m=+33.777187401 Since nothing was found in the last query in the test suite, we try again. Again, the number of parent nodes found is 0.

  • 2019-04-21 22:54:42.3191121 +0000 UTC m=+41.911423601 Since nothing was found in the last query in the test suite, we try again. Again, the number of parent nodes found is 0.

  • 2019-04-21 22:55:04.330284 +0000 UTC m=+63.922629401 Since nothing was found in the last query in the test suite, we try again. Again, the number of parent nodes found is 0.

  • 2019-04-21 22:55:19.2863382 +0000 UTC m=+78.878676601 Since nothing was found in the last query in the test suite, we try again. Again, the number of parent nodes found is 0.

The test is then failed because the parent node was not found and a timeout was reached.

Are the traces purged when dropAll is executed? I am finding the traces to be missing a lot of mutations and queries, even though --trace is unset and should default to 1.

zPages only keep the latest trace data and discard older events. If you’re looking at the trace data from the zPages, then that’d explain it. DropAll does not affect traces.

You can also run Jaeger and export the traces to Jaeger with --jaeger.collector http://<jaegerhostname>:14268. You would be able to query all the traces Jaeger collects. Jaeger’s in-memory store is simple to run temporarily. It stores all the traces in memory.

Are the event handler and the test suite are talking to the same Dgraph cluster? And DropAll is being run before both of them are running, meaning everything is run from a clean state?

Thanks, Daniel. I’ve added jaeger to the setup and can see all the traces in there to investigate.

The event handler and test suite use the same cluster as there is only 1 dgraph cluster in my test setup. I have a set of integration tests that tests against dgraph. Before each test is run, DropAll is used to clear the database and then the schema is updated. The problem is that the failures are inconsistent. Sometimes, they fail and sometimes they pass. If they fail, I can retrigger the tests multiple times to try and get them to pass.

After digging through the traces and dgraph alpha logs, I found that the dgraph client is dropping the database and updating the schema multiple times during a test run.

This is the log from alpha for 1 test run. There should only be 1 dropAll and 1 schema update at the beginning of the run, but I am seeing 9 dropAlls and schema updates (log is truncated due to post character limits):

I0422 01:13:21.898490       1 server.go:275] Received ALTER op: drop_all:true 
I0422 01:13:21.898526       1 server.go:602] Got Alter request from "172.25.0.10:49212"
I0422 01:13:21.906242       1 server.go:303] ALTER op: drop_all:true  done
I0422 01:13:21.916701       1 server.go:275] Received ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n" 
I0422 01:13:21.916753       1 server.go:602] Got Alter request from "172.25.0.10:49212"
I0422 01:13:21.917789       1 server.go:325] Got schema: [predicate:"emailAddress" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"hash" value_type:STRING directive:INDEX tokenizer:"term"  predicate:"id" value_type:STRING directive:INDEX tokenizer:"term"  predicate:"definitionID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"fieldID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"name" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"organizationID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"selector" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"valuesData" value_type:STRING list:true ]
I0422 01:13:21.926959       1 index.go:686] Deleting index for attr emailAddress and tokenizers []
I0422 01:13:21.927012       1 index.go:704] Rebuilding index for attr emailAddress and tokenizers [hash]
I0422 01:13:21.927173       1 mutation.go:145] Done schema update predicate:"emailAddress" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:21.931496       1 index.go:686] Deleting index for attr hash and tokenizers []
I0422 01:13:21.931541       1 index.go:704] Rebuilding index for attr hash and tokenizers [term]
I0422 01:13:21.931722       1 mutation.go:145] Done schema update predicate:"hash" value_type:STRING directive:INDEX tokenizer:"term" 
I0422 01:13:21.934884       1 index.go:686] Deleting index for attr id and tokenizers []
I0422 01:13:21.934921       1 index.go:704] Rebuilding index for attr id and tokenizers [term]
I0422 01:13:21.935091       1 mutation.go:145] Done schema update predicate:"id" value_type:STRING directive:INDEX tokenizer:"term" 
I0422 01:13:21.952165       1 index.go:686] Deleting index for attr definitionID and tokenizers []
I0422 01:13:21.952223       1 index.go:704] Rebuilding index for attr definitionID and tokenizers [hash]
I0422 01:13:21.952401       1 mutation.go:145] Done schema update predicate:"definitionID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:21.959945       1 index.go:686] Deleting index for attr fieldID and tokenizers []
I0422 01:13:21.959995       1 index.go:704] Rebuilding index for attr fieldID and tokenizers [hash]
I0422 01:13:21.960154       1 mutation.go:145] Done schema update predicate:"fieldID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:21.963956       1 index.go:686] Deleting index for attr name and tokenizers []
I0422 01:13:21.964011       1 index.go:704] Rebuilding index for attr name and tokenizers [hash]
I0422 01:13:21.964167       1 mutation.go:145] Done schema update predicate:"name" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:21.967621       1 index.go:686] Deleting index for attr organizationID and tokenizers []
I0422 01:13:21.967678       1 index.go:704] Rebuilding index for attr organizationID and tokenizers [hash]
I0422 01:13:21.967852       1 mutation.go:145] Done schema update predicate:"organizationID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:21.975928       1 index.go:686] Deleting index for attr selector and tokenizers []
I0422 01:13:21.975978       1 index.go:704] Rebuilding index for attr selector and tokenizers [hash]
I0422 01:13:21.976109       1 mutation.go:145] Done schema update predicate:"selector" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:21.982775       1 mutation.go:145] Done schema update predicate:"valuesData" value_type:STRING list:true 
I0422 01:13:21.986163       1 server.go:329] ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n"  done
I0422 01:13:21.989202       1 server.go:275] Received ALTER op: schema:"\n\t\t\tmigrationVersion: string @index(exact) .\n\t\t" 
I0422 01:13:21.989278       1 server.go:602] Got Alter request from "172.25.0.10:49212"
I0422 01:13:21.990067       1 server.go:325] Got schema: [predicate:"migrationVersion" value_type:STRING directive:INDEX tokenizer:"exact" ]
I0422 01:13:22.001930       1 index.go:686] Deleting index for attr migrationVersion and tokenizers []
I0422 01:13:22.001981       1 index.go:704] Rebuilding index for attr migrationVersion and tokenizers [exact]
I0422 01:13:22.002147       1 mutation.go:145] Done schema update predicate:"migrationVersion" value_type:STRING directive:INDEX tokenizer:"exact" 
I0422 01:13:22.007343       1 server.go:329] ALTER op: schema:"\n\t\t\tmigrationVersion: string @index(exact) .\n\t\t"  done
I0422 01:13:22.096994       1 groups.go:394] Serving tablet for: objectDefinition
I0422 01:13:22.221588       1 server.go:275] Received ALTER op: drop_all:true 
I0422 01:13:22.221795       1 server.go:602] Got Alter request from "172.25.0.10:49220"
I0422 01:13:22.234311       1 server.go:303] ALTER op: drop_all:true  done
I0422 01:13:22.442873       1 server.go:275] Received ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n" 
I0422 01:13:22.443123       1 server.go:602] Got Alter request from "172.25.0.10:49220"
I0422 01:13:22.443931       1 server.go:325] Got schema: [predicate:"emailAddress" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"hash" value_type:STRING directive:INDEX tokenizer:"term"  predicate:"id" value_type:STRING directive:INDEX tokenizer:"term"  predicate:"definitionID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"fieldID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"name" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"organizationID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"selector" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"valuesData" value_type:STRING list:true ]
I0422 01:13:22.452727       1 index.go:686] Deleting index for attr emailAddress and tokenizers []
I0422 01:13:22.452776       1 index.go:704] Rebuilding index for attr emailAddress and tokenizers [hash]
I0422 01:13:22.453009       1 mutation.go:145] Done schema update predicate:"emailAddress" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:22.455647       1 index.go:686] Deleting index for attr hash and tokenizers []
I0422 01:13:22.455691       1 index.go:704] Rebuilding index for attr hash and tokenizers [term]
I0422 01:13:22.455839       1 mutation.go:145] Done schema update predicate:"hash" value_type:STRING directive:INDEX tokenizer:"term" 
I0422 01:13:22.458794       1 index.go:686] Deleting index for attr id and tokenizers []
I0422 01:13:22.458826       1 index.go:704] Rebuilding index for attr id and tokenizers [term]
I0422 01:13:22.459079       1 mutation.go:145] Done schema update predicate:"id" value_type:STRING directive:INDEX tokenizer:"term" 
I0422 01:13:22.463439       1 index.go:686] Deleting index for attr definitionID and tokenizers []
I0422 01:13:22.463487       1 index.go:704] Rebuilding index for attr definitionID and tokenizers [hash]
I0422 01:13:22.463582       1 mutation.go:145] Done schema update predicate:"definitionID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:22.471689       1 index.go:686] Deleting index for attr fieldID and tokenizers []
I0422 01:13:22.471729       1 index.go:704] Rebuilding index for attr fieldID and tokenizers [hash]
I0422 01:13:22.471894       1 mutation.go:145] Done schema update predicate:"fieldID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:22.477790       1 index.go:686] Deleting index for attr name and tokenizers []
I0422 01:13:22.477836       1 index.go:704] Rebuilding index for attr name and tokenizers [hash]
I0422 01:13:22.478040       1 mutation.go:145] Done schema update predicate:"name" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:22.480904       1 index.go:686] Deleting index for attr organizationID and tokenizers []
I0422 01:13:22.480939       1 index.go:704] Rebuilding index for attr organizationID and tokenizers [hash]
I0422 01:13:22.481036       1 mutation.go:145] Done schema update predicate:"organizationID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:22.483931       1 index.go:686] Deleting index for attr selector and tokenizers []
I0422 01:13:22.483948       1 index.go:704] Rebuilding index for attr selector and tokenizers [hash]
I0422 01:13:22.484079       1 mutation.go:145] Done schema update predicate:"selector" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:22.487199       1 mutation.go:145] Done schema update predicate:"valuesData" value_type:STRING list:true 
I0422 01:13:22.490042       1 server.go:329] ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n"  done
I0422 01:13:22.492244       1 server.go:275] Received ALTER op: schema:"\n\t\t\tmigrationVersion: string @index(exact) .\n\t\t" 
I0422 01:13:22.492289       1 server.go:602] Got Alter request from "172.25.0.10:49220"
I0422 01:13:22.492884       1 server.go:325] Got schema: [predicate:"migrationVersion" value_type:STRING directive:INDEX tokenizer:"exact" ]
I0422 01:13:22.497542       1 index.go:686] Deleting index for attr migrationVersion and tokenizers []
I0422 01:13:22.497590       1 index.go:704] Rebuilding index for attr migrationVersion and tokenizers [exact]
I0422 01:13:22.497685       1 mutation.go:145] Done schema update predicate:"migrationVersion" value_type:STRING directive:INDEX tokenizer:"exact" 
I0422 01:13:22.500821       1 server.go:329] ALTER op: schema:"\n\t\t\tmigrationVersion: string @index(exact) .\n\t\t"  done
I0422 01:13:23.261043       1 groups.go:394] Serving tablet for: checkpoint
I0422 01:13:23.264367       1 groups.go:394] Serving tablet for: checkpoint
I0422 01:13:23.269468       1 groups.go:394] Serving tablet for: apiID
I0422 01:13:23.275688       1 groups.go:394] Serving tablet for: singularName
I0422 01:13:23.281900       1 groups.go:394] Serving tablet for: pluralName
I0422 01:13:23.294293       1 groups.go:394] Serving tablet for: system
I0422 01:13:23.299077       1 groups.go:394] Serving tablet for: systemName
I0422 01:13:23.433857       1 groups.go:394] Serving tablet for: eventID
I0422 01:13:23.438570       1 groups.go:394] Serving tablet for: eventID
I0422 01:13:24.409104       1 groups.go:394] Serving tablet for: fields
I0422 01:13:24.415964       1 groups.go:394] Serving tablet for: field
I0422 01:13:24.420839       1 groups.go:394] Serving tablet for: data
I0422 01:13:25.430387       1 groups.go:394] Serving tablet for: fieldDependencies
I0422 01:13:25.430464       1 groups.go:394] Serving tablet for: layouts
I0422 01:13:26.518444       1 groups.go:394] Serving tablet for: layout
I0422 01:13:33.037723       1 groups.go:394] Serving tablet for: parentFieldID
I0422 01:13:33.042676       1 groups.go:394] Serving tablet for: childFieldID
I0422 01:13:33.045873       1 groups.go:394] Serving tablet for: fieldDependency
I0422 01:13:33.425091       1 groups.go:394] Serving tablet for: type
I0422 01:13:33.436744       1 server.go:275] Received ALTER op: drop_all:true 
I0422 01:13:33.436806       1 server.go:602] Got Alter request from "172.25.0.10:49232"
I0422 01:13:33.463435       1 server.go:303] ALTER op: drop_all:true  done
I0422 01:13:34.372779       1 server.go:275] Received ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n" 
I0422 01:13:34.372950       1 server.go:602] Got Alter request from "172.25.0.10:49232"
I0422 01:13:34.376156       1 server.go:325] Got schema: [predicate:"emailAddress" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"hash" value_type:STRING directive:INDEX tokenizer:"term"  predicate:"id" value_type:STRING directive:INDEX tokenizer:"term"  predicate:"definitionID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"fieldID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"name" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"organizationID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"selector" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"valuesData" value_type:STRING list:true ]
I0422 01:13:34.393613       1 index.go:686] Deleting index for attr emailAddress and tokenizers []
I0422 01:13:34.393773       1 index.go:704] Rebuilding index for attr emailAddress and tokenizers [hash]
I0422 01:13:34.394921       1 mutation.go:145] Done schema update predicate:"emailAddress" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:34.399752       1 index.go:686] Deleting index for attr hash and tokenizers []
I0422 01:13:34.399807       1 index.go:704] Rebuilding index for attr hash and tokenizers [term]
I0422 01:13:34.400643       1 mutation.go:145] Done schema update predicate:"hash" value_type:STRING directive:INDEX tokenizer:"term" 
I0422 01:13:34.405710       1 index.go:686] Deleting index for attr id and tokenizers []
I0422 01:13:34.405818       1 index.go:704] Rebuilding index for attr id and tokenizers [term]
I0422 01:13:34.408058       1 mutation.go:145] Done schema update predicate:"id" value_type:STRING directive:INDEX tokenizer:"term" 
I0422 01:13:34.413110       1 index.go:686] Deleting index for attr definitionID and tokenizers []
I0422 01:13:34.413198       1 index.go:704] Rebuilding index for attr definitionID and tokenizers [hash]
I0422 01:13:34.413614       1 mutation.go:145] Done schema update predicate:"definitionID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:34.418503       1 index.go:686] Deleting index for attr fieldID and tokenizers []
I0422 01:13:34.418591       1 index.go:704] Rebuilding index for attr fieldID and tokenizers [hash]
I0422 01:13:34.419231       1 mutation.go:145] Done schema update predicate:"fieldID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:34.424033       1 index.go:686] Deleting index for attr name and tokenizers []
I0422 01:13:34.424122       1 index.go:704] Rebuilding index for attr name and tokenizers [hash]
I0422 01:13:34.424935       1 mutation.go:145] Done schema update predicate:"name" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:34.430034       1 index.go:686] Deleting index for attr organizationID and tokenizers []
I0422 01:13:34.430121       1 index.go:704] Rebuilding index for attr organizationID and tokenizers [hash]
I0422 01:13:34.430828       1 mutation.go:145] Done schema update predicate:"organizationID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:34.436398       1 index.go:686] Deleting index for attr selector and tokenizers []
I0422 01:13:34.436487       1 index.go:704] Rebuilding index for attr selector and tokenizers [hash]
I0422 01:13:34.437094       1 mutation.go:145] Done schema update predicate:"selector" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:34.443273       1 mutation.go:145] Done schema update predicate:"valuesData" value_type:STRING list:true 
I0422 01:13:34.447914       1 server.go:329] ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n"  done
I0422 01:13:34.452341       1 server.go:275] Received ALTER op: schema:"\n\t\t\tmigrationVersion: string @index(exact) .\n\t\t" 
I0422 01:13:34.452444       1 server.go:602] Got Alter request from "172.25.0.10:49232"
I0422 01:13:34.454255       1 server.go:325] Got schema: [predicate:"migrationVersion" value_type:STRING directive:INDEX tokenizer:"exact" ]
I0422 01:13:34.459392       1 index.go:686] Deleting index for attr migrationVersion and tokenizers []
I0422 01:13:34.459427       1 index.go:704] Rebuilding index for attr migrationVersion and tokenizers [exact]
I0422 01:13:34.459597       1 mutation.go:145] Done schema update predicate:"migrationVersion" value_type:STRING directive:INDEX tokenizer:"exact" 
I0422 01:13:34.463300       1 server.go:329] ALTER op: schema:"\n\t\t\tmigrationVersion: string @index(exact) .\n\t\t"  done
I0422 01:13:41.420693       1 server.go:275] Received ALTER op: drop_all:true 
I0422 01:13:41.420729       1 server.go:602] Got Alter request from "172.25.0.10:49240"
I0422 01:13:41.444203       1 server.go:303] ALTER op: drop_all:true  done
I0422 01:13:41.450793       1 server.go:275] Received ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n" 
I0422 01:13:41.450916       1 server.go:602] Got Alter request from "172.25.0.10:49240"
I0422 01:13:41.453139       1 server.go:325] Got schema: [predicate:"emailAddress" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"hash" value_type:STRING directive:INDEX tokenizer:"term"  predicate:"id" value_type:STRING directive:INDEX tokenizer:"term"  predicate:"definitionID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"fieldID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"name" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"organizationID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"selector" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"valuesData" value_type:STRING list:true ]
I0422 01:13:41.466117       1 index.go:686] Deleting index for attr emailAddress and tokenizers []
I0422 01:13:41.466214       1 index.go:704] Rebuilding index for attr emailAddress and tokenizers [hash]
I0422 01:13:41.466708       1 mutation.go:145] Done schema update predicate:"emailAddress" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:41.472484       1 index.go:686] Deleting index for attr hash and tokenizers []
I0422 01:13:41.472573       1 index.go:704] Rebuilding index for attr hash and tokenizers [term]
I0422 01:13:41.473413       1 mutation.go:145] Done schema update predicate:"hash" value_type:STRING directive:INDEX tokenizer:"term" 
I0422 01:13:41.477809       1 index.go:686] Deleting index for attr id and tokenizers []
I0422 01:13:41.477867       1 index.go:704] Rebuilding index for attr id and tokenizers [term]
I0422 01:13:41.479009       1 mutation.go:145] Done schema update predicate:"id" value_type:STRING directive:INDEX tokenizer:"term" 
I0422 01:13:41.482805       1 index.go:686] Deleting index for attr definitionID and tokenizers []
I0422 01:13:41.482859       1 index.go:704] Rebuilding index for attr definitionID and tokenizers [hash]
I0422 01:13:41.483146       1 mutation.go:145] Done schema update predicate:"definitionID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:41.486635       1 index.go:686] Deleting index for attr fieldID and tokenizers []
I0422 01:13:41.486720       1 index.go:704] Rebuilding index for attr fieldID and tokenizers [hash]
I0422 01:13:41.487126       1 mutation.go:145] Done schema update predicate:"fieldID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:41.490487       1 index.go:686] Deleting index for attr name and tokenizers []
I0422 01:13:41.490542       1 index.go:704] Rebuilding index for attr name and tokenizers [hash]
I0422 01:13:41.491212       1 mutation.go:145] Done schema update predicate:"name" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:41.494552       1 index.go:686] Deleting index for attr organizationID and tokenizers []
I0422 01:13:41.494583       1 index.go:704] Rebuilding index for attr organizationID and tokenizers [hash]
I0422 01:13:41.495008       1 mutation.go:145] Done schema update predicate:"organizationID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:41.498674       1 index.go:686] Deleting index for attr selector and tokenizers []
I0422 01:13:41.498744       1 index.go:704] Rebuilding index for attr selector and tokenizers [hash]
I0422 01:13:41.499142       1 mutation.go:145] Done schema update predicate:"selector" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:13:41.502667       1 mutation.go:145] Done schema update predicate:"valuesData" value_type:STRING list:true 
I0422 01:13:41.505958       1 server.go:329] ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n"  done
I0422 01:13:41.508828       1 server.go:275] Received ALTER op: schema:"\n\t\t\tmigrationVersion: string @index(exact) .\n\t\t" 
I0422 01:13:41.508889       1 server.go:602] Got Alter request from "172.25.0.10:49240"
I0422 01:13:41.510527       1 server.go:325] Got schema: [predicate:"migrationVersion" value_type:STRING directive:INDEX tokenizer:"exact" ]
I0422 01:13:41.514937       1 index.go:686] Deleting index for attr migrationVersion and tokenizers []
I0422 01:13:41.515036       1 index.go:704] Rebuilding index for attr migrationVersion and tokenizers [exact]
I0422 01:13:41.515429       1 mutation.go:145] Done schema update predicate:"migrationVersion" value_type:STRING directive:INDEX tokenizer:"exact" 
I0422 01:13:41.521668       1 server.go:329] ALTER op: schema:"\n\t\t\tmigrationVersion: string @index(exact) .\n\t\t"  done
I0422 01:13:43.706699       1 groups.go:394] Serving tablet for: user
I0422 01:13:44.583759       1 groups.go:394] Serving tablet for: created
I0422 01:13:44.587455       1 groups.go:394] Serving tablet for: active
I0422 01:13:45.567379       1 groups.go:394] Serving tablet for: passwordResetTokens
I0422 01:13:46.406765       1 groups.go:394] Serving tablet for: values
I0422 01:13:46.418452       1 groups.go:394] Serving tablet for: references
I0422 01:13:47.451908       1 groups.go:394] Serving tablet for: value
I0422 01:14:26.868146       1 server.go:275] Received ALTER op: drop_all:true 
I0422 01:14:26.868863       1 server.go:602] Got Alter request from "172.25.0.10:49248"
I0422 01:14:26.899732       1 server.go:303] ALTER op: drop_all:true  done
I0422 01:14:26.904887       1 server.go:275] Received ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n" 
I0422 01:14:26.904938       1 server.go:602] Got Alter request from "172.25.0.10:49248"
I0422 01:14:26.905726       1 server.go:325] Got schema: [predicate:"emailAddress" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"hash" value_type:STRING directive:INDEX tokenizer:"term"  predicate:"id" value_type:STRING directive:INDEX tokenizer:"term"  predicate:"definitionID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"fieldID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"name" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"organizationID" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"selector" value_type:STRING directive:INDEX tokenizer:"hash"  predicate:"valuesData" value_type:STRING list:true ]
I0422 01:14:26.911496       1 index.go:686] Deleting index for attr emailAddress and tokenizers []
I0422 01:14:26.911534       1 index.go:704] Rebuilding index for attr emailAddress and tokenizers [hash]
I0422 01:14:26.911831       1 mutation.go:145] Done schema update predicate:"emailAddress" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:14:26.915230       1 index.go:686] Deleting index for attr hash and tokenizers []
I0422 01:14:26.915264       1 index.go:704] Rebuilding index for attr hash and tokenizers [term]
I0422 01:14:26.915829       1 mutation.go:145] Done schema update predicate:"hash" value_type:STRING directive:INDEX tokenizer:"term" 
I0422 01:14:26.919030       1 index.go:686] Deleting index for attr id and tokenizers []
I0422 01:14:26.919065       1 index.go:704] Rebuilding index for attr id and tokenizers [term]
I0422 01:14:26.920195       1 mutation.go:145] Done schema update predicate:"id" value_type:STRING directive:INDEX tokenizer:"term" 
I0422 01:14:26.923688       1 index.go:686] Deleting index for attr definitionID and tokenizers []
I0422 01:14:26.923722       1 index.go:704] Rebuilding index for attr definitionID and tokenizers [hash]
I0422 01:14:26.923990       1 mutation.go:145] Done schema update predicate:"definitionID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:14:26.926944       1 index.go:686] Deleting index for attr fieldID and tokenizers []
I0422 01:14:26.926979       1 index.go:704] Rebuilding index for attr fieldID and tokenizers [hash]
I0422 01:14:26.927410       1 mutation.go:145] Done schema update predicate:"fieldID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:14:26.930256       1 index.go:686] Deleting index for attr name and tokenizers []
I0422 01:14:26.930299       1 index.go:704] Rebuilding index for attr name and tokenizers [hash]
I0422 01:14:26.930791       1 mutation.go:145] Done schema update predicate:"name" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:14:26.933866       1 index.go:686] Deleting index for attr organizationID and tokenizers []
I0422 01:14:26.933913       1 index.go:704] Rebuilding index for attr organizationID and tokenizers [hash]
I0422 01:14:26.934395       1 mutation.go:145] Done schema update predicate:"organizationID" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:14:26.937110       1 index.go:686] Deleting index for attr selector and tokenizers []
I0422 01:14:26.937148       1 index.go:704] Rebuilding index for attr selector and tokenizers [hash]
I0422 01:14:26.937517       1 mutation.go:145] Done schema update predicate:"selector" value_type:STRING directive:INDEX tokenizer:"hash" 
I0422 01:14:26.940378       1 mutation.go:145] Done schema update predicate:"valuesData" value_type:STRING list:true 
I0422 01:14:26.943146       1 server.go:329] ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n"  done
I0422 01:14:26.945416       1 server.go:275] Received ALTER op: schema:"\n\t\t\tmigrationVersion: string @index(exact) .\n\t\t" 
I0422 01:14:26.945458       1 server.go:602] Got Alter request from "172.25.0.10:49248"
I0422 01:14:26.946113       1 server.go:325] Got schema: [predicate:"migrationVersion" value_type:STRING directive:INDEX tokenizer:"exact" ]
I0422 01:14:26.954327       1 index.go:686] Deleting index for attr migrationVersion and tokenizers []
I0422 01:14:26.954366       1 index.go:704] Rebuilding index for attr migrationVersion and tokenizers [exact]
I0422 01:14:26.954655       1 mutation.go:145] Done schema update predicate:"migrationVersion" value_type:STRING directive:INDEX tokenizer:"exact" 
I0422 01:14:26.960170       1 server.go:329] ALTER op: schema:"\n\t\t\tmigrationVersion: string @index(exact) .\n\t\t"  done
...

The ip address 172.25.0.10 is the address of the Go container that is running the tests. I am also sure the tests are not running in parallel. I also added some debugging statements to the code that drops dgraph and updates the schema and they were only called once, so I am baffled as to why the server logs and tracing show the client dropping and updating the schema multiple times during the duration of the test.

The code for initializing the environment before a running a test looks like this:

func getDgraphClient(t *testing.T) *dgo.Dgraph {

	dgraphAddr := os.Getenv("DGRAPH_ADDR")

	d, err := grpc.Dial(dgraphAddr, grpc.WithInsecure())

	if err != nil {
		t.Fatalf("Error opening connection to Drgraph server: %s", err)
	}

	return dgo.NewDgraphClient(api.NewDgraphClient(d))
}


func initTestEnvironment(t *testing.T) clients {

	dg := getDgraphClient(t)

	fmt.Println("Dropping dgraph", time.Now())

	err := dg.Alter(context.Background(), &api.Operation{
		DropAll: true,
	})

	if err != nil {
		t.Fatalf("Unable to reset dgraph: %s", err)
	}

	fmt.Println("Initializing schema", time.Now())

	err = schema.Init(dg) // This updates the schema

	if err != nil {
		t.Fatalf("Error initializing schema: %s", err)
	}

	projector := NewProjector(dg, bc)

	es := getEventStoreClient(bc, projector, t)

	return clients{
		dgraph:     dg,
		eventStore: es,
		projector:  projector,
	}
}

func TestSomething(t *testing.T) {
	clis := initTestEnvironment(t)
        ...
}

I wrote a grpc client interceptor and attached it to the client to log requests. From the logs, I can see that DropAll and the schema update were only executed by the client once per test, which contradicts with the logs and trace from the server, which states that they were executed multiple times per test.

Do you have any suggestions regarding what to try next?

$ grep -A3 'Received ALTER op: drop_all' alpha.log
I0422 01:13:21.898490       1 server.go:275] Received ALTER op: drop_all:true 
I0422 01:13:21.898526       1 server.go:602] Got Alter request from "172.25.0.10:49212"
I0422 01:13:21.906242       1 server.go:303] ALTER op: drop_all:true  done
I0422 01:13:21.916701       1 server.go:275] Received ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n" 
--
I0422 01:13:22.221588       1 server.go:275] Received ALTER op: drop_all:true 
I0422 01:13:22.221795       1 server.go:602] Got Alter request from "172.25.0.10:49220"
I0422 01:13:22.234311       1 server.go:303] ALTER op: drop_all:true  done
I0422 01:13:22.442873       1 server.go:275] Received ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n" 
--
I0422 01:13:33.436744       1 server.go:275] Received ALTER op: drop_all:true 
I0422 01:13:33.436806       1 server.go:602] Got Alter request from "172.25.0.10:49232"
I0422 01:13:33.463435       1 server.go:303] ALTER op: drop_all:true  done
I0422 01:13:34.372779       1 server.go:275] Received ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n" 
--
I0422 01:13:41.420693       1 server.go:275] Received ALTER op: drop_all:true 
I0422 01:13:41.420729       1 server.go:602] Got Alter request from "172.25.0.10:49240"
I0422 01:13:41.444203       1 server.go:303] ALTER op: drop_all:true  done
I0422 01:13:41.450793       1 server.go:275] Received ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n" 
--
I0422 01:14:26.868146       1 server.go:275] Received ALTER op: drop_all:true 
I0422 01:14:26.868863       1 server.go:602] Got Alter request from "172.25.0.10:49248"
I0422 01:14:26.899732       1 server.go:303] ALTER op: drop_all:true  done
I0422 01:14:26.904887       1 server.go:275] Received ALTER op: schema:"\n\t\temailAddress: string @index(hash) .\n\t\thash: string @index(term) .\n\t\tid: string @index(term) .\n\t\tdefinitionID: string @index(hash) .\n\t\tfieldID: string @index(hash) .\n\t\tname: string @index(hash) .\n\t\torganizationID: string @index(hash) .\n\t\tselector: string @index(hash) .\n\t\tvaluesData: [string] .\n" 

The logs here show that Drop All was run multiple times for this Alpha, and the source IPs change for each request, which indicates that it’s a different client connection that sent the Alter request. It looks like different test runs or test cases are running drop all.

I have enabled http2 frame debugging to log all the frames being sent.

Here’s a snapshot of the logs from dgraph alpha:

Here’s a snippet of the relevant logged frames from the test run:

From the logged frames, I only see 1 drop_all and schema migration being logged, whereas alpha is reporting multiple drop_alls and schema migrations.

Edit: Just did another test and seeing the same problem. Dgraph reports multiple requests to drop_all and update the schema, but the calling code did not record any http2 frames for the alleged calls.

I’m not sure what else to suggest. It seems like the tests are running Alter multiple times. Is initTestEnvironment running multiple times in the tests?

If you can reproduce this behavior outside of your test suite, that may shed more light on what’s happening.

I’ve added print statements to initTestEnvironment and can confirm they are called only once per test, with tests being run in a serial manner. The log of the http2 frames show this, so I am not sure how to investigate further.

I did try to use tcpdump yesterday to capture the packets being sent, but the pcap files produced were corrupt. After fixing them, wireshark did not show anything interesting.

I tried using tcpdump again, but this time using a different docker image. I can see the multiple ALTER mutations being sent to dgraph and they correlate with the logs being seen in Alpha.

The problem is that the frames being logged on the client do not correlate with what I see. I am wondering if this is a bug in docker’s networking layer that’s duplicating packets.

Finally found the source of the problem. I ended up attaching some metadata to the grpc call to DropAll to identify where each call was coming from. I then used tcpdump to capture the network packets from and to the dgraph alpha server and inspected them in wireshark. I was seeing integration tests from different suites running at the same time.

It turns out that even if we don’t use t.Parallel() in our tests, tests from different packages will execute in parallel according to the GOMAXPROCS environment variable.

The solution is to use go test -p 1 ./.... The -p 1 flag disables all parallelism.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.