Bulk Loader with cluster - 'Attribute xyz is not indexed' on followers

omerenesbayram · August 11, 2021, 9:39pm

Report a Dgraph Bug

What version of Dgraph are you using?

Dgraph Version v21.03.0

$ dgraph version
 
Dgraph version   : v21.03.0
Dgraph codename  : rocket
Dgraph SHA-256   : b4e4c77011e2938e9da197395dbce91d0c6ebb83d383b190f5b70201836a773f
Commit SHA-1     : a77bbe8ae
Commit timestamp : 2021-04-07 21:36:38 +0530
Branch           : HEAD
Go version       : go1.16.2
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit http://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.

Have you tried reproducing the issue with the latest release?

Yes, the issue is with the latest release.

What is the hardware spec (RAM, OS)?

3 x AWS EC2 Instance m5.2xlarge 8 vCPU, 32GB RAM, Linux

Steps to reproduce the issue (command/config used to run Dgraph).

Start three Dgraph Zero instances on three different EC2 instances, one being the leader, the other two followers. So a High Availability Cluster with 1 shard and three machines.
Copy 1 million movie dataset to the machine with Zero leader, under directory ./bulk-loader-data

curl -L -o 1million.rdf.gz "https://github.com/dgraph-io/tutorial/blob/master/resources/1million.rdf.gz?raw=true"

Copy the schema from Dgraph tutorial: A bigger dataset | Moredata | Dgraph Tour to folder ./bulk-loader-data
Run bulk loader for the 1 million movie dataset.

dgraph bulk -f ./bulk-loader-data/1million.rdf.gz -s ./bulk-loader-data/movie_schema.schema --reduce_shards=1 --zero=localhost:5080 --http=localhost:10080

Once the bulk loader is finished copy the ./out/0/p folder to data folder of Dgraph.
Start the Alpha on the machine with Zero Leader, and wait for it to capture the snapshot.
Start the Alpha on the other two machines, and wait for them to sync with leader. (I tried waiting for up to 6 hours, the result is always same)
Query the leader and followers using their private IP and following curl command (query is from Dgraph tutorial)

curl --location --request POST 'http://{PRIVATE_IP_OF_INSTANCE}:8080/query' \
--header 'Content-Type: application/graphql+-' \
--data-raw '{
  caro(func: allofterms(name@en, "Marc Caro")) {
    name@en
    director.film {
      name@en
    }
  }
  jeunet(func: allofterms(name@en, "Jean-Pierre Jeunet")) {
    name@en
    director.film {
      name@en
    }
  }
}'

Expected behaviour and actual result.

All three instances should respond correctly to the query, but only the leader responds correctly. The followers return a not indexed error.
Here is the response from leader to the curl command above:

{"data":{"caro":[{"name@en":"Marc Caro","director.film":[{"name@en":"Delicatessen"},{"name@en":"The City of Lost Children"}]}],"jeunet":[{"name@en":"Jean-Pierre Jeunet","director.film":[{"name@en":"Delicatessen"},{"name@en":"A Very Long Engagement"},{"name@en":"Micmacs"},{"name@en":"The Young and Prodigious Spivet"},{"name@en":"Amélie"},{"name@en":"The City of Lost Children"},{"name@en":"Things I Like, Things I Don't Like"},{"name@en":"Alien: Resurrection"}]}]},"extensions":{"server_latency":{"parsing_ns":80314,"processing_ns":9201308,"encoding_ns":51712,"assign_timestamp_ns":423486,"total_ns":9812511},"txn":{"start_ts":8},"metrics":{"num_uids":{"_total":14,"director.film":2,"name":12}}}}

Here is the response we get from the two followers:

{"errors":[{"message":": Attribute name is not indexed with type term","extensions":{"code":"ErrorInvalidRequest"}}],"data":null}

We basically followed the procedure described in your bulk loader docs. However it seems the followers do not sync the indices. We can see that the followers have the p folder as well, but somehow the indices are not sync across the cluster.

I can give more details how we start and stop the Zero and Alphas, if needed, but I think you can test this as well as the data is yours. We want to be able to use bulk loader, but it seems after bulk loader loading procedure, followers do not get the indices, and therefore can not be used to respond queries. Please let us know what is wrong here.

dmai · August 12, 2021, 1:53am

This is odd. Can you try the latest v21.03.1? The initial report is from v21.03.0.

The bulk loader steps sound right to me.

omerenesbayram · September 16, 2021, 1:58pm

Hi Daniel, I have finally retried this with the latest release v21.03.02. I have got the exact same result. Here is the output of dgraph version:

Dgraph version   : v21.03.2
Dgraph codename  : rocket-2
Dgraph SHA-256   : 00a53ef6d874e376d5a53740341be9b822ef1721a4980e6e2fcb60986b3abfbf
Commit SHA-1     : b17395d33
Commit timestamp : 2021-08-26 01:11:38 -0700
Branch           : HEAD
Go version       : go1.16.2
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit http://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.

If you need zero/alpha logs, or any other information, I would be happy to share. We still couldn’t manage to use Bulk Loader in a cluster setting.

ahsan · September 19, 2021, 7:59pm

@omerenesbayram, Thanks for reporting this. We’ll try to repro this issue on our end and will let you know.

Topic		Replies	Views
Inconsistent bulk loader failures Dgraph dgraph , status:accepted , kind:bug , area:bulk-loader	14	882	January 27, 2021
Why feature team is silent on Bulk loader performance issues? Dgraph dgraph , area:bulk-loader	4	743	February 18, 2022
Dgraph bulk loader with smaller dataset sync not working from leader to the followers Dgraph kind:question	7	698	May 18, 2023
Bulk Loader - Deploy Documentation	0	895	December 16, 2020
Bulk loader still OOM during reduce phase Dgraph area:bulk-loader	18	871	August 1, 2021