Bulk loader fails to create graphql schema properly

Report a Dgraph Bug

What version of Dgraph are you using?

20.07.1

Have you tried reproducing the issue with the latest release?

already the latest version

What is the hardware spec (RAM, OS)?

running on docker

Steps to reproduce the issue (command/config used to run Dgraph).

trying to bulk load a json file with with this command:

dgraph bulk -f users.json -s schema.dql -g schema.graphql --map_shards=9 --reduce_shards=3 --http localhost:8000 --zero=dgraph-1.lan:5080

schema.dql is empty file and schema.qraphql is like:

type User {
    id: String! @id
    username: String @search(by: [hash, regexp])
    followees: [User]
}

when i move the output p folders to the alpha servers and start them, the created schema is this:


as you can see, there is no User. prefix and predicates is not graphql compatible.

Expected behaviour and actual result.

to see User.id, etc.

1 Like

@pawan any ideas?

What data does your users.json file contain? The file has to contain data in the correct format like User.id, User.followees, User.username and so on for data to be written correctly. If it doesn’t have the prefix, then Dgraph will store whatever it has.

Giving the GraphQL schema while doing bulk loading via -g schema.graphql just stores the GraphQL schema inside the Dgraph instance so that when you bring up the instance you have an API that you can hit.

@pawan i fixed the users.json file as you mentioned, and now the fields has correct names.
but it seems that the schema itself is not registered inside Dgraph instance, as there is no index on them:


and also querying by User type returns nothing:

{
  getUser(func: eq(dgraph.type, User)) {
    count(uid)
  }
}

i think it should be considered as a bug

even after applying the schema via admin api, the User type filter query returns nothing!

Do you see User in Ratel’s Types tab?

Did your data add the dgraph.type edge for nodes of User type? Can you share a small reproducible example here so that we can help you faster?

here is the simplified versions of schema and data files: schema_simple.graphql (106 Bytes) and users_simple.json (233 Bytes)
which can be used with this command to reproduce the bug:

dgraph bulk -f users_simple.json -s schema_empty.dql -g schema_simple.graphql --http localhost:8000 --zero=localhost:5080

Thanks @mbande for providing the data files. I was able to reproduce the issue.
Inside users_simple.json file, dgraph.type attribute has not been set which is resulting in this behaviour. You need to explicitly set the dgraph.type attribute so that DQL or GraphQL queries could get to know type of a node.
I added dgraph.type attribute to the two entities as follows

[                                                                               
{"User.id": "1", "dgraph.type": "User", "User.username": "a", "followees": [{"User.username": "b", "User.id": "2"}, {"User.username": "c", "User.id": "3"}]},
{"User.id": "2", "dgraph.type": "User", "User.username": "b", "followees": [{"User.username": "c", "User.id": "3"}]}
] 

After adding dgraph.type, the query

{
  getUser(func: eq(dgraph.type, User)) {
    count(uid)
  }
}

returns

"data": {
    "getUser": [
      {
        "count": 2
      }
    ]
  }

The GraphQL query

query{
  queryUser{
    id
    username
  }
}

returns

"data": {
    "queryUser": [
      {
        "id": "1",
        "username": "a"
      },
      {
        "id": "2",
        "username": "b"
      }
    ]
  }

@rajas i fixed the data file as you mentioned, and this query works fine:

{
 getUser(func: eq(dgraph.type, User)) {
   count(uid)
 }
}

but something strange is happened, the schema type is not registered:


and the required indexes is not created:

also, because there is no schema and hence no uniqueness constraint at the bulk load time, there are many duplicate nodes in database.
sample file to reproduce: users_simple.json (348 Bytes)

Hi @mbande,

Thanks for providing with the updated json file. I was able to reproduce this issue.

Bulk Loader is meant to be used for loading large data into a new Dgraph cluster. The data is provided in json/rdf format using the -f flag. By providing the graphql schema with -g flag, the related graphql mutations and queries are generated which could then be used to query or mutate data.

But, the provided graphql schema is not used to create corresponding DQL types. This is done so as to avoid conflicts with the existing DQL types. To do that, you may use the /admin/schema graphql endpoint after using bulk loader for the correct DQL schema to reflect on ratel.

There are differences between GraphQL and DQL schemas. One of the difference is the @id directive in GraphQL which does not directly translate to a DQL schema. The problem of no uniqueness constraint is caused due to this. A possible workaround for this is to start an empty Dgraph cluster with the given GraphQL schema and then insert data using GraphQL endpoint. Although, this will be slower than bulk loader.