Can't Query Type Data on Bulk Loader


#1

I can’t seem to query against types for data inserted using the bulk loader.
I created a small dataset to test

_:brand1 <dgraph.type> "Brand" .
_:brand1 <name> "brand1" .
_:brand2 <dgraph.type> "Brand" .
_:brand2 <name> "brand2" .

_:product1 <dgraph.type> "Product" .
_:product1 <brand> _:brand1 .
_:product1 <name> "name1" .
_:product1 <pid> "abc" .

_:product2 <dgraph.type> "Product" .
_:product2 <brand> _:brand2 .
_:product2 <name> "name2" .
_:product2 <pid> "123" .

_:product3 <dgraph.type> "Product" .
_:product3 <brand> _:brand2 .
_:product3 <name> "name3" .
_:product3 <pid> "ab1" .

with a schema

type Product {
  name: string
  brand: uid
  pid: string
}

type Brand {
  name: string
}

name: string @index(term)  .
pid: string @index(hash)  .
brand: uid .

I ran it through a bulk loader with reduce shards = 2
When querying through ratel using types… for example

  q(func: type(Product)) {
    name
    uid
  }

I expect to see all three products returned (similarly, nothing is returned for brand).
However when I mutate the data attaching a type using mutate…

<0x4e21> <dgraph.type> "Product" .

Now when I query, I can see this product when querying by type. Can you not insert dgraph.type values using bulk loader?


(Michel Conrado) #2

The issue may relies here. Can you share more details about your cluster config? also stats, what OS/Docker version and so on.


#3

I’ve tried on a larger similar dataset/schema with reduce-shards 3, 1 zero, 3 alphas no replication, ubuntu 16.04 machines without using docker (used install scripts)

This dataset I ran locally with docker reduce-shards 2, 1 zero, 2 alphas no replication on ubuntu 18.04

Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:21:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:19:41 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

I used the docker compose script and just copied and modified the server portion to make another alpha with offset 1 ex.

dgraph alpha --my=server:7081 --lru_mb=2048 --zero=zero:5080 -p out/1/p/ -o=1

Both scenarios are using dgraph v1.1

In both scenarios I can inspect/query the predicates, schema, types in ratel fine, but can’t filter by type or see any dgraph.type values until I modify it live.


(Michel Conrado) #4

If you have no replication, there’s no need to have more shards. Set it to 1 shard.


#5

If I want to split data over several nodes shouldn’t I set reduce shards to a number greater than 1?


(Pawan Rawal) #6

Yes, you are right about this.

I tried reproducing the problem but couldn’t. I am guessing the p directory isn’t being picked up by your alpha nodes properly. I am sharing what I did. If you still encounter the problem, then please share the exact commands that you used to run the bulk loader and the alpha nodes. Also please share logs from all the alpha nodes.

  1. Start Zero
dgraph zero
  1. Load data using bulk loader
dgraph bulk --files data.rdf --schema data.schema --reduce_shards 2 --map_shards 4
  1. Start the alpha nodes with the appropriate p directory.
dgraph alpha -p out/0/p

dgraph alpha -p out/1/p -o 1 -w w1
  1. Run the queries that you mentioned and I got three products and two brands as expected.

(Pawan Rawal) #7

I was able to reproduce the bug with --reduce_shards as 3 and map_shards as 4. @cactus222 could you please create an issue on GitHub for this and we’ll get it sorted. From my initial investigation, it seems that although the data for dgraph.type is stored in group 3 when the alpha node for group 1 comes up it starts serving it. So the bug seems to be that alpha1 starts serving it before getting tablet info from Zero.


#8

Thank you. I’ve create an issue https://github.com/dgraph-io/dgraph/issues/3968