"func: type() "error in 20.11.03 or later

Thank you Anthony!

We have tried version 20.11.3 to version 21.03.2. These all have type function problems, but we haven’t used the latest 21.12.0 version yet.

We tried to use version 20.11.0 to rewrite our data and found another proplem.

(Hope @MichelDiz can help us out! )

For example, we have 5 million person type nodes.
each person node have a person_name predicate and only person type node has this person_name predicate.

So we have the following two ways to find the total number of person type nodes.

{
    res(func: type(person)){
        count(uid)
    }
}
{
    res(func: has(person_name)){
        count(uid)
    }
}

But The result we query through these two methods is not the same.

The result using type is about 1600 less than using has.

Next we use the following query tried to check if there is problem with our data,

{
    res(func: has(person_name)) @filter(not type(<person>)) {
        count(uid)
        dgraph.type
    }
}

I can get the missing 1600 nodes,

But these nodes actually have the dgraph.type attribute of "person"

Based on this, we judged that there must be something wrong with the type() query

The above experiments were all completed in version 20.11.0 ,

Also is there any different between using “eq(dgraph.type, “person”)” and “type()”?

In newer version ,if there is a type() function error, neither method is available anymore.

This is odd cuz as you can see in this comment

type function is just an alias for eq(type, “dgraph.type”).

So, if there is a bug, it is in the eq function. Or maybe the type implementation is a bit more complex.

@purist180, test it with eq(type, "dgraph.type")

Other thing would make sure that you have upgraded in a clean way. Mean, export the data and reimport again in a new version. Also would be good to have an idea of your cluster config.

I think this is the main PR for Type() Add type function to allow querying types. by martinmr · Pull Request #2933 · dgraph-io/dgraph (github.com)

I tried this on my local server again
and found an other issue

Here is my envionment and setup:
RAM: 256GB
CPU:Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
OS:ubuntu 18.04
dgraph version: 20.03.7
using docker image dgraph/standlone:v20.03.7

we define our type system in Chinese

type(<人>){
    <姓名>
}

type <人> means person in Chinese

predicate <姓名> means person_name in Chinese and only <人> type nodes contain this predicate.

using type(<人>) count(uid) ->5,409,548

using eq(<dgraph.type>, “人”) count(uid) → same result as 5,409,548

——————————————————————————————————

But when we use has() function, the node count seem wired

using has(<姓名>) count(uid) ->5,409,837

5409837 - 5409548 = 289

there are 289 node cant query by type() function

{
    res(func: has(<姓名>)) @filter(not type(<人>)) {
        count(uid)
        uid
        <dgraph.type>
    }
}

or

{
    res(func: has(<姓名>)) @filter(not eq(<dgraph.type>,"人")) {
        count(uid)
        uid
        <dgraph.type>
    }
}

the nodes count by these query is just 289,

But these nodes do contain the dgraph.type value of “人”(person)in the result.

This means there is 289 nodes do have person dgraph.type, and they can not be queryed using type(person) ?

Is this mean something was wrong about dgraph.type predicate?

Is there a way to fix this?@MichelDiz

Can you expand the predicates of those 289? to check if they have some data.

As you mentioned now, that it is in chinese. This could be also a bug with Predicates i18n.

Dunno, this is new to me. We need a way to reproduce it to make a valid bug report. But try what I have mentioned. Do a clean migration and also share the details about your cluster setup.

Thanks,
@MichelDiz
I added environmental information in the previous problem description

the data in 289 seem to be just correct

the problem is they cant be query by type()

only these 289 nodes the other 5,409,548 are just fine by now.

What dose “Do a clean migration” mean?

And is there a way we can force Dgraph to re-build index for “dgraph.type” predicate?
@amaster507

To me this proves it is not a problem with type(), as they get the same results.

I think you just have some orphan nodes with incorrect typing in your database.

J

The problem is the 289 nodes do have person “dgraph.type” value But they can not by query by type .

To be clear, this means you cannot query them by using type nor dgraph.type functions, which we established are the same thing.


To me, this means you have orphan nodes without a type, or they would show up. The other option, is the query does not work on certain characters? Do these 289 nodes have any characters in common?

I only see 2 possibilities:

  • there is something wrong with dql query (which I doubt, unless it is a utf8 issue)
  • the data was never added correctly (or the types were deleted later), hence they are orphan nodes for one of two reasons:
    1. due to bad user input / deleting parents etc
    2. due to a problem with dql set function (again I doubt, but the chinese characters could be a problem)

The most likely answer to me is that you have orphan nodes

  • you deleted a parent without deleting the child
  • you forgot to add ‘dgraph.type’ to the original set commands
  • you never completely deleted old data containing '姓名`… this happens a lot when the nodes are not complete

Just my two cents, but I could be way off,

J

Thank you J!

I made the experiment above in a clean DB, and I just simply write rdfs in it without any delete operation.

I have checked all the 289 nodes, these nodes are not orphan nodes as you say, they all have “姓名”(person_name) predicate and “dgraph.type” wtih value of “人”(person).

you can see the JSON reult by the query.

there is a filter not eq(<dgraph.type>, “person”)

but the result nodes exactly contain “person” dgraph.type

so I think Dgraph mismatch these nodes type prdicate?

It looks like dgraph.type is equal to an object { "人” } not a string "人”.

Maybe I am missing something, but I think that is the problem…

J

Hey, is it possible that those 289 nodes have Person type encoded differently? I mean they may all look the same after UTF decoding, but maybe the binary representation is different, so the type is really different. I would compare the good and the bad records within hex editor.

1 Like

Hello Michel, Anthony, Jonathan and miko
@MichelDiz @amaster507 @jdgamble555 @miko

Because you think it might be a problem with Chinese strings, so I repeated this experiment without using Chinese data,

still got the same error

The Type System in our dgraph schema is

like

type <id> {
	id_number
	~person_with_id
}
type <person> {
	name
	person_with_id
	gender
}

<gender>: [string] @index(hash) .
<id_number>: [string] @index(hash) .
<name>: [string] @index(hash) .
<person_with_id>: [uid] @count @reverse .

As in the previous experiment, we still checked the data to ensure that each person has a name,

and the query result by type(person) is still less than has(name).

# query by type(person)
{
  res(func: type(<person>)) {
    count(uid)
  }
}

# result
{
  "data": {
    "res": [
      {
        "count": 20327334
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 60658,
      "processing_ns": 9196903337,
      "encoding_ns": 4210614182,
      "assign_timestamp_ns": 436762,
      "total_ns": 13408087207
    },
    "txn": {
      "start_ts": 40741
    },
    "metrics": {
      "num_uids": {
        "_total": 0,
        "dgraph.type": 0
      }
    }
  }
}

and query by has(name)

{
  res(func: has(<name>)) {
    count(uid)
  }
}

# result
{
  "data": {
    "res": [
      {
        "count": 20327727
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 49874,
      "processing_ns": 37059444391,
      "encoding_ns": 4188967194,
      "assign_timestamp_ns": 346657,
      "total_ns": 41251798784
    },
    "txn": {
      "start_ts": 40736
    },
    "metrics": {
      "num_uids": {
        "_total": 0,
        "name": 0
      }
    }
  }
}

But when we check these missing nodes, there are still person dgraph.type values.

{
  res(func: has(<name>)) @filter(not type(<person>)) {
    count(uid)
			uid
    dgraph.type
  }
}
{
  "data": {
    "res": [
      {
        "count": 393
      },
      {
        "uid": "0xfda573587ff07",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x109cb6fcbf1403",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x10d811ed0800c7",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x10e4bfe904f30e",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x10f5004dd1d557",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x11804fb41c2203",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x11da6826e6e22b",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x12b1e49b8586b1",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x12d27b2fb47ec4",
        "dgraph.type": [
          "person"
        ]
      },
      "....."
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 81467,
      "processing_ns": 46168463968,
      "encoding_ns": 609478,
      "assign_timestamp_ns": 542048,
      "total_ns": 46169760026
    },
    "txn": {
      "start_ts": 40705
    },
    "metrics": {
      "num_uids": {
        "": 393,
        "_total": 20328906,
        "dgraph.type": 20328120,
        "name": 0,
        "uid": 393
      }
    }
  }
}

Can you do this:

Query using type(person) filter and first:1 and return a uid.

Then with that one do:

{
  n(func: uid("0xfda573587ff07","<uid#2>")) {
    uid
    dgraph.type
  }
}

Normal person node fisrt 2

{
  res(func: type(<person>),first:2) {
    uid
    dgraph.type
    count(uid)
  }
}

{
    "res": [
      {
        "count": 2
      },
      {
        "uid": "0x69b2ee7f",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x1d9026e53a",
        "dgraph.type": [
          "person"
        ]
      }
    ]
  }

The two nodes has problem

# query 
{
  res(func: uid(0xfda573587ff07,0x109cb6fcbf1403)) {
    uid
    dgraph.type
    count(uid)
  }
}
# result

 {
    "res": [
      {
        "count": 2
      },
      {
        "uid": "0xfda573587ff07",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x109cb6fcbf1403",
        "dgraph.type": [
          "person"
        ]
      }
    ]
  }

@core-devs this appears to be a bug. No idea why it would behave this way.

1 Like

Correct me if I’m wrong, but shouldn’t the dgraph.type NOT be a list type? It is returning an array instead of a string…

"dgraph.type": [
  "person"
]

I would wonder if the correct data returns a string instead of an array?

J

J,
dgraph.type is a list type predicate in GraphQL+-(DQL), and it can not be modified in Schema or Type System.

you can try it in Dgraph Docs Or Playground.

1 Like

Do you have any solution to fix this?

in version 20.11.0

We can rewrite some rdfs manually,

<problem_nodes_with_type_error> <dgraph.type> "person" .

after we rewrite these rdfs

these node would work fine with func: type(person)

But I’m not sure that we can always have a way to find the nodes in our real application.

That is correct in DQL a node can have multiple types. Like a type that implements an interface will have DQL types for both the type and the interface.


I don’t have a fix for this, but wondering if 21.12 would make any difference because of how bitmaps were changed for posting lists. I don’t know the actual problem though.

2 Likes

So are you able to replicate this problem with smaller set of data (not 20 mln+ records)? Can you prepare minimal possible set to test this case?
BTW, do you test it on a single dgraph alpha node or on a cluster (and if so, how many nodes)?
Also, is id your real type name? I wonder if it could be kind of “reserved” keyword (like in @id). If so, can you try with different type name?