"func: type() "error in 20.11.03 or later

purist180 · December 6, 2021, 7:34am

In my application we use type() function ( using GraphQL+-) to get node list of nodes.

But in the new version I encountered a problem that I cannot query according to the node type.

We have no problem with the same data in version 20.11.0, but this version has the problem of write amplification.

After we upgraded to 20.11.3（which fix the wirte amplification）, we encountered the problem that some nodes could not be queried by node type.

There are more than a dozen types of nodes in our application, only two types of nodes (the two with the largest number，) will have problems, and other types can be queried normally using type

The two types of nodes in question have approximately 6,000,000 and 50,000,000 respectively.

Later, we tried 21.03.0 and 21.03.2, these two types of nodes still cannot be queried

The data of these nodes is not lost. It can still be queried through predict or has(predict_of_specific_node), but these two type of nodes just cannot be queried through the type fuction.

I would like to ask whether the version after 20.11.3 has made any new changes to the storage or query method of dgraph.type?

MichelDiz · December 6, 2021, 5:25pm

Query like this to check if it has the right dgraph.type.

{
 q(func: has(predict_of_specific_node)) {
    uid
    predict_of_specific_node
    dgraph.type
}
}

The desired node should have the corresponding dgraph.type used in the type function.

Also, it should have the predicates used in the corresponding type.

amaster507 · December 6, 2021, 5:38pm

Is it possible that there is some kind of OOM problem here? Can I suggest trying to use first to limit to a smaller number to prove whether the problem is that you can’t query by the type or just that the query is silently failing because OOM problem with too many nodes.

{
  q(func: type(), first: 1000) {
    uid
    dgraph.type
    predict_of_specific_node
  }
}

And if this is the case please follow back up, because I have a thought how to improve types for Dgraph possibly

purist180 · December 7, 2021, 1:56am

Thank you for your reply, Michel!

The “has” query method is available. We can use this method to check that all data does exist(including node attributes, predicates, like “dgraph.type” ).

But it just can’t be queried through the “type” method.

We tried it in Both in Ratel and Dgraph Client,
(func:type(two_types_which_has_problem)) can’t return any result.

purist180 · December 7, 2021, 2:09am

Anthony, Thanks for your reply and suggestions！

We think this is not like an OOM problem, but more like something wrong with type() function or dgraph.type field.

When we found the problem that the “type” function could not be retrieved (repeated many times) in the process of writing data, as long as type() is used, even with first:10, the result cannot be returned.

When using ratel to search, the results will not be returned, and finally the timeout will be prompted.

But before version 20.11.0 has never had this problem. Even if there are a lot of nodes, it can always be returned, and when first and offset are used, the search results can almost be returned immediately.

amaster507 · December 7, 2021, 2:57am

Oh 20.11 is going back quite a lot relatively. There are a lot of changes in here. Have you tried with the newest 21.12? There are some things that look promising such as:

feat(sroar): Bring sroar to Dgraph (#7840)
optimize eq filter queries (#7895)
enable split of posting list with single plist (#8062)
Bring latest sroar to master (#7977)
append galaxy namespace to type name (#7880)
update the schema and type from 2103 (#7838)

I was just trying to find the post on how types are implemented/stored in Dgraph, but I can’t find it.

My theory is still that type filters should be a lot better if my understanding of them is correct. @MichelDiz maybe you can confirm your DQL understanding here.

AFAIK, types are just another predicate dgraph.type but are given their own special filter and priority in the graph. But they are still stored as a triple like:

_:a <dgraph.type> "Foo" .
_:b <dgraph.type> "Foo" .
_:c <dgraph.type> "Bar" .

So in theory if I have 50 Million nodes then I have a predicate with 50 Million (+ other types) triples on disk. Is this correct? Wouldn’t it be better if this is the case that instead there was a posting list for every type that contained all of the uids in that type. Maybe this is already in place, but a huge list for every type would be better than one huge predicate shard with mostly the same data in it. Just thinking about normalization, disk usage, and query efficiency.

purist180 · December 7, 2021, 3:30am

Thank you Anthony!

We have tried version 20.11.3 to version 21.03.2. These all have type function problems, but we haven’t used the latest 21.12.0 version yet.

We tried to use version 20.11.0 to rewrite our data and found another proplem.

(Hope @MichelDiz can help us out! )

For example, we have 5 million person type nodes.
each person node have a person_name predicate and only person type node has this person_name predicate.

So we have the following two ways to find the total number of person type nodes.

{
    res(func: type(person)){
        count(uid)
    }
}

{
    res(func: has(person_name)){
        count(uid)
    }
}

But The result we query through these two methods is not the same.

The result using type is about 1600 less than using has.

Next we use the following query tried to check if there is problem with our data,

{
    res(func: has(person_name)) @filter(not type(<person>)) {
        count(uid)
        dgraph.type
    }
}

I can get the missing 1600 nodes,

But these nodes actually have the dgraph.type attribute of "person"

Based on this, we judged that there must be something wrong with the type() query

The above experiments were all completed in version 20.11.0 ,

Also is there any different between using “eq(dgraph.type, “person”)” and “type()”?

In newer version ,if there is a type() function error, neither method is available anymore.

MichelDiz · December 7, 2021, 4:32am

This is odd cuz as you can see in this comment

github.com

dgraph-io/dgraph/blob/d6dbacb93a0e78c946a8f677673434e570f554a9/query/query.go#L343


      
          	}
          
          	sg.SrcFunc = &Function{
          		Name:       gf.Name,
          		Args:       append(gf.Args[:0:0], gf.Args...),
          		IsCount:    gf.IsCount,
          		IsValueVar: gf.IsValueVar,
          		IsLenVar:   gf.IsLenVar,
          	}
          
          	// type function is just an alias for eq(type, "dgraph.type").
          	if gf.Name == "type" {
          		sg.Attr = "dgraph.type"
          		sg.SrcFunc.Name = "eq"
          		sg.SrcFunc.IsCount = false
          		sg.SrcFunc.IsValueVar = false
          		sg.SrcFunc.IsLenVar = false
          		return
          	}
          
          	if gf.Lang != "" {

type function is just an alias for eq(type, “dgraph.type”).

So, if there is a bug, it is in the eq function. Or maybe the type implementation is a bit more complex.

@purist180, test it with eq(type, "dgraph.type")

Other thing would make sure that you have upgraded in a clean way. Mean, export the data and reimport again in a new version. Also would be good to have an idea of your cluster config.

I think this is the main PR for Type() Add type function to allow querying types. by martinmr · Pull Request #2933 · dgraph-io/dgraph (github.com)

purist180 · December 7, 2021, 6:47am

I tried this on my local server again
and found an other issue

Here is my envionment and setup:
RAM: 256GB
CPU:Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
OS:ubuntu 18.04
dgraph version: 20.03.7
using docker image dgraph/standlone:v20.03.7

we define our type system in Chinese

type(<人>){
    <姓名>
}

type <人> means person in Chinese

predicate <姓名> means person_name in Chinese and only <人> type nodes contain this predicate.

using type(<人>) count(uid) ->5,409,548

using eq(<dgraph.type>, “人”） count(uid) → same result as 5,409,548

——————————————————————————————————

But when we use has() function, the node count seem wired

using has(<姓名>) count(uid) ->5,409,837

5409837 - 5409548 = 289

there are 289 node cant query by type() function

{
    res(func: has(<姓名>)) @filter(not type(<人>)) {
        count(uid)
        uid
        <dgraph.type>
    }
}

or

{
    res(func: has(<姓名>)) @filter(not eq(<dgraph.type>,"人")) {
        count(uid)
        uid
        <dgraph.type>
    }
}

the nodes count by these query is just 289,

But these nodes do contain the dgraph.type value of “人”（person）in the result.

This means there is 289 nodes do have person dgraph.type, and they can not be queryed using type(person) ?

Is this mean something was wrong about dgraph.type predicate?

Is there a way to fix this?@MichelDiz

MichelDiz · December 7, 2021, 7:00am

Can you expand the predicates of those 289? to check if they have some data.

As you mentioned now, that it is in chinese. This could be also a bug with Predicates i18n.

Dunno, this is new to me. We need a way to reproduce it to make a valid bug report. But try what I have mentioned. Do a clean migration and also share the details about your cluster setup.

purist180 · December 7, 2021, 7:12am

Thanks,
@MichelDiz
I added environmental information in the previous problem description

the data in 289 seem to be just correct

the problem is they cant be query by type()

only these 289 nodes the other 5,409,548 are just fine by now.

What dose “Do a clean migration” mean?

And is there a way we can force Dgraph to re-build index for “dgraph.type” predicate?
@amaster507

jdgamble555 · December 8, 2021, 2:02am

purist180:

{
    res(func: has(<姓名>)) @filter(not type(<人>)) {
        count(uid)
        uid
        <dgraph.type>
    }
}

or

{
    res(func: has(<姓名>)) @filter(not eq(<dgraph.type>,"人")) {
        count(uid)
        uid
        <dgraph.type>
    }
}

To me this proves it is not a problem with type(), as they get the same results.

I think you just have some orphan nodes with incorrect typing in your database.

J

purist180 · December 8, 2021, 2:06am

The problem is the 289 nodes do have person “dgraph.type” value But they can not by query by type .

jdgamble555 · December 8, 2021, 2:37am

To be clear, this means you cannot query them by using type nor dgraph.type functions, which we established are the same thing.

To me, this means you have orphan nodes without a type, or they would show up. The other option, is the query does not work on certain characters? Do these 289 nodes have any characters in common?

I only see 2 possibilities:

there is something wrong with dql query (which I doubt, unless it is a utf8 issue)
the data was never added correctly (or the types were deleted later), hence they are orphan nodes for one of two reasons:
1. due to bad user input / deleting parents etc
2. due to a problem with dql set function (again I doubt, but the chinese characters could be a problem)

The most likely answer to me is that you have orphan nodes

you deleted a parent without deleting the child
you forgot to add ‘dgraph.type’ to the original set commands
you never completely deleted old data containing '姓名`… this happens a lot when the nodes are not complete

Just my two cents, but I could be way off,

J

purist180 · December 8, 2021, 3:04am

Thank you J!

I made the experiment above in a clean DB, and I just simply write rdfs in it without any delete operation.

I have checked all the 289 nodes, these nodes are not orphan nodes as you say, they all have “姓名”（person_name） predicate and “dgraph.type” wtih value of “人”(person).

you can see the JSON reult by the query.

there is a filter not eq(<dgraph.type>, “person”)

but the result nodes exactly contain “person” dgraph.type

so I think Dgraph mismatch these nodes type prdicate?

jdgamble555 · December 8, 2021, 3:32am

It looks like dgraph.type is equal to an object { "人” } not a string "人”.

Maybe I am missing something, but I think that is the problem…

J

miko · December 8, 2021, 2:40pm

Hey, is it possible that those 289 nodes have Person type encoded differently? I mean they may all look the same after UTF decoding, but maybe the binary representation is different, so the type is really different. I would compare the good and the bad records within hex editor.

purist180 · December 10, 2021, 1:56am

Hello Michel, Anthony, Jonathan and miko
@MichelDiz @amaster507 @jdgamble555 @miko ！

Because you think it might be a problem with Chinese strings, so I repeated this experiment without using Chinese data，

still got the same error

The Type System in our dgraph schema is

like

type <id> {
	id_number
	~person_with_id
}
type <person> {
	name
	person_with_id
	gender
}

<gender>: [string] @index(hash) .
<id_number>: [string] @index(hash) .
<name>: [string] @index(hash) .
<person_with_id>: [uid] @count @reverse .

As in the previous experiment, we still checked the data to ensure that each person has a name,

and the query result by type(person) is still less than has(name).

# query by type(person)
{
  res(func: type(<person>)) {
    count(uid)
  }
}

# result
{
  "data": {
    "res": [
      {
        "count": 20327334
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 60658,
      "processing_ns": 9196903337,
      "encoding_ns": 4210614182,
      "assign_timestamp_ns": 436762,
      "total_ns": 13408087207
    },
    "txn": {
      "start_ts": 40741
    },
    "metrics": {
      "num_uids": {
        "_total": 0,
        "dgraph.type": 0
      }
    }
  }
}

and query by has(name)

{
  res(func: has(<name>)) {
    count(uid)
  }
}

# result
{
  "data": {
    "res": [
      {
        "count": 20327727
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 49874,
      "processing_ns": 37059444391,
      "encoding_ns": 4188967194,
      "assign_timestamp_ns": 346657,
      "total_ns": 41251798784
    },
    "txn": {
      "start_ts": 40736
    },
    "metrics": {
      "num_uids": {
        "_total": 0,
        "name": 0
      }
    }
  }
}

But when we check these missing nodes, there are still person dgraph.type values.

{
  res(func: has(<name>)) @filter(not type(<person>)) {
    count(uid)
			uid
    dgraph.type
  }
}

{
  "data": {
    "res": [
      {
        "count": 393
      },
      {
        "uid": "0xfda573587ff07",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x109cb6fcbf1403",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x10d811ed0800c7",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x10e4bfe904f30e",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x10f5004dd1d557",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x11804fb41c2203",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x11da6826e6e22b",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x12b1e49b8586b1",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x12d27b2fb47ec4",
        "dgraph.type": [
          "person"
        ]
      },
      "....."
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 81467,
      "processing_ns": 46168463968,
      "encoding_ns": 609478,
      "assign_timestamp_ns": 542048,
      "total_ns": 46169760026
    },
    "txn": {
      "start_ts": 40705
    },
    "metrics": {
      "num_uids": {
        "": 393,
        "_total": 20328906,
        "dgraph.type": 20328120,
        "name": 0,
        "uid": 393
      }
    }
  }
}

amaster507 · December 10, 2021, 2:52am

Can you do this:

Query using type(person) filter and first:1 and return a uid.

Then with that one do:

{
  n(func: uid("0xfda573587ff07","<uid#2>")) {
    uid
    dgraph.type
  }
}

purist180 · December 10, 2021, 3:05am

Normal person node fisrt 2

{
  res(func: type(<person>),first:2) {
    uid
    dgraph.type
    count(uid)
  }
}

{
    "res": [
      {
        "count": 2
      },
      {
        "uid": "0x69b2ee7f",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x1d9026e53a",
        "dgraph.type": [
          "person"
        ]
      }
    ]
  }

The two nodes has problem

# query 
{
  res(func: uid(0xfda573587ff07,0x109cb6fcbf1403)) {
    uid
    dgraph.type
    count(uid)
  }
}
# result

 {
    "res": [
      {
        "count": 2
      },
      {
        "uid": "0xfda573587ff07",
        "dgraph.type": [
          "person"
        ]
      },
      {
        "uid": "0x109cb6fcbf1403",
        "dgraph.type": [
          "person"
        ]
      }
    ]
  }

Topic		Replies	Views
How does type system work? Dgraph kind:question	1	747	August 3, 2022
What's problem of this query? Dgraph kind:question	8	572	September 26, 2020
The type() function performs poorly Dgraph dgraph , kind:enhancement , status:accepted , ticket:created	4	576	November 23, 2020
How to get result of a type limited by x correctly? Dgraph	5	470	December 16, 2019
Dgraph Type query GraphQL dgraph	7	491	August 12, 2020

"func: type() "error in 20.11.03 or later

Related topics