"func: type() "error in 20.11.03 or later

Do you have any solution to fix this?

in version 20.11.0

We can rewrite some rdfs manually,

<problem_nodes_with_type_error> <dgraph.type> "person" .

after we rewrite these rdfs

these node would work fine with func: type(person)

But I’m not sure that we can always have a way to find the nodes in our real application.

That is correct in DQL a node can have multiple types. Like a type that implements an interface will have DQL types for both the type and the interface.


I don’t have a fix for this, but wondering if 21.12 would make any difference because of how bitmaps were changed for posting lists. I don’t know the actual problem though.

2 Likes

So are you able to replicate this problem with smaller set of data (not 20 mln+ records)? Can you prepare minimal possible set to test this case?
BTW, do you test it on a single dgraph alpha node or on a cluster (and if so, how many nodes)?
Also, is id your real type name? I wonder if it could be kind of “reserved” keyword (like in @id). If so, can you try with different type name?

Ok, Anthony
I will try 21.12.0 this week to see if there is the same issue

the problem is not about id type, it’s person type problem,

and as I know id is not a reversed word in GraphQL+- schema.

Correct, uid is reserved but not id.

Curious on how you will be testing 21.12 I figure an export/import. And then I wonder that since the differing nodes are identical if it is not an overflow problem somewhere. I really wish someone from the @core-devs would chime in here. This is above my pay grade (sarcastic idiom).

What I would like to know is:

  1. Is there any difference in storage on disk for the dgraph.type predicate vs any other predicate?
  2. Since this type(Foo) function is equivalent to and maybe even implemented with the same piece of code as the eq(dgraph.type,"Foo") function, can this problem be duplicated if it was a different predicate where 5 million plus nodes had the same value. Which leads me to…
  3. maybe this is related to an overflow on indexes. Is there any limit on how many predicates with the same data can be indexed? A reverse index would mean that the key would be for one value and the value would contain a list of 5 million plus (uids?) — Not sure if an index points to the uid of the node or something else.
  4. Or if maybe there is an actual limit to how many items can be in a posting list. And this might have changed fundamentally in 21.12 with sroar. Were there any bug fixes that sroar implementation fixed or was it truly all performance related?

Don’t know who to tag here to actually get eyes on this issue. It really :broken_heart: that Dgraph support has crippled so much now. @MichelDiz and I can only do so much. I don’t know if discuss is actually part of the job description for Michel but I am here of my own free will/time and not an expert in golang that is needed to dig into problems like this. I really am :pray: for @mrjn to address this concern here soon as promised over on this thread: What is Dgraph lacking? - #78 by mrjn

I guess let me end with this question for @purist180, can you reproduce this problem anyway without having 5 million+ nodes of the same type? How could I replicate this problem in the simplest form, would just creating 5 million+ nodes of a type suffice to replicate? I have a little bare metal machine that I packed with RAM and can install ubuntu server and dgraph and try to replicate this with you, but need to know a sure way to replicate without having all of your dataset.

If you were a Dgraph cloud user, I would suggest you open a support ticket for this, but I am assuming you are not since you were using such an older version.


^ This may indicate that this problem is fixes as I suspected above. Or maybe it breaks this even further and would return zero nodes for the type??

In v21.12, we have added a flag to forbid any key which has greater than 1000 splits

2 Likes

Anthony, thank you very much!

I need to discuss with my colleague how to better express this problem.

At the same time, I will use the same data to test the new version you suggested.

Anthony,

@amaster507

I tried to upgrade our dgraph to version 21.12.0

The problem of the previous node type query still exists, and it is more serious

query person

{
  res(func: type(person)) {
    count(uid)
  }
}

The number of nodes of the person obtained by the query result is 0

{
    "res": [
      {
        "count": 0
      }
    ]
}

But when we change the query method, instead of using type, we query based on predicate, and we can see that there are actually nodes in the data with the dgraph.type attribute of person.

{
  res(func: has(person_name),first:3) {
    count(uid)
    uid
    dgraph.type
    gender
  }
}
{
    "res": [
      {
        "count": 3
      },
      {
        "uid": "0x1d9026e53a",
        "dgraph.type": [
          "person"
        ],
        "gender": [
          "f"
        ]
      },
      {
        "uid": "0x25c65d6603",
        "dgraph.type": [
          "person"
        ],
        "gender": [
          "f"
        ]
      },
      {
        "uid": "0x2b515fd59e",
        "dgraph.type": [
          "person"
        ],
        "gender": [
          "m"
        ]
      }
    ]
  }

It looks like this is the case.

Sadly it looks like this was an engineering solution to stop the problem instead of fixing the problem.

1 Like

@purist180 Hi,

Well, look, it is impossible to help when we know only the story. We need ways to reproduce the case. If this happens only in your env, this is certainly a problem in your env. Which we also would need to know how it is set to try to reproduce the same.

We can’t help in such cases based only in story and prints.

Share with us everything you can, so we reproduce the issue and make a Dev Core aware of it. The more context, the better.

If this happens only with large datasets. It would be necessary to create a test that generates a large enough dataset with the same characteristics for example.

Cheers.

OK Michel and Anthony !
@MichelDiz @amaster507
I wirte a simple Python Script to repoduce this problem.

Frist, setup a dgraph by docker

docker run -it --name dgraph -p 5080:5080 -p 6080:6080 -p 8080:8080 -p 9080:9080 dgraph/standalone:v21.12.0

Then, you may need a python env to run the script and use pip to install pydgraph and faker

pip install faker pydgraph
import pydgraph
import hashlib
import time
import requests
from faker import Faker



client_stub = pydgraph.DgraphClientStub('192.168.171.77:9080')
client = pydgraph.DgraphClient(client_stub)

# lease_uid
# /assign?what=uids&num=100 allocates a range of UIDs specified by the num argument, 
# and returns a JSON map containing the startId and endId that defines the range of UIDs (inclusive). 
# This UID range can be safely assigned externally to new nodes during data ingestion.
# See docs in https://dgraph.io/docs/deploy/dgraph-zero/


url = "http://192.168.171.77:6080/assign?what=uids&num=1000000000000000000"

payload={}
headers = {}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

# convert xid to dgraph_uid and make sure uid under uid num we juset lease(assign?what=uids&num=1000000000000000000)
def xid2uid(xid):
    uid = "0x" + hashlib.md5(xid.encode(encoding='utf-8')).hexdigest()[8:23]
    return uid

faker = Faker('en_US')


batch_size = 50
for i in range(10000):
    rdf = ""
    for j in range(batch_size):
    # create data
        name = faker.name()

        person_xid = f"person_{i*batch_size + j}"
        person_uid = xid2uid(person_xid)
        rdf += f'''<{person_uid}> <dgraph.type> "person" .\n'''
        rdf += f'''<{person_uid}> <xid> "{person_xid}" .\n'''
        rdf += f'''<{person_uid}> <person_name> "{name}" .\n'''
        
    txn = client.txn()
    # Running a Mutation
    txn.mutate(set_nquads=rdf)
    txn.commit()
    
    print(f"Finish mutation {i * batch_size + j} nodes")

The Script above make sure each nodes has a “person” dgraph.type and has a random value of person_name.

After run this, I tried to check these nodes in Ratel

{
  res(func: has(person_name)){
    count(uid)
  }
}
{
  "data": {
    "res": [
      {
        "count": 500000
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 36031,
      "processing_ns": 2213543572,
      "encoding_ns": 120902912,
      "assign_timestamp_ns": 780213,
      "total_ns": 2335353583
    },
    "txn": {
      "start_ts": 21066
    },
    "metrics": {
      "num_uids": {
        "_total": 0,
        "person_name": 0
      }
    }
  }
}
{
  res(func: type(person)){
    count(uid)
  }
}

this result maybe 0, or other value lower than the result of query by has(person_name)

{
  res(func: has(person_name),first:3) @filter(not type(person)){
    count(uid)
    dgraph.type
    person_name
    xid
  }
}
{
  "data": {
    "res": [
      {
        "count": 3
      },
      {
        "dgraph.type": [
          "person"
        ],
        "person_name": "Jessica Johnson",
        "xid": "person_14228"
      },
      {
        "dgraph.type": [
          "person"
        ],
        "person_name": "Joe Stewart",
        "xid": "person_61"
      },
      {
        "dgraph.type": [
          "person"
        ],
        "person_name": "Linda Cox",
        "xid": "person_30812"
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 68446,
      "processing_ns": 89664232518,
      "encoding_ns": 59379,
      "assign_timestamp_ns": 295900,
      "total_ns": 89671218655
    },
    "txn": {
      "start_ts": 34260
    },
    "metrics": {
      "num_uids": {
        "": 105700,
        "_total": 211409,
        "dgraph.type": 105703,
        "person_name": 3,
        "xid": 3
      }
    }
  }
}
1 Like

Thanks,

I’ll run this tomo(in my day light of my timezone).

1 Like

Hi Zhao,

I did test your code in a Photon OS env(Docker) - Totally isolated env.
I did a little change in your code. The fact of not using a prefix for Blank Nodes was bothering me.

<_:{person_uid}>

This above is important.

The last query you’ve shared.

Cheers!

PS. I think there is a problem the way you handle the UIDs.

PS. I have tested again without my changes. And no sign of the bug.

PS. Yeah, confirmed. Is something in the UID handling. Cuz it works only when it is Blank Node instead of some custom UID generated by your code.

1 Like

Michel, Please use the original code !

About lease uid
/assign?what=uids&num=100 allocates a range of UIDs specified by the num argument,
and returns a JSON map containing the startId and endId that defines the range of UIDs (inclusive).
This UID range can be safely assigned externally to new nodes during data ingestion.

You can See docs in More about Dgraph Zero - Deploy

In Our case we use \assing zero endpoint (6080) to lease uid and mutation nodes
with the uids we just leased rather than using blank nodes.

I did, the problem is your code.

I think you are creating invalid UIDs. Other theory is that the UIDs are in a pretty long range so it still maybe indexing it. Or become buggy cuz of that. Need to validate this hypothesis.

It certainly has to do with the way you hash the XID and convert it to HEX. What is this parameter “:23” ??? at the end of hexdigest. Are you sure it give 64 bits?

BTW, you don’t need to rely in UIDs only. You can give numbers instead. Dgraph accepts numbers in place of UID. It will be converted to UID on the fly.

PS. Note that MD5 is 128 bits.

About lease uid
/assign?what=uids&num=100 allocates a range of UIDs specified by the num argument,
and returns a JSON map containing the startId and endId that defines the range of UIDs (inclusive).
This UID range can be safely assigned externally to new nodes during data ingestion.

You can See docs in https://dgraph.io/docs/deploy/dgraph-zero/

In Our case we use \assing zero endpoint (6080) to lease uid and mutation nodes
with the uids we just leased rather than using blank nodes.

You lease UIDs, but the API doesn’t dump the UIDs for you(in a list). You are obviously creating the UIDs by yourself hoping it matches the leased ones. Right?

OK Michel!

Please Forget about Lease Uid I will POST ANOTHER ISSUE about this

Let’s JUST TALK ABOUT type function in this POST

I just tried blank nodes mehod to create nodes like you did

        rdf += f'''<_:{person_uid}> <dgraph.type> "person" .\n'''
        rdf += f'''<_:{person_uid}> <xid> "{person_xid}" .\n'''
        rdf += f'''<_:{person_uid}> <person_name> "{name}" .\n'''

The result is still the same in dgraph version 21.12.0



Please make sure you are using Dgraph v21.12.0

As @amaster507 said

This seems to describe what you are seeing:

Forbid Massive Fan-outs

Certain keys in the graph suffer from a massive fan-out problem. These keys are typically index keys. For example, a certain string value might be a default value set to all the nodes in the graph. A reverse index on this value could point to millions of nodes in the graph, hence creating huge posting lists. Dgraph would split such a posting list across multiple keys, so as not to exceed Badger’s value limits and also allow partial reads of this index key.
A typical key like this would have dozens of splits. We noticed, however, that some keys have thousands of splits – that’s possible when the fan-out is in billions of nodes. A query using this key would be slow at best and would crash the system at worst by causing a massive memory consumption or a massive CPU spike.
In v21.12, we have added a flag to forbid any key which has greater than 1000 splits by marking it forbidden in Badger. Once a key is forbidden, that key would continue to drop data and would always return an empty result.
Almost all backends we have seen (in Dgraph Cloud) are not affected by this change. But, in the rare case that a user is affected, rewriting the query to use another key for the root function would fix the issue. We think this small downside is worth the upside of keeping the system stable and performant.

Yes, I’ve copied your command from your comment. I did not changed anything related to this.

Check if you are starting from scratch please.

I have done this same test with your code about 6 times. 3 of them with Blank node instead of the generated UIDs. Only with Blank nodes I saw no issue.

Check the docker volumes, the binded path and so on. Also if there is any dangling Zero instance somewhere.

So you also get type fucntion error when you use fixed uid?