Finding those lonely nodes

Is it possible to query for nodes that have no edges? i.e, “lonely” nodes.

I’ve noticed that after using dgraph for a while, it’s possible that some nodes get created accidentally, or perhaps a buggy deletion mutation leaves a lonely node behind. So in order to perform some housekeeping, i’d like to be able to query those lonely nodes to understand what happened, and whether they should be deleted or connected elsewhere.

UID with no incoming or outgoing edges are considered orphans and are effectively not stored in Dgraph. There’s no easy way to query for lonely nodes since they’re not stored.

If a uid has no associated predicates, then there’s nothing that logically says whether it should be connected to something else.

Thanks for the answer Daniel. It makes sense. However, I believe I did not word the question correctly. What I meant to ask was how should I query for nodes that may have edges, but not to other nodes. In other words, how do I find nodes that do not have edges of type uid? That’s what I meant by lonely. Is this possible?

That’s a more difficult query to form. You’d need to get all the set of uids that are not lonely and subtract them from the set of all uids.

1 Like

To answer the question - I don’t think it’s necessary to try to find “lonely” nodes, because I don’t think a node without any predicate is even stored on disk at all. Here’s why:

According to everything I’ve been able to find, getting a list of all UIDs by query is impossible. Likewise, finding a set of nodes without some factor doesn’t seem possible either. For example: a not(has()) query wouldn’t work unless that’s in a filter. There is a workaround, however.

Testing what happens if we query the dgraph cluster for UIDs which we do not think exist, we’ll notice that it always returns a node with at least a UID, no matter if there’s other data attached to the node. From this we can deduce that it’s likely that dgraph only actually stores nodes with predicates on disk. Thus, getting a list of all UIDs would literally be a list of all the UIDs possible.

The question then becomes, which UIDs are actually linked with predicates?

To find this out, we can ask dgraph for our schema which will show us all of the possible predicates. From that, we can query the UIDs for every node with has() one of those predicates. This might take a while depending on how many different types of predicate you have stored. After you have the UID lists from each query, you’ll need to combine and deduplicate them. Congratulations - this is your master UID list.

Note that the above may or may not (probably won’t) work well on a production database which is in live use. (Unless nobody writes to it while you’re doing this.)

Using your master UID list, you can do further data profiling by querying for all nodes with a certain type, and comparing it to your master to find all nodes without a type. You can also construct a set of queries which will return collectively all of your validly structured nodes - comparing that with the master list will yield a set of nodes which might have been added by buggy code, which can help you find the bugs. Your imagination is your friend here!

Have fun!

Example Master UID list:

# For the following schema:
<templar>: string .
<knight>: uid .
<horse>: float .
# etc.

# Query the following:
{
  hasTemplar(func: has(templar)) {
    uid 
  }
  hasKnight(func: has(knight)) {
    uid 
  }
  hasHorse(func: has(horse)) {
    uid 
  }
}

# Yielding: 
{
  "data": {
    "hasTemplar": [
      {
        "uid": "0x50"
      },
      {
        "uid": "0x53"
      },
      " ... "
    ],
    "hasKnight": [
      {
        "uid": "0x57"
      },
      {
        "uid": "0x6e"
      },
      " ... "
    ],
    "hasHorse": [
      {
        "uid": "0x6e"
      },
      {
        "uid": "0x97"
      },
      " ... "
    ]
  }
}

# Then combine and deduplicate :)

For a small dataset this will work fine. If you have a huge dataset, you probably have more engineers to help you figure out what needs to be done and how :slight_smile: