Finding those lonely nodes

aryzing · October 12, 2018, 2:52pm

Is it possible to query for nodes that have no edges? i.e, “lonely” nodes.

I’ve noticed that after using dgraph for a while, it’s possible that some nodes get created accidentally, or perhaps a buggy deletion mutation leaves a lonely node behind. So in order to perform some housekeeping, i’d like to be able to query those lonely nodes to understand what happened, and whether they should be deleted or connected elsewhere.

dmai · October 12, 2018, 5:48pm

UID with no incoming or outgoing edges are considered orphans and are effectively not stored in Dgraph. There’s no easy way to query for lonely nodes since they’re not stored.

If a uid has no associated predicates, then there’s nothing that logically says whether it should be connected to something else.

aryzing · October 12, 2018, 6:44pm

Thanks for the answer Daniel. It makes sense. However, I believe I did not word the question correctly. What I meant to ask was how should I query for nodes that may have edges, but not to other nodes. In other words, how do I find nodes that do not have edges of type uid? That’s what I meant by lonely. Is this possible?

dmai · October 12, 2018, 8:46pm

That’s a more difficult query to form. You’d need to get all the set of uids that are not lonely and subtract them from the set of all uids.

alt-jero · June 29, 2020, 1:42pm

To answer the question - I don’t think it’s necessary to try to find “lonely” nodes, because I don’t think a node without any predicate is even stored on disk at all. Here’s why:

According to everything I’ve been able to find, getting a list of all UIDs by query is impossible. Likewise, finding a set of nodes without some factor doesn’t seem possible either. For example: a not(has()) query wouldn’t work unless that’s in a filter. There is a workaround, however.

Testing what happens if we query the dgraph cluster for UIDs which we do not think exist, we’ll notice that it always returns a node with at least a UID, no matter if there’s other data attached to the node. From this we can deduce that it’s likely that dgraph only actually stores nodes with predicates on disk. Thus, getting a list of all UIDs would literally be a list of all the UIDs possible.

The question then becomes, which UIDs are actually linked with predicates?

To find this out, we can ask dgraph for our schema which will show us all of the possible predicates. From that, we can query the UIDs for every node with has() one of those predicates. This might take a while depending on how many different types of predicate you have stored. After you have the UID lists from each query, you’ll need to combine and deduplicate them. Congratulations - this is your master UID list.

Note that the above may or may not (probably won’t) work well on a production database which is in live use. (Unless nobody writes to it while you’re doing this.)

Using your master UID list, you can do further data profiling by querying for all nodes with a certain type, and comparing it to your master to find all nodes without a type. You can also construct a set of queries which will return collectively all of your validly structured nodes - comparing that with the master list will yield a set of nodes which might have been added by buggy code, which can help you find the bugs. Your imagination is your friend here!

Have fun!

Example Master UID list:

# For the following schema:
<templar>: string .
<knight>: uid .
<horse>: float .
# etc.

# Query the following:
{
  hasTemplar(func: has(templar)) {
    uid 
  }
  hasKnight(func: has(knight)) {
    uid 
  }
  hasHorse(func: has(horse)) {
    uid 
  }
}

# Yielding: 
{
  "data": {
    "hasTemplar": [
      {
        "uid": "0x50"
      },
      {
        "uid": "0x53"
      },
      " ... "
    ],
    "hasKnight": [
      {
        "uid": "0x57"
      },
      {
        "uid": "0x6e"
      },
      " ... "
    ],
    "hasHorse": [
      {
        "uid": "0x6e"
      },
      {
        "uid": "0x97"
      },
      " ... "
    ]
  }
}

# Then combine and deduplicate :)

For a small dataset this will work fine. If you have a huge dataset, you probably have more engineers to help you figure out what needs to be done and how

Topic		Replies	Views
Find all orphan nodes Dgraph kind:question , dgraph	8	1701	January 15, 2021
How to find nodes without dgraph.type? Dgraph help-wanted , area:querylang	6	1320	August 24, 2020
Preventing "deleted" nodes from appearing in search queries Dgraph	3	455	January 28, 2020
How to find and delete the nodes that are not connected to any other nodes？ Dgraph kind:question	1	525	January 6, 2022
Querying for nodes without any friends Users	5	799	February 1, 2018

Finding those lonely nodes

Related topics