Filter performance

I have a very simple query that is using an index:

{
q(func: eq(name, “Aqualung”)) {
uid
}
}

According to the response this runs in about 2,102,000ns. However if I add a filter on the type:

{
q(func: eq(name, “Aqualung”)) @filter(type(Album)) {
uid
}
}

It takes 1,066,956,371ns. That’s about 500x slower. It’s as if the name index is not being used. Is that correct? Why would it not use the index and then filter the results?

I am using the latest docker images and querying through the web client. 2 million vertexes are loaded.

I’m not qualified to answer this, but out of curiousity what is the index type: term, fulltext, hash, or exact? And “Aqualung” has the dgraph.type set to “Album”?

Yes, index is exact:

<name>: string @index(exact) .

and the dgraph.type is “Album”. There are 2 items with different types with name “Aqualung” (Album and Track) and this returns just the Album. The query works just incredibly slow…

Hi, I have a theory but I don’t have the dataset to test it.

In all latest releases, we have this new feature Support filtering on nonindexed predicate by animesh2049 · Pull Request #4531 · dgraph-io/dgraph · GitHub
which allows you to filter on non-indexed predicate. And it is set by default as you can see in this line Support filtering on nonindexed predicate by animesh2049 · Pull Request #4531 · dgraph-io/dgraph · GitHub

At least I think it is set by default. I’m not sure.

If you export your dataset. And import it to a new instance of v1.1.1 and a new instance of the latest one. And compare the performance of both. Well, my theory is right.

But, it is important to export and create both new instances. To not have a bias.

If this comes to be true, well please open an issue requesting a way to make this optional.

If I had a dataset that I could compare and get the same numbers as yours, I would do the tests myself.

Cheers.

If name is indexed and we anchor on that it seems the additional filter (especially in this case where only a few items are returned for the name lookup) should be pretty fast. Again, it looks like the addition of the filter is disabling the initial index lookup against name. I am speculating here but the numbers seem to backup it up. If so something seems off with that approach.

I switched to 1.1.1 and performance seems to be the same.

{
  q(func: eq(name, "Aqualung"))  {
    uid
    dgraph.type
  }
}

Results

{
  "data": {
    "q": [
      {
        "uid": "0xc95b1",
        "dgraph.type": [
          "Album"
        ]
      },
      {
        "uid": "0x13a823",
        "dgraph.type": [
          "Track"
        ]
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 100900,
      "processing_ns": 1052200,
      "encoding_ns": 13200,
      "assign_timestamp_ns": 570200,
      "total_ns": 1794500
    },
    "txn": {
      "start_ts": 8019
    },
    "metrics": {
      "num_uids": {
        "dgraph.type": 2,
        "name": 0,
        "uid": 2
      }
    }
  }
}
{
  q(func: eq(name, "Aqualung")) @filter(type(Track)) {
    uid
    dgraph.type
  }
}

Results

{
  "data": {
    "q": [
      {
        "uid": "0x13a823",
        "dgraph.type": [
          "Track"
        ]
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 62300,
      "processing_ns": 549921231,
      "encoding_ns": 8800,
      "assign_timestamp_ns": 654200,
      "total_ns": 550743131
    },
    "txn": {
      "start_ts": 8023
    },
    "metrics": {
      "num_uids": {
        "dgraph.type": 2,
        "name": 0,
        "uid": 1
      }
    }
  }
}

As far as I remember this it’s a very old topic, look at this post.

It seems that every filter you add will look at the entire index list, so I even if your main filter returns one node, consecutive filters will hit the index table.

The linked post is about filtering on edge facets and indicates this was fixed in 1.2, so this is not likely the same scenario unless I am missing something.

@mrjn or @dmai it would be nice to get an explanation of why this operations is so slow. I am willing to share my data and schema if it helps.

Note I am currently evaluating Dgraph for a project at my company. This seems like a simple query that should perform. If I cannot figure out what is wrong soon we would pass on Dgraph.

I suspect that type(Track) has a lot of nodes. So, perhaps try this:

{
  var(func: eq(name, "Aqualung")) {
    t as dgraph.type
  }
  q(func: eq(val(t), "Track")) {
    uid
    val(t)
  }
}

Thanks. I will try this and report results back (loading data now).

Still confused as to why the initial index lookup would not take precedence. Yes there are a lot of tracks (1 million+) but the index lookup against name returns 2 nodes 1 of which is a Track.

I will admit I am new to Dgraph, RDF, and GraphQL. I however have extensive experience using Gremlin and Cypher.

Thanks. This query does perform much better. I am still curious why my original would not hit the name index and then filter by type.

The query is run exactly how it was written. The original query would use the name index to get the nodes matched by "Aqualung" (eq(name, "Aqualung")), then use another index to get the nodes of type Album (type(Album)), and then intersect these two lists of UIDs to perform the filter.

Once there’s a query planner in place (TBD on the 2020 Dgraph roadmap) Dgraph can be smarter about executing your original query.

The filtering on edge facets it’s a workaround for the main issue that it’s similar to yours…

Your right Orlando. I’ll echo a comment from that post that seems applicable more then a year later:

anyway, it looks like a timebomb, because a lot of people who use filters don’t understand how they actually works, and it works well only while they have small amount of data.

I do want to say that I do appreciate the effort of the Dgraph team and am continuing my evaluation. The above issue is relatively small in the grand scheme of things and you all provided a working solution. So thanks guys.