Filter performance

luke.daugherty · March 24, 2020, 5:57pm

I have a very simple query that is using an index:

{
q(func: eq(name, “Aqualung”)) {
uid
}
}

According to the response this runs in about 2,102,000ns. However if I add a filter on the type:

{
q(func: eq(name, “Aqualung”)) @filter(type(Album)) {
uid
}
}

It takes 1,066,956,371ns. That’s about 500x slower. It’s as if the name index is not being used. Is that correct? Why would it not use the index and then filter the results?

I am using the latest docker images and querying through the web client. 2 million vertexes are loaded.

acastle · March 24, 2020, 9:12pm

I’m not qualified to answer this, but out of curiousity what is the index type: term, fulltext, hash, or exact? And “Aqualung” has the dgraph.type set to “Album”?

luke.daugherty · March 24, 2020, 9:37pm

Yes, index is exact:

<name>: string @index(exact) .

and the dgraph.type is “Album”. There are 2 items with different types with name “Aqualung” (Album and Track) and this returns just the Album. The query works just incredibly slow…

MichelDiz · March 25, 2020, 2:12am

Hi, I have a theory but I don’t have the dataset to test it.

In all latest releases, we have this new feature Support filtering on nonindexed predicate by animesh2049 · Pull Request #4531 · dgraph-io/dgraph · GitHub
which allows you to filter on non-indexed predicate. And it is set by default as you can see in this line Support filtering on nonindexed predicate by animesh2049 · Pull Request #4531 · dgraph-io/dgraph · GitHub

At least I think it is set by default. I’m not sure.

If you export your dataset. And import it to a new instance of v1.1.1 and a new instance of the latest one. And compare the performance of both. Well, my theory is right.

But, it is important to export and create both new instances. To not have a bias.

If this comes to be true, well please open an issue requesting a way to make this optional.

If I had a dataset that I could compare and get the same numbers as yours, I would do the tests myself.

Cheers.

luke.daugherty · March 25, 2020, 2:50pm

If name is indexed and we anchor on that it seems the additional filter (especially in this case where only a few items are returned for the name lookup) should be pretty fast. Again, it looks like the addition of the filter is disabling the initial index lookup against name. I am speculating here but the numbers seem to backup it up. If so something seems off with that approach.

luke.daugherty · March 25, 2020, 3:12pm

I switched to 1.1.1 and performance seems to be the same.

{
  q(func: eq(name, "Aqualung"))  {
    uid
    dgraph.type
  }
}

Results

{
  "data": {
    "q": [
      {
        "uid": "0xc95b1",
        "dgraph.type": [
          "Album"
        ]
      },
      {
        "uid": "0x13a823",
        "dgraph.type": [
          "Track"
        ]
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 100900,
      "processing_ns": 1052200,
      "encoding_ns": 13200,
      "assign_timestamp_ns": 570200,
      "total_ns": 1794500
    },
    "txn": {
      "start_ts": 8019
    },
    "metrics": {
      "num_uids": {
        "dgraph.type": 2,
        "name": 0,
        "uid": 2
      }
    }
  }
}

{
  q(func: eq(name, "Aqualung")) @filter(type(Track)) {
    uid
    dgraph.type
  }
}

Results

{
  "data": {
    "q": [
      {
        "uid": "0x13a823",
        "dgraph.type": [
          "Track"
        ]
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 62300,
      "processing_ns": 549921231,
      "encoding_ns": 8800,
      "assign_timestamp_ns": 654200,
      "total_ns": 550743131
    },
    "txn": {
      "start_ts": 8023
    },
    "metrics": {
      "num_uids": {
        "dgraph.type": 2,
        "name": 0,
        "uid": 1
      }
    }
  }
}

orlandoco · March 26, 2020, 1:57pm

As far as I remember this it’s a very old topic, look at this post.

It seems that every filter you add will look at the entire index list, so I even if your main filter returns one node, consecutive filters will hit the index table.

luke.daugherty · March 26, 2020, 2:39pm

The linked post is about filtering on edge facets and indicates this was fixed in 1.2, so this is not likely the same scenario unless I am missing something.

luke.daugherty · March 26, 2020, 2:43pm

@mrjn or @dmai it would be nice to get an explanation of why this operations is so slow. I am willing to share my data and schema if it helps.

Note I am currently evaluating Dgraph for a project at my company. This seems like a simple query that should perform. If I cannot figure out what is wrong soon we would pass on Dgraph.

mrjn · March 26, 2020, 4:13pm

I suspect that type(Track) has a lot of nodes. So, perhaps try this:

{
  var(func: eq(name, "Aqualung")) {
    t as dgraph.type
  }
  q(func: eq(val(t), "Track")) {
    uid
    val(t)
  }
}

luke.daugherty · March 26, 2020, 4:27pm

Thanks. I will try this and report results back (loading data now).

Still confused as to why the initial index lookup would not take precedence. Yes there are a lot of tracks (1 million+) but the index lookup against name returns 2 nodes 1 of which is a Track.

I will admit I am new to Dgraph, RDF, and GraphQL. I however have extensive experience using Gremlin and Cypher.

luke.daugherty · March 26, 2020, 5:32pm

Thanks. This query does perform much better. I am still curious why my original would not hit the name index and then filter by type.

dmai · March 26, 2020, 7:50pm

The query is run exactly how it was written. The original query would use the name index to get the nodes matched by "Aqualung" (eq(name, "Aqualung")), then use another index to get the nodes of type Album (type(Album)), and then intersect these two lists of UIDs to perform the filter.

Once there’s a query planner in place (TBD on the 2020 Dgraph roadmap) Dgraph can be smarter about executing your original query.

orlandoco · March 26, 2020, 8:47pm

The filtering on edge facets it’s a workaround for the main issue that it’s similar to yours…

luke.daugherty · March 26, 2020, 10:13pm

Your right Orlando. I’ll echo a comment from that post that seems applicable more then a year later:

anyway, it looks like a timebomb, because a lot of people who use filters don’t understand how they actually works, and it works well only while they have small amount of data.

luke.daugherty · March 27, 2020, 3:24pm

I do want to say that I do appreciate the effort of the Dgraph team and am continuing my evaluation. The above issue is relatively small in the grand scheme of things and you all provided a working solution. So thanks guys.

Topic		Replies	Views
Filtering is slow on large amount of data Dgraph dgraph , status:accepted , priority:p1 , popular , area:performance	5	1178	June 15, 2020
Wrong filter section design Users	11	801	November 30, 2018
Slow query when apply @filter or order to predicates Dgraph kind:question , kind:enhancement , kind:bug , area:performance , ticket:created	5	1194	May 6, 2021
Filtering on same predicate using multiple indices Dgraph	9	505	July 3, 2020
Significant Performance Degradation with More Conditions Dgraph	4	698	October 23, 2018

Filter performance

Related topics