Slow query times for has() function (on play.dgraph.io)

I’ve been playing around with play.dgaph.io, and have noticed something odd with the has() function. Some requests return quickly, while others take almost 20 seconds to return. I’ve only tested this on play.dgraph.io so far.

Running the following query:

{
  q(func:has(name)) {
    name
  }
}

typically returns in 17-19 seconds, and occasionally doesn’t return in the 20 second timeout. e.g.

{
  ...
  "extensions": {
    "server_latency": {
      "parsing_ns": 51811,
      "processing_ns": 17574771692,
      "encoding_ns": 583096,
      "total_ns": 17575497225 # 17.58 secs
    },
    "txn": {
      "start_ts": 343271
    },
    "metrics": {
      "num_uids": {
        "_total": 603,
        "name": 603
      }
    }
  }
}

This seemed slow to me, so ran some more tests. Since there are 603 results, I tried two more queries.

{
  q(func:has(name), first: 600) {
    name
  }
}

typically returns in about 1.6 seconds:

{
  ...
  "extensions": {
    "server_latency": {
      "parsing_ns": 65559,
      "processing_ns": 1656242021,
      "encoding_ns": 901675,
      "total_ns": 1657286896 # 1.66 secs
    },
    "txn": {
      "start_ts": 343271
    },
    "metrics": {
      "num_uids": {
        "_total": 600,
        "name": 600
      }
    }
  }
}

while setting the ‘first’ value to slightly above the number of results similarly has the same issue as the first request:

{
  q(func:has(name), first: 610) {
    name
  }
}

gave

{
  ...
  "extensions": {
    "server_latency": {
      "parsing_ns": 69261,
      "processing_ns": 18813905219,
      "encoding_ns": 629217,
      "total_ns": 18814682766 # 18.81 secs
    },
    "txn": {
      "start_ts": 343271
    },
    "metrics": {
      "num_uids": {
        "_total": 603,
        "name": 603
      }
    }
  }
}

Likewise, even with the limit set to less than 603, if we do any sort of ordering, we get the really long processing time:

{
  q(func:has(name), first: 600, orderasc: name) {
    name
  }
}

gave

{
  ...
  "extensions": {
    "server_latency": {
      "parsing_ns": 57129,
      "processing_ns": 17719464401,
      "encoding_ns": 809650,
      "total_ns": 17720402882 # 17.72 secs
    },
    "txn": {
      "start_ts": 343271
    },
    "metrics": {
      "num_uids": {
        "_total": 600,
        "name": 600
      }
    }
  }
}

I’m not sure how the has() function works internally, or if perhaps the version of Drgraph running on play.dgraph.io is and older version, but it doesn’t seem to have been optimized here.

1 Like

Also, it seems that if the query doesn’t reach the limit for any reason, the request is much longer. e.g.:

{
  q(func: has(name), first: 100, offset: 500) {
  	name
  }
}

gives

{
  ...
  "extensions": {
    "server_latency": {
      "parsing_ns": 68804,
      "processing_ns": 1662391018,
      "encoding_ns": 140824,
      "total_ns": 1662655543 # 1.6 secs
    },
    "txn": {
      "start_ts": 343271
    },
    "metrics": {
      "num_uids": {
        "_total": 100,
        "name": 100
      }
    }
  }
}

but

{
  q(func: has(name), first: 100, offset: 510) {
  	name
  }
}

gives

{
  ...
  "extensions": {
    "server_latency": {
      "parsing_ns": 55267,
      "processing_ns": 18231784824,
      "encoding_ns": 130284,
      "total_ns": 18232029560 # 18.23 secs
    },
    "txn": {
      "start_ts": 343271
    },
    "metrics": {
      "num_uids": {
        "_total": 93,
        "name": 93
      }
    }
  }
}

Given the number of nodes on play.dgraph.io (21 million apparently), I’m wondering if perhaps some loop over all the nodes is being done when in fact only the objects of the ‘name’ predicate should be being accessed.

Can anyone @graphql confirm this is a bug, or explain why the queries that are still searching after they hit all the nodes take so much longer? Thanks.

I’ll let the devs comment on the difference between 1.6s and 17s there, but:

do not use has() other than debugging, it’s basically a full table scan of a predicate. Use almost any function at the root of your query and it will use an index vs just starting with every uid that has that predicate.

Storage in dgraph is by predicate (in other words: key) only. Nodes (as a storage container) do not exist. Only a list of UIDs that a predicate has. Once understood, this explains a ton about dgraph’s query complexity by query.

1 Like

@iluminae - Thanks for the response.

I’m aware that storage is by predicate, which is why it surprised me that it took so much longer. It feels to me like has(predicate) should access one of the indexes in some way, but perhaps it doesn’t. That would likely explain the difference in times. It’ll be interesting to see what the devs say on that.

Can you suggest a function that would list all the nodes with predicate ‘name’ other than has() that would work on the Freebase data set which has multiple types using the ‘name’ predicate? I’ve tried

{
  q(func: regexp(name, /^.*$/)) {
  	name
  }
}

but get the error

and

{
  q(func: gt(name, "")) {
  	name
  }
}

gives

I’m still learning the range of functions that are available, and any pointers would be appreciated.

Oh no if you want every name, then you want a full table scan - I just mean in a real application using the database, I highly suggest not building against using has(). Like, you want all the names of Persons, so you would do names(func: type(Person)){name} - which would use the index on dgraph.type to get only uids that have dgraph.type=“Person”, then go to the name tablet, and pull values for each of those uids. Just meant as a suggestion as you build out an application, not to be applied to the test IMDB dataset per se.

I do not understand a reason behind your observation about asking for 603 is fastish but 604 is slow, so I wont comment on that. (by the way I have reproduced that myself on there)

1 Like

@iluminae - Gotcha. Thanks for your input.

As a side note, it looks like dgraph.type isn’t built for the Freebase data on play.dgraph.io, so films(func: type(Film)){name} won’t work on there. Not to worry, though. The key thing is I’m getting my head round some more details.

Thanks again.

Nope, the dataset was updated. This query will work.

{
  films(func: type(Film), first:100){
    name@en
    dgraph.type
  }
}

The problem with your query is that you are asking for “name”, but the dataset has language support. When it has language support in general you have to search in the pair lang you want.

BTW, avoid doing “benchmark” tests over the play website. It has a simple cluster. It is not made for this. If you force it with a heavy query, the service may crash and take time to come back. It was meant for simple exploration.

As the predicate “name” exists, the DB assumes you know that there is some data somewhere. So it will scan over the DB cuz you asked for it. But the problem is that there’s no data over “name” as I mentioned before it has language support.

1 Like

@MichelDiz - Thanks for all that, it was helpful.

Ah, gotcha. Thanks.

Sorry. I didn’t mean for them to be benchmarks. I was just trying things out on that dataset, since I haven’t loaded a local copy yet. It was puzzling me why the requests were taking so long.

Thanks again.

The reason I said it looked like the dgraph.type isn’t built for the Freebase data was because when looking at the schema for the Freebase dataset, there were various other dgraph.[name] predicates mentioned, but not dgraph.type - so I made the erroneous assumption that it wasn’t built. Thanks for the correction.

Are there any other internal predicates that won’t be listed either in the schema in general, or specifically in the Ratel schema interface?

Thanks.

All predicates that start with ~dgraph~ will be hidden from the user. You may have a flag to expose them or maybe they will be available at /state. But they have no use for the general users. All you need tho is dgraph.type. But it doesn’t need to be in the schema panel.

Cheers!

1 Like