Query Performance

calummoore · December 21, 2018, 2:40pm

I’m using Dgraph as a searchable log store, and I’ve noticed some performance lag even at very limited scale. Just wanted to know if this lag will get significantly worse at scale and if I can help Dgraph by organizing my data/schema in a way that helps it to stay performant at scale?

Dev Environment

Docker - 4GB RAM, 2xCPU (actual machine has 16 GB 2133 MHz LPDDR3, 3.3 GHz Intel Core i7) - dgraph/dgraph:v1.0.10. 1 x zero and 1 x alpha.

Requirements for my use case:

Extremely high write volume, although this is performed asynchronously so overall write speed is less of an issue
Query log data between a specified time period and for a space.name (count, sum and text search queries).
Transactions are not required
Data will be deleted after ~1-3 months, so that should prevent the overall size of the dataset from growing indefinitely

Is Dgraph the right tool for this use case?

Schema / Data

Data does not currently use any graph features, so all records are added without any connections. This avoids having to perform multiple lookups or risking a transaction failure, when adding log entries.

space.name: string @index(hash) .
log.message: string @index(hash) .
log.searchable: string @index(term) .
log.severity: string @index(hash) .
log.size: float @count .
log.timestamp: dateTime @index(hour) .
log.type: string @index(hash) @count .
log.containerId: string @index(hash) .

Example Queries

These are some of the queries I run. Total number of records currently in space.name "dev" is 2999 - and I’m getting query times of between 1-2 seconds.

Am I doing something wrong? Will these times increase dramatically if I add more log records / spaces?

Total number of LOG lines (since x timestamp)

query {
  requests (func: eq(space.name, "dev")) @filter (ge(log.timestamp, "2018-12-01T00:00:00+00:00") AND eq(log.type, "LOG")) {
    count(uid)
  }
}

Took between 1.4 - 2 seconds:

{
  "data": {
    "requests": [
      {
        "count": 2634
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 219200,
      "processing_ns": 1395451400,
      "encoding_ns": 4498600
    },
    "txn": {
      "start_ts": 280019
    }
  }
}

Total Log Volume (since x timestamp)

query {
  var (func: eq(space.name, "dev")) @filter (ge(log.timestamp, "2018-12-01T00:00:00+00:00") AND eq(log.type, "LOG"))  {
		A as log.size
  }  
    
  volume () {
    size: sum(val(A))
  }
}

Took ~ 1.6 seconds

{
  "data": {
    "volume": [
      {
        "size": 83944
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 44600,
      "processing_ns": 1606808100,
      "encoding_ns": 1918400
    },
    "txn": {
      "start_ts": 280026
    }
  }
}

Keyword Search (since x timestamp)

query {
  requests (func: eq(space.name, "dev"), first: 100) @filter (ge(log.timestamp, "2018-12-01T00:00:00+00:00") AND eq(log.type, "LOG") AND allofterms(log.searchable, "Hello World Docker")) {
    log.searchable
  }
}

Took ~2.2 seconds:

{
  "data": {
    "requests": [
      {
        "log.searchable": " 2. The Docker daemon pulled the \"hello-world\" image from the Docker Hub.\n",
        "uid": "0x11255"
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 49400,
      "processing_ns": 2219122200,
      "encoding_ns": 2248800
    },
    "txn": {
      "start_ts": 280043
    }
  }
}

Any help improving the performance would be much apperciated!

MichelDiz · December 21, 2018, 3:23pm

Hey Calummoore,

Can you test your application with v1.0.11-nolru?
Docker Hub

This is a kind of RC made to improve some performance. It’s not totally done, but wanna see if you can take some perf from it.

calummoore · December 21, 2018, 5:21pm

Here are my results using dgraph/dgraph:v1.0.11-nolru - looks like’s it’s actually a bit slower.

Total number of LOG lines (since x timestamp)
Took ~2.9 seconds (first time), ~2.7 seconds (thereafter)

{
  "extensions": {
    "server_latency": {
      "parsing_ns": 56800,
      "processing_ns": 2917386000,
      "encoding_ns": 10451500
    },
    "txn": {
      "start_ts": 290007
    }
  },
  "data": {
    "requests": [
      {
        "count": 2634
      }
    ]
  }
}

Total Log Volume (since x timestamp)
Took ~3.2 seconds (first time), ~2.6 seconds (thereafter)

{
  "extensions": {
    "server_latency": {
      "parsing_ns": 63900,
      "processing_ns": 3284000400,
      "encoding_ns": 2329000
    },
    "txn": {
      "start_ts": 290008
    }
  },
  "data": {
    "volume": [
      {
        "size": 83944
      }
    ]
  }
}

Keyword Search (since x timestamp)
Took ~3.3 seconds (first time), ~2.2-3.6 seconds (is quite variable)

{
  "extensions": {
    "server_latency": {
      "parsing_ns": 63000,
      "processing_ns": 3251891200,
      "encoding_ns": 3782700
    },
    "txn": {
      "start_ts": 290019
    }
  },
  "data": {
    "requests": [
      {
        "log.searchable": " 2. The Docker daemon pulled the \"hello-world\" image from the Docker Hub.\n",
        "uid": "0x11255"
      }
    ]
  }
}

makitka · December 21, 2018, 6:16pm

every eq operation performs lookups to index both for func and filter cases.
to have good performance in dgraph, you should use predicates for types, areas and so on.
so, instead of space.name predicate with “dev” value you should create empty space.name.dev predicate with “” value for each log in this space. the same for log.type - replace it with log.type.LOG with “” value. you don’t need indexes for these predicates.
if you want to calculate logs for days interval, create a node for each day and link every log node to this day node with log.day predicate.

if you want to count all logs for day interval, firstly you get all day nodes, then all linked to that day nodes logs (via reverse predicate).
so, your query will be like:

var(func: ge(day.date, "2018-12-01")) {
  logs as ~log.day
}

request(func: uid(logs)) @filter(has(space.name.dev) and has(log.type.LOG)) {
  count(uid)
}

or even just:

request(func: ge(day.date, "2018-12-01")) {
  ~log.day @filter(has(space.name.dev) and has(log.type.LOG)) {
    count(uid)
  }
}

it should work fast.
but why are you using graph db for that? why not ElasticSearch?

MichelDiz · December 21, 2018, 6:40pm

I believe that for you what would matter most would be the insertion of data. That is, writing. Analyzing the performance for your tool by Query response does not have an optimal source for analysis. Unless you want query for thousands of data to be more important than saving them.

However, http://play.dgraph.io has a configuration much like yours. And running the query down I only had 6s of delay at the beginning. Then it went down to milliseconds.

{
  jcount(func: has(actor.film)){
    count(uid)
  }
}
#Server latency: 336ms

Also I would recommend that you review the indexing usage. You have 8 indexes and two @count. If you use all 8 predicates this can generate a bit of delay in writing. I would recommend that you consider a strategy for this. Indexing everything does not help in writing, but helps on reading.

I think (Just guessing here) the low performance may be due to recent heavy usage, multiple tasks running at the same time. In the “Play.Dgraph” example you just see up there. There are no mutations going on. Everything is consolidated.

In this way, I would also recommend you add more Alphas. Maybe more CPUs and a little more memory. The Play Dgraph uses 16GB of memory and 2CPUs. But no background working.

Dgraph Core team is working on it xD

mrjn · December 21, 2018, 6:47pm

Hey @calummoore,

Dgraph is the right DB for this kind of work, obviously. Can you share your p directory with us? Then we can run these queries and see how to optimize them and if there’s something we could optimize in Dgraph.

In general, all folks here, if you have slow queries, please post them in Discuss, and tag me or @dmai. We’re looking for ways to optimize Dgraph. Having said that, do remember that a lot of times, the queries themselves can be optimized.

Cheers,
Manish

Update: One more thing. Try to run with Jaeger. That could help you understand what’s going on with the query latency. That’s how we debug latency issues. Dgraph v1.0.11 has support for exporting to Jaeger.

calummoore · December 22, 2018, 1:15am

Thanks for your responses!

@makitka - if I understand you correctly, having a predicate space.name.dev="" with query func: has(space.name.dev), would be faster than having space.name="dev" with query func: eq(space.name, “dev”)? If that’s the case I will try it, but seems like that could that be optimised on the Dgraph side as it’s not totally intuitive?

I see why creating a posting list could offer better performance, because it essentially creates a secondary index. I don’t really want to attach to an existing node though, as it would require 2 round trips (the first to get the ID of the desired node, and the second to attach the node) - as Dgraph does not allow mutations using external IDs.

@MichelDiz, @mrjn - is there a way to add a “secondary” index to the root, rather than attaching to existing nodes? Or even better would be some way to insert data to a posting list using an external key, without 2 roundtrips (so I can use the graph like features).

@MichelDiz - Memory doesn’t seem to be a limiting factor in this case - Dgraph is only using ~300Mb of the 4GB available. CPU jumps to ~140% when running the query, so I guess that’s the problem. Still, space.name='dev" has only 3,000 records, so worse case I would have thought you could just loop over those 3,000 records - which should still be faster than 1-3 seconds). And to be clear, no other queries or mutations are happening at the same time.

I know I have a lot of indexes, but Dgraph doesn’t let you filter items without an index, and I want the user to be able to filter on those items. In any case, my writes are still actually quite fast (~46ms), so at the moment, I’m more focused on the query delay, which is quite noticeable.

For my use case, a read of 1-3 seconds is actually okay, but anything more would become a problem. Given I’m already at that level with a limited dataset, I’m concerned it will get worse.

@mrjn - Sure, here’s a link - https://drive.google.com/open?id=1fpl_r_yKlDewbS35TBrZGe3h2vdanO3T.

Is this what you mean by Jaeger? I’ve not seen it before, how exactly do I use it with Dgraph? Can it trace the internals of Dgraph?

Key Questions

Could we add a way to insert data to an existing node predicate posting list using an external key? There would be no risk of conflict and that would enable me to use all the graph features!
How are secondary indexes and filters applied in Dgraph at the root level, and on a node posting list level?
- Does each posting list get it’s own set of indexes? Or is it manually traversed?
- Are any secondary indexes created at the root level?
- How does Dgraph use it’s indexes when there are multiple filters? Does it just use the one with the smallest results, and then perform the rest of the filters line by line?

Thanks all!

dmai · December 22, 2018, 5:42am

That’s Jaeger. See the new tracing docs about how to use it with Dgraph. Dgraph exposes traces for internal events including queries and mutations.

Indices are only created when you explicitly set them in your schema.

Queries always start with a result set starting from the root func. Further filters narrow down the result set from the initial set. There’s nothing done to “choose the smaller result” from the filters as the root func. That would go against the semantics of the written query.

It’s common that a transaction queries for a set of nodes and then applies mutations based on the nodes from the query. The query latency should not be an impact here.

Thanks for sharing the p directory. I’ll take a look some time tomorrow. In the mean time, you can run Jaeger and take a look yourself. That’s what I’ll be doing.

dmai · December 22, 2018, 6:09am

I had some time tonight to take another look here.

@makitka provided sound advice earlier about using predicates for types. That should help your query performance as running eq(log.type, "LOG") is most of the work done in these queries. See the docs about Giving Nodes a Type for the same info.

calummoore · December 22, 2018, 2:31pm

Thanks for the update!

Indices are only created when you explicitly set them in your schema.

Yes, I understand that - my question is that if you add an index to your schema does each posting list also get a version of that index or is an index only created at the root level?

For example, if I create an index on log.timestamp and log.size predicates - and then I attach logs to an existing node <_:space>.

<_:space> <space.name> "dev" .
<_:space> <space.logs> <_:log1> .
<_:space> <space.logs> <_:log2> .
<_:log1> <log.timestamp> "2018-12-01T00:00:00+00:00" .
<_:log1> <log.size> "10" .
<_:log2> <log.timestamp> "2018-12-01T00:00:00+00:00" .
<_:log2> <log.size> "10" .

Do you use the fact that both log.timestamp and log.size are indexed in the schema, to also maintain a separate mini-index for each posting list <space.logs>?

Queries always start with a result set starting from the root func. Further filters narrow down the result set from the initial set. There’s nothing done to “choose the smaller result” from the filters as the root func. That would go against the semantics of the written query.

Fair enough, but does that mean an index is only ever used if it is in the “root” section of the query. So in the following query - only the space.name index is actually used?

(func: eq(space.name, "dev")) @filter (ge(log.timestamp, "2018-12-01T00:00:00+00:00") AND eq(log.type, "LOG")) {
  	...
}

I’m keen to understand how “Further filters narrow down the result set from the initial set” part works in particular - is that done by going through line by line or are indexes used somehow?

It’s common that a transaction queries for a set of nodes and then applies mutations based on the nodes from the query. The query latency should not be an impact here.

That makes a lot of sense when you are mutating the values you are querying, but it makes a lot less sense if all you want to do is add a record to an existing posting list - I don’t actually need transactions in that case. For example - I want to simply add a log to a posting list space.logs - but for that I need 2 round trips.

<????> <space.logs> <_:log1> .
<_:log1> <log.timestamp> "2018-12-01T00:00:00+00:00" .
<_:log1> <log.size> "10" .

Given my requirement is to have an extremely large write volume, I think performing an additional write for every mutation would cause performance issues.

Thanks for sharing the p directory. I’ll take a look some time tomorrow. In the mean time, you can run Jaeger and take a look yourself. That’s what I’ll be doing.

Will give it a go and let you know when I’ve done it

system · January 21, 2019, 2:31pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Improving performance and scaling for a key-value workload Dgraph	0	510	October 25, 2019
Performance Questions Dgraph kind:question , dgraph	0	396	October 19, 2021
Slow performance on a single node with millions of documents Dgraph performance , area:performance	7	1762	August 24, 2020
Slow performance on ~ 100M vertices graph from Wikidata Dgraph area:performance	18	1397	December 13, 2021
Realistic query performance Dgraph	7	662	May 16, 2020