I’m using Dgraph as a searchable log store, and I’ve noticed some performance lag even at very limited scale. Just wanted to know if this lag will get significantly worse at scale and if I can help Dgraph by organizing my data/schema in a way that helps it to stay performant at scale?
Dev Environment
Docker - 4GB RAM, 2xCPU (actual machine has 16 GB 2133 MHz LPDDR3, 3.3 GHz Intel Core i7) - dgraph/dgraph:v1.0.10. 1 x zero and 1 x alpha.
Requirements for my use case:
- Extremely high write volume, although this is performed asynchronously so overall write speed is less of an issue
- Query log data between a specified time period and for a
space.name
(count, sum and text search queries). - Transactions are not required
- Data will be deleted after ~1-3 months, so that should prevent the overall size of the dataset from growing indefinitely
Is Dgraph the right tool for this use case?
Schema / Data
Data does not currently use any graph features, so all records are added without any connections. This avoids having to perform multiple lookups or risking a transaction failure, when adding log entries.
space.name: string @index(hash) .
log.message: string @index(hash) .
log.searchable: string @index(term) .
log.severity: string @index(hash) .
log.size: float @count .
log.timestamp: dateTime @index(hour) .
log.type: string @index(hash) @count .
log.containerId: string @index(hash) .
Example Queries
These are some of the queries I run. Total number of records currently in space.name
"dev"
is 2999 - and I’m getting query times of between 1-2 seconds.
Am I doing something wrong? Will these times increase dramatically if I add more log records / spaces?
Total number of LOG lines (since x timestamp)
query {
requests (func: eq(space.name, "dev")) @filter (ge(log.timestamp, "2018-12-01T00:00:00+00:00") AND eq(log.type, "LOG")) {
count(uid)
}
}
Took between 1.4 - 2 seconds:
{
"data": {
"requests": [
{
"count": 2634
}
]
},
"extensions": {
"server_latency": {
"parsing_ns": 219200,
"processing_ns": 1395451400,
"encoding_ns": 4498600
},
"txn": {
"start_ts": 280019
}
}
}
Total Log Volume (since x timestamp)
query {
var (func: eq(space.name, "dev")) @filter (ge(log.timestamp, "2018-12-01T00:00:00+00:00") AND eq(log.type, "LOG")) {
A as log.size
}
volume () {
size: sum(val(A))
}
}
Took ~ 1.6 seconds
{
"data": {
"volume": [
{
"size": 83944
}
]
},
"extensions": {
"server_latency": {
"parsing_ns": 44600,
"processing_ns": 1606808100,
"encoding_ns": 1918400
},
"txn": {
"start_ts": 280026
}
}
}
Keyword Search (since x timestamp)
query {
requests (func: eq(space.name, "dev"), first: 100) @filter (ge(log.timestamp, "2018-12-01T00:00:00+00:00") AND eq(log.type, "LOG") AND allofterms(log.searchable, "Hello World Docker")) {
log.searchable
}
}
Took ~2.2 seconds:
{
"data": {
"requests": [
{
"log.searchable": " 2. The Docker daemon pulled the \"hello-world\" image from the Docker Hub.\n",
"uid": "0x11255"
}
]
},
"extensions": {
"server_latency": {
"parsing_ns": 49400,
"processing_ns": 2219122200,
"encoding_ns": 2248800
},
"txn": {
"start_ts": 280043
}
}
}
Any help improving the performance would be much apperciated!