V1.0.12 slower for some queries


(Kaveh) #1

Hi,

I have a setup that I can switch between dgraph versions easily. After v1.0.12 release, I was comparing my queries to make sure if I can switch to v1.0.12 in production but I found one complex query to run consistently slower in v1.0.12 compared to v1.0.11.

The query is my longest one and might need improvement in the new version and it is:

{
    var(func: eq(serial, "MY-PRODUCT")) {
        PRODUCTSELF as uid
        FEATURES as has_feature @filter(eq(code,["COLOR"])  )
        garment_of {
            GENDER as gender
            type_of {
                TYPE as uid
            }
        }
        DESC as described_as
    }

    prod(func: eq(locale,"CA::en")) @cascade {
        locale
        products @filter(
                eq(is_published, true)
                and eq(has_thumbnail,true)
                and eq(has_productURL,true)
                and not uid(PRODUCTSELF)
                and (eq(described_as, "") or not eq(described_as, val(DESC)))
            )
                @facets(is_in_stock,updated_at)
                @facets(eq(is_in_stock,true) and gt(updated_at,'$(expr $(date +%s) - 172800)'))
        {
            is_published
            has_thumbnail
            has_productURL
            serial
            garment_of @filter( eq(gender,"u") or eq(gender,val(GENDER)) ) {
                gender
                type_of @filter(uid(TYPE)) {
                    type_name
                }
            }

            has_feature @filter(uid(FEATURES)) {
                locale
                code
                value
            }
        }
    }
}

The schema I use is:

product_of : uid @reverse @count .
type_of : uid @reverse @count .
garment_of : uid @reverse @count .
serial : string @count @index(exact) .
described_as : string @count @index(term) .
is_published : bool @count @index(bool) .
has_thumbnail : bool @count @index(bool) .
has_productURL : bool @count @index(bool) .
updated_at : dateTime @index(day) .

shop_id : string @count @index(exact) .
shop_name : string @count @index(exact) .
prefix : string @count @index(exact) .

type_id : string @count @index(exact) .
is_generic : bool @count @index(bool) .

garment_id : string @count @index(exact) .
gender : string @count @index(exact) .

article_size_id : string @count @index(exact) .
products : uid @reverse @count .
shop_country : string @count @index(exact) .
language : string @count @index(exact) .
locale: string @count @index(exact) .
is_in_stock: bool @count @index(bool) .

code: string @count @index(exact) .
value: string @count @index(exact) .

All predicates have indexing and I assume they are correct as the older version of dgraph is querying near 40% faster. I am checking on http and grpc btw.

Is this anyone else noticed or it is an edge case that only affects me?

average query time (after 100 runs):
v1.0.11: 189156741.4 ns
v1.0.12: 289063134.8 ns


(Kaveh) #2

I was trying to see if I can replicate the same situation on a more public data and if I am not wrong ( I just tried a couple of times each time 1000 executions), it seems I see the same case for 1million.rdf.gz data set.

while true; do curl -s localhost:8080/query -XPOST -d '
{
  var(func: eq(name@en,"Minority Report")) {
    d as initial_release_date
  }

  me(func: eq(name@en, "Steven Spielberg")) {
    name@en
    director.film @filter(ge(initial_release_date, val(d))) {
      initial_release_date
      name@en
    }
  }
}
' | python -m json.tool|grep processing_ns|cut -d':' -f 2|cut -d' ' -f 2;done

If it was replicable for any one else this case make it easier to trace it as data set is the same we all use in tutorial.

Notice for this set I only test it in my local machine a couple times and not in my servers as I was just trying to find a public data/query with the same issue.

average response (1000x):

v1.0.11: 5462598.224
v1.0.12: 7390753.525


(Nikita Zaletov) #3

1.0.12 doesn’t have LRU cache comparing to 1.0.11, may be that’s the case


(Manish R Jain) #4

We’ll have to verify these numbers, but just at a general level, simpler queries might slow down a bit, while more complex queries speed up with these changes. The issue was a lot of contention – for simpler queries that’s not an issue – so they’d be limited by the how fast data can be accessed off disk.

@dmai: Can you post what numbers you get on your desktop?

P.S. We have plans to write a much faster LFU cache based on research – going to post a blog about it today.


(system) closed #5

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.