Dgraph Design to improve query performance

tintinthong · October 23, 2020, 8:25am

Problem

Recently, me and my colleague have run into some performance issues with querying dgraph. We are curious whether we have taken the right approach to our dgraph design.

enum Fruit {
   apple
   orange 
   banana
}

enum Color {
    red
    blue 
    green
    yellow 
}


type Person {
  id: String! @id
  favoriteColor: Color @search(by: [hash])
  favoriteFruit: Fruit @search(by: [hash])
  favoriteWord: String @search(by: [hash])
  age: Int! @search
  ...
  ...
  ...
}

{
    q1(func: eq( Person.favoriteFruit , "apple")) @filter(
       eq(Person.favoriteColor, "red") AND (ge(Person.age, 10) )
) {
   id
    }
}

As you can see, favoriteFruit and favoriteColor seem to be predicates (not an edge) of a node called Person.
Using this query we have found that it is really slow. And we think it is because dgraph has to make scans of all Person nodes.
We hypothesize that we would benefit from a new dgraph schema design. We plan to replace predicates with edges. favoriteColor and favoriteFruit is now an edge (not a predicate) because they are connected to new nodes FruitNode and ColorNode. In this case, the number of nodes scans is reduced for those only connected to FruitNode and ColorNode.

type FruitNode {
    name: Fruit @search(by: [hash])
}

type ColorNode {
    name: Color @search(by: [hash])
}

enum Fruit {
   apple
   orange 
   banana
}

enum Color {
    red
    blue 
    green
    yellow 
}


type Person {
  id: String! @id
  favoriteColor: ColorNode
  favoriteFruit: FruitNode
  favoriteWord: String @search(by: [hash])
  age: Int! @search
  ...
  ...
  ...
}

{
    q1(func: ge(Person.age, 10) ) @normalize {
       id 
       Person.favoriteFruit:  @filter ( eq(Fruitnode.name, "apple") ) {
           name
       }
       Person.favoriteColor:  @filter ( eq(Colornode.name, "red" ) ){
           name
       }
   }
}

Question

In the context of the problem, does replacing predicate with edge make sense in improving query performance?
When should we choose to define our field as a predicate or an edge(or in other words, when should I use a node attribute or a new node )?
As an aside, what if we compare favoriteWord <> String and favoriteColor <> Enum Color? Are there performance improvements with using enums rather than Strings?

Any input will be much appreciated. Thank you.

chewxy · October 27, 2020, 3:42am

I observe that your schema is in graphql but your queries are in DQL. Is that intended?

anand · October 27, 2020, 4:40am

Hi @tintinthong, I dont believe that Dgraph should be scanning all Person nodes as the predicate has a hash index. This can be confirmed in the backend through Ratel.

Could you please share the dataset you are testing against? Also please let us know your cluster setup and we will try and reproduce this behavior.

Topic		Replies	Views
Sharing a little trick Dgraph	4	260	March 20, 2024
I Need Guidance on Schema Design and Query Optimization in Dgraph Dgraph	0	87	June 27, 2024
Advice Needed on Optimizing Dgraph Query Performance Users	1	82	July 29, 2024
Query to slow, how to optimize query Dgraph	5	471	April 25, 2021
How to improve query response time? Dgraph	6	534	November 11, 2019

Dgraph Design to improve query performance

Problem

Question

Related topics