Dgraph Design to improve query performance

Problem

Recently, me and my colleague have run into some performance issues with querying dgraph. We are curious whether we have taken the right approach to our dgraph design.

enum Fruit {
   apple
   orange 
   banana
}

enum Color {
    red
    blue 
    green
    yellow 
}


type Person {
  id: String! @id
  favoriteColor: Color @search(by: [hash])
  favoriteFruit: Fruit @search(by: [hash])
  favoriteWord: String @search(by: [hash])
  age: Int! @search
  ...
  ...
  ...
}
{
    q1(func: eq( Person.favoriteFruit , "apple")) @filter(
       eq(Person.favoriteColor, "red") AND (ge(Person.age, 10) )
) {
   id
    }
}
  • As you can see, favoriteFruit and favoriteColor seem to be predicates (not an edge) of a node called Person.
  • Using this query we have found that it is really slow. And we think it is because dgraph has to make scans of all Person nodes.
  • We hypothesize that we would benefit from a new dgraph schema design. We plan to replace predicates with edges. favoriteColor and favoriteFruit is now an edge (not a predicate) because they are connected to new nodes FruitNode and ColorNode. In this case, the number of nodes scans is reduced for those only connected to FruitNode and ColorNode.
type FruitNode {
    name: Fruit @search(by: [hash])
}

type ColorNode {
    name: Color @search(by: [hash])
}

enum Fruit {
   apple
   orange 
   banana
}

enum Color {
    red
    blue 
    green
    yellow 
}


type Person {
  id: String! @id
  favoriteColor: ColorNode
  favoriteFruit: FruitNode
  favoriteWord: String @search(by: [hash])
  age: Int! @search
  ...
  ...
  ...
}
{
    q1(func: ge(Person.age, 10) ) @normalize {
       id 
       Person.favoriteFruit:  @filter ( eq(Fruitnode.name, "apple") ) {
           name
       }
       Person.favoriteColor:  @filter ( eq(Colornode.name, "red" ) ){
           name
       }
   }
}

Question

  1. In the context of the problem, does replacing predicate with edge make sense in improving query performance?
  2. When should we choose to define our field as a predicate or an edge(or in other words, when should I use a node attribute or a new node )?
  3. As an aside, what if we compare favoriteWord <> String and favoriteColor <> Enum Color? Are there performance improvements with using enums rather than Strings?

Any input will be much appreciated. Thank you.

2 Likes

I observe that your schema is in graphql but your queries are in DQL. Is that intended?

Hi @tintinthong, I dont believe that Dgraph should be scanning all Person nodes as the predicate has a hash index. This can be confirmed in the backend through Ratel.

Could you please share the dataset you are testing against? Also please let us know your cluster setup and we will try and reproduce this behavior.