Node capacity sizing for 100+M vertices

vitorhirota · March 9, 2020, 7:34pm

I’m playing around a cluster of two 8 core 32Gb RAM machines with 1 zero in machine A and 2 alphas (one in each machine with 9GB LRU). This is over version 1.2.1.

Prepared 1B edges on bulk loader with shard=2 with a schema fragment like this:

type A {
   a: string @index(hash) .
   link: [uid] @reverse
}

type B {
   b: string @index(hash) .
}

Where A has 55M nodes, and B has 60M nodes.

When issuing a simple query to count nodes per type, everything goes swell.

The query below takes 10min, issuing through a HTTP client (such as Postman) with a timeout=600s querystring argument:

{
  var(func: type(A)) {
    cnt as count(link)
  }
  
  a(func: type(A), orderdesc: val(cnt), first: 5) {
    uid
  }
}

If I add val(cnt), it fails with an empty response, even increasing the timeout argument.
As far as I can see in the logs, the alpha with the link predicate, looses connection to zero and immediately reconnects.

What can I do to get this query working? Is this lack of hardware, or just a configuration I’m missing?

MichelDiz · March 9, 2020, 8:34pm

Try to “bind” the query blocks. e.g.

{
  Ca as var(func: type(A)) {
    cnt as count(link)
  }
  
  a(func:  uid(Ca), orderdesc: val(cnt), first: 5) {
    uid
  }
}

Example using the movie dataset we have in play dgraph

{
  # A as var(func: has(<director.film>)) {
  #   C as count(director.film)
  # }
    
 A as var(func: gt(count(director.film), 1)) {
   C as count(director.film)
 }
  
  q(func: uid(A), orderdesc: val(C) , first:10) {
    uid
    name@.
     count(director.film)
  }
}

vitorhirota · March 11, 2020, 9:23pm

Great tip @MichelDiz, thank you very much. My error was missing the count index. Now query runs in under 2min.

Topic		Replies	Views
Can not query dataset with 222M Edge in a bare metal server with 65GB Ram(Memory overflow) Dgraph	12	998	September 10, 2019
Missing edges in Dgraph Dgraph	4	1006	September 18, 2018
Connecting Filters are very slow! Dgraph	2	390	January 4, 2021
8 process parallel query dgraph, in less than a minute, memory runs out Users	1	407	November 17, 2019
Dgraph query Issues	2	400	January 6, 2022

Node capacity sizing for 100+M vertices

Related topics