Performance issue in cluster


#1

I have setup a cluster with 3 zeros with replicas 3 as a option and 3 alpha server as 1 group.

I just imported all data around 150GB using bulk loader. Now, i am trying to query based on some reverse edges to show recommendation. But, out of three alphas randomly one or two servers taking long time to give response (around 10s - 12s) and one server is giving response in less than 2s.

Please give me suggestion to fix this issue in my setup.

client: pydgraph (latest)
protocol: gRPC
dgraph version: 1.0.16
systems info (all nodes):-
32 cores
128GB RAM
5TB hard disk


(Daniel Mai) #2

Do all the machines in the cluster have the same machine specs? The one thing that stands out to me is the hard disk. Queries can be slow due langer disk seek latencies from HDDs instead of SSDs.

Can you share the query where you’re seeing this discrepancy in response time? It’s possible that the query could be slow due to not using an index or some other reason.


#3
Do all the machines in the cluster have the same machine specs? 

Yes.

Queries can be slow due langer disk seek latencies from HDDs instead of SSDs.

Yes, I know HDD is slower than SSD but in that case it should give response within the same time.

Query:-

{
  var(func: eq(jd_id, xxxxxxxxxxx)) {
    vlang as jd_lang
    duid as uid
    
    gflr as genre
    dflr as director
    cflr as cast_member
    sflr as screenwriter
    pflr as production_company
    aflr as award_received
  }
  details(func: uid(duid))  @normalize {
    mid: jd_id
    lbl: label
    hash: jd_hash
    desc: jd_desc
    rel_yr: jd_rel_yr
    main_img: jd_main_image
    thumb_img: jd_thumb_image
    rate: IMDb_average_rating
    
    gcnt as count(genre)
    dcnt as count(director)
    ccnt as count(cast_member)
    scnt as count(screenwriter)
    pcnt as count(production_company)
    acnt as count(award_received)
    trval as IMDb_average_rating
    relyr as jd_rel_yr
    
    norm as math(1)
    inorm as math(0.001)
    rval as math(trval+inorm)
      
    normIn as math(sqrt(gcnt+dcnt+ccnt+scnt+acnt+pcnt+rval))

    cast_member{
      ~cast_member @filter(eq(jd_online_flag, 1) AND eq(jd_lang, val(vlang))) {
        label
        jd_id
        cgcnt as count(genre @filter(uid(gflr)))
        cdcnt as count(director @filter(uid(dflr)))
        cccnt as count(cast_member @filter(uid(cflr)))
        cscnt as count(screenwriter @filter(uid(sflr)))
        cacnt as count(award_received @filter(uid(aflr)))
        cpcnt as count(production_company @filter(uid(pflr)))
        ctrval as IMDb_average_rating
        cnorm as math(0.001)
        crval as math(ctrval+cnorm)
        tcrel as jd_rel_yr
        czero as math(0)
        crel as math(czero+tcrel)
        tcrelyr as math(relyr/norm)

        crelyr as math(cond(tcrelyr >= crel, tcrelyr-crel, crel-tcrelyr))
        cnormIn as math(sqrt(cgcnt+cdcnt+cccnt+cscnt+cacnt+cpcnt+crval+crelyr))
          
        cscore as math( ((gcnt/normIn)*(cgcnt/cnormIn)) + ((dcnt/normIn)*(cdcnt/cnormIn)) + ((ccnt/normIn)*(cccnt/cnormIn)) + ((scnt/normIn)*(cscnt/cnormIn)) + ((acnt/normIn)*(cacnt/cnormIn)) + ((pcnt/normIn)*(cpcnt/cnormIn)) + ((rval/normIn)*(crval/cnormIn)) )
      }
    }

    director{
      ~director @filter(eq(jd_online_flag, 1) AND eq(jd_lang, val(vlang))){
        label
        jd_id
        dgcnt as count(genre @filter(uid(gflr)))
        ddcnt as count(director @filter(uid(dflr)))
        dccnt as count(cast_member @filter(uid(cflr)))
        dscnt as count(screenwriter @filter(uid(sflr)))
        dacnt as count(award_received @filter(uid(aflr)))
        dpcnt as count(production_company @filter(uid(pflr)))
        dtrval as IMDb_average_rating
        dnorm as math(0.001)
        drval as math(dtrval+dnorm)
        tdrel as jd_rel_yr
        dzero as math(0)
        drel as math(dzero+tdrel)
        tdrelyr as math(relyr/norm)

        drelyr as math(cond(tdrelyr >= drel, tdrelyr-drel, drel-tdrelyr))
        dnormIn as math(sqrt(dgcnt+ddcnt+dccnt+dscnt+dacnt+dpcnt+drval+drelyr))

        dscore as math( ((gcnt/normIn)*(dgcnt/dnormIn)) + ((dcnt/normIn)*(ddcnt/dnormIn)) + ((ccnt/normIn)*(dccnt/dnormIn)) + ((scnt/normIn)*(dscnt/dnormIn)) + ((acnt/normIn)*(dacnt/dnormIn)) + ((pcnt/normIn)*(dpcnt/dnormIn)) + ((rval/normIn)*(drval/dnormIn)) )
      }
    }

    genre{
      label
      ~genre @filter(eq(jd_online_flag, 1) AND eq(jd_lang, val(vlang))) (first: 50){
        jd_id
        label
        ggcnt as count(genre @filter(uid(gflr)))
        gdcnt as count(director @filter(uid(dflr)))
        gccnt as count(cast_member @filter(uid(cflr)))
        gscnt as count(screenwriter @filter(uid(sflr)))
        gacnt as count(award_received @filter(uid(aflr)))
        gpcnt as count(production_company @filter(uid(pflr)))
        gtrval as IMDb_average_rating
        gnorm as math(0.001)
        grval as math(gtrval+gnorm)
        tgrel as jd_rel_yr
        gzero as math(0)
        grel as math(gzero+tgrel)
        tgrelyr as math(relyr/norm)

        grelyr as math(cond(tgrelyr >= grel, tgrelyr-grel, grel-tgrelyr))
        gnormIn as math(sqrt(ggcnt+gdcnt+gccnt+gscnt+gacnt+gpcnt+grval+grelyr))

        gscore as math( ((gcnt/normIn)*(ggcnt/gnormIn)) + ((dcnt/normIn)*(gdcnt/gnormIn)) + ((ccnt/normIn)*(gccnt/gnormIn)) + ((scnt/normIn)*(gscnt/gnormIn)) + ((acnt/normIn)*(gacnt/gnormIn)) + ((pcnt/normIn)*(gpcnt/gnormIn)) + ((rval/normIn)*(grval/gnormIn)) )
      }
    }

    score as math(max(max(cscore, dscore), gscore))
  }
       

  similar(func: uid(score), orderdesc: val(score), first: 20)  @filter(NOT uid(duid)) {
    uid
    mid: jd_id
    lbl: label
    hash: jd_hash
    # desc: jd_desc
    rel_yr: jd_rel_yr
    # limg: logo_image
    # img: image
    main_img: jd_main_image
    thumb_img: jd_thumb_image
    rate: IMDb_average_rating
    score: val(score)
    dscore:val(dscore)
    gscore:val(gscore)
    # sscore:val(sscore)
    cscore:val(cscore)

  }

}

Indexes:-

genre: uid @reverse .
cast_member: uid @reverse .
director: uid @reverse .
award_received: uid @reverse .
screenwriter: uid @reverse .
production_company: uid @reverse .

jd_online_flag: int @index(int) .
jd_id: string @index(exact) .
jd_lang: string @index(exact) .

#4

Any suggestion to improve performance?

I tried with docker image and initially it was working fine and returning response in less than 1s. But, after 2 week, Alpha servers are taking long time randomly.

@dmai
In last message updated the query please check it and give me some suggestion to fix this issue.