Delete operation too slow

To facilitate better answering of questions, if you have a question, please fill in the following info. Otherwise, please delete the template.

What I want to do

Delete all predicates/nodes of a type(Equipment). The operation is too slow with (100000) nodes and I see Context Cancelled after 30 seconds

What I did

upsert {
    query {
        equipmentType as var(func: type(Equipment)) @filter(eq(scopes,DMO)){
            equipments as equipment.id
        }
    }

  mutation {
    delete {
      uid(equipmentType) * * .
            uid(equipments) * * .
    }
  }
}

Dgraph metadata

Dgraph version : v20.03.3
Dgraph SHA-256 : 08424035910be6b6720570427948bab8352a0b5a6d59a0d20c3ec5ed29533121
Commit SHA-1 : fa3c19120
Commit timestamp : 2020-06-02 16:47:25 -0700
Branch : HEAD
Go version : go1.14.1

You don’t need to do this twice. Stick to equipmentType. It is redundant what you are doing.

Or change to

{
       var(func: type(Equipment)) @filter(eq(scopes,DMO)){
            equipmentType as uid
        }
1 Like

@MichelDiz, Thanks I will try to update the query and will share the results. But do you think that the above query should be this slow(more than 30seconds). Do you know any optimization for this kind of bulk delete and is it possible to see the query plan in DGraph.

Nope, It shouldn’t as far I can tell.

It depends. What are your stats? (resources).

No, but you can try to understand how it works by reading this paper Dgraph Whitepapers: In-Depth Insights and Analysis

Cheers.

My Setup - Single Node(with Alpha and Zero)

RAM - 8GB
CPU- 4vCPU

Do you require any additional details?

Thanks,
DharmjitSIngh

For that setup, I would recommend internal bulk batches of ~10k. Or create a multi-query upsert that it will run concurrently.

e.g.

upsert {
    query {
       var(func: type(Equipment)) @filter(eq(scopes,DMO) AND something1){
            equipmentType1 as uid
        }
       var(func: type(Equipment)) @filter(eq(scopes,DMO) AND something2){
            equipmentType2 as uid
        }
       var(func: type(Equipment)) @filter(eq(scopes,DMO) AND something3){
            equipmentType3 as uid
        }
    }

  mutation {
    delete {
      uid(equipmentType1) * * .
      uid(equipmentType2) * * .
      uid(equipmentType3) * * .
    }
  }
}

Also, I would upgrade to the latest, double the RAM if you are close to production. And have a trio cluster. 3 Alphas should be good enough. That’s my recommendation. But you can try to fit these resources for your usage. But 8GB of RAM isn’t recommended for prod.

BTW

To limit your batches to 10k, you can do

var(func: type(Equipment), first: 10000)
1 Like

Hello @MichelDiz, Both the queries(with/without batch) still is getting timed out(after 30 seconds), even after upgrading to below configuration.

RAM - 16GB
Disk - SSD(IOPS limit: 1,000, IOPS burst limit: 10,000).

Batch Query

upsert {
    query {
      var(func: type(Equipment), first: 10000, offset: 0) @filter(eq(scopes,OFR)){
            eq1 as uid
        }
  var(func: type(Equipment), first: 10000, offset: 10000) @filter(eq(scopes,OFR)){
              eq2 as  uid
        }
     var(func: type(Equipment), first: 10000, offset: 20000) @filter(eq(scopes,OFR)){
            eq3 as uid
        }
      var(func: type(Equipment), first: 10000, offset: 30000) @filter(eq(scopes,OFR)){
           eq4 as uid
        }
        var(func: type(Equipment), first: 10000, offset: 40000) @filter(eq(scopes,OFR)){
           eq5 as uid
        }
          var(func: type(Equipment), first: 10000, offset: 50000) @filter(eq(scopes,OFR)){
           eq6 as uid
        }
              var(func: type(Equipment), first: 10000, offset: 60000) @filter(eq(scopes,OFR)){
            eq7 as uid
        }
                var(func: type(Equipment), first: 10000, offset: 70000) @filter(eq(scopes,OFR)){
            eq8 as uid
        }
                  var(func: type(Equipment), first: 10000, offset: 80000) @filter(eq(scopes,OFR)){
            eq9 as uid
        }
                    var(func: type(Equipment), first: 10000, offset: 90000) @filter(eq(scopes,OFR)){
            eq10 as uid
        }
                      var(func: type(Equipment), first: 10000, offset: 100000) @filter(eq(scopes,OFR)){
           eq11 as  uid
        }
                        var(func: type(Equipment), first: 10000, offset: 110000) @filter(eq(scopes,OFR)){
            eq12 as uid
        }
                          var(func: type(Equipment), first: 10000, offset: 120000) @filter(eq(scopes,OFR)){
           eq13 as uid
        }
                            var(func: type(Equipment), first: 10000, offset: 130000) @filter(eq(scopes,OFR)){
            eq14 as uid
        }
    }

  mutation {
   delete {
        uid(eq1) * * .
    uid(eq2) * * .
    uid(eq3) * * .
    uid(eq4) * * .
    uid(eq5) * * .
    uid(eq6) * * .
    uid(eq7) * * .
    uid(eq8) * * .
    uid(eq9) * * .
    uid(eq10) * * .
    uid(eq11) * * .
    uid(eq12) * * .
    uid(eq13) * * .
    uid(eq14) * * .
    }
  }
}

The only difference is that we do not observe any OOM errors, but the delete is still not successful. We have also setup monitoring and below is the screenshot for your reference.

Note that using offset is just a computationally heavy as asking for first=first+offset. Using after: as a paging mechanism is much faster. I believe what @MichelDiz was referring to when he suggested using limit is repeated delete operations each with limit, not a bunch of them with limit and offset.

Thanks @iluminae, I will try to create a batch query using first, after pagination but what is not making sense is we only have data of 131000(10 predicates most) nodes and deletion is taking too long. I am not sure if we have some DB tuning parameters.

Thanks,

Yep, @dharmjit send sequential delete operations. Choose the interval, something like a sec between them. Using pagination is too complex for this and you don’t reduce any weight in the operation. Don’t send a big operation cuz Dgraph will push as far he can get.

Hi @MichelDiz, Then there might be cases where some delete passes and some fail. We will not have transaction guarantees as in the single delete. I tried below with first after and still not get any performance improvements. Do you need some other data like logs etc to better debug this?

upsert{
    query{
        var(func: type(Equipment), first:30000) @filter(eq(scopes,OFR)){
            eq1 as uid
        }
      var(func: type(Equipment), first:30000, after: 0x7c7a) @filter(eq(scopes,OFR)){
            eq2 as uid
        }
    var(func: type(Equipment), first:30000, after: 0xf1aa) @filter(eq(scopes,OFR)){
            eq3 as uid
        }
    var(func: type(Equipment), first:30000, after: 0x166da) @filter(eq(scopes,OFR)){
            eq4 as uid
        }
    var(func: type(Equipment), first:30000, after: 0x1dc0a) @filter(eq(scopes,OFR)){
            eq5 as uid
        }
  }
  mutation{
        delete{
      uid(eq1) * * .
      uid(eq2) * * .
      uid(eq3) * * .
      uid(eq4) * * .
      uid(eq5) * * .
    }
  }
}

Thanks

Can you share the reason why you wanna do such a big transaction?
Are you gonna do this on a regular basis?

The multiple blocks didn’t help any a bit?
Do you say it is slow related to what other DB? Can you do a comparison in the real world? This is helpful to bring the core devs to the action. We can’t do miracles with an unbalanced demand.

Increase your resources, add more Alphas, use faster disks, avoid pagination, avoid big transactions sending batches with a cooldown. This might help for your context of low resources.

Cheers.

Hi @MichelDiz,

It will not be a regular transaction but done on monthly basis. The data is not that big(1 lakh nodes)… in my mind, maybe I am wrong.
We already have 8vCPU, 16GB RAM, SSD Disk Single Alpha VM. I am already following your recommendation(single query → Batch(pagination->first,after) → sequential) and trying to fix this issue. Next, I will try with 3 alpha on a single node as we are not seeing saturation of resources in grafana dashboards.

Thanks

Hi @MichelDiz,

Thanks for suggestions. Sequential batching did help to bring down the resource consumption as well as time by atleast 50%. Now the total run time is under 50 seconds.

We gonna try next with multiple alpha group on the same node.

Thanks

1 Like