Maintaining Dgraph via batched deletes / ttl workaround

Hi team,

Interested to know if anyone has any nice patterns for maintaining large (Terabytes) graph in lieu of built in Dgraph ttl support?

One idea I am POCing is to have a general purpose created_date predicate and add that to every node I want to delete. Then… create a ttl Type that has every predicate I would want to delete.

This is my mutation (inspired by How to bulk delete nodes or cascade delete nodes?)

upsert {
  query {
    V as var(func: type(ttl)) @filter(eq(created_date, "2"))
 }

 mutation {
   delete {
     uid(V) * * .
   }
 } 
}

Running this does delete the “shag rug” product and Danielle customer nodes (see data set below), which is what I want. Also, have not thought about edges too much, but wanted to see if this approach is not terrible and if others have found other workaround ttl solutions.

Just adding my data-set here for ref. Btw, just using ints for the created_date for convenience. Thanks!

{
  set {
 _:p1 <product> "red toaster" .
 _:p1 <created_date> "1" .
 _:p1 <dgraph.type> "prodtype" .
 _:p1 <dgraph.type> "ttl" .
 _:p2 <product> "shag rug" .
 _:p2 <created_date> "2" .
 _:p2 <dgraph.type> " prodtype" .
 _:p2 <dgraph.type> "ttl" .
 _:p3 <product> "$10 gift card" .
 _:p3 <created_date> "3" .
 _:p3 <dgraph.type> " prodtype" .
 _:p3 <dgraph.type> "ttl" .

 _:c1 <customer> "Danielle" .
 _:c1 <dgraph.type> "ttl" .
 _:c1 <created_date> "2" .
 _:c2 <customer> "Joe" .
 _:c2 <dgraph.type> "ttl" .
 _:c2 <created_date> "3" .
 } 
}

Hey, just seeing if anyone from core team has any thoughts on this? A 101 of graph databases seems to be to not let the graph grow too big and want to check what solutions people have seen before? Thanks

Hi @damienburke, a consideration that might be required here is about how many entries get generated into each time bucket upon indexing. Please see this blog. Bottom line is that the overall performance will be affected by the choice of index on date, and the number of entries in corresponding time buckets.

1 Like

Hi @damienburke, here is another feedback: you might want to start the root query with an appropriately indexed date as opposed to the type. The type on the root may open up to too many nodes affecting performance of the delete query.

1 Like

@damienburke Yeah, this will work fine as expected.

1 Like