Maintaining Dgraph via batched deletes / ttl workaround

damienburke · July 16, 2020, 5:20pm

Hi team,

Interested to know if anyone has any nice patterns for maintaining large (Terabytes) graph in lieu of built in Dgraph ttl support?

One idea I am POCing is to have a general purpose created_date predicate and add that to every node I want to delete. Then… create a ttl Type that has every predicate I would want to delete.

This is my mutation (inspired by How to bulk delete nodes or cascade delete nodes?)

upsert {
  query {
    V as var(func: type(ttl)) @filter(eq(created_date, "2"))
 }

 mutation {
   delete {
     uid(V) * * .
   }
 } 
}

Running this does delete the “shag rug” product and Danielle customer nodes (see data set below), which is what I want. Also, have not thought about edges too much, but wanted to see if this approach is not terrible and if others have found other workaround ttl solutions.

Just adding my data-set here for ref. Btw, just using ints for the created_date for convenience. Thanks!

{
  set {
 _:p1 <product> "red toaster" .
 _:p1 <created_date> "1" .
 _:p1 <dgraph.type> "prodtype" .
 _:p1 <dgraph.type> "ttl" .
 _:p2 <product> "shag rug" .
 _:p2 <created_date> "2" .
 _:p2 <dgraph.type> " prodtype" .
 _:p2 <dgraph.type> "ttl" .
 _:p3 <product> "$10 gift card" .
 _:p3 <created_date> "3" .
 _:p3 <dgraph.type> " prodtype" .
 _:p3 <dgraph.type> "ttl" .

 _:c1 <customer> "Danielle" .
 _:c1 <dgraph.type> "ttl" .
 _:c1 <created_date> "2" .
 _:c2 <customer> "Joe" .
 _:c2 <dgraph.type> "ttl" .
 _:c2 <created_date> "3" .
 } 
}

damienburke · July 28, 2020, 9:30am

Hey, just seeing if anyone from core team has any thoughts on this? A 101 of graph databases seems to be to not let the graph grow too big and want to check what solutions people have seen before? Thanks

anand · July 28, 2020, 10:30am

Hi @damienburke, a consideration that might be required here is about how many entries get generated into each time bucket upon indexing. Please see this blog. Bottom line is that the overall performance will be affected by the choice of index on date, and the number of entries in corresponding time buckets.

anand · July 29, 2020, 5:05am

Hi @damienburke, here is another feedback: you might want to start the root query with an appropriately indexed date as opposed to the type. The type on the root may open up to too many nodes affecting performance of the delete query.

balaji · July 29, 2020, 7:43am

@damienburke Yeah, this will work fine as expected.

Topic		Replies	Views
Deleting nodes, redux Dgraph	2	376	February 25, 2020
Time To Live (TTL) on Predicates (Nodes & Relationships) Dgraph dgraph , status:accepted , kind:feature , exp:beginner	4	1436	October 19, 2023
What is the best way to delete a type with 27 millon nodes and 8 predicates? Dgraph	0	252	September 27, 2023
Time to live for nodes, etc. in Dgraph Users	6	1357	April 7, 2020
Using Upsert Query to Delete Old Nodes Dgraph kind:question	2	526	May 18, 2021

Maintaining Dgraph via batched deletes / ttl workaround

Related topics