Using Upsert Query to Delete Old Nodes

Hi! I want to delete old nodes that are older than 1 month on a daily basis.

They are not used for the training of my machine learning algorithm and they use too much space. I indexed the created_at predicate using days. (I don’t need hourly indexing and I thought it results in too many indexes. My data includes millions of nodes even only for a day and I am not able to delete them.

My query and mutation that I use in pydgraph:

    query = """{{
        nodes as var(func: lt(created_at, "30")) @filter(eq(company, "XYZ"))
    }}"""
    mutation = """
        uid(nodes) * * .
    """

Two example errors that I face:

  1. “NQuad count in the request: 1148066, is more that threshold: 1000000”
  2. “Deadline Exceeded” (timeout was 300)

I get the second error when there are less than 1m nodes but still, the number is high. What is the optimal solution for this? Pagination? If it is, what is the optimal node number to delete?

Thank you for your responses.

You can set a higher limit for nquads using --mutations_nquad_limit

dgraph alpha -h | grep limit                                                   
 --mutations_nquad_limit uint       Limit for the maximum number of nquads that can be inserted in a mutation request (default 1000000)
 --normalize_node_limit uint        Limit for the maximum number of nodes that can be returned in a query that uses the normalize directive. (default 10000)
 --pending_proposals int            Number of pending mutation proposals. Useful for rate limiting. (default 256)
 --query_edge_limit uint            Limit for the maximum number of edges that can be returned in a query. This applies to shortest path and recursive queries. (default 1000000)

that’s the best approach and I personally recommend this.

BTW, why this question is on the GraphQL topic?

Cheers.

1 Like

Thank you for the quick reply. I will try pagination and come back with the results later!

I was thinking that this discussion will be about optimizing the query. That’s why I chose GraphQL topic. We can change it to the appropriate one.