Dgraph Alpha Crashing when deleting large amount of data

We are using Dgraph v22.0.2. Have deployed 3 alpha and 3 zeros.

In this DB I have multiple projects where the project is a Type.
Each project has around 15 other Types of data connected to it through edges.
I am using the following query to delete the project.

upsert {
    query {
           var(func: uid(0x12b603d)) {
            p_uid as uid
            p1: ~project@filter(type(Party)) {
                party_uid as uid
                ~party { pm_uid as uid }
                location {
                    loc_uid as uid
                    state { state_uid as uid }
                    country1 { count_uid as uid }
            p2: ~project@filter(NOT (type(Party))) { s_uid as uid }
        var(func: uid(p_uid)) { Item as ~project@filter(type(Item)) }
        var(func: uid(Item)) @cascade { ~saleitem { sales_uid as uid } }
    mutation {
        delete {
            uid(pm_uid) * * .
            uid(state_uid) * * .
            uid(count_uid) * * .
            uid(loc_uid) * * .
            uid(sales_uid) * * .
            uid(party_uid) * * .
            uid(s_uid) * * .
            uid(p_uid) * * .

When I run this query for multiple projects one after the other,

  • the memory consumption increases significantly

  • The DB be comes unresponsive

  • one of the alpha nodes goes into a crash loop and does not recover until we delete the whole data from the infrastructure side.

Please let me know what can we do to solve this.

You may first analyze performance of the query. Because DQL queries are declarative and composable, you can run each sub-element of the query separately, and Dgraph will tell you the time taken (processing_ns in nanoseconds) and a “metrics” section.

If one element of this query is very slow, or accesses high numbers of uids or properties, that could take up memory. In particular, if the query is slow and you launch many queries rapidly, the server will try to execute them in parallel, and each uses memory. E.g. if a query takes 1 second, and you launch 100 per second, about 100 will be executing at any given instant.

Second, I would check on the number of UIDs returned by the query part to be sure the number of deletes being done in a single transaction is manageable.