A pitfall of using sync pools in golang

As per the documentation of golang, a Pool is a set of temporary objects that may be individually saved and retrieved and it is concurrency safe. Its main purpose is to cache unused items for later reuse and thus taking away some burden from garbage collector. It has been proven that using pools rightly can improve performance by many folds.

But, a small mistake while using it can lead to a microscopic bug which can get super hard to find and debug. One such bug was recently found in the K-Shortest Paths query in dgraph.

The bug

The weird response of k-shortest paths was reported by an user in this post. A shortest path query was returning a path which was completely garbage. It didn’t contain the destination node as its last node and its length was random. But how did this bug stayed under our nose for so long? It is because this bug show up once in every 20k-30k queries and that too for some specific kind of graph and query.

What was causing this?

It was caused by mishandling of objects taken from the pool. Consider the following sample code.

type route struct {
  path   []int
  weight int 
}

type shortestRoute struct {
  routeVal route
  idx      int
}

var pathPool = sync.Pool{
  New: func() interface{} {
    return route{}
  },  
}

func main() {
  var kroutes []shortestRoute
  // create a shortestRoute object
  curRoute := route{
    path:   []int{1, 2, 3, 4},
    weight: 1,
  }
  curRoute.path = append(curRoute.path, 5)
  sroute := shortestRoute{
    idx:      1,
    routeVal: curRoute,
  }
  // append sroute to kroutes and put curRoute in pool
  kroutes = append(kroutes, sroute)
  pathPool.Put(curRoute)

  for i := 0; i < 2; i++ {
    // get newRoute from pool and if it's path can hold one more element
    // then use it otherwise create a new one
    newRoute := pathPool.Get().(route)
    if cap(newRoute.path) >= len(newRoute.path)+1 {
      newRoute.path = newRoute.path[:len(newRoute.path)+1]
    } else {
      newRoute = route{}
      for j := 1; j <= i+len(kroutes[len(kroutes)-1].routeVal.path); j++ {
        newRoute.path[j] = j
      }
    }

    // perform some operations on the path
    newRoute.path[len(newRoute.path)-1] = len(newRoute.path)
    newRoute.weight = i + 2
    
   // create a shortestRoute struct
    sroute := shortestRoute{}
    sroute.idx = i + 2
    sroute.routeVal = newRoute

    // append sroute in kroutes and put back newRoute in the pool
    kroutes = append(kroutes, sroute)
    pathPool.Put(newRoute)
  }

  fmt.Println(kroutes)
}

In the above code, we create curRoute, a route structure with the path as 1 - 2 - 3 - 4 - 5 and weight = 1 and then create a shortestRoute struct sroute using it. We then append sroute to the kroutes slice and put curRoute in the pool. Later in the for loop we reuse the item from pool if its capacity is sufficient to contain one more element otherwise we create a new route struct. And then append the next element in the newRoute.path and finally the new shortestRoute is appended in the kroutes.

We get the shortestRoute in kroutes finally as

{
  path: [1 2 3 4 5] 
  weight: 1
} 
{
  path: [1 2 3 4 5 6] 
  weight: 2
} 
{
  path: [1 2 3 4 5 6 7] 
  weight: 3
}
]

The above result is exactly what we wanted. Now, lets change the code a little by adding one more change in the newRoute.path, we make newRoute.path[0] = i + 100

newRoute.path[len(newRoute.path)-1] = len(newRoute.path)
newRoute.path[0] = i + 100

We want the result to be:

{
  path: [1 2 3 4 5] 
  weight: 1
} 
{
  path: [100 2 3 4 5 6] 
  weight: 2
} 
{
  path: [101 2 3 4 5 6 7] 
  weight: 3
}
]

but we actually get:

{
  path: [101 2 3 4 5] 
  weight: 1
} 
{
  path: [101 2 3 4 5 6] 
  weight: 2
} 
{
  path: [101 2 3 4 5 6 7] 
  weight: 3
}
]

We get the weight as expected but something get messed up with the path. This happened because the assignment sroute.routeVal = newRoute didn’t deep copy the whole []path but just the pointer (the underlying array for the slice remains the same). Although, the case is not the same with the weight. Any operations on newRoute.path is reflected on all the previous items appended in kroutes. The issue shows up rarely because we used the pools rarely (depending upon if the item we get from pool has sufficient capacity as required).

Take away

The take away for @core-devs is that we should be careful while assigning an item from pools to a variable and processing the variable later, specially with items like slices. This would help us keep the bugs away which can live under our noses for a long time. These bugs brings non-determinism in the response and hence causes it difficult to replicate and debug.

9 Likes

I’m assuming the issue is that an object sent back to Pool was accessed by the sender. We have lately been moving things away from sync.Pool. Instead, using z.Calloc and helpers, to do manual memory management. That identifies such issues really quickly. Because for an object which is deallocated, accessing it would quickly trigger a SEGFAULT, instead of just being a race condition.

1 Like

Yep. Using manual memory management would help to keep away these issues.

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.