Optimizing Your Allocations With Resource Pool Pattern

Data allocation/deallocation may represent a significant amount of time in a processing workflow. In this article, we will see how to adress this issue with the Resource Pool Pattern.

What is your problem ?

Imagine we build an image processing pipeline acting on a movie. All images have the same size and same data type, like 4k RGBA 8bit. So you have a process function, taking an image as input and returning an image:

Image* process(Image* image)
{
  Image* denoisedImage = new Image(image->size);
  denoize(image, denoisedImage);
 
  Image* normalizedImage = new Image(image->size);
  normalize(denoisedImage, normalizedImage);

  Image* finalImage = new Image(image->size);
  filter(normalizedImage, finalImage);

  delete denoisedImage;
  delete normalizedImage;

  return finalImage;
}

I’m using old school new/delete to clearly show allocation and deallocation. In real application, you should use smart pointers. If we apply our processing to all images of a given movie, we end up with this kind of timeline:

Allocating CPU memory is really fast on today’s computer. But in some cases, like if your images are CUDA object, or if your constructor has to initialize some non trivial resources, allocation/deallocation could be significant.

Ok, what do you suggest ?

Let implement an ImagePool class that will keep deallocated objects in a Resource Pool, ready to be used by the next allocation:

class ImagePool
{
public:
  Image* get(Vec2 size)
  {
    Image* image = findImage(size);
    if ( image != nullptr )
      return m_freeImage[size];

    return new Image(size);
  }

  void release(Image* image)
  {
    m_freeImage.insert(image)
  }

  ~ImagePool()
  {
    // Image are actually deleted only when Pool is destroyed
    std::for_each(m_freeImage.begin(),
                  m_freeImage.end(),
                  std::default_delete<Image>());
  }

private:
  Image* findImage(Vec2 size)
  {
    for ( auto image: m_freeImage )
      if ( image->size == size )
        return image;

    return nullptr;
  }
  std::list<Image*> m_freeImage;
}

And use this ImagePool in our process pipeline:

Image* process(Image* image)
{
  Image* denoisedImage = ImagePool.get(image->size);
  denoize(image, denoisedImage);
 
  Image* normalizedImage = ImagePool.get(image->size);
  normalize(denoisedImage, normalizedImage);

  Image* finalImage = ImagePool.get(image->size);
  filter(normalizedImage, finalImage);

  ImagePool.release(denoisedImage);
  ImagePool.release(normalizedImage);

  return finalImage;
}

Lets have a look to processing timeline for first image:

Nothing really change. We still allocate images. We don’t call delete at the end, so we may gain a little bit here depending on deallocation cost. But lets see what happen for the following frames:

The get no more needs to allocate resources. It can directly return a pre-allocated resource from its m_freeImage list, assuming searching a free resource is faster than allocating a new one.

A better implementation using RAII

Previous implementation has a big drawback: it require manually calling release method, which is subject to leak. This can be solved by using the RAII pattern.

class ImagePool
{
public:
  std::shared_ptr<Image> get(Vec2 size)
  {
    Image* image = findImage(size);
    if ( image != nullptr )
      return m_freeImage[size];

    // Just an obscure syntax meaning ImagePool::release() should be called instead
    // of delete when Image object is destroyed.
    std::shared_ptr<Image> imagePtr(image,
                                    std::bind(&ImagePool::release,
                                              this,
                                              std::placeholders::_1));
    return new Image(size);
  }

  void release(Image* image)
  {
    m_freeImage.insert(image)
  }

  ~ImagePool()
  {
    // Image are actually deleted only when Pool is destroyed
    std::for_each(m_freeImage.begin(),
                  m_freeImage.end(),
                  std::default_delete<Image>());
  }

private:
  Image* findImage(Vec2 size)
  {
    for ( auto image: m_freeImage )
      if ( image->size == size )
        return image;

    return nullptr;
  }
  std::list<Image*> m_freeImage;
}

Instead of returning a naked Image* pointer, we return an std::shared_ptr which will automatically call ImagePool::release() when Image object is no more referenced.

Tracking resources

Another big advantage of ResourcePool pattern is the ability to track amount of allocated resources and limit its usage. This is very usefull to implement a “MaxCacheMemory” mechanism and avoid saturating memory.

class ImagePool
{
public:
  // [… ]
  std::shared_ptr<Image> get(Vec2 size)
  {
    Image* image = findImage(size);
    if ( image != nullptr )
      return m_freeImage[size];

    // increase current allocated memory.
    // * 4 because we use RGBA8 images in this example, but you should use your
    // own data type.
    size_t currentMemory = m_currentMemory + size.width() * size.height() * 4;
    if ( currentMemory > m_maxMemory )
      throw std::bad_alloc()
    m_currentMemory = currentMemory;
   
    // Just an obscure syntax meaning ImagePool::release() should be called instead
    // of delete when Image object is destroyed.
    std::shared_ptr<Image> imagePtr(image,
                                    std::bind(&ImagePool::release,
                                              this,
                                              std::placeholders::_1));

    return new Image(size);
  }
  // […]

private:
  // max allowed memory. Can be specified in ctor. Initilized it to 4Gb.
  const size_t m_maxMemory = 4 << 30;
  size_t m_currentMemory = 0;
  std::list<Image*> m_freeImage;
}

Conclusion

ResourcePool pattern should not be used “a priori”. You should before identify that time spent in allocation is significant regarding processing time. To do so, you can use profiling tools like NVIDIA Nsight or Optick. Keep in mind that ResourcePool add an extra layer of complexity and might generate some tricky low level bugs in your application, so it has to be worth the cost.

I used this pattern once in an image processing workflow doing FFT, denoising, binning, etc… with CUDA on 4k images and the gain was about 20%. Not magical but still significant.

Home


Similar topics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s