Optimizing Your Allocations With Resource Pool Pattern
Data allocation/deallocation may represent a significant amount of time in a processing workflow. In this article, we will see how to adress this issue with the Resource Pool Pattern.
What is your problem ?
Imagine we build an image processing pipeline acting on a movie. All images have the same size and same data type, like 4k RGBA 8bit. So you have a process function, taking an image as input and returning an image:
Image*process(Image* image)
{
Image* denoisedImage =newImage(image->size);
denoize(image, denoisedImage);
Image* normalizedImage =newImage(image->size);
normalize(denoisedImage, normalizedImage);
Image* finalImage =newImage(image->size);
filter(normalizedImage, finalImage);
delete denoisedImage;
delete normalizedImage;
return finalImage;
}
I’m using old school new/delete to clearly show allocation and deallocation. In real application, you should use smart pointers. If we apply our processing to all images of a given movie, we end up with this kind of timeline:
Allocating CPU memory is really fast on today’s computer. But in some cases, like if your images are CUDA object, or if your constructor has to initialize some non trivial resources, allocation/deallocation could be significant.
Ok, what do you suggest ?
Let implement an ImagePool class that will keep deallocated objects in a Resource Pool, ready to be used by the next allocation:
classImagePool
{
public:
Image*get(Vec2 size)
{
Image* image =findImage(size);
if ( image !=nullptr )
returnm_freeImage[size];
returnnewImage(size);
}
voidrelease(Image* image)
{
m_freeImage.insert(image)
}
~ImagePool()
{
// Image are actually deleted only when Pool is destroyed
Lets have a look to processing timeline for first image:
Nothing really change. We still allocate images. We don’t call delete at the end, so we may gain a little bit here depending on deallocation cost. But lets see what happen for the following frames:
The get no more needs to allocate resources. It can directly return a pre-allocated resource from its m_freeImage list, assuming searching a free resource is faster than allocating a new one.
A better implementation using RAII
Previous implementation has a big drawback: it require manually calling release method, which is subject to leak. This can be solved by using the RAII pattern.
classImagePool
{
public:
std::shared_ptr<Image> get(Vec2 size)
{
Image* image =findImage(size);
if ( image !=nullptr )
returnm_freeImage[size];
// Just an obscure syntax meaning ImagePool::release() should be called instead
// of delete when Image object is destroyed.
std::shared_ptr<Image>imagePtr(image,
std::bind(&ImagePool::release,
this,
std::placeholders::_1));
returnnewImage(size);
}
voidrelease(Image* image)
{
m_freeImage.insert(image)
}
~ImagePool()
{
// Image are actually deleted only when Pool is destroyed
std::for_each(m_freeImage.begin(),
m_freeImage.end(),
std::default_delete<Image>());
}
private:
Image*findImage(Vec2 size)
{
for ( auto image: m_freeImage )
if ( image->size== size )
return image;
returnnullptr;
}
std::list<Image*> m_freeImage;
}
Instead of returning a naked Image* pointer, we return an std::shared_ptr which will automatically call ImagePool::release() when Image object is no more referenced.
Tracking resources
Another big advantage of ResourcePool pattern is the ability to track amount of allocated resources and limit its usage. This is very usefull to implement a “MaxCacheMemory” mechanism and avoid saturating memory.
classImagePool
{
public:
// [… ]
std::shared_ptr<Image> get(Vec2 size)
{
Image* image =findImage(size);
if ( image !=nullptr )
returnm_freeImage[size];
// increase current allocated memory.
// * 4 because we use RGBA8 images in this example, but you should use your
// Just an obscure syntax meaning ImagePool::release() should be called instead
// of delete when Image object is destroyed.
std::shared_ptr<Image>imagePtr(image,
std::bind(&ImagePool::release,
this,
std::placeholders::_1));
returnnewImage(size);
}
// […]
private:
// max allowed memory. Can be specified in ctor. Initilized it to 4Gb.
constsize_tm_maxMemory=4<<30;
size_tm_currentMemory=0;
std::list<Image*>m_freeImage;
}
Conclusion
ResourcePool pattern should not be used “a priori”. You should before identify that time spent in allocation is significant regarding processing time. To do so, you can use profiling tools like NVIDIA Nsight or Optick. Keep in mind that ResourcePool add an extra layer of complexity and might generate some tricky low level bugs in your application, so it has to be worth the cost.
I used this pattern once in an image processing workflow doing FFT, denoising, binning, etc… with CUDA on 4k images and the gain was about 20%. Not magical but still significant.
Leave a Reply