Allocate from Memory Pool

 »»»»»»»»»»»»»»»          (== | ^ ==)


Memory allocation and deallocation using the conventional malloc / free (or their C++ equivalent operator new and delete) are seemingly innocuous at a cursory glance. The overhead they incur is often overlooked until its sluggishness surfaces as the system grows and performance begins to depreciates sharply. Guess thats given to the prevalent tenet amongst us - to not bother about something unless there is a problem with it. And guess this ain't just limited with software ;)! Enough ado.

There is a view that certain implementations of memory allocation and deallocation involves switching execution context to kernel mode, which when done very frequently can cause performance problems. Even otherwise the amount of book-keeping that is performed by heap memory-managers is considerable. This characteristic becomes very evident when a huge number of small objects are allocated and deallocated frequently. As much as contributing to fragmentation, the overhead of book-keeping for each object allocated accrues and leads to much lower percentage of the heap space being available for allocation by the program.

Systems involving frequent allocation and deallocation of a huge number objects often resort to some form of memory pooling / chunking in an attempt to improve performance. It is much easier to build more efficient memory-managers which are aware of the context of allocated objects, when compared to a generic memory-handler such as those used by standard C/C++ allocation / deallocation functions. Further since C/C++ memory managers do not facilitate compacting and such, most 24x7 systems resort to custom memory-managers tuned to operate well with priori-knowledge of its usage characteristics.

Allocator characteristics

Now that we have laid out that the custom memory manager is usage context driven, lets get a little bit more specific. In good many situations most data-structures representative of some information have fixed size. I.e. unless the data-structure employs a zero-sized array at its tail. Since it is much more tenable to manage fixed size data-structures which are prevalently used, we shall concentrate on them.

Here is a summary of the objectives of our custom allocator

  • Congruent & compatible with STL's std::allocator - This enables existing C++ applications which use STL to easily shift to using our custom memory-manager (without having to change their code much), at the same time making it easy for the developer to switch to the default allocator or another easily.
  • Minimize calls to new & delete - This shall be achieved by retaining recently released objects & reusing them for subsequent allocations without involving the system's memory manager.
  • Should facilitate programmer to specify block size - This will be the size in increments of which our allocator will reserve and release memory from the system. This enables the developer to tweak the block size for optimal performance depending on how many object fit into a single block and other allocation patterns.
  • Design should facilitate compacting - It should be possible for the allocator design to accommodate compacting, i.e. freeing blocks which are entirely unused. Moving of objects during compacting is not a goal as C++ applications often generously make use of pointers in lieu of handles for performance reasons.


Before we get into the implementation details of our allocator, I propose this handy extension to the STL's std::allocator. One of the functionality that I realized would have been handy in allocator class was the ability to verify if a given pointer is a valid allocated object. Since we are dealing with a custom allocator for a user-defined types this is quite feasible and comes handy while debugging stray pointers etc. The following code with the isvalid function is self explanatory.

#include <memory>
namespace xstd
template <typename type>
class allocator
: public std::allocator<type>
virtual bool isvalid(const type*) const { return true; };

The idea is to replace the use of std::allocator with xstd::allocator in user code. This does not in anyway affect the STL containers from defaulting to std::allocator. Since our allocator would be inherited from xstd::allocator (maintaining compatibility with std::allocator), the user would have to switch back to xstd::allocator in case his code makes use of the isvalid function.

The isvalid function is hard coded to return true, since std::allocator doesn't provide any such functionality and the code assumes that the pointers are valid. This comes in handy when the user would like to switch back from using the custom memory-allocator. Note: This class is only provided for convenience purpose and can very well be by-passed.

Design of our memory-pool based allocator

A quick dip into how our the allocator is designed. You might want to skim through my article from IPOP Series - Low Cost Re-Sizable Array, as the allocator design is an adaptation of the same. The explanation that follows assumes your familiarity of the same to minimize redundancy in discussion. The following text shall refer to our new allocator as alloc4mpool. The crux of this class is presented below. The full definition of the same is available here - alloc4mpool.h.html.

#include <cassert>
#include <memory>
#include <iostream>
#include "xallocator.h"

namespace xstd
using namespace xstd;

#define K(x) (x << 10)

template<typename type, size_t blockSize, size_t tableSize> struct Block;
template<typename type, size_t blockSize = 64, size_t tableSize = 32>
class alloc4mpool
: protected xstd::allocator<type>
struct Block
elemsPerBlock8 = K(blockSize) / ( (sizeof(type)<<3) + 1 ),
delta = K(blockSize) - ( (sizeof(type)<<3) + 1 ) * elemsPerBlock8,
elemsExtra = (delta >= (sizeof(type) + 1)) ? (delta - 1) / sizeof(type) : 0

friend alloc4mpool;


elemsPerBlock = (elemsPerBlock8<<3) + elemsExtra,
bitMaskLength = K(blockSize) - (elemsPerBlock * sizeof(type))

static const float maskLength;

Block ();
bool isFull ();
type* PData ();

long ElemsPerBlock () const;
ulong BitMaskLength () const;

byte pData [ elemsPerBlock * sizeof(type) ];
byte bitMask[ bitMaskLength ];

size_t numBlocks;
Block **ppBlockTable;


alloc4mpool ();

pointer address (reference _val) const { return &_val; }
const_pointer address (const_reference _val) const { return &_val; }

pointer allocate (size_type _cnt = 1, const void *pHint = nullptr(void));
void construct (pointer _ptr, const type& _val);
void deallocate (pointer _ptr, size_type _cnt = 1);
void destroy (pointer _ptr);

bool isvalid (const type*);
void flush ();

As in the case with rsArray the root table (alloc4mpool::ppBlockTable) is an array of pointers. But unlike in rsArray here the table entries point to a 'block' rather than just an array of N elements. A 'block' is a segment of memory which can hold Block::elementsPerBlock elements of that type (Block::pData) and information as to which have been allocated and which ones are unused (Block::bitMask). In this case the alloc4mpool::isvalid function verifies as to whether the specified pointer belongs to one of the blocks and is not currently allocated.

Another functionality associated with custom memory managers is to prevent memory leaks. This is not a big issue in batch programs but has serious repercussions in a 24x7 system. The current implementation for flush merely releases all the objects and blocks as it is used in a batch program. But this is the foundation based around which functionalities such as compacting can be built. In fact, this infrastructure can easily be extended so that the application can even reclaim lost objects or perform garbage collection at certain points without the need for the overhead of a generics garbage collection mechanism ;). With this allocator it is also possible to profile the memory usage characteristics of the system w.r.t. objects


The alloc4mpool allocator shown here is a handy replacement to the standard allocation / deallocation routines, esp. when objects of a specific type will be frequently allocated and deallocated. This implementation is limited since it is being used in a batch processing system. But its lays a good foundation for scaling it to the needs of a 24x7 system whilst incurring minimal overheads.