Fixed Memory Pool

Contains the following:

The static_fixed_memory_pool manages an efficient heap of fixed-size objects. Since all instances of a class are the same size, this is ideal for implementing a class-specific operator new and operator delete.

General

These classes are specially designed for use as class-specific memory managers, especially for primitive types used deep inside the system (such as the smart pointerís internals). To this end, it must be usable at any time, even before Classics' static variables have constructed, even before Classicsí DllMain has been called.

Because it will appear as a static member of such a class, it canít be set up using a constructor because that would mean the heap could not be used before its own static constructor is called. So there is no constructor at all, and the implementation relies on all-zero initialization of static objects at image load time. That is why itís named static_fixed_memory_pool, to emphasize that itís designed for use as a static member.

This affects the destructor too. Make a section explaining this in detail.

In addition, the class provides some minimal support for auditing and debugging memory leaks, without adding significantly to the overhead or requiring a special debug version.

[One Way!] As mentioned above, there is no constructor. An instance must be declared as a static variable or static member, or explicitly filled with zeros, in order to work properly.

To allocate memory, call void* alloc(int size). The size must use the same size on every call to this object.

To free memory, call void free(void*).

In addition, there is a void clean() function that tries to release unneeded memory back to the system. The ability to support this kind of incremental cleaning is fundamental to the design. To understand the need, consider an example: you have a peak load of a million nodes, but then traffic slows down and you only need a thousand. But there are still 900,000 unused nodes on its internal free list. Most fixed-size heaps (including that of STL) have no provision to release this memory until all nodes are freed. But in this case, a large number of nodes (though not all the unused nodes) can be released back to the system. This feature is not implemented yet.

Thread Safety

The static_fixed_memory_pool class is not thread-safe (it has I-OTT semantics), because a class that is itself OTT or higher may draw upon it, and thread safety is unnecessary in this situation. The purpose of this class is to be fast, and the presence of a critical section slows it down enormously because itís so simple to begin with.

The ts_static_fixed_memory_pool is a thread-safe version. Use this to allow safe free-threaded access to the heap.

These are both derived from the general_static_fixed_memory_pool, which can have thread-safety turned on and off on the fly during run-time by assigning to the single_thread_only member.

Using the general_static_fixed_memory_pool is thread-safe by default. The static_fixed_memory_pool, in addition to eliminating the need to specify the template argument or empty <>, turns thead-safe functioning off at the first opportunity.

The ts_static_fixed_memory_pool derives from general_static_fixed_memory_pool<> also, but leaves the thread-safety turned on. Currently, it doesnít do anything else. However, a future version may supply more features to optimize threading speed.

No Order Dependancies

A design feature of these classes is that they support static constructors and destructors. That is, initialization code before main is called might need to call static_fixed_memory_pool::alloc. Not only is the order of initialization not defined, so that an objectís constructor may try to call pool.alloc before pool itself is constructed; but the pool may be in a different DLL than the object that uses it, and the order of initialization of DLLs is unspecified!

That is why this class is designed to be zero-filled and not have a run-time constructor called. It will be ready to go as soon as the program image loads.

Likewise, the order of destruction is undefined, both within one library and between DLLs. Yet, we donít want to simply abandon the memory: This would certainly work—why bother freeing it back to the operating system if the process is about to terminate anyway? But doing so bothers people who run tools like Bounds Checker or otherwise look for memory leaks in their code.

So, it is a design feature that the callback is called for every item freed, even if during static destruction; and the underlying large chunks are explicitly returned to the OS at program termination.

A related feature is what happens if alloc is called during the program shutdown, after the pool has already been destroyed? This is also supported. It will fill the request gracefully.

Using as a class-specific memory manager

The static_fixed_memory_pool (and ts_static_fixed_memory_pool) class is specifically meant to be used to implement a class-specific allocator.

Implement operator new and operator delete

Using static_fixed_memory_pool (and ts_static_fixed_memory_pool) to implement a classís operator new and operator delete is rather simple:

Listing 1

class C {
public:
   static ts_static_fixed_memory_pool pool;
   static void* operator new (size_t size) { return pool.alloc (size); }
   static void operator delete (void* p)  { pool.free(p); }
   //…etc.
   };

 // in one CPP file:
 ts_static_fixed_memory_pool C::pool;

The basic principle is to implement operator new and operator delete to simply call the alloc and free functions of a static_fixed_memory_pool object.

Typically, the static_fixed_memory_pool object will be intended for use with this one class alone, so that can be declared as a static member of the class, too (remember to define it in a .CPP file somewhere). However, you could have more than one class share the same static_fixed_memory_pool object, as long as they are all the same size.

A derived class will inherit the operator new and operator delete from the base class. [caution] If you derive from a class that uses the fixed memory pool, and the derived class is larger, you will get a runtime error when alloc is called with the wrong size. Define operator new and operator delete in the derived class as well, to use a different pool or to go back to the regular heap.

A Canned Solution (deprecated)

It would seem that adding three members to a class, especially when exactly the same text is used every time, is something that can be provided as a unit. There are actually some technical difficultites in doing this, but a mix-in class pool_mixin (and ts_pool_mixin for the thread-safe version) is provided for this purpose.

Ideally, the class simply contains the same three members discussed above. To use it, make it a base class and all three members get added to your class.

Thatís the theory anyway. The technical problem is that if it were done this way, all classes using the mix-in would share the same static data member. Instead, each class needs its own fixed_memory_pool, which means it must declare its own static data member. To get around this, each class can mix-in a different base class. This is easily done with a template. This example usage can be found in memory_pool_benchmark.cxx (Listing 2):

Listing 2

class C2 : public classics::pool_mixin<C2> {
   const void* addr;
   int value;
public:
   C2 (int x) : addr(this), value(x) {}
   };

The template argument given to pool_mixin doesnít do anything. Itís just there to allow the use of a template to generate as many unique base classes as may be needed. Each different type for the template parameter produces a different specialization of pool_mixin, and therefore a different static data member.

Use the class itself as the template parameter, and you know it will be different in every class.

As seen in the memory_pool_benchmark.cxx example program, this is easy to use and works properly in an application program. However, there is a serious restriction on its use due to the way templates work in the compiler.

The templateís members are instantiated in each translation unit (.CPP file) in which they are needed. The linker then eliminates duplicates. When defining a normal static member, you declare it in the header file inside the class, but still have to define it in one .CPP file file. For the classes generated by the templates, there is no “one place” where the static member is defined.

In a single program, this is not a problem, since all you care is that one exists somewhere. But a program with DLLs is actually linked as several independent execution image files. Each DLL will contain its own copy of the static member. Doing it manually, you can define it in one DLL and import it into all the others. For the classes generated by the template, you get a copy in each DLL and in the main EXE. So, if operator new is invoked in one DLL (or the main EXE) and operator delete is invoked in a different one, things wonít work right.

[caution] The bottom line: Use this mix-in in a code that will go into the EXE only; that is, when writing a program. Think carefully before using it when writing a DLL. Since the use of the mix-in only saves four simple lines of code, itís not worth coming up with a better and more complex solution.

Because of this problem, these classes are removed as of PB10. To access this class anyway, define the preprocessor symbol POOL_MIXIN before any #include directives.

Testing and Debugging

We all know that sometimes using the “debug version” makes the problem go away, and Classics follows the philosophy of “The first rule of testing is build something testable.”

Callback Hook

In order to test this class, a debugging hook is provided. The member callback points to a function which will be called at various strategic points in the classís execution. This provides a way to monitor the classís behavior, without causing a significant amount of additional overhead when the hook is not needed.

An example can be found in handle_UT.cxx. The line
    classics::lifetime::pool.callback= callback;
if un-commented, will enable callbacks to this function:

Listing 3

void callback (int mode, void* p)
 {
 static bool internal= false;
 static int counter= 0;
 counter++;
 switch (mode) {
    case 1:
       if (!internal)  cout << "mem " << p << " alloc [" << counter << "]\n";
       break;
    case 2:
       if (!internal)  cout << "mem " << p << " free  [" << counter << "]\n";
       break;
    case 3:
       internal= true;
       break;
    case 4:
       internal= false;
       break;
    }
 }

Calls with code values of 1 and 2 indicate allocations and frees, respectively, as you can see from the implementation of fixed_memory_pool::alloc and fixed_memory_pool::free. Values of 3 and 4 are no longer used.

[caution] If compiled using the /CLR switch to run under the .NET virtual machine, the callback must be cleared (set the pointer to 0) before returning from main.

ďInternalĒ Data

The following member data and functions are available for testing, debugging, trouble-shooting, and performance monitoring. They are not part of the ďnormalĒ interface, and should only be used for these auxilary purposes.

get_use_count()

This shows how many objects have been returned by alloc but not yet free'd.

callback

Set this to your callback function, explained above.

wait_count

Indicates how many times a function has needed to wait for another thread before it could proceed. For unit testing, seeing a non-zero value proves that you did in fact have threads get in each otherís way during testing.

get_nodelist()

This exposes (read-only) the internal nodelist member. Note that the type it returns is not defined in the header.

get_Recsize()

This returns the size of the objects allocated by this pool. It is set when the first alloc is made.

get_chunklist()

This exposes (read-only) the internal chunklist member. Note that the type it returns is not defined in the header.

check_address()

This returns true if the argument is a pointer to a record that is part of this pool. It reports that the pointer could be something that was returned by alloc (though it might currently be on the free list). On a valid pointer, this will reliably distinguish whether it was allocated on this pool as opposed to allocated by some other means.

Heap Checking

There are functions for validating the integrety of the pool, for debugging. A call to check_heap will tell you that something was freed twice, a freed object was written to (the first four bytes anyway), or other stray pointer problems. It is intended as a debugging aid, not a perminant part of a program.

There are two implementations: The normal way, and a fast way that can be useful if a program contains a very large number of checks. The normal/FAST_HEAP_CHECK behavior can be enabled on a per-instance basis, using the template argument to general_static_fixed_memory_pool.

These functions are also thread-safe (when single_thread_only is not set), but unlike simultainious calls to alloc and free, these may cause other threads to be locked out of using this pool for a more substantial period of time.

check_heap()

This examines the freelist and reports false if any problems were found. The normal form will traverse the freelist and call check_address on each node, and also make sure the linked list doesnít point back to itself forming a closed loop.

The FAST_HEAP_CHECK form does not have to call check_address() on every free node, but simply makes a checksum of all the nodes present. If enabled, each alloc or free incrementally adjusts the expected checksum, so those operations are slightly more expensive.

check_heap_p()

This is similar to check_heap(), but returns the address of the bad node rather than just false. It returns 0 if nothing wrong was found. In addition, it returns the special values 1 if the freelist is empty, or 2 if an infinite loop is detected.

This form is unaffected by the FAST_HEAP_CHECK option. By its nature, it has to do a full check. When using the fast check_heap(), you can follow it up with check_heap_p() if an error is detected. Or, you can do fast checks more often and a full check every once in a while.

Performance

Original

Why bother using a custom allocator in a class? Run memory_pool_benchmark.cxx and youíll find out that the fixed memory pool is much faster than the standard new/delete code. On my machine when originally developed (in 1998 I think), I get:

Original Benchmark Results
version time adj. time relative speed
stub (no allocation) 40 0
regular new/delete 2733 2693 defined as 1
fixed pool 260 220 12 ľ ◊
thread-safe fixed pool 1090 1050 2 Ĺ ◊

This benchmark is rather generous toward the performance of the regular memory manager. Because nearly a million calls were made to allocate and free the same sized block, the heap became filled with blocks of that size, so one was easily found. Under a mixed load, the general-purpose allocator is slower.

So, if you use a fixed block pool to implement a custom allocator for a class, you can expect it to allocate/free 2 to 3 times faster. Furthermore, future improvements may significantly increase its speed. If you donít need thread safety, skipping the very expensive mutual-exclusion code gives you a wopping 12 times speedup.

October 2004

The performance gain is not as dramatic as it once was. Thanks to improvements in Microsoftís implementation of heap management in VC++ 7.1 compared to their older compilers, and/or improvements to the implementation of the Win32 functions HeapAlloc/HeapFree in Windows XP compared to older versions of Windows, the compiler-supplied allocation is more efficient than it used to be.

October 2004 Benchmark Results
version time adj. time relative speed
stub (no allocation) ?? 0
regular new/delete 266 ?? defined as 1
fixed pool 47 ?? 5.6 ◊
thread-safe fixed pool 203 ?? 30% faster

My understanding of the improvements to the Win32 primitives (though Iíve not benchmarked the more complex cases) is that it wonít slow down as much under a real load of mixed sizes, compared to older implementations.

Iíve also discovered that the smart pointerís load on the pool in some multi-threaded applications cause more blocking then originally thought. So a future direction is to improve the thread-safe version by using a different approach. Stay tuned...

June 2006

The thread-safe code does not use a Win32 CRITICAL_SECTION at all. A critical section, by its nature, takes two atomic operations: one to enter and one to leave. Furthermore, the enter code has some complexity to it. As of PB8, this class manipulates the linked list using only 1 atomic operation per alloc or free.

If it is set for single-threaded usage, it does not use any atomic operation. You can switch the need for thread safety on and off during the program, individually for each pool.

June 2006 Benchmark Results
version time adj. time relative speed
stub (no allocation) ?? 0
regular new/delete 278 ?? defined as 1
fixed pool 34.4 ?? 8.1 ◊
thread-safe fixed pool 106 ?? 2.6 ◊