closure (a.k.a. thunks)

Contents

What Is A Closure?

In C++, you can have a pointer to a member function. This requires an instance to be called upon, so the syntax is:

(instance.*ptr)(args)

When you call a member function normally, the syntax is:

instance.member(args)

Normally, every expression in C++ has a well-defined type. But what is instance.*ptr and what is instance.member? They are clearly subexpressions in that they are made up of more primitive things and an operator, and the resulting thing is something that is used as part of a function call, which is itself classified as a postfix expression.

Clearly it is something, but what is it? It is called a class closure and is something that has very little support in C++. That is, the only thing you can do with it is call the function right away. You canít save the result in a variable and use it later, e.g.

// hypothetical code, not legal C++!
p1= instance.*ptr;
p1(args);

p2= &instance.member;
p2 (args);

This looks like it should work, if only there was some way in C++ to define p1 or p2. This incomplete feature is referred to as a class closure by Stroustrup and others who speculate on such things.

When I need to give a callback pointer to Windows, such as for a WinProc, what I really want is p2, so I can use a member of a specific object. This module in Classics provides this capability.

Using a Closure

The file Classics/closure_UT.cxx illustrates the use of a class closure.

typedef int (__stdcall callback_t) (int);
callback_t* callback;

…

//Normal set-up of a pointer to a function.
callback= &callback1;
//And use it.
y= callback(x);

//Here is my magic:
// I really want to say  callback= &object.member
//But I get the same result with:
member_callback_thunk<C,int,int> thunk (&object, &C::member);
callback= thunk.callptr();
//And use it.
y= callback(x);

In this example, code that only knows about normal pointers to functions can be given the function created by thunk.callptr() and it will work, calling object.member when invoked. Note that this is not the same as using STL to make functors that are triggered with the same function calling syntax as normal functions. This is really a normal function! It can be passed to code that is already compiled and expecting a pointer to a regular function. This specifically includes the WinProc and any of the Enum... API functions in Windows.

Basically, the template member_callback_thunk holds a pointer to an object and a pointer-to-member of a member function. If this class had an operator() defined on it, it would be a functor as used with STL. But instead, it has callptr().

The instance of the member_callback_thunk, called thunk in the example above, causes a function to exist. This normal (non-member) function contains code to call the member on the object, with the same argument. callptr gives you a pointer to this function.

One Way road signThe function is dynamically generated to match the specific object and member needed, and it exists within the member_callback_thunk object. When the destructor is called, the function you got from callptr isnít any good anymore.

Using the member_callback_thunk template

Limitations

The member must be __thiscall

Member functions, in Microsoftís compiler, other than those with ellipses, are normally passed using the __thiscall calling convention. Only by specifiying otherwise does it do something different. The code is necessarily specific to how the function expects to pass arguments, return, and manage the stack.

It would be possible to write additional templates for other calling conventions, but thatís not a priority. A work-around is to create a member function that is a one-line wrapper. For example, see the members in the atomic_counter, which are available both as __fastcall and as the default __thiscall.

The resulting closure is __stdcall

The code is necessarily specific to how the function expects to pass arguments, return, and manage the stack. I chose to implement only the __stdcall case, but additional templates could certainly be created for other cases. Providing for more calling conventions for both member and thunk results in an explosion of combinations, since each pair needs unique code. Furthermore, C++ doesnít have overloading based on calling convention so you would need to specify these things (correctly!) when creating a thunk. In short, itís easier all around to not have them unless they really are needed.

The Windows operating system uses __stdcall for its API entry points. COM uses it. DLL exports are assumed to use it. Windows callbacks use it. Since providing a thunk for outside code (e.g. Windows callbacks) is the main point (for stuff within C++ we can just use functors without problems), __stdcall is the way to go.

Nonstandard and Compiler Specific

Since this is not something that is possible in C++, the implementation must be non-standard and therefore compiler-specific. If you want to supply code for other compilers, Iíll be happy to merge it in. Note that the closure.h file just includes the proper implementation, and each implementation goes in its own file. That is, each file is written for the platform rather than having a nest of conditional compilation.

How Many Arguments?

You need to use a different class name for each count of arguments to the function. For example,

 int out1= object.call3(12,3.14,'a'); // member takes three arguments.
 member_callback_thunk_3<C,int, int,double,char> thunk (&object, &C::call3);
 int (__stdcall *callback3)(int, double, char)= thunk.callptr();
 int output= callback3 (12, 3.14, 'a'); // thunk takes three arguments also.

The template class member_callback_thunk_3 is for functions that take three arguments, The template class member_callback_thunk_12 is for functions that take 12 arguments, etc.

You need to specify the types of all the arguments to the template. Since template classes canít be overloaded based on different number of arguments, the simplest thing is to give them different names.

Where are they declared?

The template class member_callback_thunk_3 is in the header file "classics\closure-3.h", the template class member_callback_thunk_99 is in the header file "classics\closure-99.h", etc. Each count is in a header by itself, named with the count also.

But where is the file "classics\closure-867.h"? They are not to be seen in the directory when Classics is installed. Instead, they are generated using the program ThunkN.perl. Each variation is a simple mechanical change from the basic class, amounting to how many ParamTypenís are listed in each place that lists them. Since C++ doesnít have meta-templates that let you change the number of parameters, and preprocessor macros have neither iteration nor recursion, while Perl is born to do text manipulations, I used Perl. The program copies the basic definition from the header and adds more parameters. So, any change to the basic definition is automatically reflected in the copies.

If you desire to include "closure-867.h", go to the Classics source directory and run

Command
[blah\blah\blah\classics] ThunkN 867
        or perhaps
[blah\blah\blah\classics] perl.exe ThunkN.perl 867

If you donít have Perl, itís probably easier to just replace all occurances of ParamType using your text editor, since thatís all there is to it. Look at the existing classics\closure-3.h" for an example.

But you only need one

If you have more than one argument, you can declare the function to take a single struct instead. Make the memory layout of the structure on the stack the same as what passing multiple arguments would be. For the WinProc, I ended up liking it better this way!

Zero arguments

If you have no arguments, use member_callback_thunk_0 declared in "classics\closure-0.h".

Const and Volatile Members

Other kinds of thunks

The provided code centers on getting a normal __stdcall pointer to call a member function of a particular instance. Other things are possible, such as adding a parameter to a function (called currying in some languages). However, before getting very far in experimenting with other kinds, I realized that I can always design the code to use a member function. Once the thunking capability exists in any form, the rest can be done within the C++ language.

Internals: How It Works (Microsoft VC++)

[skip it]

Basic Thunk

The real magic is done with the closure_stdcall_to_thiscall class. The idea is that the code to call exists inside the instance.

To create the code, I use a structure. The layout of the structure is such that the objectís address and the member functionís address (or values computed simply from them) slide right in where they are needed in the code. Around them, I defined fields whose name reflects the assembly language opcodes to be placed in them.

Once the structure is populated with the values and the opcodes, the address of that structure can be cast to a function pointer!

Different Grades of Member Pointers

The compiler generates pointers to member functions in three different ways. In the simpest case, it is simply a pointer to the function. If the member is virtual, it points to a VCALL Thunk generated by the compiler that does the virtual lookup. In any case, all that needs to be done is to load the ECX register with the objectís this pointer and jump to the function pointed to.

In the case of multiple inheritence, things are a little more complicated. Not every base class can be at the same address of the drived class, so all base classes other than the first need a this adjustment when calling a function implemented in that base. That is, when calling a member thatís inherited, the this pointer needs to be adjusted to point to the proper base object. In single inheritence, the adjustment is always zero. With multiple inheritence, the member pointer dereference code needs to take care of this.

So, a pointer to a member in this case contains two fields. The first is a pointer to the entry point, and the second is a constant to add to this before calling it. Since a pointer to a member can hold different members at different times, this needs to be done at run time. However, since my thunk is bound to a specific object, rather than being called on different objects at different times, I can do that calculation at initialization time. That is, the thunk is exactly as in the simpler case, and the object pointer is pointing to the correct base object already.

With virtual base classes, things are even more complex. The location of a particular base class can vary from instance to instance (since the instance may really be of some further-derived class). If the member to call is inherited from one of those virtual bases that like to move around, the code to locate the base uses lookups from the vtable in the object. Since the pointer can point to different members at different times, it must also note which base class contains the member being pointed to.

So, the pointer to member in this case contains three fields. The first is again the actual function entry point. The third is involved with looking up the virtual baseís location. The second is again a this adjustment, applied after locating the correct virtual base, in case the member is inherited from some non-virtual base class of the virtual base. Sound complicated? It gets worse! The code to do the call also contains a constant known at compile time, which is dependant on the declared class type and how itís layed out. I cannot write code to call a member pointer, since there is no way to determine this other constant from the class name alone and information provided through the C++ language (which is basically the sizeof the final class and the location of the very beginning of the most-complete object).

To get around this, I have the compiler figure it out for me. The code in create_probe sets up a pointer to a member with the same adjustment values, but points to a “fake” member function. Then the template, which expands into the proper code for the types given, performs the ->* operator that only it knows how to do. But it calls my function! My function simply notes the address of this. Now, I know the correct subobject for the given object and the real member, and once again set up the most simple thunk with this information.

Efficient, No Matter What

Youíll notice that calling a thunk is more efficient than calling a normal pointer to a member function! Because the object is fixed, the calculations are done once, when the thunk is created, rather than every time it is called. The thunks are just as efficient for calling back things in fancy virtual inheritence trees as they are with the simplest class.

There is a slight difference at initialization time, though. The single and multiple inheritence cases call a non-template function. Only one function exists, no matter how many types are instantiated.

The virtual inheritence case requires that the ->* operation be generated by the template, knowing the correct object type. So, a different helper function is generated for every object class, though different parameter and return types can share one. However, it is inlined anyway. Basically, the initialization requires more code than for the other cases: it performs the ->* call, and one more additional call compared with the non-virtual-base case.

How Fast Is It?

The code present in the generated thunk contains two instructions:

mov ECX, object  ; mov immediate
jmp target ; jmp relative immediate

Thatís the same amount of code as the compilerís VCALL thunk. With only two instructions itís hard to get simpler: calling a function in a DLL goes through an extra jump too, but does not need to load a register.

By all rights, this ought to be extremely fast, and is certainly as fast as is possible to accomplish this. But measuring the actual speed is problematic. Because of the nature of the modern superscaler CPUs, trying to measure a bunch of copies of this code gives funny results. For example, making the called function do one more layer of recursion made it run almost twice as fast! It only makes sence to time it when the CPU is doing other things, not just millions of concecutive calls. If the called function is realistic in its complexity, the overhead of the thunk is not even measurable with a simple timing program.

Caveats

Microsoft VC++

Porting