member_callback_thunk Template
In C++, you can have a pointer to a member function. This requires an instance to be called upon, so the syntax is:
(instance.*ptr)(args)
When you call a member function normally, the syntax is:
instance.member(args)
Normally, every expression in C++ has a well-defined type. But what is instance.*ptr
and what is instance.member? They are clearly subexpressions in that they are made up
of more primitive things and an operator, and the resulting thing is something that is used as part of
a function call, which is itself classified as a postfix expression.
Clearly it is something, but what is it? It is called a class closure and is something that has very little support in C++. That is, the only thing you can do with it is call the function right away. You can’t save the result in a variable and use it later, e.g.
// hypothetical code, not legal C++! p1= instance.*ptr; p1(args); p2= &instance.member; p2 (args);
This looks like it should work, if only there was some way in C++ to define p1 or p2.
This incomplete feature is referred to as a class closure by Stroustrup and others who speculate on such
things.
When I need to give a callback pointer to Windows, such as for a WinProc, what I really want is p2,
so I can use a member of a specific object. This module in Classics provides this capability.
The file Classics/closure_UT.cxx illustrates the use of a class closure.
typedef int (__stdcall callback_t) (int); callback_t* callback; … //Normal set-up of a pointer to a function. callback= &callback1; //And use it. y= callback(x); //Here is my magic: // I really want to say callback= &object.member //But I get the same result with: member_callback_thunk<C,int,int> thunk (&object, &C::member); callback= thunk.callptr(); //And use it. y= callback(x);
In this example, code that only knows about normal pointers to functions can be given
the function created by thunk.callptr() and it will work, calling
object.member when invoked. Note that this is not the same as using
STL to make functors that are triggered with the same function calling syntax as normal
functions. This is really a normal function! It can be passed to code that is
already compiled and expecting a pointer to a regular function. This specifically includes
the WinProc and any of the Enum... API functions in Windows.
Basically, the template member_callback_thunk holds a pointer to
an object and a pointer-to-member of a member function. If this class had an operator()
defined on it, it would be a functor as used with STL. But instead, it has callptr().
The instance of the member_callback_thunk, called thunk in the
example above, causes a function to exist. This normal (non-member) function contains code to
call the member on the object, with the same argument. callptr gives you a
pointer to this function.
The function is dynamically generated to match the specific object and member needed, and it
exists within the member_callback_thunk object. When the destructor is called,
the function you got from callptr isn’t any good anymore.
member_callback_thunk template__thiscallMember functions, in Microsoft’s compiler, other than those with ellipses, are normally
passed using the __thiscall calling convention. Only by specifiying otherwise
does it do something different. The code is necessarily specific to how the function expects
to pass arguments, return, and manage the stack.
It would be possible to write
additional templates for other calling conventions, but that’s not a priority. A work-around
is to create a member function that is a one-line wrapper. For example, see the members
in the atomic_counter, which
are available both as __fastcall and as the default __thiscall.
__stdcallThe code is necessarily specific to how the function expects to pass arguments, return, and
manage the stack. I chose to implement only the __stdcall case, but
additional templates could certainly be created for other cases. Providing for more calling
conventions for both member and thunk results in an explosion of combinations, since each
pair needs unique code. Furthermore, C++ doesn’t have overloading based on calling convention
so you would need to specify these things (correctly!) when creating a thunk. In short, it’s easier
all around to not have them unless they really are needed.
The Windows operating system uses __stdcall for its API entry points.
COM uses it. DLL exports are assumed to use it. Windows callbacks use it. Since providing
a thunk for outside code (e.g. Windows callbacks) is the main point (for stuff within C++ we
can just use functors without problems), __stdcall is the way to go.
Since this is not something that is possible in C++, the implementation must be
non-standard and therefore compiler-specific. If you want to supply code for other compilers, I’ll
be happy to merge it in. Note that the closure.h file just includes the proper
implementation, and each implementation goes in its own file. That is, each file is written
for the platform rather than having a nest of conditional compilation.
You need to use a different class name for each count of arguments to the function. For example,
int out1= object.call3(12,3.14,'a'); // member takes three arguments. member_callback_thunk_3<C,int, int,double,char> thunk (&object, &C::call3); int (__stdcall *callback3)(int, double, char)= thunk.callptr(); int output= callback3 (12, 3.14, 'a'); // thunk takes three arguments also.
The template class member_callback_thunk_3 is for functions that
take three arguments, The template class member_callback_thunk_12 is for functions that
take 12 arguments, etc.
You need to specify the types of all the arguments to the template. Since template classes can’t be overloaded based on different number of arguments, the simplest thing is to give them different names.
The template class member_callback_thunk_3 is in the header file
"classics\closure-3.h", the template class member_callback_thunk_99 is in the header file
"classics\closure-99.h", etc. Each count is in a header by itself, named with the count also.
But where is the file "classics\closure-867.h"? They are not to be seen in the directory when Classics
is installed. Instead, they are generated using the program ThunkN.perl. Each variation is a simple
mechanical change from the basic class, amounting to how many ParamTypen’s are listed in each
place that lists them. Since C++ doesn’t have meta-templates that let you change the number of parameters, and
preprocessor macros have neither iteration nor recursion, while Perl is born to do text manipulations, I used Perl. The program
copies the basic definition from the header and adds more parameters. So, any change to the basic definition is
automatically reflected in the copies.
If you desire to include "closure-867.h", go to the Classics source directory and run
Command
[blah\blah\blah\classics] ThunkN 867 or perhaps [blah\blah\blah\classics] perl.exe ThunkN.perl 867
If you don’t have Perl, it’s probably
easier to just replace all occurances of ParamType using your text editor, since that’s all there
is to it. Look at the existing classics\closure-3.h" for an example.
If you have more than one argument, you can
declare the function to take a single struct instead. Make the memory layout of the
structure on the stack the same as what passing multiple arguments would be. For the WinProc, I
ended up liking it better this way!
If you have no arguments, use member_callback_thunk_0 declared in
"classics\closure-0.h".
The provided code centers on getting a normal __stdcall pointer to
call a member function of a particular instance. Other things are possible, such as adding
a parameter to a function (called currying in some languages). However, before
getting very far in experimenting with other kinds, I realized that I can always design the code
to use a member function. Once the thunking capability exists in any form, the rest can be
done within the C++ language.
The real magic is done with the closure_stdcall_to_thiscall
class. The idea is that the code to call exists inside the instance.
To create the code, I use a structure. The layout of the structure is such that the object’s address and the member function’s address (or values computed simply from them) slide right in where they are needed in the code. Around them, I defined fields whose name reflects the assembly language opcodes to be placed in them.
Once the structure is populated with the values and the opcodes, the address of that structure can be cast to a function pointer!
The compiler generates pointers to member functions in three different ways.
In the simpest case, it is simply a pointer to the function. If the member is
virtual, it points to a VCALL Thunk generated by the compiler that does the
virtual lookup. In any case, all that needs to be done is to load
the ECX register with the object’s this pointer and jump to the function
pointed to.
In the case of multiple inheritence, things are a little more complicated.
Not every base class can be at the same address of the drived class, so
all base classes other than the first need a this adjustment when
calling a function implemented in that base. That is, when calling a member
that’s inherited, the this pointer needs to be adjusted to point to the
proper base object. In single inheritence, the adjustment is always zero.
With multiple inheritence, the member pointer dereference code needs to
take care of this.
So, a pointer to a member in this case contains two fields. The first
is a pointer to the entry point, and the second is a constant to add to
this before calling it. Since a pointer to a member can hold different
members at different times, this needs to be done at run time.
However, since my thunk is bound to a specific object, rather than
being called on different objects at different times, I can do that
calculation at initialization time. That is, the thunk is exactly as in
the simpler case, and the object pointer is pointing to the correct
base object already.
With virtual base classes, things are even more complex. The location of a particular base class can vary from instance to instance (since the instance may really be of some further-derived class). If the member to call is inherited from one of those virtual bases that like to move around, the code to locate the base uses lookups from the vtable in the object. Since the pointer can point to different members at different times, it must also note which base class contains the member being pointed to.
So, the pointer to member in this case contains three
fields. The first is again the actual function entry point.
The third is involved with looking up the virtual base’s
location. The second is again a this adjustment, applied
after locating the correct virtual base, in case the member
is inherited from some non-virtual base class of the virtual base.
Sound complicated? It gets worse! The code to do the
call also contains a constant known at compile time, which
is dependant on the declared class type and how it’s layed out.
I cannot write code to call a member pointer, since there is no way
to determine this other constant from the class name alone and
information provided through the C++ language (which is basically
the sizeof the final class and the location of the very beginning of
the most-complete object).
To get around this, I have the compiler figure it out for me.
The code in create_probe sets up a pointer
to a member with the same adjustment values, but points
to a “fake” member function. Then the template, which
expands into the proper code for the types given, performs
the ->* operator that only it knows how to
do. But it calls my function! My function simply notes the
address of this. Now, I know the correct subobject for the
given object and the real member, and once again set up
the most simple thunk with this information.
You’ll notice that calling a thunk is more efficient than calling a normal pointer to a member function! Because the object is fixed, the calculations are done once, when the thunk is created, rather than every time it is called. The thunks are just as efficient for calling back things in fancy virtual inheritence trees as they are with the simplest class.
There is a slight difference at initialization time, though. The single and multiple inheritence cases call a non-template function. Only one function exists, no matter how many types are instantiated.
The virtual inheritence case requires that the ->*
operation be generated by the template, knowing the correct object
type. So, a different helper function is generated for every
object class, though different parameter and return types can
share one. However, it is inlined anyway. Basically, the
initialization requires more code than for the other cases: it performs
the ->* call, and one more additional call compared
with the non-virtual-base case.
The code present in the generated thunk contains two instructions:
mov ECX, object ; mov immediate jmp target ; jmp relative immediate
That’s the same amount of code as the compiler’s VCALL thunk. With only two instructions it’s hard to get simpler: calling a function in a DLL goes through an extra jump too, but does not need to load a register.
By all rights, this ought to be extremely fast, and is certainly as fast as is possible to accomplish this. But measuring the actual speed is problematic. Because of the nature of the modern superscaler CPUs, trying to measure a bunch of copies of this code gives funny results. For example, making the called function do one more layer of recursion made it run almost twice as fast! It only makes sence to time it when the CPU is doing other things, not just millions of concecutive calls. If the called function is realistic in its complexity, the overhead of the thunk is not even measurable with a simple timing program.