ustring

The ustring class is a universal string. It serves to stand in for either a wide or narrow string, and to translate between string types.

Read about:

Concept

A ustring object acts as a flexible cubbyhole. You can easily stick some kind of string in, and later, get some kind of string out. If the type asked for is different from what's in there, it is automatically converted.

Wide or Narrow

The ustring's primary role is to be either wide or narrow. This cuts down the number of functions needed when both 8-bit character and Unicode support is desired.

When a function takes a paramter of type ustring, the caller may pass either a string or a wstring. Either is accepted without complaint. Meanwhile, the function may decide to do different things based on whether the incoming string is wide or narrow; or, it could just get a wide string and have any narrow string converted.

example — before ustring

void foo (string);  //narrow form of function
void foo (wstring);  //wide form of function

with ustring

void foo (ustring);  //only one form
Within the function, you can have separate code for each case, or have the input automatically converted. Use the is_wide function to detect whether a wide or narrow string is present.

void foo (ustring x)
 {
 if (x.is_wide()) {
    wstring s= x;  //get the string out
    //...
    }
 else {  //must be narrow
    string s= x;  //get the string out
    //... simpler processing for 8-bit characters
    }
 }
-- or --
void foo (ustring x)
 {
 wstring s= x;  //get wide out, convert if needed.
 //... wide case subsumes narrow; works for all.
 }

Wide/Narrow Conversions

Using this class as a go-between is the preferred way to "blindly" — that is, without special processing instructions — convert between 8-bit and 16-bit character sets. It uses the Win32 functions MultiByteToWideChar and WideCharToMultiByte to do its work, so it should handle the situation, including multi-byte characters, on any nationalized version of Win32.

Multiple String Class Conversions

In addition, this class may be used to transparantly convert between different brands of string class, making it the Babel-fish of string representations. This will let you use your string class of choice and still easily use the Classics library. In addition, if you have different string classes in different libraries in your program, this feature can transparently communicate between them for you.

For example, you could put an MFC CString in, and get a classics::wstring out, so you could trivially pass CString's to foo in the above example. Likewise on return values — you could get a Rogue Wave string out, for example, regardless of what the function actually returned to you.

In order to provide transparent conversion, letting you put your type into a ustring and get your type out, you need to create a header and include it in the files that need this capability.

Meanwhile, other code that was compiled with no knowledge of your type will accept such ustrings and want to get their own types out. This is handled with an interface module you provide, which explains how to read and write data from and to your type, and will be used polymorphically by any ustring instance that finds itself holding one of your strings.

Using the ustring class

The ustring class is designed to facilitate the exchange of parameters only. It desn't "do" anything that you expect of a string object. The desired functionality belongs to a real string class, and the problem is that there are many such string classes, each of which do different things. The ustring object allows you to pass or return any kind of string object you choose, but is not itself a usable string.

Passing Parameters

The recommended way to write a function to take string arguments is:

void foo (const ustring& us)
 {
 preferred_string_type s= us;  //extract the information
 process (s);  //use s throughout, never touching us again.
 }

Returning Values

To return a string value, declare the function to return a ustring.

ustring bar (int index)
 {
 CString s;
 // ... do stuff with s involving dialog boxes or whatever ...
 return s;
 }
Within the function, use whatever concrete string type you choose. Sometimes a particular string class is especially convienient for a particular function, as with this example that wants to use a CString because it makes other MFC calls.

In any case, write the function normally, as if your desired string class were the only one in the world. Then, when you return the string (of whatever actual string type you worked with), it will be wrapped in the ustring for exporting back to the caller. The caller can write std::wstring result=bar(5); without caring that the function uses Microsoft's MFC strings.

Originally, the ustring class did not allow instances to be copied. This meant that the simple case of

ustring bardone()  { return bar(-1); }
could not be written, because the return value from bar needs to be copied again. Now, such code is legal because ustring has a copy constructor that clones the wrapped string. This is not too terribly ineficient, as it only adds a layer of indirection around the copy constructor for the underlying concrete string (which ought to be using some kind of reference counting). But it is even more efficient if you have no need to cascade ustring calls like this.

That is, only use ustrings to form your public interface, and don't pass them around within your own implementation. Within your implementation, use your chosen string type throughout.

CString bar_internal (int);
inline ustring bar (int x)  { return bar_internal(x); }
inline ustring bardone()  { return bar_internal(-1); }
Although generally a good idea to use your preferred string class consistantly throughout your implementation (rendering cascading ustring calls unnecessary), this is not always the natural solution. For example, you may be using std::string throughout, but some functions want to use CString instead, for interfacing with MFC classes. Second, for non-trivial operations the extra overhead is not an issue.

So, you can copy a ustring value if you need to, but if used as designed there is rarely the need.

Converting Strings Explicitly — string_cast

So far in this section, we have seen strings converted as a way of marshalling the data between functions that may have been written using different string classes.

As we saw in the introduction, the ustring class does more than this. It can also convert between 8-bit and 16-bit character representations, or between ANSI and OEM 8-bit character sets. Sometimes you want to trigger this behavior explicitly.

Obviously, you can do it like this:

	ustring temp= s;
	wstring result= temp;
or even
	wstring result= ustring(s);
but a cleaner method is to use the string_cast.
	wstring result= string_cast<wstring> (s);
The string_cast<T>(s) template will convert the argument s (which must be of a "string type") to T, using the same mechanism that ustring uses to convert between types. This avoids the need to create an intermediate ustring wrapper around s first.

MFC's CString Support

Among other things, the ustring component converts to and from the MFC CString class, as found in AFX.H and used extensivly with the Microsoft Foundation Classes.

Notes on Microsoft's Class

There is a race condition associated with the "lock" concept. Given two CStrings s1 and s2, one thread can call s1=s2; while another thread calls s2.LockBuffer();. The implementation does not guarantee that s2 will be locked before it's bound into s1 (so the assignment sees s2 as locked), or that s2 will be locked after it's bound into s1 (so LockBuffer sees that the reference count is greater than 1). Instead, you could wind up with an illegal state where s2 is locked but still sharing a reference with s1. This will turn into a stray pointer bug when either s1 or s2 replaces its reference or is destructed.

There are race conditions in general when writing to a CString. Because the implementation refers to the referenced data well after releasing its ownership (by decrementing the reference count), it's possible that all other references could be released, in other threads, in the mean time. Then the first thread will have a stray pointer. For example, s1=s2; so they share the same reference-counted character data; then one thread executes s1.SetAt(index,ch); while another thread executes s2=s3;, arguably a totally unrelated operation. But it's possible for the SetAt call to lose the contents of s1 before it finishes making its unique copy for the “copy on write” operation. This bug is general throughout the code.

Although InterlockedIncrement and InterlockedDecrement is used for the reference count itself, it is by no means safe to manipulate the same reference-counted object in two threads at the same time. Since different CString objects may refer to the same underlying data, it is generally unsafe to use the CString class from more than one thread. Microsoft's use of atomic counters for the reference count gives a false impression that the class is thread-safe

CStrings don't work with standard output. For some reason, there is no operator<< defined for it. If you write:

CString message= _T("hello world!");
std::cout << message;
You will get no compiler error, but a very strange result. It displays a large numeric value (an address?) in hex rather than the contents of the string!

In order to display the CString, you can write:

std::cout <<  LPCTSTR(message);
instead, which uses the operator TCHAR conversion to display the underlying char* data. However, the use of the "T" generic types is misleading here, because it doesn't work if compiled under Unicode. The line above prints an address, not the string, if TCHAR is a wchar_t. This is not because the standard library correctly does not have an output operator for a nul-terminated wchar_t on a narrow stream, but rather because it does have a non-standard output operator for this case which doesn't work. Presumably it has something to do with the fact that wchar_t isn't a distinct type like it's supposed to be.

In the testing program ustring_test_CString.cxx this is worked-around by defining a proper output operator for CString as part of the program.

Using CString with ustring

In order to use CString with the ustring conversion system, simply include AFX.H (or something else that pulls it in), which contains the class definition for class ::CString, before including classics\ustring.h (or anything else that pulls it in).

Note that CString deals either with wide or narrow characters depending on conditinoal compilation within the AFX.H header file. You can't have both wide and narrow CStrings in the same program as you can with other string classes. The ustring.h header file notes the current option and treats CString as wide or narrow, as indicated.

For examples, see the unit test program classics\ustring_CString_test.cxx.

How it Works

The classics.dll is not dependant on the MFC dlls. That would have been undesirable, not only because such a dependancy is unwelcome in the general case, but the conditional compilation nature of CString means that there are different dll's depending on whether you compile MFC for wide or narrow strings, and you can't link to both of them. Not only would Classics also have to come in two flavors as well, but then it would reference a specific version of MFC rather than being compatible with any version.

So, Classics contains a small file that essentially clones Microsoft's CString class. It knows the structure layout and the manner in which the reference counting works and the manner in which data is maintained within the string. The ustring::awareness object (which is the only code that knows about specific string types) for CString then uses this code instead of code found in MFCS42u.dll (or whereever).

As far as the ustring.h class is concerned, it uses the MFC_CString<T> parameterized type (unlike Microsoft's, it allows wide and narrow objects to co-exist) and doesn't care that such has objects that are identical with ::CString's on a binary level.

Meanwhile, if AFX.H has been included (therefore, CString is defined), a couple of inline functions are conditionally compiled-in to ustring.h. These (one for setting, one for getting) simply cast the CString object into the internal type, or vice versa, and generate no actual code.

The result is transparent CString support on demand, with all support code located inside the classics.dll rather than expanded into the calling program (via inline functions or templates).

Adding Support for Additional String Classes

The ustring class is extensible, in that you can add support for additional string types. To do this, you supply an overloaded get_string_awareness function defined for your type.

Creating the Awareness Object

In order to inform the ustring component about your type (say, SillyString), you declare an overloaded form of the get_string_awareness function. This overloaded form will be found when the get_as or set_as template is invoked, so the library will receive the correct awareness object.

const ustring::awareness_t* get_string_awareness (const SillyString*);

So just what is an awareness object? That's an adaptor or interface of sorts that tells the ustring how to manipulate the thing its wrapping. You must derive from the abstract base class ustring::awareness_t, and provide definitions for several pure virtual functions that describe how to manipulate the string.

Implementing the concrete awareness class is simple. In general, you cast the void*st argument (the "black box" being held by the ustring wrapper) into the actual string class, in this example a SillyString* and then do the corresponding operation on that object. For example,

class SillyString_awareness : public ustring::awareness_t {
   int length (const void* st) const
    {
    const SillyString* s= static_cast<const SillyString*>(st);
    	//first cast st to my actual type
    return s->elcount();
    	//then perform the operation.
    }

This section is not yet finished.

Create the Custom Header

This section has not yet been written.