ustring class is a universal string. It serves to stand in for either a
wide or narrow string, and to translate between string types.
Read about:
When a function takes a paramter of type ustring, the caller may pass either
a string or a wstring. Either is accepted without complaint.
Meanwhile, the function may decide to do different things based on whether the incoming
string is wide or narrow; or, it could just get a wide string and have any narrow string
converted.
void foo (string); //narrow form of function void foo (wstring); //wide form of function
void foo (ustring); //only one formWithin the function, you can have separate code for each case, or have the input automatically converted. Use the
is_wide function to detect whether a wide or narrow string is present.
void foo (ustring x)
{
if (x.is_wide()) {
wstring s= x; //get the string out
//...
}
else { //must be narrow
string s= x; //get the string out
//... simpler processing for 8-bit characters
}
}
-- or --
void foo (ustring x)
{
wstring s= x; //get wide out, convert if needed.
//... wide case subsumes narrow; works for all.
}
In addition, this class may be used to transparantly convert between different brands of string class, making it the Babel-fish of string representations. This will let you use your string class of choice and still easily use the Classics library. In addition, if you have different string classes in different libraries in your program, this feature can transparently communicate between them for you.
For example, you could put an MFC CString in, and get a classics::wstring out,
so you could trivially pass CString's to foo in the above example. Likewise on return values — you could
get a Rogue Wave string out, for example, regardless of what the function actually returned to you.
In order to provide transparent conversion, letting you put your type into a ustring and get your type out, you need to create a header and include it in the files that need this capability.
Meanwhile, other code that was compiled with no knowledge of your type will accept such ustrings
and want to get their own types out. This is handled with an interface module you provide,
which explains how to read and write data from and to your type, and will be used polymorphically by any ustring
instance that finds itself holding one of your strings.
The ustring class is designed to facilitate the exchange of parameters only. It desn't "do" anything
that you expect of a string object. The desired functionality belongs to a real string class, and the problem is that
there are many such string classes, each of which do different things. The ustring object allows
you to pass or return any kind of string object you choose, but is not itself a usable string.
The recommended way to write a function to take string arguments is:
void foo (const ustring& us)
{
preferred_string_type s= us; //extract the information
process (s); //use s throughout, never touching us again.
}
The preferred method is to use a const ustring&. This will cause a ustring to be
created if you call foo with a concrete string type, but will not duplicate the ustring (and the underlying
concrete string) if you pass an existing ustring instead.
Declare a string variable locally and initialize it with the parameter. The ustring will spit out the correct type. If you extract the same type that was passed, no extra work is done, as the ustring just unwraps the original value. If you extract a different type, an object of that type is created.
To return a string value, declare the function to return a ustring.
ustring bar (int index)
{
CString s;
// ... do stuff with s involving dialog boxes or whatever ...
return s;
}
Within the function, use whatever concrete string type you choose. Sometimes a particular string class is especially
convienient for a particular function, as with this example that wants to use a CString because it makes other
MFC calls.
In any case, write the function normally, as if your desired string class were the only one in the world. Then, when
you return the string (of whatever actual string type you worked with), it will be wrapped in the ustring for
exporting back to the caller. The caller can write std::wstring result=bar(5); without caring that
the function uses Microsoft's MFC strings.
Originally, the ustring class did not allow instances to be copied. This meant that the simple
case of
ustring bardone() { return bar(-1); }
could not be written, because the return value from bar needs to be copied again. Now, such code is legal because
ustring has a copy constructor that clones the wrapped string. This is not too terribly ineficient, as it
only adds a layer of indirection around the copy constructor for the underlying concrete string (which ought to be using
some kind of reference counting). But it is even more efficient if you have no need to cascade ustring calls like this.
That is, only use ustrings to form your public interface, and don't pass them around within your
own implementation. Within your implementation, use your chosen string type throughout.
CString bar_internal (int);
inline ustring bar (int x) { return bar_internal(x); }
inline ustring bardone() { return bar_internal(-1); }
Although generally a good idea to use your preferred string class consistantly throughout your implementation (rendering
cascading ustring calls unnecessary), this is not always the natural solution. For example, you may
be using std::string throughout, but some functions want to use CString instead, for
interfacing with MFC classes. Second, for non-trivial operations the extra overhead is not an issue.
So, you can copy a ustring value if you need to, but if used as designed there is rarely the need.
So far in this section, we have seen strings converted as a way of marshalling the data between functions that may have been written using different string classes.
As we saw in the introduction, the ustring class does more than this. It can also convert
between 8-bit and 16-bit character representations, or between ANSI and OEM 8-bit character sets. Sometimes
you want to trigger this behavior explicitly.
Obviously, you can do it like this:
ustring temp= s; wstring result= temp;or even
wstring result= ustring(s);but a cleaner method is to use the
string_cast.
wstring result= string_cast<wstring> (s);The
string_cast<T>(s) template will convert the argument s (which must be
of a "string type") to T, using the same mechanism that ustring uses to convert between types.
This avoids the need to create an intermediate ustring wrapper around s first.
Among other things, the ustring component converts to and from the MFC CString
class, as found in AFX.H and used extensivly with the Microsoft Foundation Classes.
There is a race condition associated with the "lock" concept. Given two CStrings s1 and s2,
one thread can call s1=s2; while another thread calls s2.LockBuffer();. The implementation
does not guarantee that s2 will be locked before it's bound into s1 (so the assignment sees
s2 as locked), or that s2 will be locked after it's bound into s1 (so
LockBuffer sees that the reference count is greater than 1). Instead, you
could wind up with an illegal state where s2 is locked but still sharing a reference with s1.
This will turn into a
stray pointer bug when either s1 or s2 replaces its reference or is destructed.
There are race conditions in general when writing to a CString.
Because the implementation refers to the referenced data well after releasing its ownership (by decrementing
the reference count), it's possible that all other references could be released, in other threads, in the mean time. Then
the first thread will have a stray pointer. For example, s1=s2; so they share the same reference-counted
character data; then one thread executes s1.SetAt(index,ch); while another thread executes
s2=s3;, arguably a totally unrelated operation. But it's possible for the SetAt call to lose the contents
of s1 before it finishes making its unique copy for the “copy on write” operation. This bug is
general throughout the code.
Although InterlockedIncrement and InterlockedDecrement is used for the
reference count itself, it is by no means safe to manipulate the same reference-counted object in two threads at
the same time. Since different CString objects may refer to the same underlying data, it is generally
unsafe to use the CString class from more than one thread. Microsoft's use of atomic counters for the
reference count gives a false impression that the class is thread-safe
CStrings don't work with standard output. For some reason, there is no operator<<
defined for it. If you write:
CString message= _T("hello world!");
std::cout << message;
You will get no compiler error, but a very strange result. It displays a large numeric value (an address?) in hex
rather than the contents of the string!
In order to display the CString, you can write:
std::cout << LPCTSTR(message);instead, which uses the operator
TCHAR conversion to display the underlying char* data.
However, the use of the "T" generic types is misleading here, because it doesn't work if compiled under Unicode. The line above
prints an address, not the string, if TCHAR is a wchar_t. This is not because the standard
library correctly does not have an output operator for a nul-terminated wchar_t on a narrow stream, but
rather because it does have a non-standard output operator for this case which doesn't work. Presumably it
has something to do with the fact that wchar_t isn't a distinct type like it's supposed to be.
In the testing program ustring_test_CString.cxx this is worked-around by defining
a proper output operator for CString as part of the program.
In order to use CString with the ustring conversion system, simply include AFX.H (or something
else that pulls it in), which contains the class definition for class ::CString, before
including classics\ustring.h
(or anything else that pulls it in).
Note that CString deals either with wide or narrow characters depending on conditinoal compilation
within the AFX.H header file. You can't have both wide and narrow CStrings in the same program
as you can with other string classes. The ustring.h header file notes the current option and treats CString
as wide or narrow, as indicated.
For examples, see the unit test program classics\ustring_CString_test.cxx.
The classics.dll is not dependant on the MFC dlls. That would have been undesirable, not only because such
a dependancy is unwelcome in the general case, but the conditional compilation nature of CString means
that there are different dll's depending on whether you compile MFC for wide or narrow strings, and
you can't link to both of them. Not only would Classics also have to come in two flavors as well, but then it would
reference a specific version of MFC rather than being compatible with any version.
So, Classics contains a small file that essentially clones Microsoft's CString class. It knows the structure
layout and the manner in which the reference counting works and the manner in which data is maintained within
the string. The ustring::awareness object (which is the only code that knows about specific string types) for
CString then uses this code instead of code found in MFCS42u.dll (or whereever).
As far as the ustring.h class is concerned, it uses the MFC_CString<T> parameterized type (unlike
Microsoft's, it allows wide and narrow objects to co-exist) and doesn't care that such has objects that are identical
with ::CString's on a binary level.
Meanwhile, if AFX.H has been included (therefore, CString is defined), a couple of inline functions are
conditionally compiled-in to ustring.h. These (one for setting, one for getting) simply cast the CString object
into the internal type, or vice versa, and generate no actual code.
The result is transparent CString support on demand, with all support code located inside the classics.dll
rather than expanded into the calling program (via inline functions or templates).
The ustring class is extensible, in that you can add support for additional string types.
To do this, you supply an overloaded get_string_awareness function defined for your
type.
In order to inform the ustring component about your type (say, SillyString),
you declare an overloaded form of the get_string_awareness function. This
overloaded form will be found when the get_as or set_as template is invoked, so the library
will receive the correct awareness object.
const ustring::awareness_t* get_string_awareness (const SillyString*);
So just what is an awareness object? That's an adaptor or interface of sorts that
tells the ustring how to manipulate the thing its wrapping. You must derive from the abstract base class
ustring::awareness_t, and provide definitions for several pure virtual functions that
describe how to manipulate the string.
Implementing the concrete awareness class is simple. In general, you cast the void*st argument (the
"black box" being held by the ustring wrapper) into the actual string class, in this example a SillyString* and
then do the corresponding operation on that object. For example,
class SillyString_awareness : public ustring::awareness_t {
int length (const void* st) const
{
const SillyString* s= static_cast<const SillyString*>(st);
//first cast st to my actual type
return s->elcount();
//then perform the operation.
}
This section is not yet finished.
This section has not yet been written.