现在的位置: 首页 > 综合 > 正文

CString Management

2013年06月24日 ⁄ 综合 ⁄ 共 34507字 ⁄ 字号 评论关闭
文章目录

http://www.flounder.com/cstring.htm

CStrings are a useful data type. They greatly simplify a lot of operations in MFC, making it much more convenient to do string manipulation. However, there are some special techniques to using CStrings, particularly hard for people coming from a pure-C background to learn. This essay discusses some of these techniques.

Much of what you need to do is pretty straightforward. This is not a complete tutorial on CStrings, but captures the most common basic questions. 

String Basics

"String" is a slippery concept.  There are many possible interpretations of the idea of a "string", that is, a sequence of characters.  While many of these are interchangeable and convertible, they each have their own characteristics, advantages and disadvantages.

The simple idea is that in programming MFC you should always use the CString type.  The only time you will do conversions to other data types are when you need to interface to components that require one of the alternative interfaces.

Base language types

char An 8-bit signed character value, range -128..127.  Should be used only in very rare circumstances, and otherwise avoided entirely.
unsigned char An 8-bit unsigned character value, range 0..255.  Should be used only in very rare circumstances, and otherwise avoided entirely.
char * A pointer to a sequence of 8-bit signed character values.  By convention, the sequence of characters is terminated by a NUL character, a 0 value.  Should be used only in very rare circumstances, and otherwise avoided entirely. 
const char * A pointer to a sequence of 8-bit signed character values.  The contents of the sequence may not be modified.  Otherwise, see char * cautions.
wchar_t A 16-bit signed character value (wide character type), range -32768..32767.  In Microsoft compilers, a wide character is interpreted as a Unicode character.  Should be used only in very rare circumstances, and otherwise avoided entirely.
unsigned wchar_t A 16-bit unsigned character value (wide character type), range 0..65536.  SSShould be used only in very rare circumstances, and otherwise avoided entirely.
wchar_t * A pointer to a sequence of 16-bit signed character values.  By convention, the sequence of characters is terminated by NUL character, a 0 value.  Should be used only in very rare circumstances, and otherwise avoided entirely.
const wchar_t * A pointer to a sequence of 16-bit signed character values.  The contents of the sequence may not be modified.  Otherwise see the wchar_t * cautions.
'c' An 8-bit signed character constant.  Strictly speaking, this represents an int value and therefore can contain more than one character, e.g., 'khuj' will appear in a memory dump as the 32-bit value 'junk' because an x86 is a "little-endian" machine.  This should be used only in the extremely rare cases when an 8-bit character constant is required, which is a vanishingly small number of times in real programming.  In the very rare cases you actually need an 8-bit character constant, the need will be obvious.  You should never assume that characters are 8-bit characters unless there is an external requirement demanding this.
"xxx" A string of 8-bit characters.  The compiler actually allocates 4 bytes for the string shown, so that the string is terminated by a NUL 8-bit character.  The value itself is allocated in the write-protected area of the program, so an attempt to assign to it will generate an access fault (starting with VS.NET 2005, the type of a string literal is now const char *).  The use of this form of constant should be vanishingly small in real programs, with the exception of the GetProcAddress API, which is the only API that actually requires an 8-bit string argument.  Only if there is an external specification demanding an 8-bit character string should this be used.
L'c' A 16-bit signed character constant.  This should be used only in the extremely rare cases when a 16-bit character contant is required, which is a vanishingly small number of times in real programming.
L"xxx" A string of 16-bit characters.  The compiler actually allocates 8 bytes for the string shown, so that the string is terminated by a NUL 16-bit character.  The value itself is allocated in the write-protected rea of the progam, so an attempt to assign to it will generate an access fault (starting with VS.NET 2005, the type of a string literal is now const wchar_t *).  The use of this form of constant should be vanshingly small in real programs

Windows Types

CHAR 8-bit signed character type.  Should be used rarely, if ever; same cautions as base char type.
LPSTR
PSTR
Pointer to 8-bit signed character sequence.  This is the preferred way to declare a pointer to 8-bit characters (char *).  Should be used very rarely, if ever.
LPCSTR
PCSTR
Pointer to constant 8-bit signed character sequence.  This is the preferred way to declare a pointer to constant 8-bit characters (const char *).  Should be used very rarely, if ever.
WCHAR 16-bit signed character type.  Should be used rarely, if ever, and only under conditions where the character is known to be a 16-bit character.  See cautions for wchar_t.
LPWSTR
PWSTR
Pointer to sequence of 16-bit characters.  This is the preferred way to declare a pointer (instead of wchar_t *).  Should be used only in the rare cases where the sequence is known to be a sequence of Unicode characters.  See cautions for wchar_t *.
LPCWSTR
PCWSTR
Pointer to a constant 16-bit signed character sequence.  This is the preferred way to declare a const wchar_t * value.
TCHAR An 8-bit or 16-bit character.  If the UNICODE preprocessor symbol is defined, this compiles to a wchar_t type; if the UNICODE preprocessor symbol is undefined, this compiles to a char type.  This is the preferred way to declare a character variable.
LPTSTR
PTSTR
A pointer to an 8-bit or 16-bit character string.  If the UNICODE preprocessor symbol is defined, this compiles to wchar_t *; if the UNICODE preprocessor symbol is undefined, this compiles to char *This is the preferred way of declare a pointer to a string.
LPCTSTR
PCTSTR
A pointer to a constant 8-bit or 16-bit character string.  If the UNICODE preprocessor symbol is defined, this compiles to const wchar_t *; if the UNICODE preprocessor symbol is undefined, this compiles to const char *This is the preferred way to declare a const pointer to a string.  Note that most APIs want LPCTSTR arguments, and a CString can always be used in such a context.
_T('x') A character literal.  If the UNICODE preprocessor symbol is defined, this compiles as L'x', a 16-bit character value; if the UNICODE preprocessor symbol is undefined, this compiles as 'x', an 8-bit character value. This is the preferred way to declare a character constant.
_T("abc") A string literal.  If the UNICODE preprocessor symbol is defined, this compiles as L"abc", a wide-character string terminated with a wide-character NUL; if the UNICODE preprocessor symbol is undefined, this compiles as "abc", an 8-bit string literal terminated with an 8-bit NUL character.  This is the preferred way to declare a string constant.

CString types

CString A string data type.  If the UNICODE preprocessor symbol is defined, this compiles as a type that holds 16-bit wide characters, terminated with a 16-bit NUL (CStringW); if the UNICODE preprocessor symbol is undefined, this compiles as a type that holds 8-bit characters, terminated with an 8-bit NUL (CStringA).
CStringA A string data type.  Independent of the setting of the UNICODE preprocessor symbol, this always represents a sequence of 8-bit characters terminated with an 8-bit NUL.
CStringW A string data type.  Independent of the setting of the UNICODE preprocessor symbol, this always represents a sequence of 16-bit characters terminated with a 16-bit NUL.
CStringT Essentially, an alias for CString.

Other types

BSTR A counted Unicode string.  This is an interface type used for communicating with Visual Basic, COM, ActiveX, and other specialized interfaces.
std::string A C++ Standard Library string.  This is used only when portability to other platforms is desirable.  Generally, it has no significant advantage in MFC programming. 
UNICODE_STRING A kernel data type.  This is used in cases where there is an interface to underlying low-level kernel APIs, most commonly the "undocumented" APIs.
PUNICODE_STRING A pointer to a UNICODE_STRING structure.
PCUNICODE_STRING A pointer to a const UNICODE_STRING structure.

String Concatenation

One of the very convenient features of CString is the ability to concatenate two strings. For example if we have

CString gray("Gray");
CString cat("Cat");
CString graycat = gray + cat;

is a lot nicer than having to do something like:

char gray[] = "Gray";
char cat[] = "Cat";
char * graycat = malloc(strlen(gray) + strlen(cat) + 1);
strcpy(graycat, gray);
strcat(graycat, cat);

Note that the above code is not "Unicode-aware", that is, it only works in compilations of ANSI applications.  The correct Unicode-aware representation would be

CString gray(_T("Gray"));
CString cat(_T("Cat"));
CString graycat = gray + cat;

Formatting (including integer-to-CString)

Rather than using sprintf or wsprintf, you can do formatting for a CString by using the Format method:

CString s;
s.Format(_T("The total is %d"), total);

The advantage here is that you don't have to worry about whether or not the buffer is large enough to hold the formatted data; this is handled for you by the formatting routines.

Use of formatting is the most common way of converting from non-string data types to a CString, for example, converting an integer to a CString:

CString s;
s.Format(_T("%d"), total);

I always use the _T( ) macro because I design my programs to be at least Unicode-aware, but that's a topic for some other essay. The purpose of _T( ) is to compile a string for an 8-bit-character application as:

#define _T(x) x // non-Unicode version

whereas for a Unicode application it is defined as

#define _T(x) L##x // Unicode version

so in Unicode the effect is as if I had written

s.Format(L"%d", total);

If you ever think you might ever possibly use Unicode, start coding in a Unicode-aware fashion. For example, never, ever use sizeof( ) to get the size of a character buffer, because it will be off by a factor of 2 in a Unicode application. We cover Unicode in some detail in Win32 Programming. When I need a size, I have a macro called DIM, which is defined in a file dim.h that I include everywhere:

#define DIM(x) ( sizeof((x)) / sizeof((x)[0]) )

This is not only useful for dealing with Unicode buffers whose size is fixed at compile time, but any compile-time defined table.

class Whatever { ... };
Whatever data[] = {
   { ... },
    ...
   { ... },
};

for(int i = 0; i < DIM(data); i++) // scan the table looking for a match 

Bytes vs. characters

Beware of those API calls that want genuine byte counts; using a character count will not work.

TCHAR data[20];
lstrcpyn(data, longstring, sizeof(data) - 1); // WRONG!
lstrcpyn(data, longstring, DIM(data) - 1); // RIGHT but questionable
WriteFile(f, data, DIM(data), &bytesWritten, NULL); // WRONG!
WriteFile(f, data, sizeof(data), &bytesWritten, NULL); // RIGHT but questionable
WriteFile(f, data, lstrlen(data)*sizeof(TCHAR), &bytesWritten, NULL);  // More RIGHT but still questionable
WriteFile(f, longstring, lstrlen(longstring)*sizeof(TCHAR), &bytesWritten, NULL); // RIGHT 

This is because lstrcpyn wants a character count, but WriteFile wants a byte count. Also note that this always writes out the entire contents of data. If you only want to write out the actual length of the data, you would think you might do

WriteFile(f, data, lstrlen(data), &bytesWritten, NULL); // WRONG

but that will not work in a Unicode application. Instead, you must do

WriteFile(f, data, lstrlen(data) * sizeof(TCHAR), &bytesWritten, NULL); // RIGHT

because WriteFile wants a byte count. (For those of you who might be tempted to say "but that means I'll always be multiplying by 1 for ordinary applications, and that is inefficient", you need to understand what compilers actually do. No real C or C++ compiler would actually compile a multiply-by-one instruction inline; the multiply-by-one is simply discarded by the compiler as being a silly thing to do. And if you think when you use Unicode that you'll have to pay the cost of multiplying by 2, remember that this is just a bit-shift left by 1 bit, which the compiler is also happy to do instead of the multiplication).

Why are some forms questionable?  Because it basically makes no sense to copy the data from one place to another just to write it out.  The most common form of this error is to copy a CString to a buffer, e.g.,

TCHAR data[SOME_FIXED_SIZE];
CString s = ...some computation...;
lstrcpyn(data, s, SOME_FIXED_SIZE - 1);
WriteFile(f, data, lstrlen(data)*sizeof(TCHAR), &bytesWritten, NULL);

when the sensible solution is

WriteFile(f, (LPCTSTR)data, data.GetLength() * sizeof(TCHAR), &bytesWritten, NULL);

There is a common myth, especially among beginners, that the argument must be a variable of the same type as the argument.  The truth is that the expression used for that parameter must have the same type as the argument; a variable of the type is not required to exist, as long as the expression produces the right type.

Using _T does not create a Unicode application. It creates a Unicode-aware application. When you compile in the default 8-bit mode, you get a "normal" 8-bit program; when you compile in Unicode mode, you get a Unicode (16-bit-character) application. Note that a CString in a Unicode application is a string that holds 16-bit characters.

Converting a CString to an integer

The simplest way to convert a CString to an integer value is to use one of the standard string-to-integer conversion routines.

While generally you will suspect that _atoi is a good choice, it is rarely the right choice. If you play to be Unicode-ready, you should call the function _ttoi, which compiles into _atoi in ANSI code and _wtoi in Unicode code. You can also consider using _tcstoul (for unsigned conversion to any radix, such as 2, 8, 10 or 16) or _tcstol (for signed conversion to any radix). For example, here are some examples:

CString hex = _T("FAB");
CString decimal = _T("4011");
ASSERT(_tcstoul(hex, 0, 16) == _ttoi(decimal));

Converting a CString to a double (VS6)

This is a real pain for versions < VS.2005.  In all earlier versions of VS, there is no wide-character version (wtof) of the ANSI conversion function (atof).

double atof(const char * string);

has been defined since the prehistory of C, but the required Unicode version

double wtof(const wchar_t * string);

did not appear until the VS2005 library.  This means that _ttof does not exist below VS.2005.

To deal with this, I use the T2A macro.

USES_CONVERSION
CString s = _T("123.45");
double d = atof(T2A(s));

This could be handled with conditional compilation

CString s = _T("123.45");
#ifdef _MSC_VER < 1300
USES_CONVERSION
double d = atof(T2A(s));
#else
double d = _ttof(s);
#endif

Converting a CString of hex digits to an integer

This is a frequent question, because everyone who asks it seems to miss that atoi (and therefore _ttoi) only works on decimal digits 0..9.

The answer is strtoul, wcstoul, or better still, _tcstoul.

ULONG strtoul(LPCSTR ptr, LPSTR * endptr, int base)
ULONG wcstoul(LPCWSTR ptr, LPWSTR * endptr, int base)
ULONG _tcstoul(LPCTSTR ptr, LPTSTR * endptr, int base)

These functions expect an input string of the form

[whitespace] [{+ | –}] [0 [{ x | X }]] [digits]

where whitespace is space or tab characters, and is ignored.  The base value can be any value from 2 through 36, or 0.  If the base is between 2 and 36, then the string is interpreted according to base.  But if base is 0, then special rules come into play. If the first digit is 0 and the character which follows it is not 'x' or 'X', then the number is interpreted as if base were specified as 8.  If the first digit is 0 and the character which follows is 'x' or 'X', then the '0x' or '0X' is ignored as input and the remainder of the number is interpreted as if base had been 16.  Otherwise, it is interpreted as if base were 10.

Converting between char * (TCHAR *) and CString

This is the most common set of questions beginners have on the CString data type. Due largely to serious C++ magic, you can largely ignore many of the problems. Things just "work right". The problems come about when you don't understand the basic mechanisms and then don't understand why something that seems obvious doesn't work.

For example, having noticed the above example you might wonder why you can't write

CString graycat = "Gray" + "Cat";

or

CString graycat("Gray" + "Cat");

In fact the compiler will complain bitterly about these attempts. Why? Because the + operator is defined as an overloaded operator on various combinations of the CString and LPCTSTR data types, but not between two LPCTSTR data types, which are underlying data types. You can't overload C++ operators on base types like int and char, or char *. What will work is 

CString graycat = CString("Gray") + CString("Cat");

or even

CString graycat = CString("Gray") + "Cat";

If you study these, you will see that the + always applies to at least one CString and one LPCSTR.

Note that it is always better to write Unicode-aware code, e.g.,

CString graycat = CString(_T("Gray")) + _T("Cat");

and so on. This makes your code immediately portable.

Generally, you should forget that char exists as a data type except in very rare and exotic situations where the fact that it is 8-bit characters is dictated by some external constraint, such as a hardware device or a network connection.  In that case, with VS.NET 2003 and later, you can use CStringA to represent a CString that is always 8-bit characters.

char * (TCHAR *) to CString

So you have a char *, WCHAR *, or TCHAR *, or a string literal. How do you create a CString. Here are some examples:

char * p = "This is a test"

or, in Unicode-aware applications

TCHAR * p = _T("This is a test")

or

LPTSTR p = _T("This is a test");

you can write any of the following:

CString s = "This is a test";     // 8-bit only
CStringA s = "This is a test";    // 8-bit characters will work in Unicode app*
CString s = L"This is a test";    // Unicode only
CStringW s = L"This is a test";   // Unicode characters will work in 8-bit app*
CString s = _T("This is a test"); // Unicode-aware
CString s("This is a test");      // 8-bit only
CStringA s("This is a test");     // 8-bit characters will work in Unicode app*
CStringW s(L"This is a test");    // Unicode characters will work in an 8-bit app*
CString s(_T("This is a test"));  // Unicode-aware
CString s = p;
CString s(p);

*Note that CStringA and CStringW are not available in VS6, only in VS.NET versions!

Any of these readily convert the constant string or the pointer to a CString value. Note that the characters assigned are always copied into the CString so that you can do something like

TCHAR * p = _T("Gray");
CString s(p);
p = _T("Cat");
s += p;

and be sure that the resulting string is "GrayCat".

There are several other methods for CString constructors, but we will not consider most of these here; you can read about them on your own.

Actually, it is a bit subtler than I show. For example

CString s = "This is a test"; 

is sloppy programming, but actually will compile correctly for Unicode. What it does is invoke the MultiByteToWideChar operation of the CString constructor to convert, at run-time, the 8-bit character string to a 16-bit Unicode character string. However, this can still be useful if the char * pointer refers, for example, to 8-bit data that just came in over the network.  Always try to avoid this and program Unicode-aware.

CString to char */TCHAR * I: Casting to LPCTSTR

This is a slightly harder transition to find out about, and there is lots of confusion about the "right" way to do it. There are quite a few right ways, and probably an equal number of wrong ways.

The first thing you have to understand about a CString is that it is a special C++ object which contains three values: a pointer to a buffer, a count of the valid characters in the buffer, and a buffer length. The count of the number of characters can be any size from 0 up to the maximum length of the buffer minus one (for the NUL byte). The character count and buffer length are cleverly hidden.

Unless you do some special things, you know nothing about the size of the buffer that is associated with the CString. Therefore, if you can get the address of the buffer, you cannot change its contents. You cannot shorten the contents, and you absolutely must not lengthen the contents. This leads to some at-first-glance odd workarounds.

The operator LPCTSTR (or more specifically, the operator (const TCHAR *), is overloaded for CString. The definition of the operator is to return the address of the buffer. Thus, if you need a string pointer to the CString you can do something like

CString s("GrayCat");
LPCTSTR p =  s;

and it works correctly. This is because of the rules about how casting is done in C; when a cast is required, C++ rules allow the cast to be selected. For example, you could define (float) as a cast on a complex number (a pair of floats) and define it to return only the first float (called the "real part") of the complex number so you could say

Complex c(1.2f, 4.8f);
float realpart = c;

and expect to see, if the (float) operator is defined properly, that the value of realpart is now 1.2.

This works for you in all kinds of places. For example, any function that takes an LPCTSTR parameter will force this coercion, so that you can have a function (perhaps in a DLL you bought):

BOOL DoSomethingCool(LPCTSTR s);

and call it as follows

CString file(_T("c://myfiles//coolstuff"))
BOOL result = DoSomethingCool(file);

This works correctly because the DoSomethingCool function has specified that it wants an LPCTSTR and therefore the LPCTSTR operator is applied to the argument, which in MFC means that the address of the string is returned.

But what if you want to format it?

CString graycat(_T("GrayCat"));
CString s;
s.Format(_T("Mew! I love %s"), graycat);

Note that because the value appears in the variable-argument list (the list designated by "..." in the specification of the function) that there is no implicit coercion operator. What are you going to get?

Well, surprise, you actually get the string

"Mew! I love GrayCat"

because the MFC implementers carefully designed the CString data type so that an expression of type CString evaluates to the pointer to the string, so in the absence of any casting, such as in a Format or sprintf, you will still get the correct behavior. The additional data that describes a CString actually lives in the addresses below the nominal CString address.

What you can't do is modify the string. For example, you might try to do something like replace the "." by a "," (don't do it this way, you should use the National Language Support features for decimal conversions if you care about internationalization, but this makes a simple example):

CString v("1.00");  // currency amount, 2 decimal places
LPCTSTR p = v;
p[lstrlen(p) - 3] = ',';

If you try to do this, the compiler will complain that you are assigning to a constant string. This is the correct message. It would also complain if you tried

strcat(p, "each");

because strcat wants an LPTSTR as its first argument and you gave it an LPCTSTR

Don't try to defeat these error messages. You will get yourself into trouble!

The reason is that the buffer has a count, which is inaccessible to you (it's in that hidden area that sits below the CString address), and if you change the string, you won't see the change reflected in the character count for the buffer. Furthermore, if the string happens to be just about as long as the buffer physical limit (more on this later), an attempt to extend the string will overwrite whatever is beyond the buffer, which is memory you have no right to write (right?) and you'll damage memory you don't own. Sure recipe for a dead application.

Most kernel APIs want LPCTSTR parameters.  Because the (LPCTSTR) operator is defined for CString, the compiler will automatically invoke the conversion.  Given a definition of the form

WINAPI BOOL SomeAPI(LPCTSTR);

This can be called by doing

CString s = _T("Some string value");
if(SomeAPI(s))
   ...

CString to char */TCHAR * II: Using GetBuffer

A special method is available for a CString if you need to modify it. This is the operation GetBuffer. What this does is return to you a pointer to the buffer which is considered writeable. If you are only going to change characters or shorten the string, you are now free to do so:

CString s(_T("File.ext"));
LPTSTR p = s.GetBuffer();
LPTSTR dot = strchr(p, '.'); // OK, should have used s.Find...
if(p != NULL)
    *p = _T('/0');
s.ReleaseBuffer();

This is the first and simplest use of GetBuffer. You don't supply an argument, so the default of 0 is used, which means "give me a pointer to the string; I promise to not extend the string". When you call ReleaseBuffer, the actual length of the string is recomputed and stored in the CString. Within the scope of a GetBuffer/ReleaseBuffer sequene, and I emphasize this: You Must Not, Ever, Use Any Method Of CString on the CString whose buffer you have! The reason for this is that the integrity of the CString object is not guaranteed until the ReleaseBuffer is called. Study the code below:

CString s(...);
LPTSTR p = s.GetBuffer();
//... lots of things happen via the pointer p
int n = s.GetLength(); // BAD!!!!! PROBABLY WILL GIVE WRONG ANSWER!!!
s.TrimRight();         // BAD!!!!! NO GUARANTEE IT WILL WORK!!!!
s.ReleaseBuffer();     // Things are now OK
int m = s.GetLength(); // This is guaranteed to be correct
s.TrimRight();         // Will work correctly

Suppose you want to actually extend the string. In this case you must know how large the string will get. This is just like declaring

char buffer[1024];

knowing that 1024 is more than enough space for anything you are going to do. The equivalent in the CString world is

LPTSTR p = s.GetBuffer(1024);

This call gives you not only a pointer to the buffer, but guarantees that the buffer will be (at least) 1024 characters in length. (Note I said "characters", not "bytes", because CString is Unicode-aware implicitly).

Also, note that if you have a pointer to a const string, the string value itself is stored in read-only memory; an attempt to store into it, even if you've done GetBuffer, you have a pointer to read-only memory, so an attempt to store into the string will fail with an access error. I haven't verified this for CString, but I've seen ordinary C programmers make this error frequently.

A common "bad idiom" left over from C programmers is to allocate a buffer of fixed size, do a sprintf into it, and assign it to a CString:

char buffer[256];
sprintf(buffer, "%......", args, ...); // ... means "lots of stuff here"
CString s = buffer;

while the better form is to do

CString s;
s.Format(_T("%...."), args, ...);

Note that this always works; if your string happens to end up longer than 256 bytes you don't clobber the stack!

Another common error is to be clever and realize that a fixed size won't work, so the programmer allocates bytes dynamically. This is even sillier:

int len = lstrlen(parm1) + 13 + lstrlen(parm2) + 10 + 100;
char * buffer = new char[len];
sprintf(buffer, "%s is equal to %s, valid data", parm1, parm2);
CString s = buffer;
....
delete [] buffer;

Where it can be easily written as

CString s;
s.Format(_T("%s is equal to %s, valid data"), parm1, parm2);

Note that the sprintf examples are not Unicode-ready (although you could use tsprintf and put _T() around the formatting string, but the basic idea is still that you are doing far more work than is necessary, and it is error-prone.

CString to char */TCHAR * III: Interfacing to a control

A very common operation is to pass a CString value in to a control, for example, a CTreeCtrl. While MFC provides a number of convenient overloads for the operation, but in the most general situation you use the "raw" form of the update, and therefore you need to store a pointer to a string in the TVITEM which is included within the TVINSERTITEMSTRUCT:

TVINSERTITEMSTRUCT tvi;
CString s;
// ... assign something to s
tvi.item.pszText = s; // Compiler yells at you here
// ... other stuff
HTREEITEM ti = c_MyTree.InsertItem(&tvi);

Now why did the compiler complain? It looks like a perfectly good assignment! But in fact if you look at the structure, you will see that the member is declared in the TVITEM structure as shown below:

LPTSTR pszText;
int cchTextMax;

Therefore, the assignment is not assigning to an LPCTSTR and the compiler has no idea how to cast the right hand side of the assignment to an LPTSTR.

OK, you say, I can deal with that, and you write

tvi.item.pszText = (LPCTSTR)s; // compiler still complains!

What the compiler is now complaining about is that you are attempting to assign an LPCTSTR to an LPTSTR, an operation which is forbidden by the rules of C and C++. You may not use this technique to accidentally alias a constant pointer to a non-constant alias so you can violate the assumptions of constancy. If you could, you could potentially confuse the optimizer, which trusts what you tell it when deciding how to optimize your program. For example, if you do

const int i = ...;
//... do lots of stuff
     ... = a[i];  // usage 1
// ... lots more stuff
     ... = a[i];  // usage 2

Then the compiler can trust that, because you said const, that the value of i at "usage1" and "usage2" is the same value, and it can even precompute the address of a[i] at usage1 and keep the value around for later use at usage2, rather than computing it each time. If you were able to write

const int i = ...;
int * p = &i;
//... do lots of stuff
     ... = a[i];  // usage 1
// ... lots more stuff
     (*p)++;      // mess over compiler's assumption
// ... and other stuff
     ... = a[i];  // usage 2

The the compiler would believe in the constancy of i, and consequently the constancy of the location of a[i], and the place where the indirection is done destroys that assumption. Thus, the program would exhibit one behavior when compiled in debug mode (no optimizations) and another behavior when compiled in release mode (full optimization). This Is Not Good. Therefore, the attempt to assign the pointer to i to a modifiable reference is diagnosed by the compiler as being bogus. This is why the (LPCTSTR) cast won't really help.

Why not just declare the member as an LPCTSTR? Because the structure is used both for reading and writing to the control. When you are writing to the control, the text pointer is actually treated as an LPCTSTR but when you are reading from the control you need a writeable string. The structure cannot distinguish its use for input from its use for output.

Therefore, you will often find in my code something that looks like

tvi.item.pszText = (LPTSTR)(LPCTSTR)s;

This casts the CString to an LPCTSTR, thus giving me that address of the string, which I then force to be an LPTSTR so I can assign it. Note that this is valid only if you are using the value as data to a Set or Insert style method! You cannot do this when you are trying to retrieve data!

You need a slightly different method when you are trying to retrieve data, such as the value stored in a control. For example, for a CTreeCtrl using the GetItem method. Here, I want to get the text of the item. I know that the text is no more than MY_LIMIT in size. Therefore, I can write something like

TVITEM tvi;
// ... assorted initialization of other fields of tvi
tvi.pszText = s.GetBuffer(MY_LIMIT);
tvi.cchTextMax = MY_LIMIT;
c_MyTree.GetItem(&tvi);
s.ReleaseBuffer();

Note that the code above works for any type of Set method also, but is not needed because for a Set-type method (including Insert) you are not writing the string. But when you are writing the CString you need to make sure the buffer is writeable. That's what the GetBuffer does. Again, note that once you have done the GetBuffer call, you must not do anything else to the CString until the ReleaseBuffer call.

CString to BSTR

When programming with ActiveX, you will sometimes need a value represented as a type BSTR. A BSTR is a counted string, a wide-character (Unicode) string on Intel platforms and can contain embedded NUL characters. 

You can convert at CString to a BSTR by calling the CString method AllocSysString:

CString s;
s = ... ; // whatever
BSTR b = s.AllocSysString();

 The pointer b points to a newly-allocated BSTR object which is a copy of the CString, including the terminal NUL character. This may now be passed to whatever interface you are calling that requires a BSTR. Normally, a BSTR is disposed of by the component receiving it. If you should need to dispose of a BSTR, you must use the call

::SysFreeString(b);

to free the string.

The story is that the decision of how to represent strings sent to ActiveX controls resulted in some serious turf wars within Microsoft. The Visual Basic people won, and the string type BSTR (acronym for "Basic String") was the result.

BSTR to CString

 Since a BSTR is a counted Unicode string, you can use standard conversions to make an 8-bit CString. Actually, this is built-in; there are special constructors for converting ANSI strings to Unicode and vice-versa. You can also get BSTRs as results in a VARIANT type, which is a type returned by various COM and Automation calls.

For example, if you do, in an ANSI application,

BSTR b;
b = ...; // whatever
CString s(b == NULL ? L"" : b)

works just fine for a single-string BSTR, because there is a special constructor that takes an LPCWSTR (which is what a BSTR is) and converts it to an ANSI string. The special test is required because a BSTR could be NULL, and the constructors Don't Play Well with NULL inputs (thanks to Brian Ross for pointing this out!). This also only works for a BSTR that contains only a single string terminated with a NUL; you have to do more work to convert strings that contain multiple NUL characters. Note that embedded NUL characters generally don't work well in CStrings and generally should be avoided.

Remember, according to the rules of C/C++, if you have an LPWSTR it will match a parameter type of LPCWSTR (it doesn't work the other way!).

In UNICODE mode, this is just the constructor

CString::CString(LPCTSTR);

As indicated above, in ANSI mode there is a special constructor for

CString::CString(LPCWSTR); 

this calls an internal function to convert the Unicode string to an ANSI string. (In Unicode mode there is a special constructor that takes an LPCSTR, a pointer to an 8-bit ANSI string, and widens it to a Unicode string!). Again, note the limitation imposed by the need to test for a BSTR value which is NULL.

There is an additional problem as pointed out above: BSTRs can contain embedded NUL characters; CString constructors can only handle single NUL characters in a string. This means that CStrings will compute the wrong length for a string which contains embedded NUL bytes. You need to handle this yourself. If you look at the constructors in strcore.cpp, you will see that they all do an lstrlen or equivalent to compute the length. 

Note that the conversion from Unicode to ANSI uses the ::WideCharToMultiByte conversion with specific arguments that you may not like. If you want a different conversion than the default, you have to write your own.

If you are compiling as UNICODE, then it is a simple assignment:

CString convert(BSTR b)
   {
    if(b == NULL)
        return CString(_T(""));
    CString s(b); // in UNICODE mode
    return s;
   }

If you are in ANSI mode, you need to convert the string in a more complex fashion. This will accomplish it. Note that this code uses the same argument values to ::WideCharToMultiByte that the implicit constructor for CString uses, so you would use this technique only if you wanted to change these parameters to do the conversion in some other fashion, for example, specifying a different default character, a different set of flags, etc.

CString convert(BSTR b)
   {
    CString s;
    if(b == NULL)
       return s; // empty for NULL BSTR
#ifdef UNICODE
    s = b;
#else
    LPSTR p = s.GetBuffer(SysStringLen(b) + 1); 
    ::WideCharToMultiByte(CP_ACP,            // ANSI Code Page
                          0,                 // no flags
                          b,                 // source widechar string
                          -1,                // assume NUL-terminated
                          p,                 // target buffer
                          SysStringLen(b)+1, // target buffer length
                          NULL,              // use system default char
                          NULL);             // don't care if default used
    s.ReleaseBuffer();
#endif
    return s;
   }

Note that I do not worry about what happens if the BSTR contains Unicode characters that do not map to the 8-bit character set, because I specify NULL as the last two parameters. This is the sort of thing you might want to change.

VARIANT to CString  

Actually, I've never done this; I don't work in COM/OLE/ActiveX where this is an issue. But I saw a posting by Robert Quirk on the microsoft.public.vc.mfc newsgroup on how to do this, and it seemed silly not to include it in this essay, so here it is, with a bit more explanation and elaboration. Any errors relative to what he wrote are my fault.

A VARIANT is a generic parameter/return type in COM programming. You can write methods that return a type VARIANT, and which type the function returns may (and often does) depend on the input parameters to your method (for example, in Automation, depending on which method you call, IDispatch::Invoke may return (via one of its parameters) a VARIANT which holds a BYTE, a WORD, an float, a double, a date, a BSTR, and about three dozen other types (see the specifications of the VARIANT structure in the MSDN). In the example below, it is assumed that the type is known to be a variant of type BSTR, which means that the value is found in the string referenced by bstrVal.  This takes advantage of the fact that there is a constructor which, in an ANSI application, will convert a value referenced by an LPCWCHAR to a CString (see BSTR-to-CString). In Unicode mode, this turns out to be the normal CString constructor. See the caveats about the default ::WideCharToMultibyte conversion and whether or not you find these acceptable (mostly, you will).

VARIANT vaData;

vaData = m_com.YourMethodHere();
ASSERT(vaData.vt == VT_BSTR);

CString strData(vaData.bstrVal);

Note that you could also make a more generic conversion routine that looked at the vt field. In this case, you might consider something like:

CString VariantToString(VARIANT * va)
   {
    CString s;
    switch(va->vt)
      { /* vt */
       case VT_BSTR:
          return CString(vaData->bstrVal);
       case VT_BSTR | VT_BYREF:
          return CString(*vaData->pbstrVal);
       case VT_I4:
          s.Format(_T("%d"), va->lVal);
          return s;
       case VT_I4 | VT_BYREF:
          s.Format(_T("%d"), *va->plVal);
       case VT_R8:
          s.Format(_T("%f"), va->dblVal);
          return s;
       ... remaining cases left as an Exercise For The Reader
       default:
          ASSERT(FALSE); // unknown VARIANT type (this ASSERT is optional)
          return CString("");
      } /* vt */
   }

Loading STRINGTABLE values

If you want to create a program that is easily ported to other languages, you must not include native-language strings in your source code. (For these examples, I'll use English, since that is my native language (aber Ich kann ein bischen Deutsch sprechen). So it is very bad practice to write

CString s = "There is an error";

Instead, you should put all your language-specific strings (except, perhaps, debug strings, which are never in a product deliverable). This means that is fine to write

s.Format(_T("%d - %s"), code, text);

in your program; that literal string is not language-sensitive. However, you must be very careful to not use strings like

// fmt is "Error in %s file %s"
// readorwrite is "reading" or "writing"
s.Format(fmt, readorwrite, filename); 

I speak of this from experience. In my first internationalized application I made this error, and in spite of the fact that I know German, and that German word order places the verb at the end of a sentence, I had done this. Our German distributor complained bitterly that he had to come up with truly weird error messages in German to get the format codes to do the right thing. It is much better (and what I do now) to have two strings, one for reading and one for writing, and load the appropriate one, making them string parameter-insensitive, that is, instead of loading the strings "reading" or "writing", load the whole format:

// fmt is "Error in reading file %s"
//          "Error in writing file %s"
s.Format(fmt, filename);

Note that if you have more than one substitution, you should make sure that if the word order of the substitutions does not matter,

抱歉!评论已关闭.