The GetStringTypeA function returns character-type information for the characters in the specified source string. For each character in the string, the function sets one or more bits in the corresponding 16-bit element of the output array. Each bit identifies a given character type, such as whether the character is a letter, a digit, or neither.
BOOL GetStringTypeA(
LCID Locale, |
// locale identifer |
DWORD dwInfoType, |
// information-type options |
LPCSTR lpSrcStr, |
// pointer to the source string |
int cchSrc, |
// size, in bytes, of the source string |
LPWORD lpCharType |
// pointer to the buffer for output |
); |
This parameter can be a locale identifier created by the MAKELCID macro, or one of the following predefined values:
LOCALE_SYSTEM_DEFAULT |
Default system locale |
LOCALE_USER_DEFAULT |
Default user locale |
Note that the Locale parameter does not exist in the GetStringTypeW
function. Because of that parameter difference, an application cannot
automatically invoke the proper A or W version of GetStringType*
through the use of the #define UNICODE switch. An application can
circumvent this limitation by using GetStringTypeEx,
which is the recommended Win32 function.
CT_CTYPE1 |
Retrieve character type information. |
CT_CTYPE2 |
Retrieve bidirectional layout information. |
CT_CTYPE3 |
Retrieve text processing information. |
If the function succeeds, the return value is nonzero.
If the function fails, the return value is zero. To get extended error information, call GetLastError. GetLastError may return one of the following error codes:
ERROR_INVALID_FLAGS
ERROR_INVALID_PARAMETER
The lpSrcStr and lpCharType pointers must not be the same. If they are the same, the function fails and GetLastError returns ERROR_INVALID_PARAMETER.
The Locale parameter is only used to perform string conversion to Unicode. It has nothing to do with the CTYPEs the function returns. The CTYPEs are solely determined by Unicode code points, and do not vary on a locale basis. For example, Greek letters are C1_ALPHA for any Locale value.
The character-type bits are divided into several levels. The information for one level can be retrieved by a single call to this function. Each level is limited to 16 bits of information so that the other mapping routines, which are limited to 16 bits of representation per character, can also return character-type information.
The character types supported by this function include the following.
Name |
Value |
Meaning |
C1_UPPER |
0x0001 |
Uppercase |
C1_LOWER |
0x0002 |
Lowercase |
C1_DIGIT |
0x0004 |
Decimal digits |
C1_SPACE |
0x0008 |
Space characters |
C1_PUNCT |
0x0010 |
Punctuation |
C1_CNTRL |
0x0020 |
Control characters |
C1_BLANK |
0x0040 |
Blank characters |
C1_XDIGIT |
0x0080 |
Hexadecimal digits |
C1_ALPHA |
0x0100 |
Any linguistic character: alphabetic, syllabary, or ideographic |
The following character types are either constant or computable from basic types and do not need to be supported by this function.
Type |
Description |
Alphanumeric |
Alphabetic characters and digits (C1_ALPHA and C1_DIGIT) |
Printable |
Graphic characters and blanks (all C1_* types except C1_CNTRL) |
Name |
Value |
Meaning |
Strong |
|
|
C2_LEFTTORIGHT |
0x0001 |
Left to right |
C2_RIGHTTOLEFT |
0x0002 |
Right to left |
Weak |
|
|
C2_EUROPENUMBER |
0x0003 |
European number, European digit |
C2_EUROPESEPARATOR |
0x0004 |
European numeric separator |
C2_EUROPETERMINATOR |
0x0005 |
European numeric terminator |
C2_ARABICNUMBER |
0x0006 |
Arabic number |
C2_COMMONSEPARATOR |
0x0007 |
Common numeric separator |
Neutral |
|
|
C2_BLOCKSEPARATOR |
0x0008 |
Block separator |
C2_SEGMENTSEPARATOR |
0x0009 |
Segment separator |
C2_WHITESPACE |
0x000A |
White space |
C2_OTHERNEUTRAL |
0x000B |
Other neutrals |
Not applicable | ||
C2_NOTAPPLICABLE |
0x0000 |
No implicit directionality (for example, control codes) |
Name |
Value |
Meaning |
C3_NONSPACING |
0x0001 |
Nonspacing mark |
C3_DIACRITIC |
0x0002 |
Diacritic nonspacing mark |
C3_VOWELMARK |
0x0004 |
Vowel nonspacing mark |
C3_SYMBOL |
0x0008 |
Symbol |
C3_KATAKANA |
0x0010 |
Katakana character |
C3_HIRAGANA |
0x0020 |
Hiragana character |
C3_HALFWIDTH |
0x0040 |
Half-width character |
C3_FULLWIDTH |
0x0080 |
Full-width character |
C3_IDEOGRAPH |
0x0100 |
Ideographic character |
C3_KASHIDA |
0x0200 |
Arabic Kashida character |
C3_LEXICAL |
0x0400 |
Punctuation which is counted as part of the word (Kashida, hyphen, feminine/masculine ordinal indicators, equal sign, and so forth) |
C3_ALPHA |
0x8000 |
All linguistic characters (alphabetic, syllabary, and ideographic) |
Not applicable | ||
C3_NOTAPPLICABLE |
0x0000 |
Not applicable |