Prev Next

GetStringTypeEx info Overview Group

The GetStringTypeEx function returns character-type information for the characters in the specified source string. For each character in the string, the function sets one or more bits in the corresponding 16-bit element of the output array. Each bit identifies a given character type, such as whether the character is a letter, a digit, or neither.

Unlike its close relatives GetStringTypeA and GetStringTypeW, GetStringTypeEx exhibits appropriate A or W behavior through the use of the #define UNICODE switch. It is the recommended Win32 function.

BOOL GetStringTypeEx(

LCID Locale,

// locale identifer

DWORD dwInfoType,

// information-type options

LPCTSTR lpSrcStr,

// address of source string

int cchSrc,

// size, in bytes or characters, of source string

LPWORD lpCharType

// address of buffer for output

);

Parameters

Locale

Specifies the locale identifier. This value uniquely defines the ANSI code page to use to translate the string pointed to by lpSrcStr from ANSI to Unicode. The function then analyzes each Unicode character for character type information. Note that the W version of this function ignores this parameter.

This parameter can be a locale identifier created by the MAKELCID macro, or one of the following predefined values:

LOCALE_SYSTEM_DEFAULT

Default system locale

LOCALE_USER_DEFAULT

Default user locale

dwInfoType

Specifies the type of character information the user wants to retrieve. The various types are divided into different levels (see the following Remarks section for a list of the information included in each type). This parameter can specify one of the following character type flags:

CT_CTYPE1	Retrieve character type information.
CT_CTYPE2	Retrieve bidirectional layout information.
CT_CTYPE3	Retrieve text processing information.

lpSrcStr

Points to the string for which character types are requested. If cchSrc is -1, the string is assumed to be null terminated. This must be a Unicode string for the W version of this function, and an ANSI string for the A version. Note that for the A version, this can be a double-byte character set (DBCS) string if the locale is appropriate for DBCS.

cchSrc

Specifies the size, in bytes (ANSI version) or characters (Unicode version), of the string pointed to by the lpSrcStr parameter. If this count includes a null terminator, the function returns character type information for the null terminator. If this value is -1, the string is assumed to be null terminated and the length is calculated automatically.

lpCharType

Points to an array of 16-bit values. The length of this array must be large enough to receive one 16-bit value for each character in the source string. When the function returns, this array contains one word corresponding to each character in the source string.

Return Values

If the function succeeds, the return value is nonzero.

If the function fails, the return value is zero. To get extended error information, call GetLastError. GetLastError may return one of the following error codes:

ERROR_INVALID_FLAGS
ERROR_INVALID_PARAMETER

Remarks

The GetStringTypeEx function exists to circumvent a limitation caused by the difference in parameters of GetStringTypeA and GetStringTypeW. That parameter difference prevents an application from automatically invoking the proper A or W version of GetStringType* through the use of the #define UNICODE switch. GetStringTypeEx, on the other hand, behaves properly as regards that switch. Thus, it is the recommended Win32 function.

The Locale parameter is only used to perform string conversion to Unicode. It has nothing to do with the CTYPEs the function returns. The CTYPEs are solely determined by Unicode code points, and do not vary on a locale basis. For example, Greek letters are C1_ALPHA for any Locale value.

The lpSrcStr and lpCharType pointers must not be the same. If they are the same, the function fails and GetLastError returns ERROR_INVALID_PARAMETER.

The character-type bits are divided into several levels. The information for one level can be retrieved by a single call to this function. Each level is limited to 16 bits of information so that the other mapping routines, which are limited to 16 bits of representation per character, can also return character-type information.

The character types supported by this function include the following.

Ctype 1

These types support ANSI C and POSIX (LC_CTYPE) character-typing functions. A combination of these values is returned in the array pointed to by the lpCharType parameter when the dwInfoType parameter is set to CT_CTYPE1.

Name	Value	Meaning
C1_UPPER	0x0001	Uppercase
C1_LOWER	0x0002	Lowercase
C1_DIGIT	0x0004	Decimal digits
C1_SPACE	0x0008	Space characters
C1_PUNCT	0x0010	Punctuation
C1_CNTRL	0x0020	Control characters
C1_BLANK	0x0040	Blank characters
C1_XDIGIT	0x0080	Hexadecimal digits
C1_ALPHA	0x0100	Any linguistic character: alphabetic, syllabary, or ideographic

The following character types are either constant or computable from basic types and do not need to be supported by this function.

Type

Description

Alphanumeric

Alphabetic characters and digits (C1_ALPHA and C1_DIGIT)

Printable

Graphic characters and blank (all C1_* types except C1_CNTRL)

Ctype 2

These types support proper layout of Unicode text. The direction attributes are assigned so that the bidirectional layout algorithm standardized by Unicode produces accurate results. These types are mutually exclusive. For more information about the use of these attributes, see The Unicode Standard: Worldwide Character Encoding, Volumes 1 and 2, Addison Wesley Publishing Company: 1991, 1992, ISBN 0201567881.

Name	Value	Meaning
Strong
C2_LEFTTORIGHT	0x0001	Left to right
C2_RIGHTTOLEFT	0x0002	Right to left
Weak
C2_EUROPENUMBER	0x0003	European number, European digit
C2_EUROPESEPARATOR	0x0004	European numeric separator
C2_EUROPETERMINATOR	0x0005	European numeric terminator
C2_ARABICNUMBER	0x0006	Arabic number
C2_COMMONSEPARATOR	0x0007	Common numeric separator
Neutral
C2_BLOCKSEPARATOR	0x0008	Block separator
C2_SEGMENTSEPARATOR	0x0009	Segment separator
C2_WHITESPACE	0x000A	White space
C2_OTHERNEUTRAL	0x000B	Other neutrals
Not applicable
C2_NOTAPPLICABLE	0x0000	No implicit directionality (for example, control codes)

Ctype 3

These types are intended to be placeholders for extensions to the POSIX types required for general text processing or for the standard C library functions. These types are supported in the current version of Windows NT. A combination of these values is returned when dwInfoType is set to CT_CTYPE3.

Name	Value	Meaning
C3_NONSPACING	0x0001	Nonspacing mark
C3_DIACRITIC	0x0002	Diacritic nonspacing mark
C3_VOWELMARK	0x0004	Vowel nonspacing mark
C3_SYMBOL	0x0008	Symbol
C3_KATAKANA	0x0010	Katakana character
C3_HIRAGANA	0x0020	Hiragana character
C3_HALFWIDTH	0x0040	Half-width character
C3_FULLWIDTH	0x0080	Full-width character
C3_IDEOGRAPH	0x0100	Ideographic character
C3_KASHIDA	0x0200	Arabic Kashida character
C3_LEXICAL	0x0400	Punctuation which is counted as part of the word (Kashida, hyphen, feminine/masculine ordinal indicators, equal sign, and so forth)
C3_ALPHA	0x8000	All linguistic characters (alphabetic, syllabary, and ideographic)
Not applicable
C3_NOTAPPLICABLE	0x0000	Not applicable

LCID Locale,	// locale identifer
DWORD dwInfoType,	// information-type options
LPCTSTR lpSrcStr,	// address of source string
int cchSrc,	// size, in bytes or characters, of source string
LPWORD lpCharType	// address of buffer for output
);

LOCALE_SYSTEM_DEFAULT	Default system locale
LOCALE_USER_DEFAULT	Default user locale

Type	Description
Alphanumeric	Alphabetic characters and digits (C1_ALPHA and C1_DIGIT)
Printable	Graphic characters and blank (all C1_* types except C1_CNTRL)

GetStringTypeEx info Overview Group

Parameters

Return Values

Remarks

See Also