This topic describes how document text and properties returned by filters are broken up into words and how common words are excluded.
A word-breaker DLL parses the text and textual properties returned by the filter DLL into words. The word-breaker DLL is language dependent. For a list of languages supported by Index Server, see the Index Server Web page.
Words that are not significant for searching are called noise words or stop words. Noise words are stored in %systemroot%\system32 directory in various noise word files (Noise.dat, by default). The noise word files are language dependent. The noise word file for a particular language is specified in the registry under the key:
HKEY_LOCAL_MACHINE\SYSTEM \SYSTEM \CurrentControlSet \Control \ContentIndex \Language \<language> \NoiseFile
For example, the noise word file for English_US is listed as the registry key:
HKEY_LOCAL_MACHINE\SYSTEM \SYSTEM \CurrentControlSet \Control \ContentIndex \Language \English_US \NoiseFile \noise.dat
The noise word files can be edited with a text editor to either add new words or remove words that are not considered noise at a particular installation. Note that querying for noise words will not yield any hits.
Caution Removing all noise words from the noise word files can significantly increase the size of indexes.