Filtering

Microsoft Index Server filters documents by inserting data from the document files into content indexes. Content filters break documents into words (keys) and create word lists, which supply raw data for the index. Filtering is a three-step process:

A filter DLL (dynamic-link library) extracts the text and properties out of a document.
A word-breaker DLL parses the text and textual properties into words.
Noise words (also known as stop words) are removed from the data extracted from the document, and the remaining words are stored in the index.

This section contains:

CiDaemon Process: Describes how Index Server identifies the correct filters to apply to a list of documents.
Disk-Full Condition: Describes how Index Server works when disk space is running low.
Filter DLLs: Explains how text and properties are extracted from different document types.
Word-Breaker DLLs and Noise Words: Describes how document text and properties returned by filters are broken up into words and how common words are excluded.
Modifying Filter Registration: Tells how to modify filter registration and identify file types with binary files, using Regsvr32.exe.
CiDaemon Priority Settings: Lists possible settings for ThreadClassFilter and ThreadPriorityFilter.
Related Performance Counters: Lists and explains the counters present under the Windows NT Performance Monitor object Content Index.