Glossary


A B C D E F G I L M N O P Q R S T W

Select the first letter of the word from the list above to jump to appropriate section of the glossary.


A  To Top

Abstract
A summary of a document or HTML page. Microsoft Index Server can automatically generate a document abstract using information contained within the document, such as Heading information in HTML pages and property information on documents. Also called a characterization.

Access Control List (ACL)
A level of Windows NT permission that you can set on a file or a folder allowing some users to access it while other users cannot access it. For details, see the Windows NT documentation.


B  To Top

Boolean
A type of variable that can have only two values, typically 1 or 0. Boolean variables are often used to express conditions that are either TRUE or FALSE. Queries with Boolean operators (AND, OR, NOT, and NEAR) are referred to as Boolean queries.

Breaker, word
An Index Server language utility that is responsible for identifying words in a document. As the document contents are emitted by the content filter, the word breaker identifies where the words are located in the sentence. There is one word breaking module for each of the languages supported by Index Server.


C  To Top

Catalog
The directory in which Index Server data is stored. The data is stored in the directory Catalog.wci under the path chosen at the time of installation.

Characterization
See Abstract.

Child process
An executing computer program that is started by another executing program. For example, if Process-A is running and it executes another program, Process-B, Process-B is a child process of Process-A.

Corpus
The collection of documents and HTML pages indexed by Index Server.

Cursor
A pointer into the context index. Functionally the same as a database cursor, the cursor points to the next record to retrieve from the information store.

D  To Top

DLL, dynamic-link library
A collection of programs that can be accessed and executed by other programs running on the computer. These files typically use the extension .dll. For example, the Microsoft Word filter DLL may contain several programs (the content filters) that read different versions of Microsoft Word files. These different programs are packaged together in a single dynamic-link library for convenience and efficiency.

Dirty shutdown
Any unusual or abnormal shutdown for Index Server or IIS. Index Server has a very specific shutdown sequence that must be followed to guarantee that updates to the index happen correctly. If this shutdown sequence is not followed, the index may become corrupted. For example, a power failure is considered a dirty shutdown because Index Server is not given the chance to execute its shutdown sequence.

E  To Top

Embedding, embedded object
Typically data from one program that is stored within the data of another program. For example, a user may create a Microsoft Word document. Later the user creates a spreadsheet using Microsoft Excel and inserts this spreadsheet in the Microsoft Word document. The spreadsheet is embedded in the Word document and is referred to as an embedding, or embedded object.

F  To Top

Filter, content
An Index Server component that is responsible for reading a document from the disk and extracting the textual content from that document. Typically filters are associated with particular document formats. For example, Microsoft Word documents have their contents extracted by a different filter than Microsoft Excel documents.

Filter DLL
A dynamic-link library (DLL) that collects together a number of content filters.

Fixup
A prefix on a path that will be substituted for the scope when a remote client sends a query.

Free-text query
With a free-text query, the user can enter any set of words or phrases, or even a complete sentence, as the query restriction. Index Server examines this text, identifies all the nouns and noun phrases, and posts a query with those terms. For example, assume the user typed the following free-text query:

The Fulton County Grand Jury said Friday an investigation of Atlanta’s recent primary election produced no evidence that any irregularities took place.

The system would identify the following words and noun phrases:

Words: Fulton, county, grand, jury, Friday, investigation, Atlanta, recent, primary, election, produce, evidence, irregularities.

Phrases: Fulton county grand jury, primary election, grand jury, Atlanta’s recent primary election

The words and phrases are combined into a restriction, weighted for proper ranking, and posted as a query against the corpus.

Note   You must preface all free-text queries with $contents.

Fuzzy Query
Fuzzy queries search for words that are similar to the words or text entered in the query restriction. Rather than looking for only exact matches, the system will modify the words in the query and look for these modified forms.

The system supports simple wildcards (such as those in MS-DOS®) and regular expression matching (as used in UNIX) against textual properties. Content queries support simple prefix matching (for example, dog* will return dogmatic and doghouse). The system also provides linguistic stemming support that matches inflected and base forms of query words. (For example, swim is expanded to swimming, swam, swum, and so on.)


G  To Top

GUID
A globally unique identifier (GUID), expressed as a number in the following format:

nnnnnnnn-nnnn-nnnn-nnnn-nnnnnnnnnnnn

For example:

F29F85E0-4FF9-AB91-08002B27B3D9

I  To Top

Indexed Directory
A directory pointed to by a virtual root that is configured by the administrator to be indexed by Index Server.

L  To Top

Locale
Used to indicate language information. For example, a Web server may have a locale variable that indicates the default language used on that server. A server in Seattle will probably have a locale of EN-US (U.S. English) whereas a server in Berlin would have a local of DE (German or Deutsch). Web browsers can specify locale also to indicate the language that the user of that browser understands. Documents and Web pages also can specify a locale to indicate what language the text is in.

Locale ID
A number that uniquely identifies a locale to API set for Windows NT. This number consists of a language code and a sublanguage code.

M  To Top

Master index
A persistent index that contains the indexed data for a large number of documents.

Metadata
Data used to describe other data. For example, Index Server must maintain data that describes the data in the content index. This data that Index Server maintains is called metadata because it describes how data in the index is stored.

N  To Top

National Language Support (NLS)
Helps applications developed for the Win32 application programming interface (API) adapt to the differing language and locale-specific needs of users around the world.

Noise words
Words that are not significant in searches, such as a, an, and the in English. Noise words are also called stop words.

Normalizer, word
An Index Server component that takes accepts words and converts them into a standard representation before placing them in the index.

O  To Top

Overhead, disk
The amount of space required to store the index information.

P  To Top

Persistent index
An index with data stored on a disk.

Phrase
A sequence of words that can be searched for. See Free text query for an example.

Property
Data associated with a file, but not actually stored within the contents of a file. For example, a Microsoft Word document may possess an AUTHOR property, which gives information about the person who wrote the document. Properties are often accessible by the operating system directly and do not require the original application to read them. For example, in Windows 95 (or later) you can read the AUTHOR property on a Microsoft Word document without having to start Microsoft Word or even have it installed on your computer.

Property Value
The data contained in a property. If a document is authored by John Smith, the AUTHOR property contains (its value is) "John Smith".

PROPID
An integer that uniquely identifies a property. This integer can be expressed as decimal (10-based) or hexadecimal (16-based) number.

Q  To Top

Query
In Index Server, the process of searching for specific data in a set of files and returning links to the files containing that data.

R  To Top

Regex
An abbreviation for regular expression.

Registered binary file
A binary file is typically an executable file with the extension .exe. The term binary file can also refer to a file whose disk format is unknown. A registered binary file is one whose format is known and is entered in the system registry, but which is not assigned a content filter.

Regular expression
An expression syntax used by many operating systems, especially UNIX, to specify similarity between words and phrases. A powerful way to express wildcards in textual expressions.

Restriction
A description of what to look for in a query. A restriction narrows the focus of a query.

Result set
The information returned by Index Server in response to a query. Also used to define the set of properties or columns to return from the files that matched the query restriction.

S  To Top

Scan
The action of checking all files and directories for modifications among the virtual roots selected for indexing. When Index Server is first activated it must scan all directories and files to find the documents that may have changed since Index Server was shut down. Scanning is a background operation that allows queries to be executed. Once scanning is complete, Index Server can usually use change notifications to keep its indexes up to date.

Scope
A query scope specifies the set of documents that must be searched. Typically scopes are specified by a directory path on a storage volume, such as D:\Docs. Index Server can also use virtual roots to indicate scope.

Shadow index
A persistent index created by merging word lists and sometimes other shadow indexes into a single index.

Sleep time
A waiting period during which a particular operation does not take place. For example, if index merging takes place every 24 hours, the sleep time between merges is 24 hours.

Stemmer, word
An Index Server component that takes a word and generates grammatically correct variations of that word. Different languages require their own stemmer. For example, the English stemmer if given the word swam, would generate swim, swam, swum, swimming, swims, and so on.

Stop words
See Noise words.

T  To Top

Time slice
A specific amount of time dedicated to computational task.

W  To Top

Word breaker DLL
See
Breaker, word.
Word list
During indexing, Index Server first extracts words and properties from documents and collects them into word lists. These words and properties are combined with other word lists into shadow indexes, and then moved into the master index.

© 1997 by Microsoft Corporation. All rights reserved.