Basic Querying Features

These are the basic features of a query:

The scope tells the query engine where to look when searching. It describes the set of documents within the corpus that will be searched. The restriction tests to see if a document should be returned. A restriction is a set of terms that can be combined by various operators. The result set defines the information to return from a query.

In addition to the basic features, other features let you control how results are returned and displayed, for example, how results are sorted. You can also:

Scope

A query scope specifies the set of documents that must be searched. Typically scopes are specified by a directory path on a storage volume, such as D:\Docs.

Index Server indexes documents based on sites. An administrator can index all the sites on a server, or select a subset of sites to index. Queries can be run against multiple sites, against a single site, or even against a single physical directory within a site.

Restriction

You can query against the contents of Web pages and other documents served by IIS and Index Server. The types of documents you can query include HTML, Microsoft® Word, Microsoft® Excel, Microsoft® PowerPoint®, and plain text documents. Other document types are not supported by Index Server directly, but a content filter can be written to extend the list of supported document types. A content filter reads a proprietary document format and emits textual words, which are indexed by Index Server. For more information on content filters, go to the http://www.microsoft.com/ and search for IFilter interface.

With Index Server you can search for multiple words and phrases within documents as well as words and phrases near other words and phrases. Index Server also provides free-text queries. With free-text queries, you can enter any set of words or phrases, or even a complete sentence, as the query restriction. Index Server will examine this text, identify all the nouns and noun phrases, and post a query using those terms. For example, assume you typed the following free-text query:

The Fulton County Grand Jury said Friday an investigation of Atlanta’s recent primary election produced no evidence that any irregularities took place.

Index Server identifies the following words and noun phrases:

Words: Fulton, county, grand, jury, Friday, investigation, Atlanta, recent, primary, election, produce, evidence, irregularity

Phrases: Fulton county grand jury, primary election, grand jury, Atlanta’s recent primary election

These words and phrases are combined into a restriction, weighted for proper ranking, and posted as a query against the corpus.

Property Restrictions

In addition to querying contents, users can query properties stored on objects. These properties include file size, creation and modification dates, file names, authors, and so on. Clients can query both textual properties (file name and author, for example) and numerical properties (size and modification date, for example). Clients can also query all ActiveX™ properties, including custom properties on Microsoft Office documents.

You can use the standard comparison operators in queries. These include =, >, <, >=, <=, and != (not equal) for numeric and textual properties. In addition, for textual properties all the content query functionality is available. With Boolean operators (AND, OR, and NOT) and parentheses, you can freely mix restriction terms.

Fuzzy Queries

Index Server supports fuzzy queries, which contain simple wildcards (such as those in MS-DOS®), and matches regular expressions against textual properties. Content queries support simple-prefix matching (for example, dog* will return dogmatic and doghouse). Index Server also supports linguistic stemming, which matches inflected and base forms of query words. (For example, swim** is expanded to swimming, swam, swum, and so on.)

Although Index Server does not support true natural-language processing, it supports free-text mode.

Result Sets

Index Server assembles query hits into result sets, which are returned to the client. The administrator can limit the maximum number of hits returned to the client. For example, a result set of 200 hits can be returned the client in 10 pages of 20 hits each. The query form determines the number of hits returned per page, but you can configure a form to let the client specify the number of hits to be returned.

In addition to sorting by rank, Index Server can sort query results according to any document property.

If the corpus is stored on a local Windows NT File System (NTFS) volume, Index Server respects all security restrictions and checks access control lists (ACLs). In a result set, a user can never see a document reference if the ACL on that object prohibits Read access to that client. ACL checking on IIS remote virtual directories is determined by the IIS security on those remote virtual directories.

If allowed, the client can specify the specific properties to return in a result set (that is, the columns in the result set). Index Server can display only properties stored in the property cache or properties that can be retrieved from the file being searched. The administrator can restrict the properties returned by a query.

In addition to returning properties stored with the document, Index Server can generate document abstracts (or summaries), which can also be returned in a result set.

Logging

IIS logs all traffic moving between a client and the server. Standard IIS logging records query information such as the querying IP address and the queries posted to the server.


© 1997 by Microsoft Corporation. All rights reserved.