The CONTAINS predicate provides several text matching options for building queries. Typically, you use this predicate to query the Contents property - the text contents of a file. You can also query other textual properties with content, such as DocTitle, DocSubject, or a user-defined property.
CONTAINS ( [Column_Reference, ] ' Content_Search_Conditions ') > 0
Note: The "> 0" part of the expression is required. It causes the expression to evaluate to a Boolean result.
The CONTAINS predicate uses the following conventions when dealing with single and double quotes in basic string literals:
If a basic string literal contains spaces, always use double-quotes as delimiters. For example:
'"Phrase With Spaces"'
If a basic string literal with no spaces contains a single-quote, use two single-quotes to specify the single-quote character. For example:
'Phrase''Without''Spaces'
If a basic string literal with spaces contains a single-quote, use double-quotes as delimiters. For example:
'"Phrase 'With' Spaces"'
If a basic string literal with no spaces contains a double-quote, the two double-quotes are not necessary. For example:
'Phrase"Without"Spaces'
If a basic string literal with spaces contains a double-quote, use two double-quotes to specify the double-quote character. For example:
'"Phrase ""With"" Spaces"'
The types of matches, along with descriptions of how they work, are listed in the following table. Click a match type for a more detailed explanation and examples.
Match Type | Description |
---|---|
Simple | Matches the specified word or phrase. |
Word Prefix | Uses the wildcard character "*" to match words containing the same prefix. |
Proximity | Searches for words required to be close to one another. |
Linguistic Generation | Matches various tenses of verbs, plural or singular of nouns. |
List | Returns rows that match a list of words. |
In a simple match, the word or phrase must be an exact match; however, the matching is not case sensitive. Multiple consecutive words are treated as a single phrase, and must appear in exactly the same order in the matching document. If a phrase contains blank spaces between words or punctuation marks, it must be enclosed in double quotes.
CONTAINS ([column_reference, ] '"Word_or_Phrase" [ Boolean_Operator '"Word_or_Phrase" ] ... ') > 0
Column_Reference | Specifies the column name (alias). Its data type must be compatible with the format of the Word_or_Phrase specified. |
Word_or_Phrase |
Specifies the literal of type Basic String to match. |
Boolean_Operator | Specifies the boolean operator to use, following the precedence rules previously stated, that combines the comparison predicates. |
The following example searches the DocAuthor property of all files from the default scope, and retrieves rows consisting of the DocTitle property for any file in which the DocAuthor is Smith or Jones.
SELECT DocTitle FROM SCOPE() WHERE CONTAINS(DocAuthor,'"Smith" OR "Jones"') > 0
The following example searches the Contents of all files from the default scope, and retrieves rows consisting of the DocTitle and DocAuthor properties for any file containing the phrases "telecommunications industry" and "United Kingdom".
SELECT DocTitle, DocAuthor FROM SCOPE() WHERE CONTAINS('"telecommunications industry" AND "United Kingdom"') > 0
A word prefix match is similar to a simple match, except that the wildcard asterisk (*) character is used to represent the variable component of a word or phrase. The matching is not case sensitive. Multiple consecutive words are treated as a single phrase, and must appear in exactly the same order in the matching document. If a phrase contains blank spaces between words or punctuation marks, it must be enclosed in double quotes.
CONTAINS ( [ column_reference, ] ' "Word_or_Phrase*" [ Boolean_Operator "Word_or_Phrase" ] ... ' ) > 0
Column_Reference | Specifies the column name (alias). Its data type must be compatible with the format of the Word_or_Phrase specified. |
Word_or_Phrase | Specifies the literal of type Basic String to use as the prefix. |
Boolean_Operator | Specifies the boolean operator to use, following the precedence rules previously stated, that combines the comparison predicates. |
The following example searches the Contents of all files from the default scope, and retrieves rows consisting of the DocTitle property for any file containing words such as wine, winery, wineries, and so forth.
SELECT DocTitle FROM SCOPE() WHERE CONTAINS('wine*') > 0
In a proximity match, the query returns a row only if the location of two or more words in a property is close to each other.
CONTAINS ([column_reference,] '"Word_or_Phrase " { NEAR() | ~ } "Word_or_Phrase " [ { NEAR() | ~ } "Word_or_Phrase " ] ...') > 0
Column_Reference | Specifies the column name (alias). Its data type must be compatible with the format of the Word_or_Phrase specified. |
Word_or_Phrase | Specifies the literal of type Basic String to match. |
NEAR() | ~ | Specifies the proximity relationship between adjacent words or phrases. (You can also use the tilde (~) instead of NEAR().) Currently, NEAR() uses WORD as the unit of measurement and a value of approximately 50 words as the proximity distance; in other words, if the two phrases are within approximately 50 words, a row is returned. |
Note: You can also use a proximity match to query for more than two words that are near each other.
The following example returns files where wordA is near wordB, which is near wordC.
...WHERE CONTAINS ('"wordA" NEAR() "wordB" NEAR() "wordC"') > 0
The following example returns files where Index is within approximately 50 words of the word Server:
...WHERE CONTAINS(DocSubject,'"Index" ~ "Server"') > 0
The following example returns files containing the word Microsoft, and where the word Index is close to the word Server:
...WHERE CONTAINS(DocSubject,'"Microsoft" AND "Index" NEAR() "Server"') > 0
A linguistic generation match is a type of fuzzy search in which the target of the search is expanded to include variations of the word. Currently, the SQL Extensions support inflectional morphology generation, in which the search expands to include verb tenses, and the singular or plural of nouns.
CONTAINS( [column_reference, ] 'FORMSOF(INFLECTIONAL, "Generation_Term ")') > 0
Column_Reference | Specifies the column name (alias). Its data type must be compatible with the format of the Word_or_Phrase specified. |
Generation_Term | Specifies the literal of type Basic String to expand. |
The following example returns files that contain words such as "driving", "driven", and "drives".
...WHERE CONTAINS('FORMSOF(INFLECTIONAL, "drive")') > 0
The SQL Extensions support queries based on matching a group of words or phrases. Also called vector space queries, these queries work by weighting each of the match terms, or vector components. The rank of each returned row indicates how well it matches the query. Rows returned do not have to match every listed word or phrase. The matching algorithm uses all the weighted components together when determining a match.
CONTAINS( [column_reference, ] 'ISABOUT(Vector_Term [ WEIGHT( Value ) ] [ , Vector_Term [ WEIGHT( Value ) ] ] ...') > 0
Column_Reference | Specifies the column name (alias). Its data type must be compatible with the format of the Vector_Term specified. |
Vector_Term | Specifies the terms to match. The term is one of the following types: Simple, Word Prefix, Proximity, or Linguistic Generation. |
WEIGHT( Value ) | Specifies the weight of the Vector_Term. If you specify a weight for a term, you must enclose the term in double quotes. You can specify Value in the range of 0.0 to 1.0. |
The following example queries the Contents property for the words "railroad", "transportation", and "locomotive". Files whose Contents include the word "heavy" near the word "locomotive" will be weighted higher in the results ranking. Since the term "transportation" is ranked lower than the other two terms, files containing only that term are ranked lower than files containing either "railroad" or "locomotive".
...WHERE CONTAINS ('ISABOUT (railroad, "transportation" WEIGHT(0.5), "heavy near locomotive" WEIGHT(0.9))') > 0