Some people, when confronted with a problem, think I know, Ill use regular expressions. Now they have two problems. --Jamie Zawinski, in comp.lang.emacs
Used to extremes in Perl. Available in many languages. The following is designed as a quick reference / memory jog for experienced RE users. Any new users should... A) find another solution B) copy existing working code C) join a newsgroup or mailing list and ask for help D) take a class. RE is like shaking hands with an octopus.
Matches ^ beginning $ end . any character [.-.] any character from the first "." to the second where . is any character e.g. [A-Z] matches any uppercase letter Literals \. Quote. Treats "." as a literal value where . is any character e.g. \$ matches the dollar sign, not the end of line. \### Byte where ### are three octal digits. \x## Byte where ## are two hexadecimal digits. Flow control (.*) Group. Matches everything in the parens or nothing. Saves the match in $# were # counts up the groups. e.g. Time: (..):(..):(..) will put the hours in $1, minutes in $2 and seconds in $3. .*|.* Or. If the pattern before the "|" fails to match, it will try the pattern after. e.g. A|B will match A or B Repeat * 0 or more times. Same as {0,}. Will "eat" to the end unless followed by ? or something else + 1 or more times. Same as {1,}. Will "eat" to the end unless followed by ? or something else ? 0 or 1 times. Same as {0,1} {n} Match exactly n times {n,} Match at least n times. Will "eat" to the end unless followed ? or something else {n,m} Match at least n but not more than m times. .*? Match the minimum number of times possible where .* is one of the repeat patterns above. e.g. foo(.*)bar used against "the food is barbecued in the barn" will set $1 to "d is barbecued in the " but foo(.*?)bar will set it to "d is ". Notice that foo(.*)barb will also produce "d is "
For a regular expression to match, the entire regular expression must match, not just part of it. So if the beginning of a pattern containing a quantifier succeeds in a way that causes later parts in the pattern to fail, the matching engine backs up and recalculates the beginning part--that's why it's called backtracking.
Also:
See also:
City state zip | \s*(.*)\s*,\s*([A-Z]{{2}})\s+(\d{{5}}(\-\d{{4}})?)\s*" |
HTML eMail with only an image in it |
The following expression will match a message that contains one or more
images and no text at
all: <BODY[^>]*>(<[^>]+>|\n|\r)*<IMG[^>]+>(<[^>]+>|\n|\r)*</BODY> |
HTML eMail with an image |
<BODY[^>]*>(<[^>]+>|\n|\r|\s)*<IMG[^>]*src=['"]?cid: |
Interested: