正则表达式语法汇总
正则表达式作为功能强大的文本模式匹配语言应用非常广泛,除类Unix系统所使用的标准正则表达式外,像UltraEdit、MS VC++ 6.0编辑器、VS.NET编辑器等也会遇到。但是他们的语法是有差别的,下面就将这几类正则表达式的语法罗列出来以供在必要时查阅。
一、标准正则表达式
这里所说的标准正则表达式是指类Unix系统所使用的正则表达式,其语法如下:
Regular Expressions (Unix Syntax):
Symbol
|
Function
|
/
|
Indicates the next character has a special meaning. "n" on it抯 own matches the character "n". "/n" matches a linefeed or newline character. See examples below (/d, /f, /n etc).
|
^
|
Matches/anchors the beginning of line.
|
$
|
Matches/anchors the end of line.
|
*
|
Matches the preceding character zero or more times.
|
+
|
Matches the preceding character one or more times. Does not match repeated newlines.
|
.
|
Matches any single character except a newline character. Does not match repeated newlines.
|
(expression)
|
Brackets or tags an expression to use in the replace command.A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.
The corresponding replacement expression is /x, for x in the range 1-9. Example: If (h.*o) (f.*s) matches "hello folks", /2 /1 would replace it with "folks hello". |
[xyz]
|
A character set. Matches any characters between brackets.
|
[^xyz]
|
A negative character set. Matches any characters NOT between brackets.
|
/d
|
Matches a digit character. Equivalent to [0-9].
|
/D
|
Matches a nondigit character. Equivalent to [^0-9].
|
/f
|
Matches a form-feed character.
|
/n
|
Matches a linefeed character.
|
/r
|
Matches a carriage return character.
|
/s
|
Matches any whitespace including space, tab, form-feed, etc but not newline.
|
/S
|
Matches any non-whitespace character but not newline.
|
/t
|
Matches a tab character.
|
/v
|
Matches a vertical tab character.
|
/w
|
Matches any word character including underscore.
|
/W
|
Matches any nonword character.
|
/p
|
Matches CR/LF (same as /r/n) to match a DOS line terminator
|
二、UltraEdit风格的正则表达式
Regular Expressions (UltraEdit Syntax):
Symbol
|
Function
|
%
|
Matches the start of line - Indicates the search string must be at the beginning of a line but does not include any line terminator characters in the resulting string selected.
|
$
|
Matches the end of line - Indicates the search string must be at the end of line but does not include any line terminator characters in the resulting string selected.
|
?
|
Matches any single character except newline.
|
*
|
Matches any number of occurrences of any character except newline.
|
+
|
Matches one or more of the preceding character/expression. At least one occurrence of the character must be found. Does not match repeated newlines.
|
++
|
Matches the preceding character/expression zero or more times. Does not match repeated newlines.
|
^b
|
Matches a page break.
|
^p
|
Matches a newline (CR/LF) (paragraph) (DOS Files)
|
^r
|
Matches a newline (CR Only) (paragraph) (MAC Files)
|
^n
|
Matches a newline (LF Only) (paragraph) (UNIX Files)
|
^t
|
Matches a tab character
|
[ ]
|
Matches any single character or range in the brackets
|
^{A^}^{B^}
|
Matches expression A OR B
|
^
|
Overrides the following regular expression character
|
^(sub-regex)
|
Brackets or tags an expression to use in the replace command. A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression.
The corresponding replacement expression is ^x, for x in the range 1-9. Example: If ^(h*o^) ^(f*s^) matches "hello folks", ^2 ^1 would replace it with "folks hello". |
三、MS VC++ 6.0编辑器风格的正则表达式
在使用MS VC++ 6.0编辑代码时,我们常常会在代码中“查找/替换”,这时只需勾选“正则表达式”选项就可以在查找替换时使用功能强大的正则表达式。下面是在此处使用正则表达式相应的语法规则:
Regular Expression
|
Description
|
.
|
(Period.) Any single character.
|
[ ]
|
Any one of the characters contained in the brackets, or any of an ASCII range of characters separated by a hyphen (-). For example, b[aeiou]d matches bad, bed, bid, bod, and bud, and r[eo]+d matches red, rod, reed, and rood, but not reod or roed. x[0-9] matches x0, x1, x2, and so on. If the first character in the brackets is a caret (^), then the regular expression matches any characters except those in the brackets.
|
^
|
The beginning of a line.
|
$
|
The end of a line.
|
/( /)
|
Indicates a tagged expression to retain for replacement purposes. If the expression in the Find What text box is /(lpsz/)BigPointer, and the expression in the Replace With box is /1NewPointer, all selected occurrences of lpszBigPointer are replaced with lpszNewPointer. Each occurrence of a tagged expression is numbered according to its order in the Find What text box, and its replacement expression is /n, where 1 corresponds to the first tagged expression, 2 to the second, and so on. You can have up to nine tagged expressions.
|
/~
|
No match if the following character or characters occur. For example, b/~a+d matches bbd, bcd, bdd, and so on, but not bad.
You can use this expression to prefix a group of characters you want to exclude, which is useful for excluding matches of particular words. For example, foo/~/(lish/) matches "foo" in "food" and "afoot" but not in "foolish."
|
/{c/!c/}
|
Any one of the characters separated by the alternation symbol (/!). For example, /{j/!u/}+fruit finds jfruit, jjfruit, ufruit, ujfruit, uufruit, and so on.
|
*
|
None or more of the preceding characters or expressions. For example, ba*c matches bc, bac, baac, baaac, and so on.
|
+
|
At least one or more of the preceding characters or expressions. For example, ba+c matches bac, baac, baaac, but not bc.
|
/{/}
|
Any sequence of characters between the escaped braces. For example, /{ju/}+fruit finds jufruit, jujufruit, jujujufruit, and so on. Note that it will not find jfruit, ufruit, or ujfruit, because the sequence ju is not in any of those strings.
|
[^]
|
Any character except those following the caret (^) character in the brackets, or any of an ASCII range of characters separated by a hyphen (-). For example, x[^0-9] matches xa, xb, xc, and so on, but not x0, x1, x2, and so on.
|
/:a
|
Any single alphanumeric character [a – zA – Z0 – 9].
|
/:b
|
Any white-space character. The /:b finds tabs and spaces. There is no alternate syntax to express :b.
|
/:c
|
Any single alphabetic character [a – zA – Z].
|
/:d
|
Any decimal digit [0 – 9].
|
/:n
|
Any unsigned number /{[0-9]+/.[0-9]*/![0-9]*/.[0-9]+/![0-9]+/}. For example, /:n should match 123, .45, and 123.45.
|
/:z
|
Any unsigned decimal integer [0 – 9]+.
|
/:h
|
Any hexadecimal number [0 – 9a – fA – F]+.
|
/:i
|
Any C/C++ identifier [a – zA – Z_$][a – zA – Z0 – 9_$]+.
|
/:w
|
Any alphabetic string [a – zA – Z]+. The string need not be bounded by white space or appear at the beginning or the end of a line.
|
/:q
|
Any quoted string /{"[^"]*"/!'[^']*'/}.
|
/
|
Removes the pattern match characteristic in the Find What text box from the special characters listed above. For example, 100$ matches 100 at the end of a line, but 100/$ matches the character string 100$ anywhere on a line.
|
四、VS.NET 2005编辑器风格的正则表达式
VS.NET 2005编辑器所使用的正则表达式是MS VC++ 6.0编辑器所使用正则表达式的超集:
Expression
|
Syntax
|
Description
|
Any character
|
.
|
Matches any one character except a line break.
|
Maximal — zero or more
|
*
|
Matches zero or more occurrences of the preceding expression.
|
Maximal — one or more
|
+
|
Matches at least one occurrence of the preceding expression.
|
Minimal — zero or more
|
@
|
Matches zero or more occurrences of the preceding expression, matching as few characters as possible.
|
Minimal — one or more
|
#
|
Matches one or more occurrences of the preceding expression, matching as few characters as possible.
|
|