正则表达式语法汇总–类Unix、UltraEdit、MS VC++ 6.0及VS.NET

现在的位置: 首页 > 综合 > 正文

正则表达式语法汇总–类Unix、UltraEdit、MS VC++ 6.0及VS.NET

2013年08月16日 ⁄ 综合 ⁄ 共 6209字 ⁄ 字号小中大 ⁄ 评论关闭

正则表达式语法汇总

正则表达式作为功能强大的文本模式匹配语言应用非常广泛，除类Unix系统所使用的标准正则表达式外，像UltraEdit、MS VC++ 6.0编辑器、VS.NET编辑器等也会遇到。但是他们的语法是有差别的，下面就将这几类正则表达式的语法罗列出来以供在必要时查阅。

一、标准正则表达式

这里所说的标准正则表达式是指类Unix系统所使用的正则表达式，其语法如下：

Regular Expressions (Unix Syntax):

Symbol	Function
/	Indicates the next character has a special meaning. "n" on it抯 own matches the character "n". "/n" matches a linefeed or newline character. See examples below (/d, /f, /n etc).
^	Matches/anchors the beginning of line.
$	Matches/anchors the end of line.
*	Matches the preceding character zero or more times.
+	Matches the preceding character one or more times. Does not match repeated newlines.
.	Matches any single character except a newline character. Does not match repeated newlines.
(expression)	Brackets or tags an expression to use in the replace command.A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression. The corresponding replacement expression is /x, for x in the range 1-9. Example: If (h.o) (f.s) matches "hello folks", /2 /1 would replace it with "folks hello".
[xyz]	A character set. Matches any characters between brackets.
[^xyz]	A negative character set. Matches any characters NOT between brackets.
/d	Matches a digit character. Equivalent to [0-9].
/D	Matches a nondigit character. Equivalent to [^0-9].
/f	Matches a form-feed character.
/n	Matches a linefeed character.
/r	Matches a carriage return character.
/s	Matches any whitespace including space, tab, form-feed, etc but not newline.
/S	Matches any non-whitespace character but not newline.
/t	Matches a tab character.
/v	Matches a vertical tab character.
/w	Matches any word character including underscore.
/W	Matches any nonword character.
/p	Matches CR/LF (same as /r/n) to match a DOS line terminator

二、UltraEdit风格的正则表达式

Regular Expressions (UltraEdit Syntax):

Symbol	Function
%	Matches the start of line - Indicates the search string must be at the beginning of a line but does not include any line terminator characters in the resulting string selected.
$	Matches the end of line - Indicates the search string must be at the end of line but does not include any line terminator characters in the resulting string selected.
?	Matches any single character except newline.
*	Matches any number of occurrences of any character except newline.
+	Matches one or more of the preceding character/expression. At least one occurrence of the character must be found. Does not match repeated newlines.
++	Matches the preceding character/expression zero or more times. Does not match repeated newlines.
^b	Matches a page break.
^p	Matches a newline (CR/LF) (paragraph) (DOS Files)
^r	Matches a newline (CR Only) (paragraph) (MAC Files)
^n	Matches a newline (LF Only) (paragraph) (UNIX Files)
^t	Matches a tab character
[ ]	Matches any single character or range in the brackets
^{A^}^{B^}	Matches expression A OR B
^	Overrides the following regular expression character
^(sub-regex)	Brackets or tags an expression to use in the replace command. A regular expression may have up to 9 tagged expressions, numbered according to their order in the regular expression. The corresponding replacement expression is ^x, for x in the range 1-9. Example: If ^(ho^) ^(fs^) matches "hello folks", ^2 ^1 would replace it with "folks hello".

三、MS VC++ 6.0编辑器风格的正则表达式

在使用MS VC++ 6.0编辑代码时，我们常常会在代码中“查找/替换”，这时只需勾选“正则表达式”选项就可以在查找替换时使用功能强大的正则表达式。下面是在此处使用正则表达式相应的语法规则：

Regular Expression	Description
.	(Period.) Any single character.
[ ]	Any one of the characters contained in the brackets, or any of an ASCII range of characters separated by a hyphen (-). For example, b[aeiou]d matches bad, bed, bid, bod, and bud, and r[eo]+d matches red, rod, reed, and rood, but not reod or roed. x[0-9] matches x0, x1, x2, and so on. If the first character in the brackets is a caret (^), then the regular expression matches any characters except those in the brackets.
^	The beginning of a line.
$	The end of a line.
/( /)	Indicates a tagged expression to retain for replacement purposes. If the expression in the Find What text box is /(lpsz/)BigPointer, and the expression in the Replace With box is /1NewPointer, all selected occurrences of lpszBigPointer are replaced with lpszNewPointer. Each occurrence of a tagged expression is numbered according to its order in the Find What text box, and its replacement expression is /n, where 1 corresponds to the first tagged expression, 2 to the second, and so on. You can have up to nine tagged expressions.
/~	No match if the following character or characters occur. For example, b/~a+d matches bbd, bcd, bdd, and so on, but not bad. You can use this expression to prefix a group of characters you want to exclude, which is useful for excluding matches of particular words. For example, foo/~/(lish/) matches "foo" in "food" and "afoot" but not in "foolish."
/{c/!c/}	Any one of the characters separated by the alternation symbol (/!). For example, /{j/!u/}+fruit finds jfruit, jjfruit, ufruit, ujfruit, uufruit, and so on.
*	None or more of the preceding characters or expressions. For example, ba*c matches bc, bac, baac, baaac, and so on.
+	At least one or more of the preceding characters or expressions. For example, ba+c matches bac, baac, baaac, but not bc.
/{/}	Any sequence of characters between the escaped braces. For example, /{ju/}+fruit finds jufruit, jujufruit, jujujufruit, and so on. Note that it will not find jfruit, ufruit, or ujfruit, because the sequence ju is not in any of those strings.
[^]	Any character except those following the caret (^) character in the brackets, or any of an ASCII range of characters separated by a hyphen (-). For example, x[^0-9] matches xa, xb, xc, and so on, but not x0, x1, x2, and so on.
/:a	Any single alphanumeric character [a – zA – Z0 – 9].
/:b	Any white-space character. The /:b finds tabs and spaces. There is no alternate syntax to express :b.
/:c	Any single alphabetic character [a – zA – Z].
/:d	Any decimal digit [0 – 9].
/:n	Any unsigned number /{[0-9]+/.[0-9]/![0-9]/.[0-9]+/![0-9]+/}. For example, /:n should match 123, .45, and 123.45.
/:z	Any unsigned decimal integer [0 – 9]+.
/:h	Any hexadecimal number [0 – 9a – fA – F]+.
/:i	Any C/C++ identifier [a – zA – Z_$][a – zA – Z0 – 9_$]+.
/:w	Any alphabetic string [a – zA – Z]+. The string need not be bounded by white space or appear at the beginning or the end of a line.
/:q	Any quoted string /{"[^"]"/!'[^']'/}.
/	Removes the pattern match characteristic in the Find What text box from the special characters listed above. For example, 100$ matches 100 at the end of a line, but 100/$ matches the character string 100$ anywhere on a line.