现在的位置: 首页 > 综合 > 正文

boost epressive用法2

2014年06月15日 ⁄ 综合 ⁄ 共 4232字 ⁄ 字号 评论关闭

书接上回,我们接着讲如何找到匹配结果中的字串

实例5:从匹配的字串中标记指定的子表达式

#define _SCL_SECURE_NO_WARNINGS // 去除vs编译警告

#include <iostream>
#include <boost/xpressive/xpressive.hpp>

using namespace boost::xpressive;

int main()
{
	std::string str( "Eric: 4:40, Karl: 3:35, Francesca: 2:32" );

	// find a race time
	sregex time = sregex::compile( "(\\d):(\\d\\d)" );

	// for each match, the token iterator should first take the value of
	// the first marked sub-expression followed by the value of the second
	// marked sub-expression
	int const subs[] = { 1, 2 };

	sregex_token_iterator cur( str.begin(), str.end(), time, subs );
	sregex_token_iterator end;

	for( ; cur != end; ++cur )
	{
		std::cout << *cur << '\n';
	}
	/*
	result:
	4
	40
	3
	35
	2
	32
	*/

	// 另一种实现,实例4中类似的方法实现之
	sregex_iterator curI( str.begin(), str.end(), time );
	sregex_iterator endI;

	for( ; curI != endI; ++curI )
	{
		std::cout << (*curI)[1] << ":" <<  (*curI)[2] << '\n';
	}

	return 0;
}
/*
result:
4:40
3:35
2:32
*/

实例6:token_iterator的特殊应用

#define _SCL_SECURE_NO_WARNINGS // 去除vs编译警告

#include <iostream>
#include <boost/xpressive/xpressive.hpp>

using namespace boost::xpressive;

int main()
{
	std::string str( "Now <bold>is the time <i>for all good men</i> to come to the aid of their</bold> country." );

	// find a HTML tag
	//sregex html = '<' >> optional('/') >> +_w >> '>';
	sregex html = sregex::compile("</?(\\w*)>");

	// -1, 是一个特殊的token数组标志,表示所有不能匹配的字串
	sregex_token_iterator cur( str.begin(), str.end(), html, -1 );
	sregex_token_iterator end;

	for( ; cur != end; ++cur )
	{
		std::cout << '{' << *cur << '}';
	}
	std::cout << '\n';
	// result:{Now }{is the time }{for all good men}{ to come to the aid of their}{ country.}

	// 0, 是一个特殊的token数组标志,表示所有能匹配的字串
	sregex_token_iterator curI( str.begin(), str.end(), html, 0);

	for( ; curI != end; ++curI )
	{
		std::cout << '{' << *curI << '}';
	}
	std::cout << '\n';
	// result: {<bold>}{<i>}{</i>}{</bold>}

	// 1为元素的数组, 是一个特殊的token数组标志,表示所有能匹配的字串内部的第1个子串
	const int sub[] = {1};
	sregex_token_iterator cur2( str.begin(), str.end(), html, sub);

	for( ; cur2 != end; ++cur2 )
	{
		std::cout << '{' << *cur2 << '}';
	}
	std::cout << '\n';
	// result:{bold}{i}{i}{bold}

	return 0;
}

对应的基本表达式表

Perl

Static xpressive

Meaning

.

_

any character (assuming Perl's /s modifier).

ab

a
>>
b

sequencing of a and b sub-expressions.

a|b

a
|
b

alternation of a and b sub-expressions.

(a)

(s1=
a)

group and capture a back-reference.

(?:a)

(a)

group and do not capture a back-reference.

\1

s1

a previously captured back-reference.

a*

*a

zero or more times, greedy.

a+

+a

one or more times, greedy.

a?

!a

zero or one time, greedy.

a{n,m}

repeat<n,m>(a)

between n and m times, greedy.

a*?

-*a

zero or more times, non-greedy.

a+?

-+a

one or more times, non-greedy.

a??

-!a

zero or one time, non-greedy.

a{n,m}?

-repeat<n,m>(a)

between n and m times, non-greedy.

^

bos

beginning of sequence assertion.

$

eos

end of sequence assertion.

\b

_b

word boundary assertion.

\B

~_b

not word boundary assertion.

\n

_n

literal newline.

.

~_n

any character except a literal newline (without Perl's /s modifier).

\r?\n|\r

_ln

logical newline.

[^\r\n]

~_ln

any single character not a logical newline.

\w

_w

a word character, equivalent to set[alnum | '_'].

\W

~_w

not a word character, equivalent to ~set[alnum | '_'].

\d

_d

a digit character.

\D

~_d

not a digit character.

\s

_s

a space character.

\S

~_s

not a space character.

[:alnum:]

alnum

an alpha-numeric character.

[:alpha:]

alpha

an alphabetic character.

[:blank:]

blank

a horizontal white-space character.

[:cntrl:]

cntrl

a control character.

[:digit:]

digit

a digit character.

[:graph:]

graph

a graphable character.

[:lower:]

lower

a lower-case character.

[:print:]

print

a printing character.

[:punct:]

punct

a punctuation character.

[:space:]

space

a white-space character.

[:upper:]

upper

an upper-case character.

[:xdigit:]

xdigit

a hexadecimal digit character.

[0-9]

range('0','9')

characters in range '0' through
'9'
.

[abc]

as_xpr('a')
| 'b'
|
'c'

characters 'a',
'b'
, or 'c'.

[abc]

(set=
'a','b','c')

same as above

[0-9abc]

set[

range
('0','9')
| 'a'
|
'b' |
'c'
]

characters 'a',
'b'
, 'c' or in range
'0'
through '9'.

[0-9abc]

set[

range
('0','9')
| (set=
'a','b','c')
]

same as above

[^abc]

~(set=
'a','b','c')

not characters 'a',
'b'
, or 'c'.

(?i:stuff)

icase(stuff)

match stuff disregarding case.

(?>stuff)

keep(stuff)

independent sub-expression, match stuff and turn off backtracking.

(?=stuff)

before(stuff)

positive look-ahead assertion, match if before stuff but don't include
stuff in the match.

(?!stuff)

~before(stuff)

negative look-ahead assertion, match if not before stuff.

(?<=stuff)

after(stuff)

positive look-behind assertion, match if after stuff but don't include
stuff in the match. (stuff must be constant-width.)

(?<!stuff)

~after(stuff)

negative look-behind assertion, match if not after stuff. (stuff must be constant-width.)

(?P<name>stuff)

mark_tag
name(n);
...
(name=
stuff)

Create a named capture.

(?P=name)

mark_tag
name(n);
...
name

Refer back to a previously created named capture. 

抱歉!评论已关闭.