我在文件中有一个字符串,该字符串将由Perl读取,并且可以是:
previous content ending with a linebreak
keyword: content
next content
要么
previous content, also ending with a line end
keyword: { content that contains {
nested parenthesis } and may span
multiple lines,c closed by matching parenthesis}
next content
无论哪种情况,我都成功地从上一个内容的开头到下一个结尾的内容以字符串形式加载了该内容$str
。
现在,我想提取结束上一个内容的换行符和下一个内容之前的换行符之间的内容。
所以我用了一个正则表达式$str
:
if($str =~
/.*\nkeyword: # keyword: is always constant, immediately after a newline
(?!\{+) # NO { follows
\s+(?!\{+) # NO { with a heading whitespace
\s* # white space between keyword: and content
(?!\{+) # no { immediately before content
# question : should the last one be a negative lookbehind AFTER the check for content itself?
([^\s]+) # the content, should be in $1;
(?!\{+) # no trailing { immediately after content
\s+ # delimited by a whitespace, ignore what comes afterwards
| # or
/.*\nkeyword: # keyword: is always constant, immediately after a newline
(?=\s*{*\s*)*) # any mix of whitespace and {
(?=\{+) # at least one {
(?=\s*{*\s*)*) # again any mix of whitespace and {
([^\{\}]+) # no { or }
(?=\s*}*\s*)*) # any mix of whitespace and }
(?=\}+) # at least one }
(?=\s*}*\s*)*) # again any mix of whitespace and }
) { #do something with $1}
我意识到这一点并没有真正解决嵌套括号中的多行信息。但是,它应该以形式捕获对象keyword: {{ content} }
但是,虽然我可以捕获内容$1
以防万一
keyword: content
表格,我无法捕捉
keyword: {multiline with nested
{parenthesis} }
我最终确实使用了一个简单的基于计数器的解析器而不是regex来实现它。我想知道如何在regex中执行此操作,以捕获第二种形式的对象,并请提供regex命令的说明。
另外,我的表述哪里出了错,甚至没有捕获带有多个(但匹配的)标题和结尾括号的单行内容?
您可以使用此:
#!/usr/bin/perl
use strict;
use warnings;
my $str = "previous content ending with a linebreak
keyword: content
next content
previous contnet, also ending with a line end
keyword: { content that contains {
nested parenthesis } and may span
multiple lines,c losed by matching parethesis}
next content";
while ($str =~ /\nkeyword:
(?| # branch reset: i.e. the two capture groups have the same number
\s*
({ (?> [^{}]++ | (?1) )*+ }) # recursive pattern
| # OR
\h*
(.*+) # capture all until the end of line
) # close the branch reset group
/xg ) {
print "$1\n";
}
此模式尝试使用带大括号的嵌套内容,如果找不到或不平衡,则尝试第二种方法,仅匹配行的内容(因为点不能匹配换行符)。
分支重置功能(?|..|..)
可用于为交替的每个部分的捕获组赋予相同的编号。
递归模式详细信息:
( # open the capturing group 1
{ # literal opening curly bracket
(?> # atomic group: possible content between brackets
[^{}]++ # all that is not a curly bracket
| # OR
(?1) # recurse to the capturing group 1 (!here is the recursion!)
)*+ # repeat the atomic group zero or more times
} # literal closing curly bracket
) # close the capturing group 1
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句