Non greedy dotall regex in Python

user3853423

I need to parse annotations of methods written in PHP. I wrote a regex (see simplified example below) to search them but it doesn't work as expected. Instead of matching the shortest part of text between /** and */, it matches the maximum amount of source code (previous methods with annotations). I'm sure I'm using the correct .*? non greedy version of * and I have found no evidence DOTALL turns it off. Where could be the problem, please? Thank you.

p = re.compile(r'(?:/\*\*.*?\*/)\n\s*public', re.DOTALL)
methods = p.findall(text)
user2357112 supports Monica

Regex engines parse from left to right. A lazy quantifier will attempt to match the least it can from the current match position, but it can't push the match start forward, even if that would reduce the amount of text matched. That means rather than starting at the last /** before the public, it's going to match from the first /** to the next */ that's attached to a public.

If you want to exclude */ from inside the comment, you'll need to group the . with a lookahead assertion:

(?:(?!\*/).)

The (?!\*/) asserts that the character we're matching is not the start of a */ sequence.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related