I try to write a python function that counts a specific word in a string.
My regex pattern doesn't work when the word I want to count is repeated multiple times in a row. The pattern seems to work well otherwise.
Here is my function
import re
def word_count(word, text):
return len(re.findall('(^|\s|\b)'+re.escape(word)+'(\,|\s|\b|\.|$)', text, re.IGNORECASE))
When I test it with a random string
>>> word_count('Linux', "Linux, Word, Linux")
2
When the word I want to count is adjacent to itself
>>> word_count('Linux', "Linux Linux")
1
Problem is in your regex. Your regex is using 2 capture groups and re.findall
will return any capture groups if available. That needs to change to non-capture groups using (?:...)
Besides there is reason to use (^|\s|\b)
as \b
or word boundary is suffice which covers all the cases besides \b
is zero width.
Same way (\,|\s|\b|\.|$)
can be changed to \b
.
So you can just use:
def word_count(word, text):
return len(re.findall(r'\b' + re.escape(word) + r'\b', text, re.I))
This will give:
>>> word_count('Linux', "Linux, Word, Linux")
2
>>> word_count('Linux', "Linux Linux")
2
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加