Regex pattern counting with repetitive words

debugcn 投稿 Dev

asl

I try to write a python function that counts a specific word in a string.

My regex pattern doesn't work when the word I want to count is repeated multiple times in a row. The pattern seems to work well otherwise.

Here is my function

import re

def word_count(word, text):
    return len(re.findall('(^|\s|\b)'+re.escape(word)+'(\,|\s|\b|\.|$)', text, re.IGNORECASE))

When I test it with a random string

>>> word_count('Linux', "Linux, Word, Linux")
2

When the word I want to count is adjacent to itself

>>> word_count('Linux', "Linux Linux")
1

anubhava

Problem is in your regex. Your regex is using 2 capture groups and re.findall will return any capture groups if available. That needs to change to non-capture groups using (?:...)

Besides there is reason to use (^|\s|\b) as \b or word boundary is suffice which covers all the cases besides \b is zero width.

Same way (\,|\s|\b|\.|$) can be changed to \b.

So you can just use:

def word_count(word, text):
     return len(re.findall(r'\b' + re.escape(word) + r'\b', text, re.I))

This will give:

>>> word_count('Linux', "Linux, Word, Linux")
2
>>> word_count('Linux', "Linux Linux")
2

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-06-12

コメントを追加

サインイン

分類Dev

What is the Regex pattern "words with numbers in them" but not a number itself

分類Dev

Match any repetitive pattern using tcl

分類Dev

Counting words separated by symbols as two words

分類Dev

Counting even and odd number of letters in words

分類Dev

Counting the particular words from array in other array

分類Dev

Regex for "or" of multiple words in grep

分類Dev

Counting Descendant Nodes [from MySQL Closure Pattern]

分類Dev

How to lemmatize spanish words with Pattern?

分類Dev

unicode regex pattern not working

分類Dev

Regex OR pattern not retrieving match

分類Dev

Counting three letter acronyms in a line with Regex Python

分類Dev

Counting shared words between strings in two columns of a data frame

分類Dev

Highlight Words from a Regex Match

分類Dev

Regex Expression for Finding Words Surrounded By {{ }}

分類Dev

Separating words with Regex (Not in specific order)

分類Dev

Regex to select words with spaces for subsititution

分類Dev

Substituting part of words in Perl regex

分類Dev

find newline with words starting with underscore with specific pattern

分類Dev

SED to remove a Line with REGEX Pattern

分類Dev

Lua regex to match pattern in makefile

分類Dev

Why does this regex pattern not match?

分類Dev

Java USSD code regex pattern

分類Dev

Regex pattern for Swift with some differences

分類Dev

Split with irregular pattern (regex) SCALA

分類Dev

Regex pattern matching for contains a character

分類Dev

Python Regex To Ignore Date Pattern

分類Dev

Regex pattern for Eventlog 4740 with powershell

分類Dev

Regex to match anything between a pattern

分類Dev

Powershell Regex Complex Pattern（XML）

Related 関連記事

記事