Regex: How to ignore dots in connected words

New Ubuntu User

For analyzing a log file, I need to extract exception types with python and regex.
The exception types always contain the substring "Exception".
The problem is that the substring "Exception" is not always at the end of their names.
Moreover, the exception types consist of an unknown number of dots.

Expected behaviour:

Input
"08-01-2021: There is a System.InvalidCalculationException - System reboots"
"09-01-2021: SuperSystem recognised a System.IO.WritingException ask user what to do next"
"10-01-2021: Oh no, not again an InternalException.NullReference.NonCritical.User we should fix it!"

Output
"System.InvalidCalculationException"
"System.IO.WritingException"
"InternalException.NullReference.NonCritical.User"

How does the regex need to look like?
I have tried it with "\w+[.]\w+[.]*Exception" for the exception types who are ending with "Exception".
But what if exception types contain even more dots and "Exception" is not at the end?

Wiktor Stribiżew

You can use

\b(?:[A-Za-z]+\.)*[A-Za-z]*Exception(?:\.[A-Za-z]+)*\b
\b(?:\w+\.)*\w*Exception(?:\.\w+)*\b

See the regex demo / regex demo #2. Details:

  • \b - a word boundary
  • (?:[A-Za-z]+\.)* - zero or more occurrences of one or more letters followed with a dot
  • [A-Za-z]* - zero or more letters
  • Exception - a string Exception
  • (?:\.[A-Za-z]+)* - zero or more reptitions of a dot and then one or more letters.
  • \b - a word boundary.

The \w matches any letters, digits or underscore.

Python usage:

re.findall(r'\b(?:\w+\.)*\w*Exception(?:\.\w+)*\b', text)

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

How to ignore brackets in a regex

分類Dev

how can I ignore nested patterns with regex

分類Dev

How to partial search for words using regex python

分類Dev

Java regex does not remove dots

分類Dev

How can I ignore the first line break for a Positive Lookahead (regex)

分類Dev

How to catch words that ends with dash and its following word? Regex

分類Dev

Regex removing ignore chars

分類Dev

Regex for "or" of multiple words in grep

分類Dev

ignore dots and commas while making like query in mysql

分類Dev

How to colour specific dots in R

分類Dev

Extract alphanumeric words starting with letter, ignore others

分類Dev

Python Regex To Ignore Date Pattern

分類Dev

Highlight Words from a Regex Match

分類Dev

Regex Expression for Finding Words Surrounded By {{ }}

分類Dev

Separating words with Regex (Not in specific order)

分類Dev

Regex pattern counting with repetitive words

分類Dev

Regex to select words with spaces for subsititution

分類Dev

Substituting part of words in Perl regex

分類Dev

How to check if MQ is connected

分類Dev

How are these nested vectors connected?

分類Dev

How to view that Advertiser is connected

分類Dev

Regex create link from word but not if the word contains three dots

分類Dev

Java regex to replace whitespaces in css class string with dots

分類Dev

How can I use regex to search unicode texts and find words that contain repeated alphabets?

分類Dev

Css: How to connect the 2 dots with a border line

分類Dev

How to get the string between two dots in bash?

分類Dev

How to allow only numbers, dots and signals?

分類Dev

How do i remove the dots in a php directory

分類Dev

How to tokenize double dots as separate tokens in spaCy?

Related 関連記事

ホットタグ

アーカイブ