For analyzing a log file, I need to extract exception types with python and regex.
The exception types always contain the substring "Exception".
The problem is that the substring "Exception" is not always at the end of their names.
Moreover, the exception types consist of an unknown number of dots.
Expected behaviour:
Input
"08-01-2021: There is a System.InvalidCalculationException - System reboots"
"09-01-2021: SuperSystem recognised a System.IO.WritingException ask user what to do next"
"10-01-2021: Oh no, not again an InternalException.NullReference.NonCritical.User we should fix it!"
Output
"System.InvalidCalculationException"
"System.IO.WritingException"
"InternalException.NullReference.NonCritical.User"
How does the regex need to look like?
I have tried it with "\w+[.]\w+[.]*Exception" for the exception types who are ending with "Exception".
But what if exception types contain even more dots and "Exception" is not at the end?
You can use
\b(?:[A-Za-z]+\.)*[A-Za-z]*Exception(?:\.[A-Za-z]+)*\b
\b(?:\w+\.)*\w*Exception(?:\.\w+)*\b
See the regex demo / regex demo #2. Details:
\b
- a word boundary(?:[A-Za-z]+\.)*
- zero or more occurrences of one or more letters followed with a dot[A-Za-z]*
- zero or more lettersException
- a string Exception
(?:\.[A-Za-z]+)*
- zero or more reptitions of a dot and then one or more letters.\b
- a word boundary.The \w
matches any letters, digits or underscore.
Python usage:
re.findall(r'\b(?:\w+\.)*\w*Exception(?:\.\w+)*\b', text)
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加