Python Regex Excluding Multiple Newlines

andoni

So I have an issue in parsing text. I'm trying to parse music files, and they are semi-formatted. I am, for example, trying to exclude the choruses from the the lyrics. Most of the time, the formatting looks like this:

[Chorus: x2]
Some Lyrics
Some More Lyrics

[Verse]
Lyrics
Lyrics

In which case, these two functions can correctly parse:

subChorus = re.sub(r'\[Chorus.*?\].*?\[', '[', lyrics, flags = re.DOTALL);
subChorus2 = re.sub(r'\[Chorus.*?\].*?(\n{2,})', '', lyrics, flags = re.DOTALL);

However, occasionally the Chorus is the last section of the file:

Lyrics

[Chorus]
Some Lyrics
Other Lyrics

In such a case, I cannot figure out the correct expression to remove the chorus. If I just do

subChorusEnd = re.sub(r'\[Chorus.*?\].*?$', '', lyrics, flags = re.DOTALL);

It will work; however, for other files in which the final chorus section is not at the end, it will remove verses that need to be preserved. All Chorus blocks with verses following are separated by at least two newlines. So I came up with this solution:

subChorusEnd = re.sub(r'\[Chorus.*?\][^(\n{2,})]*?$', '', subChorus4, flags = re.DOTALL);

But it does not work. Can someone explain to me the proper regular expression to get the above statement to work or a better approach at ONLY removing chorus blocks that are at the end of a section of text that will also PRESERVE files in which the final chorus is not at the end.

Avinash Raj

You could try the below regex to match all the Chorus blocks.

\[Chorus.*?\].*?(\n{2,}|$)

DEMO

OR

(?!.*\n\n)\[Chorus.*?\].*?$

It matches only the chorus block which was at the end. Don't forget to enable DOTALL modifier in both regexes.

DEMO

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Python Regex Excluding Multiple Newlines

From Dev

Python regex over multiple newlines

From Dev

Python regex to match string excluding word

From Dev

Python regex erronously matches trailing newlines

From Dev

Skipping whitespace excluding newlines in attoparsec

From Dev

Count number of chars excluding newlines

From Dev

Regex to replace multiple spaces to single space excluding leading spaces

From Dev

Regex match all non letters excluding diacritics (python)

From Dev

Python regex for detecting all the urls excluding certain domains

From Dev

matching any character including newlines in a Python regex subexpression, not globally

From Dev

Python script using regex (re) to remove extra newlines

From Dev

Regex excluding specific characters

From Dev

Regex for excluding a character

From Dev

Regex for Excluding special characters

From Dev

regex for matching and excluding the rest

From Dev

Regex excluding string

From Dev

Javascript regex for email excluding '[' and ']'

From Dev

regex for matching and excluding the rest

From Dev

Excluding multiple modules by groupID

From Dev

Half multiple newlines

From Dev

Excluding Portion Of RegEx From Results

From Dev

Regex - Excluding the separator character from .*

From Dev

regex for excluding text at end of string

From Dev

Regex for excluding words at start of string

From Dev

replaceAll regex excluding certain occurrences

From Dev

Regex - how not to match two newlines

From Dev

Splitting regex matches by newlines in perl

From Dev

Regex for replacing all newlines that are not after ';'

From Dev

Splitting regex matches by newlines in perl

Related Related

HotTag

Archive