Python Regex Excluding Multiple Newlines

andoni Published at Dev

andoni

So I have an issue in parsing text. I'm trying to parse music files, and they are semi-formatted. I am, for example, trying to exclude the choruses from the the lyrics. Most of the time, the formatting looks like this:

[Chorus: x2]
Some Lyrics
Some More Lyrics

[Verse]
Lyrics
Lyrics

In which case, these two functions can correctly parse:

subChorus = re.sub(r'\[Chorus.*?\].*?\[', '[', lyrics, flags = re.DOTALL);
subChorus2 = re.sub(r'\[Chorus.*?\].*?(\n{2,})', '', lyrics, flags = re.DOTALL);

However, occasionally the Chorus is the last section of the file:

Lyrics

[Chorus]
Some Lyrics
Other Lyrics

In such a case, I cannot figure out the correct expression to remove the chorus. If I just do

subChorusEnd = re.sub(r'\[Chorus.*?\].*?$', '', lyrics, flags = re.DOTALL);

It will work; however, for other files in which the final chorus section is not at the end, it will remove verses that need to be preserved. All Chorus blocks with verses following are separated by at least two newlines. So I came up with this solution:

subChorusEnd = re.sub(r'\[Chorus.*?\][^(\n{2,})]*?$', '', subChorus4, flags = re.DOTALL);

But it does not work. Can someone explain to me the proper regular expression to get the above statement to work or a better approach at ONLY removing chorus blocks that are at the end of a section of text that will also PRESERVE files in which the final chorus is not at the end.

Avinash Raj

You could try the below regex to match all the Chorus blocks.

\[Chorus.*?\].*?(\n{2,}|$)

DEMO

(?!.*\n\n)\[Chorus.*?\].*?$

It matches only the chorus block which was at the end. Don't forget to enable DOTALL modifier in both regexes.

DEMO

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-14

Comments

0 comments

From Dev

Related Related

Article