So I have an issue in parsing text. I'm trying to parse music files, and they are semi-formatted. I am, for example, trying to exclude the choruses from the the lyrics. Most of the time, the formatting looks like this:
[Chorus: x2] Some Lyrics Some More Lyrics [Verse] Lyrics Lyrics
In which case, these two functions can correctly parse:
subChorus = re.sub(r'\[Chorus.*?\].*?\[', '[', lyrics, flags = re.DOTALL);
subChorus2 = re.sub(r'\[Chorus.*?\].*?(\n{2,})', '', lyrics, flags = re.DOTALL);
However, occasionally the Chorus is the last section of the file:
Lyrics [Chorus] Some Lyrics Other Lyrics
In such a case, I cannot figure out the correct expression to remove the chorus. If I just do
subChorusEnd = re.sub(r'\[Chorus.*?\].*?$', '', lyrics, flags = re.DOTALL);
It will work; however, for other files in which the final chorus section is not at the end, it will remove verses that need to be preserved. All Chorus blocks with verses following are separated by at least two newlines. So I came up with this solution:
subChorusEnd = re.sub(r'\[Chorus.*?\][^(\n{2,})]*?$', '', subChorus4, flags = re.DOTALL);
But it does not work. Can someone explain to me the proper regular expression to get the above statement to work or a better approach at ONLY removing chorus blocks that are at the end of a section of text that will also PRESERVE files in which the final chorus is not at the end.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments