I have raw text from a chain of emails.
For all inquiries please reach out
From: [email protected] At: 01/27/21 23:29:28To: CompanyA
Cc: [email protected], [email protected] Subject: this is the subject line
From: CompanyB(company) <[email protected]>
Sent: Wednesday, January 27, 2021 12:51 PM
From: [email protected] At: 01/27/21 23:29:28To: CompanyA
Cc: [email protected], [email protected] Subject: tect
Through Regex I need to capture the email addresses between the first word From to the first Subject. In the above the match should be:
[email protected]
[email protected]
[email protected]
I do have (\n){0,1}([\w.]@[\w+-.]) to get email addresses. I will match through Python Regex Lib.
One option is to use 2 patterns with re.
First find all the matches from From:
till the first occurrence of Subject:
(?s)\bFrom:.*?\bSubject:
Then for all those matches, get the email address like patterns without matching <
and >
[^<>\s@]+@[^@\s<>]+
Example
import re
s = ("For all inquiries please reach out\n"
"From: [email protected] At: 01/27/21 23:29:28To: CompanyA\n"
"Cc: [email protected], [email protected] Subject: this is the subject line\n"
"From: CompanyB(company) <[email protected]>\n"
"Sent: Wednesday, January 27, 2021 12:51 PM\n"
"From: [email protected] At: 01/27/21 23:29:28To: CompanyA\n"
"Cc: [email protected], [email protected] Subject: tect")
for match in re.findall(r"(?s)\bFrom:.*?\bSubject:", s):
print(re.findall(r"[^<>\s@]+@[^@\s<>]+", match))
Output
['[email protected]', '[email protected],', '[email protected]']
['[email protected]', '[email protected]', '[email protected],', '[email protected]']
If you don't want to cross another occurrence of From:
or Subject
, you can use a negative lookahead to check if the line does not contain any of the strings.
^From:.*(?:\r?\n(?!From|.*\bSubject:).*)*\r?\n.*\bSubject:
Example
for match in re.findall(r"(?m)^From:.*(?:\r?\n(?!From|.*\bSubject:).*)*\r?\n.*\bSubject:", s):
print(re.findall(r"[^<>\s@]+@[^@\s<>]+", match))
Output
['[email protected]', '[email protected],', '[email protected]']
['[email protected]', '[email protected],', '[email protected]']
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加