I have the following text. I would like to collect all subsentences (from comma or period to comma or period) that have a number in them. I have managed to create the following regex that collects the number and the part after it, but since my number can have commas or periods inside it I don't know how I can grab the words before it.The sentence with the parts I would like to get in bold:
In connection with the consummation of this offering, we will enter into a forward purchase agreement with OrION Capital Structure Solutions UK Limited, or OrION, an affiliate of our sponsor, pursuant to which OrION will commit that it will purchase from us 10,000,000 forward purchase units, or at its option up to an aggregate maximum of 30,000,000 forward purchase units, each consisting of one Class A ordinary share,or a forward purchase share, and one-third of one warrant to purchase one Class A ordinary share, or a forward purchase warrant, for $10.00 per unit, or an aggregate amount of $100,000,000, or at OrION’s option up to an aggregate amount of $300,000,000, in a private placement that will close concurrently with the closing of our initial business combination.
What I want to collect:
["pursuant to which OrION will commit that it will purchase from us 10,000,000 forward purchase units",
"or at its option up to an aggregate maximum of 30,000,000 forward purchase units", "for $10.00 per unit", "or an aggregate amount of $100,000,000", "or at OrION’s option up to an aggregate amount of $300,000,000"]
The regex I wrote currently gets the number and the part after until the next comma or period.
[0-9]{1,2}([,.][0-9]{1,2})?.*?[\.,]
How can I collect part of the sentence (starting with a period or comma), and the number that can have a decimal or thousand separator in it and then part of the sentence until the next comma or period?
EDIT: anubhava and bb1 both give the correct solution. anubhava solved the question exactly as I have asked it and it is the correct answer. bb1 however prepares for something that is bound to happen (and I did not think of) so in the end I used his answer, but marked anubhava as the one who gave the solution because that is the exact solution that i have asked.
EDIT 2: anubhava since updated his answer so it solves the same problem as bb1-s.
You may use this regex with look-around assertions:
(?<=[.,] )(?:[^,.]*?\d+(?:[.,]\d+)*)+[^.,]*(?=[,.])
RegEx Details:
(?<=[.,] )
: Lookbehind assertion to assert that we have comma or dot followed by a space before the current position(?:
: Start a non-capture group
[^,.]*?
: Match 0 or more of any character that are not ,
and .
(lazy)\d+(?:[.,]\d+)*
: Match a number that may contain .
or ,
)+
: End non-capture group. +
repeats this group 1+ times[^.,]*
: Match 0 or more of any character that are not ,
and .
(?=[,.])
: Lookahead assertion to assert that we have comma or dot after the current positionCollected from the Internet
Please contact [email protected] to delete if infringement.
Comments