How to make my regex match stop after a lookahead?

Stanisloth

I have some text from a pdf in one string, I want to break it up so that I have a list where every string starts with a digit and a period, and then stops before the next number.

For example I want to turn this:

'3.1 First liens  15,209,670,396  0  15,209,670,396  14,216,703,858 
3.2 Other than first liens     0  0 
4. Real estate:
4.1 Properties occupied by  the company (less $  43,332,898 
encumbrances)  68,122,291  0  68,122,291  64,237,046 
4.2 Properties held for  the production of income (less 
$    encumbrances)       0  0 
4.3 Properties held for sale (less $  
encumbrances)      0  0 
5. Cash ($  (101,130,138)), cash equivalents 
($ 850,185,973 ) and short-term
 investments ($ 0 )  749,055,835  0  749,055,835  1,867,997,055 
6. Contract loans (including $   premium notes)  253,533,676  0  253,533,676  233,680,271 
7. Derivatives  3,194,189,871  0  3,194,189,871  2,390,781,023 
8. Other invested assets  749,074,191  11,899,360  737,174,831  692,916,503' 

Into this:

['3.1 First liens  15,209,670,396  0  15,209,670,396  14,216,703,858 ',
'3.2 Other than first liens     0  0 ',
'4. Real estate:',
'4.1 Properties occupied by  the company (less $  43,332,898 encumbrances)  68,122,291  0  68,122,291  64,237,046',
'4.2 Properties held for  the production of income (less $    encumbrances)       0  0' 
'4.3 Properties held for sale (less $  encumbrances)      0  0',
'5. Cash ($  (101,130,138)), cash equivalents ($ 850,185,973 ) and short-term investments ($ 0 ) 
749,055,835  0  749,055,835  1,867,997,055',
'6. Contract loans (including $   premium notes)  253,533,676  0  253,533,676  233,680,271',
'7. Derivatives  3,194,189,871  0  3,194,189,871  2,390,781,023',
'8. Other invested assets  749,074,191  11,899,360  737,174,831  692,916,503']

The issue is that the original string has '\n' scattered in the middle of the titles (for example in 4.1 theres a \n before the word encumbrances.

(\d+\.[\s\S]*(?!\d+\.))

This is the regex I've been trying to use but it matches the whole string instead of each number line. Is there any way for my regex to stop the match right before the next number line?

MikeM

Something like:

list = re.findall(r"^\d+\..*?(?=^\d+\.|\Z)", text, re.MULTILINE | re.DOTALL)

Further explanation on request.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Regex: Negative lookahead after list match

From Dev

How to match these patterns using regex lookahead?

From Dev

How to match these patterns using regex lookahead?

From Dev

How to match branches with negative and positive lookahead regex?

From Dev

RegEx - stop after first match

From Dev

Regex lookahead discard a match

From Dev

After the last match is found, how do I make the loop stop

From Dev

How to use lookahead in regex to match a word that only appear in certain context?

From Dev

How to use lookahead in regex to match a word that only appear in certain context?

From Dev

regex lookarounds: how to extract a full match with optional lookahead

From Java

Match this regex in python (negative lookahead?)

From Dev

stop matching string in regex after my pattern

From Dev

How can I make my TinyMCE regex stop stripping out semicolons?

From Dev

How to stop the find command after first match?

From Dev

php regex_replace stop after first match

From Dev

Regex lookahead/lookbehind match for SQL script

From Dev

Java Regex negative lookahead wrong match

From Dev

Does regex lookahead affect subsequent match?

From Dev

lookahead in the middle of regex doesn't match

From Dev

Scala regex multiline match with negative lookahead

From Dev

Regex LookAhead limit to one (or first match)

From Dev

lookahead in the middle of regex doesn't match

From Dev

Hive regex: Positive lookahead to match '&' or end of string

From Dev

How to make regex stop at a certain character

From Dev

How to make regex stop at a certain character

From Dev

Regex with DOTALL, how to make it stop at some point?

From Dev

jshell continues executing my script after exception is thrown. How to make it stop?

From Dev

How to i make my print command stop print whats after a semi-colon?

From Dev

How to use regex negative lookahead?

Related Related

  1. 1

    Regex: Negative lookahead after list match

  2. 2

    How to match these patterns using regex lookahead?

  3. 3

    How to match these patterns using regex lookahead?

  4. 4

    How to match branches with negative and positive lookahead regex?

  5. 5

    RegEx - stop after first match

  6. 6

    Regex lookahead discard a match

  7. 7

    After the last match is found, how do I make the loop stop

  8. 8

    How to use lookahead in regex to match a word that only appear in certain context?

  9. 9

    How to use lookahead in regex to match a word that only appear in certain context?

  10. 10

    regex lookarounds: how to extract a full match with optional lookahead

  11. 11

    Match this regex in python (negative lookahead?)

  12. 12

    stop matching string in regex after my pattern

  13. 13

    How can I make my TinyMCE regex stop stripping out semicolons?

  14. 14

    How to stop the find command after first match?

  15. 15

    php regex_replace stop after first match

  16. 16

    Regex lookahead/lookbehind match for SQL script

  17. 17

    Java Regex negative lookahead wrong match

  18. 18

    Does regex lookahead affect subsequent match?

  19. 19

    lookahead in the middle of regex doesn't match

  20. 20

    Scala regex multiline match with negative lookahead

  21. 21

    Regex LookAhead limit to one (or first match)

  22. 22

    lookahead in the middle of regex doesn't match

  23. 23

    Hive regex: Positive lookahead to match '&' or end of string

  24. 24

    How to make regex stop at a certain character

  25. 25

    How to make regex stop at a certain character

  26. 26

    Regex with DOTALL, how to make it stop at some point?

  27. 27

    jshell continues executing my script after exception is thrown. How to make it stop?

  28. 28

    How to i make my print command stop print whats after a semi-colon?

  29. 29

    How to use regex negative lookahead?

HotTag

Archive