Regex parsing text and get relevant words / characters

Scareactor

I want to parse a file, that contains some programming language. I want to get a list of all symbols etc.

I tried a few patterns and decided that this is the most successful yet:

pattern = "\b(\w+|\W+)\b"

Using this on my text, that is something like:

string = "the quick brown(fox).jumps(over + the) = lazy[dog];"
re.findall(pattern, string)

will result in my required output, but I have some chars that I don't want and some unwanted formatting:

['the', ' ', 'quick', ' ', 'brown', '(', 'fox', ').', 'jumps', 'over', 
' + ', 'the', ') = ',  'lazy', '[', 'dog']

My list contains some whitespace that I would like to get rid of and some double symbols, like (., that I would like to have as single chars. Of course I have to modify the \W+ to get this done, but I need a little help.

The other is that my regex doesn't match the ending ];, which I also need.

bobble bubble

Why use \W+ for one or more, if you want single non-word characters in output? Additionally exclude whitespace by use of a negated class. Also it seems like you could drop the word boundaries.

re.findall(r"\w+|[^\w\s]", str)

This matches

  • \w+ one or more word characters
  • |[^\w\s] or one character, that is neither a word character nor a whitespace

See Ideone demo

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Regex get text part between two words

From Dev

Regex text after words

From Dev

Find a single regex to get words of 3 or more characters between two specific words

From Dev

Find a single regex to get words of 3 or more characters between two specific words

From Dev

Parsing text to object with regex

From Dev

Regex to remove characters and supplied words

From Dev

Regex for words formed with specific characters

From Dev

Regex for words formed with specific characters

From Dev

Parsing a string into words with no-english characters and puntuation

From Dev

Python regex to get n characters before and after a keyword in a line of text

From Dev

python plain text regex parsing

From Dev

parsing html text with regex in javascript?

From Dev

Parsing large text file with regex

From Dev

Parsing a text file, based on words count

From Dev

C parsing input text file into words

From Dev

How to extract relevant text between two lines using regex

From Dev

how do you validate characters AND words in regex?

From Dev

Regex \b with words starting with special characters

From Dev

Python - regex to keep only words with textual characters

From Dev

Regex to match words but not numbers with certain characters

From Dev

Regex to get words within a Backslash

From Dev

RegEx get words with special character

From Dev

Get words begin with '(' - PHP regex

From Dev

Regex to get words within a Backslash

From Dev

regex to get words inside parenthesis

From Dev

Regex to get words by some patterns

From Dev

How to break UILabel's text on characters, not on words

From Dev

Finding number of characters,words and lines in a text file

From Dev

How to break UILabel's text on characters, not on words

Related Related

  1. 1

    Regex get text part between two words

  2. 2

    Regex text after words

  3. 3

    Find a single regex to get words of 3 or more characters between two specific words

  4. 4

    Find a single regex to get words of 3 or more characters between two specific words

  5. 5

    Parsing text to object with regex

  6. 6

    Regex to remove characters and supplied words

  7. 7

    Regex for words formed with specific characters

  8. 8

    Regex for words formed with specific characters

  9. 9

    Parsing a string into words with no-english characters and puntuation

  10. 10

    Python regex to get n characters before and after a keyword in a line of text

  11. 11

    python plain text regex parsing

  12. 12

    parsing html text with regex in javascript?

  13. 13

    Parsing large text file with regex

  14. 14

    Parsing a text file, based on words count

  15. 15

    C parsing input text file into words

  16. 16

    How to extract relevant text between two lines using regex

  17. 17

    how do you validate characters AND words in regex?

  18. 18

    Regex \b with words starting with special characters

  19. 19

    Python - regex to keep only words with textual characters

  20. 20

    Regex to match words but not numbers with certain characters

  21. 21

    Regex to get words within a Backslash

  22. 22

    RegEx get words with special character

  23. 23

    Get words begin with '(' - PHP regex

  24. 24

    Regex to get words within a Backslash

  25. 25

    regex to get words inside parenthesis

  26. 26

    Regex to get words by some patterns

  27. 27

    How to break UILabel's text on characters, not on words

  28. 28

    Finding number of characters,words and lines in a text file

  29. 29

    How to break UILabel's text on characters, not on words

HotTag

Archive