In the text file I've got, each sentence is represented with a specific type such as: contrast.
A contrasting sentence can either be represented with a tag "CONTRAST" or "CONTR" or "WEAKCONTR". For instance:
IMPSENT_CONTRAST_VIS(Studying networks in this way can help to
identify the people from whom an individual learns , where
conflicts_MD:+ in understanding_MD:+ may originate , and which
contextual factors influence learning .)
So I count these with following expression: /(\_(WEAK))|(\_CONTRAST)|(\_CONTR(\_|\())/g
which works perfectly fine.
Now the problem is some sentences are expressed with more than one contrast tag such as CONTR & WEAKCONTR together. For instance:
IMPSENT_CONTRAST_EMPH_WEAKCONTR_VIS(Studying_MD:+ networks in this way can help to identify_MD:+ the people from whom an individual learns , where conflicts_MD:+ in understanding_MD:+ may originate , and which contextual factors influence learning .)
At this point I have to count these as 1 not 2. Do you have any idea how possible this is with RegExp?
You can use lookaheads to assert it, and then count the matches:
(?=\w*_(?:WEAK|CONTRAST|CONTR[_)]))\b\w+\b
Demo here: http://regex101.com/r/xP2yI7/3
Notice the match count.
This will match the whole IMPSENT_CONTRAST_EMPH_WEAKCONTR_VIS
expression, but only if it matches the part in the lookahead, which filters for the keywords you're looking after. This will match even if you have multiple such sentences on the same line.
Also, I've simplified your regex a bit, retaining the same meaning. Notice you don't have to escape the _
.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments