我有一个像这样的字符串:
s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
我正在尝试使用re.sub将所有字母之间不能用撇号代替的特殊字符替换为空格,因此“无面筋”变为无面筋,我将保持原样。
我已经试过了:
import re
s = re.sub('[^[a-z]+\'?[a-z]+]', ' ', s)
我要说的是用0或一个撇号,然后是一个或多个带空格的字母替换不遵循一个或多个字母的模式的任何内容。
这将返回相同的字符串:
i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread.
我想拥有:
i'm sorry sir but this is a gluten free restaurant we don't serve bread
您可以将此正则表达式与嵌套的lookahead + lookbehind一起使用:
>>> s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
>>> print ( re.sub(r"(?!(?<=[a-z])'[a-z])[^\w\s]", ' ', s, flags=re.I) )
i'm sorry sir but this is a gluten free restaurant we don't serve bread
正则表达式详细信息:
(?!
:开始否定先行
(?<=[a-z])
:肯定地说我们在前面的位置有一个字母'
:匹配撇号[a-z]
:比赛信 [a-z]
)
:结束负前瞻[^\w\s]
:匹配不是空格和单词字符的字符本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句