反向正则表达式：grep，允许错误

debugcn 发表于 Dev

安东尼·马丁

我正在尝试从一些pdf文件创建一个数据表，从而导致数据有时带有一些未计划的空间，例如

MWE <- c("Gross Domestic Product 2.3",
"blabla 1.5",
"blabla2 6.5", 
"G ross Domestic Product 4.5",
"Another L ine 9.6",
"Gross Domestic Product 6.9",
"G r oss D omes tic Pr o du ct 7.6")

我想知道Gross Domestic Product是否有空格的所有出现。但是简单grep("Gross Domestic Product",MWE)考虑空间

grep("Gross Domestic Product",MWE)
[1] 1 6

我可以在上游做，例如通过擦除每个空格，例如

MWE_2 <- gsub("\\s","",MWE)
grep("GrossDomesticProduct",MWE_2)
[1] 1 4 6 7

我想知道是否可以通过该grep选项获得相同的结果，这对于某些用途（例如，不创建新表）可能证明是有用的

阿曼迪普（Amandeep Jiddewar）

您可以修改字符串并使用grep，如下所示。想法是创建一个regex忽略空间的空间（如果存在）。

MWE <- c("Gross Domestic Product 2.3",
         "blabla 1.5",
         "blabla2 6.5", 
         "G ross Domestic Product 4.5",
         "Another L ine 9.6",
         "Gross Domestic Product 6.9",
         "G r oss D omes tic Pr o du ct 7.6")

gdp_str <- "Gross Domestic Product"
gdp_str <- sub("\\s*", "\\\\s*", gsub('(.{1})', '\\1\\\\s*', gdp_str))
grep(gdp_str, MWE)

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。