我有一组混乱的字符串,如下所示。
string <- c("GRP-14994/", "GRP-7056 GRP-7036/", "grp-24263(24263)/IRGC 28588", "GRP-15916 /IRGC-42176",
"GRP-614-250B/", "( GRP 11432)/IRGC-14570", "Tourn", "GRPP256", "Purse", "GRP-14956 Origin:", "GRP 10537", "GRP-10096 Origin: ",
"SGRP123", "GRP1234", "AC-30009 (GRPHANA)/", "AC-3060 GRP 536-143/Old AC", "RGRPfaa/23", "/-",
"MGR:7251/", "1216-GR-567/", "X:1 Well KGRPh", "WabGRPvea(II)", "HR33(BGRP)", "Tensor",
"Wald", "grp12312")
我正在尝试提取GRP后跟数字的所有实例,这些实例可能由空格或“-”分隔。
我目前的尝试给我以下结果。
gsub("(.*)(\\b)(GRP)(-|\\s|)(\\d+)(\\/|\\b)(.*)","\\3\\5", string, ignore.case = T)
[1] "GRP14994" "GRP7056" "grp24263" "GRP15916"
[5] "GRP614" "GRP11432" "Tourn" "GRPP256"
[9] "Purse" "GRP14956" "GRP10537" "GRP10096"
[13] "SGRP123" "GRP1234" "AC-30009 (GRPHANA)/" "GRP536"
[17] "RGRPfaa/23" "/-" "MGR:7251/" "1216-GR-567/"
[21] "X:1 Well KGRPh" "WabGRPvea(II)" "HR33(BGRP)" "Tensor"
[25] "Wald" "grp12312"
但是期望的输出风险
out <- c("GRP14994", "GRP7056 GRP7036", "grp24263", "GRP15916", "GRP614250",
"GRP11432", "", "", "", "GRP14956", "GRP10537", "GRP10096", "",
"GRP1234", "", "GRP536143", "", "", "", "", "", "", "", "", "",
"grp12312")
out
[1] "GRP14994" "GRP7056 GRP7036" "grp24263" "GRP15916" "GRP614250" "GRP11432"
[7] "" "" "" "GRP14956" "GRP10537" "GRP10096"
[13] "" "GRP1234" "" "GRP536143" "" ""
[19] "" "" "" "" "" ""
[25] "" "grp12312"
如何修改正则表达式以获得所需的结果?
unlist(lapply(str_extract_all(string,"[Gg][rR][pP][-\\s]?\\d+"), function (x) { gsub("[-\\s]+(\\d)", "\\1", paste(x, collapse= " "),perl=T) }))
[1] "GRP14994" "GRP7056 GRP7036" "grp24263"
[4] "GRP15916" "GRP614" "GRP11432"
[7] "" "" ""
[10] "GRP14956" "GRP10537" "GRP10096"
[13] "GRP123" "GRP1234" ""
[16] "GRP536" "" ""
[19] "" "" ""
[22] "" "" ""
[25] "" "grp12312"
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句