设置:
我正在使用正则表达式将棒球阵容组织到一个数据框中。
LINEUPS <- c('OF Andrew Johnson P Victor Bailey OF Walter Hill 2B Carl Smith 3B Brian Rivera P Joseph Cox 1B Steven Parker SS William Gonzales OF Christopher Taylor C David Washington
',
'SS J.C. Roberts P Dennis Flores OF Jason Torres 2B Jack Rodriguez OF Randy Baker P Edward Anderson C David Washington 3B Thomas Wilson OF Ryan Walker 1B Robert Harris Jr
',
'1B J.P. Allen P Philip Hernandez OF Ryan Walker OF Christopher Taylor 2B Jack Rodriguez C Russell James 3B Brian Rivera P Joseph Cox OF Andrew Johnson SS Ralph Martinez
')
mm <- gregexpr("\\b(P|C|OF|SS|1B|2B|3B)\\b", LINEUPS)
players <- do.call("rbind", unname(Map(function(x, m, i) {
pstart <- m
pend <- pstart + attr(m, "match.length")
hstart <- pend + 1
hend <- c(tail(pstart,-1)-1, nchar(x))
data.frame(game=i, pos=substring(x, pstart, pend), name=substring(x, hstart, hend))
}, LINEUPS, mm, seq_along(LINEUPS))))
players$pos <- sub("^\\s|\\s+$","", players$pos)
players$name <- sub("^\\s|\\s+$","", players$name)
library(dplyr)
library(tidyr)
players <- players %>%
group_by(game, pos) %>%
mutate(pos=if_else(rep(n(),n())>1, paste0(pos, row_number()), pos)) %>%
pivot_wider(game, names_from=pos, values_from=name)
问题:
当玩家的名字中包含的首字母也恰好与其中一个位置匹配时,我就会遇到麻烦。在上面的示例中:SS J.C. Roberts
匹配位置C
并1B J.P. Allen
匹配position P
,导致字符串被错误地分割。
问题:
如何修改当前搜索以排除这些匹配项,以便得到以下结果:
P1 <- c('Victor Bailey','Dennis Flores','Philip Hernandez')
P2 <- c('Joseph Cox','Edward Anderson','Joseph Cox')
C <- c('David Washington','David Washington','Russell James')
"1B" <- c('Steven Parker','Robert Harris Jr', 'J.P. Allen')
"2B" <- c('Carl Smith','Jack Rodriguez','Jack Rodriguez')
"3B" <- c('Brian Rivera','Thomas Wilson','Brian Rivera')
SS <- c('William Gonzales','J.C. Roberts','Ralph Martinez')
OF1 <- c('Andrew Johnson','Jason Torres','Ryan Walker')
OF2 <- c('Walter Hill','Randy Baker','Christopher Taylor')
OF3 <- c('Christopher Taylor','Ryan Walker','Andrew Johnson')
RESULT <- data.frame(P1, P2, C, `1B`, `2B`, `3B`, SS, OF1, OF2, OF3)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句