假设我有以下数据集,其中的列结构如下。
df1 = data.frame(Date=c(rnorm(5)),
"United States) New York (NY" = c(rnorm(5)),
"United States) Chicago (Illinois" = c(rnorm(5)),
"United States) Denver (Colorado" = c(rnorm(5)),
"United States) Seattle (Washington" = c(rnorm(5)),
"United States) Minneapolis (Minnesota" = c(rnorm(5)), check.names=FALSE)
df1
df2 = data.frame(Date=c(rnorm(5)),
"New York (New York, United States)" = c(rnorm(5)),
"Phoenix (Arizona, United States)" = c(rnorm(5)),
"Chicago (Illinois, United States)" = c(rnorm(5)),
"Los Angeles (California, United States)" = c(rnorm(5)), check.names=FALSE)
df2
如您所见,每列均代表一个城市,但是列名的结构却难以管理。我想知道是否有人可以帮助我弄清楚如何从列名称字符串中提取城市名称。
我可以拥有每个城市的字典并进行字符串匹配,但是我对此不太幸运。我还假定可以使用str_split来实现此目的,但我还没有弄清楚。
sapply(str_split(names(df1),")"), 2)
当然,我敢肯定也有gsub解决方案,但是在正则表达式方面我有些无能。
最终,我只想要实际的城市名称作为列名称。
New York, Chicago, Denver, Seattle, Minneapolis
您可以使用gsub
。尝试第一个数据帧
gsub(".*[)] (.*) [(].*", "\\1", names(df1)[-1])
# [1] "New York" "Chicago" "Denver" "Seattle" "Minneapolis"
对于第二个数据帧,可以对第一个正则表达式进行较小的调整
gsub("(.*) [(].*", "\\1", names(df2)[-1])
# [1] "New York" "Phoenix" "Chicago" "Los Angeles"
将这两种名称组合为两个:
nms <- c(names(df1)[-1], names(df2)[-1])
gsub("(.*[)] |)(.*) [(].*", "\\2", nms)
# [1] "New York" "Chicago" "Denver" "Seattle" "Minneapolis"
# [6] "New York" "Phoenix" "Chicago" "Los Angeles"
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句