我有以下df:
df_1=data.frame(col_1=c("a;b;c","c;d","e","f","g","h;j"),col_2=c("1;2;3","4","5;6","7","8;9","10;11;12"))
所以我想将col_1分隔为具有col_2对应值(如果存在)的单独行。
例如,如果col_1中的元素数= col_2中的元素数,则应将它们与col_1和col_2中的相应值分开(第1行)
如果它们具有不同数量的元素,如果一列只有一个元素,那么也可以将其分为不同的行(第2行)
如果它们的元素数量不成比例(每个元素超过1个且不相等),则应保持原样
这是final_dataset:
df_2=data.frame(col_1=c("a","b","c","c","d","e","e","f","g","g","h;j"),col_2=c("1","2","3","4","4","5","6","7","8","9","10;11;12"))
我们可以用 cSplit
library(splitstackshape)
library(zoo)
cnt1 <- nchar(gsub(";", "", df_1$col_1))
cnt2 <- nchar(gsub(";", "", df_1$col_2))
i1 <- cnt1 != cnt2 & cnt1 > 1 & cnt2 > 1
rbind(cSplit(df_1[!i1,], c('col_1', 'col_2'), sep=";", "long")[
!is.na(col_1)|!is.na(col_2), lapply(.SD, na.locf0)], df_1[i1,])
# col_1 col_2
# 1: a 1
# 2: b 2
# 3: c 3
# 4: c 4
# 5: d 4
# 6: e 5
# 7: e 6
# 8: f 7
# 9: g 8
#10: g 9
#11: h;j 10;11;12
或使用base R
所有约束
cnt1 <- nchar(gsub(";", "", df_1$col_1))
cnt2 <- nchar(gsub(";", "", df_1$col_2))
i1 <- cnt1 != cnt2 & cnt1 > 1 & cnt2 > 1
lst1 <- lapply(df_1[!i1, ], function(x) strsplit(x, ";"))
out <- rbind(do.call(rbind, Map(function(x, y) {
l1 <- length(x)
l2 <- length(y)
mx <- max(l1, l2)
x <- if(l1 != l2 & l1 == 1) rep(x, mx) else x
y <- if(l1 != l2 & l2 == 1) rep(y, mx) else y
data.frame(col_1 = x, col_2 = y) } ,
lst1[[1]], lst1[[2]])), df_1[i1,])
row.names(out) <- NULL
out
# col_1 col_2
#1 a 1
#2 b 2
#3 c 3
#4 c 4
#5 d 4
#6 e 5
#7 e 6
#8 f 7
#9 g 8
#10 g 9
#11 h;j 10;11;12
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句