我第一次来这里,所以希望我不会破坏任何东西...我有一个列表清单:
Browse[2]> head(str(mylist))
List of 33
$ : chr [1:33] "0001" "space" "28" "night_club" ...
$ : chr [1:33] "0002" "concert" "28" "night_club" ...
$ : chr [1:31] "0003" "night_club" "24" "martial_arts" ...
$ : chr [1:31] "0004" "stage" "24" "basketball" ...
$ : chr [1:43] "0005" "night_club" "16" "concert" ...
$ : chr [1:43] "0006" "night_club" "16" "concert" ...
$ : chr [1:39] "0007" "night_club" "22" "concert" ...
$ : chr [1:39] "0008" "night_club" "22" "concert" ...
$ : chr [1:31] "0009" "night_club" "46" "martial_arts" ...
$ : chr [1:31] "0010" "night_club" "46" "martial_arts" ...
$ : chr [1:41] "0011" "night_club" "17" "martial_arts" ...
$ : chr [1:41] "0012" "night_club" "17" "martial_arts" ...
$ : chr [1:29] "0013" "concert" "23" "night_club" ...
$ : chr [1:29] "0014" "concert" "23" "night_club" ...
$ : chr [1:25] "0015" "night_club" "26" "concert" ...
$ : chr [1:31] "0016" "night_club" "42" "concert" ...
$ : chr [1:31] "0017" "night_club" "42" "concert" ...
$ : chr [1:31] "0018" "night_club" "25" "wrestling" ...
$ : chr [1:31] "0019" "night_club" "25" "wrestling" ...
$ : chr [1:33] "0020" "night_club" "46" "wrestling" ...
$ : chr [1:33] "0021" "night_club" "46" "wrestling" ...
$ : chr [1:41] "0022" "concert" "21" "stage" ...
$ : chr [1:41] "0023" "concert" "21" "stage" ...
$ : chr [1:55] "0024" "basketball" "8" "concert" ...
$ : chr [1:55] "0025" "basketball" "8" "concert" ...
$ : chr [1:37] "0026" "bald_person" "26" "martial_arts" ...
$ : chr [1:37] "0027" "bald_person" "26" "martial_arts" ...
$ : chr [1:37] "0028" "night_club" "32" "business_meeting" ...
$ : chr [1:37] "0029" "night_club" "32" "business_meeting" ...
$ : chr [1:15] "0030" "night_club" "59" "stage" ...
$ : chr [1:37] "0031" "stage" "12" "night_club" ...
$ : chr [1:37] "0032" "stage" "12" "night_club" ...
$ : chr [1:33] "0033" "night_club" "23" "portrait" ...
我想将此列表转换为宽格式的数据框,其中第一列将是每个内部列表的第一元素(即“ 0001”,“ 0002”等),并且文件中将存在所有可能的具有类别的列:“ space”,“ night_club”,“ concert”,“ marital_arts”,“ wrestling”等。这意味着我将使用非常宽的数据框,每一行均以ID(0001,0002,0003 ...)开头列名称将再次是文件中的所有类别:“空格”,“ night_club”,“音乐会”,“ marital_arts”,“摔跤”等。对于该ID存在类别的每一行,它将填充列表中类别旁边的值(例如,第一行中的“空格”-> 28)。
我试图用循环构造一个标准化的数据帧,然后将其转换为较宽的格式,但是随着数据规模的增加,这将是一个坏主意:
for (file in files){# iterate over files in folder
mylist <- strsplit(readLines(file), ":")
#close(mylist)
for (elem in mylist){
dataframe <- data.frame(frameid = numeric(), category = character(), nrow = length(unlist(elem)))
frameid <- rep.int(elem[[1]], length(elem)-1)
categories <- elem[-1:-1]
dataframe$frameid <- frameid
dataframe$category <- categories
}
}
可重现的输入输出示例:输入的输出:
list(c("0001", "space", "28", "night_club", "25"), c("0002",
"concert", "28", "night_club", "26"), c("0003", "night_club",
"24", "martial_arts", "27"), c("0004", "stage", "24", "basketball",
"30"))
输出:
Dataframe
frameid, cat_space, cat_night_club, cat_concert, cat_martial_arts, cat_stage, cat_basketball
0001, 28, 25, 0, 0, 0, 0
0002, 0, 26, 28, 0, 0, 0
0003, 0, 24, 0, 27, 0, 0
0004, 0, 0, 0, 0, 24, 30
这是一种可能性。我已将答案作为函数创建,并评论了每个阶段发生的情况。基本思想是:
data.frame
将这两个元素放在一起。xtabs
的输出转换为宽幅。请注意,如果存在“ ID”和“ var”的重复组合,则由于使用,这些值将被加在一起xtabs
。功能如下:
myFun <- function(inList) {
## Extract the first value in each list element
ID <- vapply(inList, `[`, character(1L), 1)
## Convert the remaining elements into a two column matrix, first
## column as variable, second column as value. Bind all list
## elements together to a single 2-column mantrix.
varval <- do.call(rbind, lapply(inList, function(x) {
matrix(x[-1], ncol = 2, byrow = TRUE, dimnames = list(NULL, c("var", "val")))
}))
## Create a data.frame where ID is repeated to the same number of rows
## as the matrices found in varval.
temp <- data.frame(ID = rep(ID, (lengths(inList)-1)/2), varval)
## Convert the val columns to numeric
temp$val <- as.numeric(as.character(temp$val))
## Use xtabs to go from a "long" form to a "wide" form
xtabs(val ~ ID + var, temp)
}
此处将其应用于您的样本数据(假设您的数据称为“ L”):
myFun(L)
# var
# ID basketball concert martial_arts night_club space stage
# 0001 0 0 0 25 28 0
# 0002 0 28 0 26 0 0
# 0003 0 0 27 24 0 0
# 0004 30 0 0 0 0 24
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句