我有以下要转换的数据框。当前,它看起来像这样:
ID
Items items.split
1 2729 Bicycle Bicycle
2 3979 TV, Mobile Phone, Bicycle, Water Tank c("TV", "Mobile Phone", "Bicycle", "Water Tank")
3 3860 Mobile Phone, Bicycle, Fan c("Mobile Phone", "Bicycle", "Fan")
4 2357 Mobile Phone, Motorbike c("Mobile Phone", "Motorbike")
5 2278 TV, Mobile Phone, Wagon/Cart, Motorbike, Plow c("TV", "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow")
6 3277 TV, Mobile Phone, Bicycle, Motorbike, Fan c("TV", "Mobile Phone", "Bicycle", "Motorbike", "Fan")
7 3501 Mobile Phone, Bicycle, Water Tank c("Mobile Phone", "Bicycle", "Water Tank")
8 3880 Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow c("Tractor", "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow")
9 3207 DVD Player, Bicycle, Plow c("DVD Player", "Bicycle", "Plow")
10 3928 TV, Mobile Phone, Bicycle, Fan c("TV", "Mobile Phone", "Bicycle", "Fan")
我想将上面的数据框转换为以下格式:
Bicycle TV Mobile Phone Water Tank [etc...]
2729 1 0 0 0
3979 1 1 1 1
3860 . 1 0 1 0
[etc...]
我不经常使用字符串或字符,因此我一直在搞清楚如何items.split
特别地操纵变量。我看过这样的问题这样,但我不想词的频率计数,而是对频率计数附加到每个ID。因此,我认为我正在努力的工作是将类似于将一个频率命令与每个ID链接FreqMat
在一起的简单dplyr
命令集成在一起。
任何帮助是极大的赞赏。数据如下。
structure(list(ID = c(2729L, 3979L, 3860L, 2357L, 2278L, 3277L,
3501L, 3880L, 3207L, 3928L), Items = c("Bicycle", "TV, Mobile Phone, Bicycle, Water Tank",
"Mobile Phone, Bicycle, Fan", "Mobile Phone, Motorbike", "TV, Mobile Phone, Wagon/Cart, Motorbike, Plow",
"TV, Mobile Phone, Bicycle, Motorbike, Fan", "Mobile Phone, Bicycle, Water Tank",
"Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow", "DVD Player, Bicycle, Plow",
"TV, Mobile Phone, Bicycle, Fan"), items.split = list("Bicycle",
c("TV", "Mobile Phone", "Bicycle", "Water Tank"), c("Mobile Phone",
"Bicycle", "Fan"), c("Mobile Phone", "Motorbike"), c("TV",
"Mobile Phone", "Wagon/Cart", "Motorbike", "Plow"), c("TV",
"Mobile Phone", "Bicycle", "Motorbike", "Fan"), c("Mobile Phone",
"Bicycle", "Water Tank"), c("Tractor", "Mobile Phone", "Wagon/Cart",
"Motorbike", "Plow"), c("DVD Player", "Bicycle", "Plow"),
c("TV", "Mobile Phone", "Bicycle", "Fan"))), row.names = c(NA,
10L), class = "data.frame")
你可以使用cSplit_e
从splitstackshape
splitstackshape::cSplit_e(df, "Items", type = "character", fill = 0, drop = TRUE)
# ID items.split Items_Bicycle Items_DVD Player Items_Fan
#1 2729 Bicycle 1 0 0
#2 3979 TV, Mobile Phone, Bicycle, Water Tank 1 0 0
#3 3860 Mobile Phone, Bicycle, Fan 1 0 1
#4 2357 Mobile Phone, Motorbike 0 0 0
#5 2278 TV, Mobile Phone, Wagon/Cart, Motorbike, Plow 0 0 0
#6 3277 TV, Mobile Phone, Bicycle, Motorbike, Fan 1 0 1
#7 3501 Mobile Phone, Bicycle, Water Tank 1 0 0
#8 3880 Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow 0 0 0
#9 3207 DVD Player, Bicycle, Plow 1 1 0
#10 3928 TV, Mobile Phone, Bicycle, Fan 1 0 1
# Items_Mobile Phone Items_Motorbike Items_Plow Items_Tractor Items_TV Items_Wagon/Cart Items_Water Tank
#1 0 0 0 0 0 0 0
#2 1 0 0 0 1 0 1
#3 1 0 0 0 0 0 0
#4 1 1 0 0 0 0 0
#5 1 1 1 0 1 1 0
#6 1 1 0 0 1 0 0
#7 1 0 0 0 0 0 1
#8 1 1 1 1 0 1 0
#9 0 0 1 0 0 0 0
#10 1 0 0 0 1 0 0
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句