缺少数据的R重塑

debugcn 发表于 Dev

用户名

我有一个看起来像这样的数据集（df）。

ID  Variable    Value
A   Height  4
A   Height  4.5
A   Height  5
B   Height  5
B   Height  5.2
B   Height  5.3
C   Height  5.1
C   Height  5.1
C   Height  5.25
A   Weight  110
A   Weight  112
A   Weight  120
B   Weight  111
B   Weight  110
C   Weight  120
C   Weight  114
C   Weight  115

除B的“ Weight”外，每个变量均列出3次。我需要将其强制转换为以下形式。

ID  Height1 Height2 Height3 Weight1 Weight2 Weight3
A   4        4.5      5       110   112     120
B   5        5.2    5.3       111   110      .
C   5.1      5.1    5.25      120   114     115

关于我该怎么做的任何想法？任何帮助表示赞赏。

奥史密斯

如果高度和重量已在预定范围内ID，则可以进行以下操作。我使用dplyr添加了一个变量，以表示每个变量中身高和体重的顺序ID。

require(dplyr)    
dat = dat %>% group_by(ID, Variable) %>% mutate(seq = 1:n())

require(reshape2)
datwide = dcast(dat, ID ~ Variable + seq, value.var = "Value", fill = ".")
names(datwide) = sub("_", "", names(datwide))

该fill参数用于定义要为缺失值输入的内容。我不知道如何从中的名称中删除下划线dcast，但sub过去我曾使用过下划线来替换它们。

正如@Beasterfield指出的那样，将代表高度和权重序列的数字简单地添加到变量名会更干净。因为我使用的Variable是分组变量，所以我无法直接对其进行修改（这可能是用户错误）。相反，我在Variable2中使用dcast。

dat = dat %>% group_by(ID, Variable) %>% mutate(Variable2 = paste0(Variable, 1:n()) )
datwide = dcast(dat, ID ~ Variable2, value.var = "Value", fill = ".")

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。