重组具有多种数据类型的大型数据帧

debugcn 发表于 Dev

Ztarrk

我正在努力以正确的形状获取数据（xlsx文件）。我的原始数据库如下：

   patient when    age weight height watchID dateFrom           
   <chr>   <chr> <dbl> <dbl>   <dbl>   <dbl> <dttm>             
 1 T01     pre      82 83        174    2788 2017-07-24
 2 T02     pre      81 80        166    7309 2017-07-22 
 3 T02     post     67 91        163    7309 2017-10-26 
 4 T03     pre      68 91        172    5066 2017-07-26 
 5 T03     post     68 91        172    7220 2017-10-24

我想获得一个广泛的数据库，其中基于“时间”列只有一个患者ID。但是，当我尝试重塑形状时，我终于可以使用“ dcast”功能来做到这一点：

   patient age_post age_pre weight_post weight_pre height_post height_pre
   <chr>      <int>   <int>       <int>      <int>       <int>      <int>
 1 T01            0       1           0          1           0          1
 2 T02            1       1           1          1           1          1
 3 T03            1       1           1          1           1          1
 4 T04            0       1           0          1           0          1
 5 T05            1       0           1          0           1          0

它以某种方式将所有变量更改为1和0。我如何获得一个相似的数据库，其变量类型具有不同的变量类型，并在原始列上附加了“ pre”和“ post”？

这是我的代码（“ HW”是上面提到的原始数据集）：

mdata <- melt(HW, id=c("patient","when"))
mdata$value <- as.numeric(as.character(mdata$value)) #I added this line to convert the column to numeric but it doesn't help
mdata2 <- dcast(mdata, patient~variable+when)

我也尝试过：

mdata <- melt(HW, id=c("patient","when"))
mdata3 <- reshape(mdata, idvar='patient', timevar='when', direction='wide')

但是然后我得到这个：

   patient variable.pre value.pre variable.post value.post
   <chr>   <fct>        <chr>     <fct>         <chr>     
 1 T01     age          82        NA            NA        
 2 T02     age          81        age           67        
 3 T03     age          68        age           68        
 4 T04     age          81        NA            NA        
 5 T05     NA           NA        age           87

没有其他变量。

提前致谢。

威慑11

这是您想要的吗？

library(tidyr)
df <- tibble(patient = c("T01","T02","T02","T03","T03"),
             when = c("pre","pre","post","pre","post"),
             age = c(82,81,67,68,68),
             weight = c(83,80,91,91,91),
             height = c(174,166,163,172,172),
             watchid = c(2788,7309,7309,5066,7220),
             datefrom = c("2017-07-24","2017-07-22","2017-10-26",
                          "2017-07-26","2017-10-24"))

df %>%
  pivot_wider(names_from = when,
              values_from = c(age,weight,height,watchid,datefrom))

A tibble: 3 x 11
  patient age_pre age_post weight_pre weight_post height_pre height_post watchid_pre watchid_post
  <chr>     <dbl>    <dbl>      <dbl>       <dbl>      <dbl>       <dbl>       <dbl>        <dbl>
1 T01          82       NA         83          NA        174          NA        2788           NA
2 T02          81       67         80          91        166         163        7309         7309
3 T03          68       68         91          91        172         172        5066         7220

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。