跨列取消嵌套一组字符串，但将其保留在R中的原始行中

debugcn 发表于 Dev

宁

我正在尝试找到一种方法来嵌套一组跨列的字符串，但将所有字符串保留在原始行中。从示例数据集中获取示例数据集starwars，dplyr因为它具有与我的数据集相似的结构。

该starwars数据集具有的3嵌套列films，vehicles，starships。常用的方法是执行此操作，unnest_longer以便将一组字符串嵌套成多行-每行包含一个字符串。但是，我希望将所有未分组的字符串保留在原始行中。

另一种方法是使用rowwise()和mutate使用paste。这可行，但是我的数据集有15个嵌套列，因此我必须输入15行带有粘贴的变异行。有点乏味。

df <- dplyr::starwars %>%
  rowwise() %>%
  mutate(films = paste(films, collapse=', '),
         vehicles = paste(vehicles, collapse=', '),
         starships = paste(starships, collapse=', '))

我目前的想法是想出一个包装函数，也许我可以purrr大规模地完成它。但是我的函数编写欠佳，无法正常工作-也许我对dplyr罩不太熟悉。

ungroup_string <- function(data, x){
  a <- rowwise(data)
  a %>% mutate(x = paste(x, collapse=','))
}

我可以通过多种方式对此字符串进行分组吗？

罗纳克·沙

您可以使用across：

library(dplyr)

starwars %>%
  select(name, films, vehicles, starships) %>%
  rowwise() %>%
  mutate(across(c(films,vehicles, starships), toString))

#    name       films                                vehicles         starships                             
#   <chr>      <chr>                                <chr>            <chr>                                 
# 1 Luke Skyw… The Empire Strikes Back, Revenge of… "Snowspeeder, I… "X-wing, Imperial shuttle"            
# 2 C-3PO      The Empire Strikes Back, Attack of … ""               ""                                    
# 3 R2-D2      The Empire Strikes Back, Attack of … ""               ""                                    
# 4 Darth Vad… The Empire Strikes Back, Revenge of… ""               "TIE Advanced x1"                     
# 5 Leia Orga… The Empire Strikes Back, Revenge of… "Imperial Speed… ""                                    
# 6 Owen Lars  Attack of the Clones, Revenge of th… ""               ""                                    
# 7 Beru Whit… Attack of the Clones, Revenge of th… ""               ""                                    
# 8 R5-D4      A New Hope                           ""               ""                                    
# 9 Biggs Dar… A New Hope                           ""               "X-wing"                              
#10 Obi-Wan K… The Empire Strikes Back, Attack of … "Tribubble bong… "Jedi starfighter, Trade Federation c…
# … with 77 more rows

across接受整洁的选择变量。因此，您不必一一指定15列中的每一列。您可以按位置1:15，范围col1:col15或名称中的某些模式选择列名称starts_with('col')。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。