如何使用map *和mutate将列表转换为一组其他列？

debugcn 发表于 Dev

弗朗西斯·巴顿

实际上，我已经尝试了数百种这种代码的排列，以期获得能够满足我想要的功能的函数，最后我放弃了。感觉绝对应该可行，而且我非常亲密！

我尝试通过下面的reprex回到这里的小问题。

基本上，我有一个单行数据帧，其中的 ~~一列包含~~ 一个字符串列表（“概念”）。我想为每个字符串创建一个附加列mutate，最好使用，使该列从字符串中获取名称，然后用函数调用的结果填充该列（对于哪个函数，这并不重要现在吗？-我只需要功能的基础结构即可。）

像往常一样，我觉得我必须丢失一些明显的东西……也许只是语法错误。我也想知道是否需要使用purrr::map，也许更简单的矢量化映射会正常工作。

我觉得新列是命名的，..1而不是概念名称，这是关于什么地方出错的线索。

我可以通过手动调用每个概念来创建所需的数据框架（请参阅reprex的结尾），但是由于不同数据框架的概念列表不同，因此我想使用管道和tidyverse技术对其进行功能化，而不是手动进行。

我已阅读以下问题以寻求帮助：

但是这些都没有帮助我解决我遇到的问题。[编辑：在最后一个q中添加了该列表，这可能是我需要的技术]。

<!-- language-all: lang-r -->


    # load packages -----------------------------------------------------------

    library(rlang)
    library(dplyr)
    library(tidyr)
    library(magrittr)
    library(purrr)
    library(nomisr)



    # set up initial list of tibbles ------------------------------------------

    df <- list(
      district_population = tibble(
        dataset_title = "Population estimates - local authority based by single year",
        dataset_id = "NM_2002_1"
      ),
      jsa_claimants = tibble(
        dataset_title = "Jobseeker\'s Allowance with rates and proportions",
        dataset_id = "NM_1_1"
      )
    )


    # just use the first tibble for now, for testing --------------------------
    # ideally I want to map across dfs through a list -------------------------

    df <- df[[1]]

    # nitty gritty functions --------------------------------------------------

    get_concept_list <- function(df) {
      dataset_id <- pluck(df, "dataset_id")
      nomis_overview(id = dataset_id,
                     select = c("dimensions", "codes")) %>%
        pluck("value", 1, "dimension") %>%
        filter(!concept == "geography") %>%
        pull("concept")
    }

    # get_concept_list() returns the strings I need:
    get_concept_list(df)
    #> [1] "time"     "gender"   "c_age"    "measures"

    # Here is a list of examples of types of map* that do various things,
    # none of which is what I need it to do
    # I'm using toupper() here for simplicity - ultimately I will use
    # get_concept_info() to populate the new columns

    # this creates four new tibbles
    get_concept_list(df) %>% 
      map(~ mutate(df, {{.x}} := toupper(.x)))
    #> [[1]]
    #> # A tibble: 1 x 3
    #>   dataset_title                                               dataset_id ..1  
    #>   <chr>                                                       <chr>      <chr>
    #> 1 Population estimates - local authority based by single year NM_2002_1  TIME 
    #> 
    #> [[2]]
    #> # A tibble: 1 x 3
    #>   dataset_title                                               dataset_id ..1   
    #>   <chr>                                                       <chr>      <chr> 
    #> 1 Population estimates - local authority based by single year NM_2002_1  GENDER
    #> 
    #> [[3]]
    #> # A tibble: 1 x 3
    #>   dataset_title                                               dataset_id ..1  
    #>   <chr>                                                       <chr>      <chr>
    #> 1 Population estimates - local authority based by single year NM_2002_1  C_AGE
    #> 
    #> [[4]]
    #> # A tibble: 1 x 3
    #>   dataset_title                                               dataset_id ..1    
    #>   <chr>                                                       <chr>      <chr>  
    #> 1 Population estimates - local authority based by single year NM_2002_1  MEASUR~

    # this throws an error
    get_concept_list(df) %>% 
      map_chr(~ mutate(df, {{.x}} := toupper(.x)))
    #> Error: Result 1 must be a single string, not a vector of class `tbl_df/tbl/data.frame` and of length 3

    # this creates three extra rows in the tibble
    get_concept_list(df) %>% 
      map_df(~ mutate(df, {{.x}} := toupper(.x)))
    #> # A tibble: 4 x 3
    #>   dataset_title                                               dataset_id ..1    
    #>   <chr>                                                       <chr>      <chr>  
    #> 1 Population estimates - local authority based by single year NM_2002_1  TIME   
    #> 2 Population estimates - local authority based by single year NM_2002_1  GENDER 
    #> 3 Population estimates - local authority based by single year NM_2002_1  C_AGE  
    #> 4 Population estimates - local authority based by single year NM_2002_1  MEASUR~

    # this does the same as map_df
    get_concept_list(df) %>% 
      map_dfr(~ mutate(df, {{.x}} := toupper(.x)))
    #> # A tibble: 4 x 3
    #>   dataset_title                                               dataset_id ..1    
    #>   <chr>                                                       <chr>      <chr>  
    #> 1 Population estimates - local authority based by single year NM_2002_1  TIME   
    #> 2 Population estimates - local authority based by single year NM_2002_1  GENDER 
    #> 3 Population estimates - local authority based by single year NM_2002_1  C_AGE  
    #> 4 Population estimates - local authority based by single year NM_2002_1  MEASUR~

    # this creates a single tibble 12 columns wide
    get_concept_list(df) %>% 
      map_dfc(~ mutate(df, {{.x}} := toupper(.x)))
    #> # A tibble: 1 x 12
    #>   dataset_title dataset_id ..1   dataset_title1 dataset_id1 ..11  dataset_title2
    #>   <chr>         <chr>      <chr> <chr>          <chr>       <chr> <chr>         
    #> 1 Population e~ NM_2002_1  TIME  Population es~ NM_2002_1   GEND~ Population es~
    #> # ... with 5 more variables: dataset_id2 <chr>, ..12 <chr>,
    #> #   dataset_title3 <chr>, dataset_id3 <chr>, ..13 <chr>

    # function to get info on each concept (except geography) -----------------
    # this is the function I want to use eventually to populate my new columns

    get_concept_info <- function(df, concept_name) {
      dataset_id <- pluck(df, "dataset_id")
      nomis_overview(id = dataset_id) %>%
        filter(name == "dimensions") %>%
        pluck("value", 1, "dimension") %>%
        filter(concept == concept_name) %>%
        pluck("codes.code", 1) %>%
        select(name, value) %>%
        nest(data = everything()) %>%
        as.list() %>%
        pluck("data")
    }


    # individual mutate works, for comparison ---------------------------------
    # I can create the kind of table I want manually using a line like the one below

    # df %>% map(~ mutate(., measures = get_concept_info(., concept_name = "measures")))
    df %>% mutate(., measures = get_concept_info(df, "measures"))
    #> # A tibble: 1 x 3
    #>   dataset_title                                        dataset_id measures      
    #>   <chr>                                                <chr>      <list>        
    #> 1 Population estimates - local authority based by sin~ NM_2002_1  <tibble [2 x ~

<sup>Created on 2020-02-10 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>

尤金C

使用!!和:=使您可以动态命名列。然后，我们可以减少map()with的列表输出reduce()，即使用数据集标题和id列left_joins（）列表中的所有数据框。

df_2 <- 
  map(get_concept_list(df),
      ~ mutate(df,
               !!.x := get_concept_info(df, .x))) %>% 
  reduce(left_join, by = c("dataset_title", "dataset_id"))

df_2

# A tibble: 1 x 6
  dataset_title                                               dataset_id           time         gender          c_age       measures
  <chr>                                                       <chr>      <list<df[,2]>> <list<df[,2]>> <list<df[,2]>> <list<df[,2]>>
1 Population estimates - local authority based by single year NM_2002_1        [28 x 2]        [3 x 2]      [121 x 2]        [2 x 2]

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。