我有以下数据框:
lineups <- tibble::tribble(
~lineupBefore, ~playerOut, ~playerIn,
"Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Kendrick Nunn", "Kendrick Nunn", "Goran Dragic",
NA, "Justise Winslow", "Derrick Jones Jr.",
NA, "Meyers Leonard", "Kelly Olynyk",
NA, "Bam Adebayo", "Justise Winslow",
NA, "Tyler Herro", "Duncan Robinson",
NA, "Derrick Jones Jr.", "Bam Adebayo",
NA, "Goran Dragic", "Kendrick Nunn",
NA, "Justise Winslow", "Tyler Herro",
NA, "Kelly Olynyk", "Meyers Leonard",
NA, "Bam Adebayo", "Justise Winslow"
)
然后,我创建一个列:
lineups %>%
mutate(lineupAfter = str_replace(lineupBefore, playerOut, playerIn))
结果是:
tibble::tribble(
~lineupBefore, ~playerOut, ~playerIn, ~lineupAfter,
"Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Kendrick Nunn", "Kendrick Nunn", "Goran Dragic", "Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Goran Dragic",
NA, "Justise Winslow", "Derrick Jones Jr.", NA,
NA, "Meyers Leonard", "Kelly Olynyk", NA,
NA, "Bam Adebayo", "Justise Winslow", NA,
NA, "Tyler Herro", "Duncan Robinson", NA,
NA, "Derrick Jones Jr.", "Bam Adebayo", NA,
NA, "Goran Dragic", "Kendrick Nunn", NA,
NA, "Justise Winslow", "Tyler Herro", NA,
NA, "Kelly Olynyk", "Meyers Leonard", NA,
NA, "Bam Adebayo", "Justise Winslow", NA
)
现在,我想将lineupBefore中的NA值设置为lineupAfter中的先前值。然后,必须将与创建lineupAfter列相同的函数应用于lineupBefore中的新值。如果我尝试使用mutate进行操作,它将仅替换第一行NA中的值。因此,我需要该函数在每一行上工作,然后将其转换为不同于NA的内容,然后再继续下一行。我想我需要使用purrr来做到这一点,但我不知道如何做。任何帮助,将不胜感激!
编辑:
这是前5行的样子:
tibble::tribble(
~lineupBefore, ~playerOut, ~playerIn, ~lineupAfter,
"Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Kendrick Nunn", "Kendrick Nunn", "Goran Dragic", "Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Goran Dragic",
"Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Goran Dragic", "Justise Winslow", "Derrick Jones Jr.", "Derrick Jones Jr., Bam Adebayo, Meyers Leonard, Tyler Herro, Goran Dragic",
"Derrick Jones Jr., Bam Adebayo, Meyers Leonard, Tyler Herro, Goran Dragic", "Meyers Leonard", "Kelly Olynyk", "Derrick Jones Jr., Bam Adebayo, Kelly Olynyk, Tyler Herro, Goran Dragic",
"Derrick Jones Jr., Bam Adebayo, Kelly Olynyk, Tyler Herro, Goran Dragic", "Bam Adebayo", "Justise Winslow", "Derrick Jones Jr., Justise Winslow, Kelly Olynyk, Tyler Herro, Goran Dragic",
"Derrick Jones Jr., Justise Winslow, Kelly Olynyk, Tyler Herro, Goran Dragic", "Tyler Herro", "Duncan Robinson", "Derrick Jones Jr., Justise Winslow, Kelly Olynyk, Duncan Robinson, Goran Dragic"
)
如您所见,列lineupBefore的第2行将等于列lineupAfter的第1行,列lineupBefore的第3行将等于列lineupAfter的第2行,依此类推。同时,lineupAfter的第2行将是str_replace(lineupBefore,playerOut,playerIn)应用于结果lineupBefore的第2行的结果,依此类推。
您要求使用管道{purrr}
样式方法。您在这里所做的是累积从一组到另一组的更改,因此您想使用purrr::accumulate
和setdiff
。
我认为将您的lineup*
列设置为列表列要容易得多,而不是像这样的字符串。这意味着在列的每一行中存储名称向量,而不是在其中包含逗号的单个字符串。
从您的第一个lineups
表开始:
library(stringr)
library(dplyr)
library(purrr)
lineups <-
tibble::tribble(
~lineupBefore, ~playerOut, ~playerIn,
"Justise Winslow, Bam Adebayo, Meyers Leonard, Tyler Herro, Kendrick Nunn", "Kendrick Nunn", "Goran Dragic",
NA, "Justise Winslow", "Derrick Jones Jr.",
NA, "Meyers Leonard", "Kelly Olynyk",
NA, "Bam Adebayo", "Justise Winslow",
NA, "Tyler Herro", "Duncan Robinson",
NA, "Derrick Jones Jr.", "Bam Adebayo",
NA, "Goran Dragic", "Kendrick Nunn",
NA, "Justise Winslow", "Tyler Herro",
NA, "Kelly Olynyk", "Meyers Leonard",
NA, "Bam Adebayo", "Justise Winslow"
)
lineups_list <-
lineups %>%
mutate(lineupBefore = str_split(lineupBefore, ", "))
lineups_list
# A tibble: 10 x 3 lineupBefore playerOut playerIn <list> <chr> <chr> 1 <chr [5]> Kendrick Nunn Goran Dragic 2 <chr [1]> Justise Winslow Derrick Jones Jr. 3 <chr [1]> Meyers Leonard Kelly Olynyk 4 <chr [1]> Bam Adebayo Justise Winslow 5 <chr [1]> Tyler Herro Duncan Robinson 6 <chr [1]> Derrick Jones Jr. Bam Adebayo 7 <chr [1]> Goran Dragic Kendrick Nunn 8 <chr [1]> Justise Winslow Tyler Herro 9 <chr [1]> Kelly Olynyk Meyers Leonard 10 <chr [1]> Bam Adebayo Justise Winslow
因此,现在您有一lineupBefore
列,其中第一个元素是长度为5的向量,长度为1的所有向量行均为单个NA
值。
我们要执行的功能是获取第一个length-5向量,并playerIn
依次(c(initial_players, new_player)
反复)将名称添加到向量中。如果我们有无限的篮球比赛,那就是我们要拥有的,只是不断增加球员。purrr::accumulate
会做到这一点,并在每一步返回结果。
但是接下来我们也想从playerOut
每一步中撤出球员。这是一样的setdiff(current_players, removed_player)
一遍又一遍。为了同时执行两个操作,我们使用purrr::accumulate2
。
我们传递给它的功能顺序操作,并且在ARGS ..1
,..2
和..3
在前面的步骤的结果将成为下一步的..1
。我们传入的第一个参数playerIn
,因此..2
每次都将其添加到结果中。第二个参数是playerOut
,因此这是..3
我们setdiff
每次都会删除的参数。而且我们必须使用开始花名册(lineupBefore[[1]]
)对其进行初始化,否则它将只是从没有球员的空队中累积而来。
您可以看到类似以下内容的输出:
x <- lineups_list$playerIn
y <- lineups_list$playerOut
accumulate2(
x, y, ~setdiff(c(..1, ..2), ..3),
.init = lineups_list$lineupBefore[[1]]
)
[[1]] [1] "Justise Winslow" "Bam Adebayo" "Meyers Leonard" "Tyler Herro" "Kendrick Nunn" [[2]] [1] "Justise Winslow" "Bam Adebayo" "Meyers Leonard" "Tyler Herro" "Goran Dragic" [[3]] [1] "Bam Adebayo" "Meyers Leonard" "Tyler Herro" "Goran Dragic" "Derrick Jones Jr." [[4]] [1] "Bam Adebayo" "Tyler Herro" "Goran Dragic" "Derrick Jones Jr." "Kelly Olynyk" [[5]] [1] "Tyler Herro" "Goran Dragic" "Derrick Jones Jr." "Kelly Olynyk" "Justise Winslow" [[6]] [1] "Goran Dragic" "Derrick Jones Jr." "Kelly Olynyk" "Justise Winslow" "Duncan Robinson" [[7]] [1] "Goran Dragic" "Kelly Olynyk" "Justise Winslow" "Duncan Robinson" "Bam Adebayo" [[8]] [1] "Kelly Olynyk" "Justise Winslow" "Duncan Robinson" "Bam Adebayo" "Kendrick Nunn" [[9]] [1] "Kelly Olynyk" "Duncan Robinson" "Bam Adebayo" "Kendrick Nunn" "Tyler Herro" [[10]] [1] "Duncan Robinson" "Bam Adebayo" "Kendrick Nunn" "Tyler Herro" "Meyers Leonard" [[11]] [1] "Duncan Robinson" "Kendrick Nunn" "Tyler Herro" "Meyers Leonard" "Justise Winslow"
但是,这是一个长度为11的列表。这是因为我们从一个.init
参数开始,因此它被视为步骤之一。然后,您可能会注意到元素2-11是您想要lineupAfter
的元素,元素1-10是您想要的元素lineupBefore
。因此,您可以使用相同的函数来计算两者,只需要切断第一个元素或最后一个元素即可。(附带说明,您可以仅使用lead
/的某个版本使lag
一列与另一列偏移,这将使您无法两次计算这些函数。但是我以这种方式保留了它以显示其并行结构。)
lineups_list_filled <- lineups_list %>%
mutate(
lineupAfter = accumulate2(
playerIn, playerOut, ~setdiff(c(..1, ..2), ..3),
.init = lineupBefore[[1]]
)[-1], # [] removes the head
lineupBefore = accumulate2(
playerIn, playerOut, ~setdiff(c(..1, ..2), ..3),
.init = lineupBefore[[1]]
)[-length(playerIn)] # [] removes the last element
)
lineups_list_filled
# A tibble: 10 x 4 lineupBefore playerOut playerIn lineupAfter <list> <chr> <chr> <list> 1 <chr [5]> Kendrick Nunn Goran Dragic <chr [5]> 2 <chr [5]> Justise Winslow Derrick Jones Jr. <chr [5]> 3 <chr [5]> Meyers Leonard Kelly Olynyk <chr [5]> 4 <chr [5]> Bam Adebayo Justise Winslow <chr [5]> 5 <chr [5]> Tyler Herro Duncan Robinson <chr [5]> 6 <chr [5]> Derrick Jones Jr. Bam Adebayo <chr [5]> 7 <chr [5]> Goran Dragic Kendrick Nunn <chr [5]> 8 <chr [5]> Justise Winslow Tyler Herro <chr [5]> 9 <chr [5]> Kelly Olynyk Meyers Leonard <chr [5]> 10 <chr [5]> Bam Adebayo Justise Winslow <chr [5]>
如果您查看lineups_list_filled$lineupBefore
和lineups_list_filled$lineupAfter
,则会发现它们与上面的length-11列表中的正确元素匹配。例如,如果要将它们折叠回字符串,以便进行打印,则可以始终执行以下操作:
lineups_list_filled %>%
mutate_all(
~map_chr(., ~paste(.x, collapse = ", "))
)
PS仅当您具有不可重复的元素(例如名册中的单个播放器)时,此方法才有效。例如,如果您使用任意整数来执行此操作,那么您将不能两次拥有3,因为首先setdiff
调用unique
。在这种情况下,你可以构建自己的版本setdiff
,使用match
和which
一些错误检查边缘情况。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句