Can I replace NAs when joining two data frames with dplyr?

qdread

I would like to join two data frames. Some of the column names overlap, and there are NA entries in one of the data frame's overlapping columns. Here is a simplified example:

df1 <- data.frame(fruit = c('apples','oranges','bananas','grapes'), var1 = c(1,2,3,4), var2 = c(3,NA,6,NA), stringsAsFactors = FALSE)
df2 <- data.frame(fruit = c('oranges','grapes'), var2=c(5,6), var3=c(7,8), stringsAsFactors = FALSE)

Can I use dplyr join functions to join these data frames and automatically prioritize the non-NA entry so that I get the "var2" column to have no NA entries in the joined data frame? As it is now, if I call left_join, it keeps the NA entries, and if I call full_join it duplicates the rows.

Psidom

coalesce might be something you need. It fills the NA from the first vector with values from the second vector at corresponding positions:

library(dplyr)
df1 %>% 
        left_join(df2, by = "fruit") %>% 
        mutate(var2 = coalesce(var2.x, var2.y)) %>% 
        select(-var2.x, -var2.y)

#     fruit var1 var3 var2
# 1  apples    1   NA    3
# 2 oranges    2    7    5
# 3 bananas    3   NA    6
# 4  grapes    4    8    6

Or use data.table, which does in-place replacing:

library(data.table)
setDT(df1)[setDT(df2), on = "fruit", `:=` (var2 = i.var2, var3 = i.var3)]
df1
#      fruit var1 var2 var3
# 1:  apples    1    3   NA
# 2: oranges    2    5    7
# 3: bananas    3    6   NA
# 4:  grapes    4    6    8

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

pandas: how to aggregate two list columns when joining data frames

From Dev

Joining two data frames of different lengths

From Dev

Avoiding column duplicate column names when joining two data frames in PySpark

From Dev

How can I replace hyphen "cells" in R data frames with zeros?

From Dev

Combine information from two data frames with dplyr

From Dev

How can I transform two data frames into another one?

From Dev

Joining two data frames and filling down empty columns

From Dev

Replace values between two data frames in R

From Dev

Merge two data frames, but only include variables where there are no NAs

From Dev

Is there a dplyr or data.table equivalent to plyr::join_all? Joining by a list of data frames?

From Dev

Can I facet data with NAs?

From Dev

Losing data when joining two tables in mysql?

From Dev

Joining list of data frames in R

From Dev

joining data frames and substitutung rows

From Dev

How can I avoid a cartesian product when joining aggregates from two tables?

From Dev

How to combine two data frames using dplyr or other packages?

From Dev

How to sequentially join two data frames using `dplyr` functions?

From Dev

Dealing with empty data frames (<0 rows>) when using pipes in dplyr

From Dev

How can I combine(concatenate) two data frames with the same column name in java

From Dev

Joining 2 data frames with overlapping data

From Dev

Joining data in spark data frames using Scala

From Dev

Nan values when I merge these data frames

From Dev

R: merge two data frames when either of two criteria matches

From Dev

How can i avoid duplication while joining two tables

From Dev

ggplot legends when plot is built from two data frames

From Dev

Joining two models to obtain data

From Dev

How can I modify a particular field in a list of data frames?

From Dev

How can I convert the format of columns from multiple data frames?

From Dev

how can i concatenate those data frames columns in pandas?

Related Related

  1. 1

    pandas: how to aggregate two list columns when joining data frames

  2. 2

    Joining two data frames of different lengths

  3. 3

    Avoiding column duplicate column names when joining two data frames in PySpark

  4. 4

    How can I replace hyphen "cells" in R data frames with zeros?

  5. 5

    Combine information from two data frames with dplyr

  6. 6

    How can I transform two data frames into another one?

  7. 7

    Joining two data frames and filling down empty columns

  8. 8

    Replace values between two data frames in R

  9. 9

    Merge two data frames, but only include variables where there are no NAs

  10. 10

    Is there a dplyr or data.table equivalent to plyr::join_all? Joining by a list of data frames?

  11. 11

    Can I facet data with NAs?

  12. 12

    Losing data when joining two tables in mysql?

  13. 13

    Joining list of data frames in R

  14. 14

    joining data frames and substitutung rows

  15. 15

    How can I avoid a cartesian product when joining aggregates from two tables?

  16. 16

    How to combine two data frames using dplyr or other packages?

  17. 17

    How to sequentially join two data frames using `dplyr` functions?

  18. 18

    Dealing with empty data frames (<0 rows>) when using pipes in dplyr

  19. 19

    How can I combine(concatenate) two data frames with the same column name in java

  20. 20

    Joining 2 data frames with overlapping data

  21. 21

    Joining data in spark data frames using Scala

  22. 22

    Nan values when I merge these data frames

  23. 23

    R: merge two data frames when either of two criteria matches

  24. 24

    How can i avoid duplication while joining two tables

  25. 25

    ggplot legends when plot is built from two data frames

  26. 26

    Joining two models to obtain data

  27. 27

    How can I modify a particular field in a list of data frames?

  28. 28

    How can I convert the format of columns from multiple data frames?

  29. 29

    how can i concatenate those data frames columns in pandas?

HotTag

Archive