Conditional binary join and update by reference using the data.table package

David Arenburg

So here is my real life problem which I feel like can be easily solved and I'm missing something obvious here. I have two big data sets called TK and DFT

library(data.table)
set.seed(123)
(TK <- data.table(venue_id = rep(1:3, each = 2), 
                  DFT_id = rep(1:3, 2), 
                  New_id = sample(1e4, 6),
                  key = "DFT_id"))

#    venue_id DFT_id New_id
# 1:        1      1   2876
# 2:        1      2   7883
# 3:        2      3   4089
# 4:        2      1   8828
# 5:        3      2   9401
# 6:        3      3    456

(DFT <- data.table(venue_id = rep(1:2, each = 2), 
                   DFT_id = 1:4, 
                   New_id = sample(4),
                   key = "DFT_id"))

#    venue_id DFT_id New_id
# 1:        1      1      3
# 2:        1      2      4
# 3:        2      3      2
# 4:        2      4      1

I want to perform a binary left join to TK on the DFT_id column when venue_id %in% 1:2, while updating New_id by reference. In other words, the desired result would be

TK
#    venue_id DFT_id New_id
# 1:        1      1      3
# 2:        2      1      3
# 3:        1      2      4
# 4:        3      2   9401
# 5:        2      3      2
# 6:        3      3    456

I was thinking to combine both conditions, but it didn't work (still not sure why)

TK[venue_id %in% 1:2 & DFT, New_id := i.New_id][]
# Error in `[.data.table`(TK, DFT & venue_id %in% 1:2, `:=`(New_id, i.New_id)) : 
#   i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). 
# Please let datatable-help know if you'd like this, or add your comments to FR #1611.

My next idea was to use chaining which partially achieves the goal by joining correctly but on some temporary table without actually affecting TK

TK[venue_id %in% 1:2][DFT, New_id := i.New_id][]
TK
#    venue_id DFT_id New_id
# 1:        1      1   2876
# 2:        2      1   8828
# 3:        1      2   7883
# 4:        3      2   9401
# 5:        2      3   4089
# 6:        3      3    456

So to make clear, I'm well aware that I can split TK into two tables, perform the join and then rbind again, but I'm doing many different conditional joins like this and I'm also looking for both speed and memory efficient solutions.

This also means that I am not looking for a dplyr solution as I'm trying to use both binary join and the update by reference features which only exist in the data.table package IIRC.


For additional information see these vignettes:

DaveTurek

Copying from Arun's updated answer here

TK[venue_id %in% 1:2, New_id := DFT[.SD, New_id]][]
#    venue_id DFT_id New_id
# 1:        1      1      3
# 2:        2      1      3
# 3:        1      2      4
# 4:        3      2   9401
# 5:        2      3      2
# 6:        3      3    456

His answer gives the details of what is going on.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

Update a table using JOIN in SQL Server?

From Dev

data.table with rolling join calculating average conditional on date

From Dev

Use ListView control to display data but join a reference table

From Dev

Conditional (inequality) join in data.table

From Dev

Removing rows by reference using data.table?

From Dev

Update a table with join?

From Dev

Update column in related table with data from a join

From Dev

Using conditional statements in r data.table

From Dev

Conditional Table Join In SQL Server

From Dev

update column on join table

From Dev

Join tables and fetch the data to a table using codeigniter

From Dev

Left join using data.table

From Dev

Cannot update a table using a simple inner join

From Dev

Ordering factors in data table using DT package

From Dev

Select from a table using a reference from a linked table (Join within a Join?)

From Dev

Using reference table fields to update main table with SQL

From Dev

Redshift table update with join

From Dev

Conditional UPDATE statement with JOIN

From Dev

Update by reference with rolling join

From Dev

Using Postgres: Create a Column of Binary Data Conditional on Data in Another Table

From Dev

MYSQL conditional update table

From Dev

Conditional update in an database table using 2 separate buttons

From Dev

Conditional data manipulation using data.table in R

From Dev

Data.Table: conditionally update values in .SD by reference

From Dev

Conditional data binding on table

From Dev

Using parameters within data.table column update by reference

From Dev

Update table using join and conditions(where) from another table

From Dev

One table conditional join

From Dev

MySQL multiple table join Update and Delete where joined data unknown

Related Related

  1. 1

    Update a table using JOIN in SQL Server?

  2. 2

    data.table with rolling join calculating average conditional on date

  3. 3

    Use ListView control to display data but join a reference table

  4. 4

    Conditional (inequality) join in data.table

  5. 5

    Removing rows by reference using data.table?

  6. 6

    Update a table with join?

  7. 7

    Update column in related table with data from a join

  8. 8

    Using conditional statements in r data.table

  9. 9

    Conditional Table Join In SQL Server

  10. 10

    update column on join table

  11. 11

    Join tables and fetch the data to a table using codeigniter

  12. 12

    Left join using data.table

  13. 13

    Cannot update a table using a simple inner join

  14. 14

    Ordering factors in data table using DT package

  15. 15

    Select from a table using a reference from a linked table (Join within a Join?)

  16. 16

    Using reference table fields to update main table with SQL

  17. 17

    Redshift table update with join

  18. 18

    Conditional UPDATE statement with JOIN

  19. 19

    Update by reference with rolling join

  20. 20

    Using Postgres: Create a Column of Binary Data Conditional on Data in Another Table

  21. 21

    MYSQL conditional update table

  22. 22

    Conditional update in an database table using 2 separate buttons

  23. 23

    Conditional data manipulation using data.table in R

  24. 24

    Data.Table: conditionally update values in .SD by reference

  25. 25

    Conditional data binding on table

  26. 26

    Using parameters within data.table column update by reference

  27. 27

    Update table using join and conditions(where) from another table

  28. 28

    One table conditional join

  29. 29

    MySQL multiple table join Update and Delete where joined data unknown

HotTag

Archive