Weighted means for several columns, by groups (in a data.table)

Peutch Published at Dev

Peutch

This question follows another one on group weighted means: I would like to create weighted within-group averages using data.table. The difference with the initial question is that the names of the variables to be average are specified in a string vector.

The data:

df <- read.table(text= "
          region    state  county  weights y1980  y1990  y2000
             1        1       1       10     100    200     50
             1        1       2        5      50    100    200
             1        1       3      120    1000    500    250
             1        1       4        2      25    100    400
             1        1       4       15     125    150    200
             2        2       1        1      10     50    150
             2        2       2       10      10     10    200
             2        2       2       40      40    100     30
             2        2       3       20     100    100     10
", header=TRUE, na.strings=NA)

Using Roland's suggested answer from aforementioned question:

library(data.table)
dt <- as.data.table(df)
dt2 <- dt[,lapply(.SD,weighted.mean,w=weights),by=list(region,state,county)]

I have a vector with strings to determine dynamically columns for which I want the within-group weighted average.

colsToKeep = c("y1980","y1990")

But I do not know how to pass it as an argument for the data.table magic.

I tried

 dt[,lapply(
      as.list(colsToKeep),weighted.mean,w=weights),
      by=list(region,state,county)]`

but I then get:

Error in x * w : non-numeric argument to binary operator

Not sure how to achieve what I want.

Bonus question: I'd like original columns names to be kept, instead of getting V1 and V2.

NB I use version 1.9.3 of the data.table package.

Arun

Normally, you should be able to do:

dt2 <- dt[,lapply(.SD,weighted.mean,w=weights), 
          by = list(region,state,county), .SDcols = colsToKeep]

i.e., just by providing just those columns to .SDcols. But at the moment, this won't work due to a bug, in that weights column won't be available because it's not specified in .SDcols.

Until it's fixed, we can accomplish this as follows:

dt2 <- dt[, lapply(mget(colsToKeep), weighted.mean, w = weights), 
            by = list(region, state, county)]
#    region state county     y1980    y1990
# 1:      1     1      1  100.0000 200.0000
# 2:      1     1      2   50.0000 100.0000
# 3:      1     1      3 1000.0000 500.0000
# 4:      1     1      4  113.2353 144.1176
# 5:      2     2      1   10.0000  50.0000
# 6:      2     2      2   34.0000  82.0000
# 7:      2     2      3  100.0000 100.0000

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-11-18

Comments

0 comments

From Dev

Related Related

Article

Weighted means for several columns, by groups (in a data.table)

Weighted means for several columns, by groups (in a data.table)

Column means for several columns

data.table execute function on groups of columns

Subsetting and assignment on several columns of a data table

Subsetting and assignment on several columns of a data table

Split Data into groups of equal means

Conditional calculation of means of different columns in data.table with R

Conditional calculation of means of different columns in data.table with R

Weighted means for groups in r - using aggregate and weighted.mean functions together

Highlight groups of table columns

Keep existing columns when expanding data.table object by groups

Apply a function across groups and columns in data.table and/or dplyr

How to split a data.table by groups and use subset by occourences in a columns?

weighted table data frame with plyr

How to set several columns as the key in data.table package (r)?

Need sql query to pull back data that meets several groups of criteria from same table in one query

Weighted mean for multiple columns in a data frame in Pandas

Group several columns then aggregate a set of columns in Pandas (It crashes badly compared to R's data.table)

Pandas Melt several groups of columns into multiple target columns by name

Pandas Melt several groups of columns into multiple target columns by name

Melting data with several groups of column names in R

Melting data with several groups of column names in R

R: Cumulative weighted mean in data.table

How to calculate weighted means for each column (or row) of a matrix using the columns (or rows) from another matrix?

Multiplying column means for groups by column mean for the entire data

How to plot weighted means by group?

How to efficiently aggregate multiple data.table columns by groups, N-at-a-time, where N is variable

R: generate means and SD table from some columns of a table by group

Remove rows from data.table in R based on values of several columns

What does ".N" means in data table in r?