Weighted means for several columns, by groups (in a data.table)

Peutch

This question follows another one on group weighted means: I would like to create weighted within-group averages using data.table. The difference with the initial question is that the names of the variables to be average are specified in a string vector.

The data:

df <- read.table(text= "
          region    state  county  weights y1980  y1990  y2000
             1        1       1       10     100    200     50
             1        1       2        5      50    100    200
             1        1       3      120    1000    500    250
             1        1       4        2      25    100    400
             1        1       4       15     125    150    200
             2        2       1        1      10     50    150
             2        2       2       10      10     10    200
             2        2       2       40      40    100     30
             2        2       3       20     100    100     10
", header=TRUE, na.strings=NA)

Using Roland's suggested answer from aforementioned question:

library(data.table)
dt <- as.data.table(df)
dt2 <- dt[,lapply(.SD,weighted.mean,w=weights),by=list(region,state,county)]

I have a vector with strings to determine dynamically columns for which I want the within-group weighted average.

colsToKeep = c("y1980","y1990")

But I do not know how to pass it as an argument for the data.table magic.

I tried

 dt[,lapply(
      as.list(colsToKeep),weighted.mean,w=weights),
      by=list(region,state,county)]` 

but I then get:

Error in x * w : non-numeric argument to binary operator

Not sure how to achieve what I want.

Bonus question: I'd like original columns names to be kept, instead of getting V1 and V2.

NB I use version 1.9.3 of the data.table package.

Arun

Normally, you should be able to do:

dt2 <- dt[,lapply(.SD,weighted.mean,w=weights), 
          by = list(region,state,county), .SDcols = colsToKeep]

i.e., just by providing just those columns to .SDcols. But at the moment, this won't work due to a bug, in that weights column won't be available because it's not specified in .SDcols.

Until it's fixed, we can accomplish this as follows:

dt2 <- dt[, lapply(mget(colsToKeep), weighted.mean, w = weights), 
            by = list(region, state, county)]
#    region state county     y1980    y1990
# 1:      1     1      1  100.0000 200.0000
# 2:      1     1      2   50.0000 100.0000
# 3:      1     1      3 1000.0000 500.0000
# 4:      1     1      4  113.2353 144.1176
# 5:      2     2      1   10.0000  50.0000
# 6:      2     2      2   34.0000  82.0000
# 7:      2     2      3  100.0000 100.0000

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Column means for several columns

From Dev

data.table execute function on groups of columns

From Dev

Subsetting and assignment on several columns of a data table

From Dev

Subsetting and assignment on several columns of a data table

From Dev

Split Data into groups of equal means

From Dev

Conditional calculation of means of different columns in data.table with R

From Dev

Conditional calculation of means of different columns in data.table with R

From Java

Weighted means for groups in r - using aggregate and weighted.mean functions together

From Dev

Highlight groups of table columns

From Dev

Keep existing columns when expanding data.table object by groups

From Dev

Apply a function across groups and columns in data.table and/or dplyr

From Dev

How to split a data.table by groups and use subset by occourences in a columns?

From Dev

weighted table data frame with plyr

From Dev

How to set several columns as the key in data.table package (r)?

From Dev

Need sql query to pull back data that meets several groups of criteria from same table in one query

From Dev

Weighted mean for multiple columns in a data frame in Pandas

From Dev

Group several columns then aggregate a set of columns in Pandas (It crashes badly compared to R's data.table)

From Dev

Pandas Melt several groups of columns into multiple target columns by name

From Dev

Pandas Melt several groups of columns into multiple target columns by name

From Dev

Melting data with several groups of column names in R

From Dev

Melting data with several groups of column names in R

From Dev

R: Cumulative weighted mean in data.table

From Dev

How to calculate weighted means for each column (or row) of a matrix using the columns (or rows) from another matrix?

From Dev

Multiplying column means for groups by column mean for the entire data

From Dev

How to plot weighted means by group?

From Dev

How to efficiently aggregate multiple data.table columns by groups, N-at-a-time, where N is variable

From Dev

R: generate means and SD table from some columns of a table by group

From Dev

Remove rows from data.table in R based on values of several columns

From Dev

What does ".N" means in data table in r?

Related Related

  1. 1

    Column means for several columns

  2. 2

    data.table execute function on groups of columns

  3. 3

    Subsetting and assignment on several columns of a data table

  4. 4

    Subsetting and assignment on several columns of a data table

  5. 5

    Split Data into groups of equal means

  6. 6

    Conditional calculation of means of different columns in data.table with R

  7. 7

    Conditional calculation of means of different columns in data.table with R

  8. 8

    Weighted means for groups in r - using aggregate and weighted.mean functions together

  9. 9

    Highlight groups of table columns

  10. 10

    Keep existing columns when expanding data.table object by groups

  11. 11

    Apply a function across groups and columns in data.table and/or dplyr

  12. 12

    How to split a data.table by groups and use subset by occourences in a columns?

  13. 13

    weighted table data frame with plyr

  14. 14

    How to set several columns as the key in data.table package (r)?

  15. 15

    Need sql query to pull back data that meets several groups of criteria from same table in one query

  16. 16

    Weighted mean for multiple columns in a data frame in Pandas

  17. 17

    Group several columns then aggregate a set of columns in Pandas (It crashes badly compared to R's data.table)

  18. 18

    Pandas Melt several groups of columns into multiple target columns by name

  19. 19

    Pandas Melt several groups of columns into multiple target columns by name

  20. 20

    Melting data with several groups of column names in R

  21. 21

    Melting data with several groups of column names in R

  22. 22

    R: Cumulative weighted mean in data.table

  23. 23

    How to calculate weighted means for each column (or row) of a matrix using the columns (or rows) from another matrix?

  24. 24

    Multiplying column means for groups by column mean for the entire data

  25. 25

    How to plot weighted means by group?

  26. 26

    How to efficiently aggregate multiple data.table columns by groups, N-at-a-time, where N is variable

  27. 27

    R: generate means and SD table from some columns of a table by group

  28. 28

    Remove rows from data.table in R based on values of several columns

  29. 29

    What does ".N" means in data table in r?

HotTag

Archive