Calculate column sums for each combination of two grouping variables

heo

I have a dataset that looks something like this:

 Type Age   count1  count2  Year   Pop1   Pop2  TypeDescrip
  A   35    1        1      1990   30000  50000  alpha                                 
  A   35    3        1      1990   30000  50000  alpha 
  A   45    2        3      1990   20000  70000  alpha 
  B   45    2        1      1990   20000  70000  beta
  B   45    4        5      1990   20000  70000  beta 

I want to add the counts of the rows that are matching in the Type and Age columns. So ideally I would end up with a dataset that looks like this:

 Type  Age  count1  count2  Year   Pop1   Pop2  TypeDescrip 
  A   35    4        2      1990   30000  50000  alpha 
  A   45    2        3      1990   20000  70000  alpha 
  B   45    6        6      1990   20000  70000  beta 

I've tried using nested duplicated() statements such as below:

typedup = duplicated(df$Type)
bothdup = duplicated(df[(typedup == TRUE),]$Age)

but this returns indices for which age or type are duplicated, not necessarily when one row has duplicates of both.

I've also tried tapply:

tapply(c(df$count1, df$count2), c(df$Age, df$Type), sum)

but this output is difficult to work with. I want to have a data.frame when I'm done.

I don't want to use a for-loop because my dataset is quite large.

akrun

Try

library(dplyr)
df1 %>%
     group_by(Type, Age) %>% 
     summarise_each(funs(sum))
#    Type Age count1 count2
#1    A  35      4      2
#2    A  45      2      3
#3    B  45      6      6

In the newer versions of dplyr

df1 %>%
     group_by(Type, Age) %>%
     summarise_all(sum)

Or using base R

 aggregate(.~Type+Age, df1, FUN=sum)
 #    Type Age count1 count2
 #1    A  35      4      2
 #2    A  45      2      3
 #3    B  45      6      6

Or

library(data.table)
setDT(df1)[, lapply(.SD, sum), .(Type, Age)] 
#   Type Age count1 count2
#1:    A  35      4      2
#2:    A  45      2      3
#3:    B  45      6      6

Update

Based on the new dataset,

 df2 %>%
     group_by(Type, Age,Pop1, Pop2, TypeDescrip) %>% 
     summarise_each(funs(sum), matches('^count'))
 #    Type Age  Pop1  Pop2 TypeDescrip count1 count2
 #1    A  35 30000 50000       alpha      4      2
 #2    A  45 20000 70000        beta      2      3
 #3    B  45 20000 70000        beta      6      6

data

 df1 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L, 
 35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L, 
 1L, 3L, 1L, 5L)), .Names = c("Type", "Age", "count1", "count2"
 ), class = "data.frame", row.names = c(NA, -5L))

 df2 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L, 
 35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L, 
 1L, 3L, 1L, 5L), Year = c(1990L, 1990L, 1990L, 1990L, 1990L), 
   Pop1 = c(30000L, 30000L, 20000L, 20000L, 20000L), Pop2 = c(50000L, 
   50000L, 70000L, 70000L, 70000L), TypeDescrip = c("alpha", 
   "alpha", "beta", "beta", "beta")), .Names = c("Type", "Age", 
  "count1", "count2", "Year", "Pop1", "Pop2", "TypeDescrip"),
   class =   "data.frame", row.names = c(NA, -5L))

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

calculate list of the sums of each column from csv

From Dev

Calculate by grouping by each column one a time

From Dev

Assign different combination of weight to variables and calculate the result for each iteration

From Dev

R Loop to calculate mean for each of two variables

From Dev

Using Numpy to find combination of rows in an array such that each column sums to the same value

From Dev

Save unique values of variable for each combination of two variables in a dataset

From Dev

Create column with n random numbers for each unique combination of variables

From Dev

data.table calculate sums by two variables and add observations for "empty" groups

From Dev

How to calculate the difference of two sums in SQL

From Dev

Select or subset variables whose column sums are not zero

From Dev

math operation grouping sums of two columns per categories

From Dev

Pandas add grouping to each column

From Dev

Calculate percentage of each word combination from input

From Dev

Calculate the sum of each possible continuous combination of a integer

From Dev

pandas Grouping based on two variables

From Dev

Summarise data by two grouping variables

From Dev

Sum a variable in a grouped dataframe only once for each unique combination of two other variables with dplyr

From Dev

Calculate percentage with two variables

From Dev

Select rows with Max(Column Value) for each unique combination of two other columns

From Dev

Oracle SQL show result of specific combination of two column values in each row

From Python

Find cumulative sums of each grouping in a row and then set the grouping equal to the maximum sum

From Dev

How to calculate cumulative sums of ones with a reset each time a zero is encountered

From Dev

Calculate the median date based on two grouping conditions

From Dev

how to calculate a new column after grouping with dplyr

From Dev

SQL query for grouping by (project, environment) and finding the newest entry for each combination?

From Dev

Grouping elements in a list such that each seed produces unique combination

From Dev

Sum rows of each unique combination of variables in r

From Dev

In Excel, I am trying to calculate a percentage of two sums

From Dev

How to calculate weighted sums of rows based on value in another column

Related Related

  1. 1

    calculate list of the sums of each column from csv

  2. 2

    Calculate by grouping by each column one a time

  3. 3

    Assign different combination of weight to variables and calculate the result for each iteration

  4. 4

    R Loop to calculate mean for each of two variables

  5. 5

    Using Numpy to find combination of rows in an array such that each column sums to the same value

  6. 6

    Save unique values of variable for each combination of two variables in a dataset

  7. 7

    Create column with n random numbers for each unique combination of variables

  8. 8

    data.table calculate sums by two variables and add observations for "empty" groups

  9. 9

    How to calculate the difference of two sums in SQL

  10. 10

    Select or subset variables whose column sums are not zero

  11. 11

    math operation grouping sums of two columns per categories

  12. 12

    Pandas add grouping to each column

  13. 13

    Calculate percentage of each word combination from input

  14. 14

    Calculate the sum of each possible continuous combination of a integer

  15. 15

    pandas Grouping based on two variables

  16. 16

    Summarise data by two grouping variables

  17. 17

    Sum a variable in a grouped dataframe only once for each unique combination of two other variables with dplyr

  18. 18

    Calculate percentage with two variables

  19. 19

    Select rows with Max(Column Value) for each unique combination of two other columns

  20. 20

    Oracle SQL show result of specific combination of two column values in each row

  21. 21

    Find cumulative sums of each grouping in a row and then set the grouping equal to the maximum sum

  22. 22

    How to calculate cumulative sums of ones with a reset each time a zero is encountered

  23. 23

    Calculate the median date based on two grouping conditions

  24. 24

    how to calculate a new column after grouping with dplyr

  25. 25

    SQL query for grouping by (project, environment) and finding the newest entry for each combination?

  26. 26

    Grouping elements in a list such that each seed produces unique combination

  27. 27

    Sum rows of each unique combination of variables in r

  28. 28

    In Excel, I am trying to calculate a percentage of two sums

  29. 29

    How to calculate weighted sums of rows based on value in another column

HotTag

Archive