Search

Search

Aggregate a column based on NAs in a different column

Soheil Published at Dev

16

Soheil

I want to aggregate group2 based on NAs in group1:

Datetime            group1  group2
2011-08-08 21:00:00   1       1
2011-08-08 21:10:00   NA      2
2011-08-08 21:20:00   NA      3
2011-08-08 21:30:00   2       4
2011-08-08 21:40:00   NA      5
2011-08-08 21:50:00   NA      6
2011-08-08 22:00:00   3       7

This is my desired output:

Datetime            group1  group2
2011-08-08 21:00:00   1       1
2011-08-08 21:30:00   2       9 
2011-08-08 22:00:00   3       18

Edit: 9=2+3+4 and 18=5+6+7.

aggregate(group2~group1, data=Data, subset(Data,group1==NA),sum)

Any suggestion is appreciated. Can I do it with aggregate? or should I use different package?

Rich Scriven

It looks like na.locf from package zoo would be quite useful here.

Assuming dat is your original data, we can take the dates for the non-NA group1 levels and use cbind to bring them together with the aggregated group2 data.

> library(zoo)
> Datetime <- dat$Datetime[!is.na(dat$group1)]
> cbind(Datetime, aggregate(group2~group1, na.locf(dat, fromLast = TRUE), sum))
#              Datetime group1 group2
# 1 2011-08-08 21:00:00      1      1
# 2 2011-08-08 21:30:00      2      9
# 3 2011-08-08 22:00:00      3     18

PS: Thanks for updating/editing your question (+1).

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-11

0

Comments

0 comments

Login to comment

Related

From Java

Mutate new column based on different datasets

From Java

Apache spark aggregation: aggregate column based on another column value

From Java

How to fill names with NA based on a different column

From Dev

Python - Transpose/Pivot a column based based on a different column

From Dev

How to select based on different column data

From Dev

Scala: aggregate column based file

From Dev

SQLSERVER group by (aggregate column based on other column)

From Dev

Pandas interpolate NaNs based on different column

From Dev

Joining tables based on different column names

From Dev

SQL XML Column filter based on XML node aggregate function

From Dev

Aggregate by repeated datetime index with different identifiers in a column on a pandas dataframe

From Dev

Duplicate row based on value in different column

From Dev

R: aggregate dataframe but different column

From Dev

Is there a way to choose a different column on a comparison based on a variable?

From Dev

deedle aggregate/group based on running numbers in a column of Frame

From Dev

R Conditional replacement of NAs based on text in another column

From Dev

Order rows based on different column value

From Dev

dplyr mutate based on other column with different suffix

From Dev

pandas dataframe: how to aggregate a subset of rows based on value of a column

From Dev

Aggregate contents of a column based on the range of values in another column in Pandas

From Dev

Conditionally aggregate grouped data frame with different functions depending on values in a column

From Dev

PySpark: Fill NAs with mode of column based on aggregation of other columns

From Dev

sum a column based on another column in R, but skip the rows with NAs

From Dev

SQLSERVER group by (aggregate column based on other column)

From Dev

Aggregate by repeated datetime index with different identifiers in a column on a pandas dataframe

From Dev

R: aggregate dataframe but different column

From Dev

select different column based on column values

From Dev

Aggregate Pandas Column based on values in Column Range

From Dev

How to conditionally aggregate a column based on another

Related Related

Article

HotTag

Archive