R-ddply功能

Mannat M 发表于 Dev

曼纳特M

大家好，我有一个大约8个数据集，我想针对每个城市和年份的组合计算出最大的数据量。

数据集如下所示：

  city    sales    volume    year    avg price
  abilene 239      12313     2000    7879
  kansas  2324     18765     2000     2424
  nyc     2342     987651    2000     3127
  abilene 3432     34342     2001     1234
  nyc     2342     10000     2001     3127 
  kansas  176      3130     2001      879
  kansas  123      999650     2002    2424  
  abilene 3432     34342     2002     1234
  nyc     2342     98000    2002     3127

我希望我的数据集看起来像这样：

city    year    volume
nyc     2000    987651    
abilene 2001    34342
kansas  2002    999650

我使用ddplyr来查找每个城市的最大数量。

newdf=ddply(df,c('city','year'),summarise, max(volume))

但是，这为我提供了一个数据集，其中包含每年每个城市的最大值。但是，我只想知道一年比较所有城市的最大数量。谢谢你。

指标

  library(dplyr)
  df %>%   #df is your dataframe
  group_by(year)%>%
  filter(volume==max(volume))

Source: local data frame [3 x 5]
Groups: year

     city sales volume year avg_price
1     nyc  2342 987651 2000      3127
2 abilene  3432  34342 2001      1234
3  kansas   123 999650 2002      2424


#updated : If you are grouping by both city and year

df %>%   #df is your dataframe
  group_by(year,city)%>%
  filter(volume==max(volume))

Source: local data frame [9 x 5]
Groups: year, city

     city sales volume year avg_price
1 abilene   239  12313 2000      7879
2  kansas  2324  18765 2000      2424
3     nyc  2342 987651 2000      3127
4 abilene  3432  34342 2001      1234
5     nyc  2342  10000 2001      3127
6  kansas   176   3130 2001       879
7  kansas   123 999650 2002      2424
8 abilene  3432  34342 2002      1234
9     nyc  2342  98000 2002      3127