我想使用SQL或R对数据进行分组,Subarea_codes
以便每个Company
和都能获得前10名或后10名Area_code
。从本质上说:在Subarea_codes
内部Area_codes
,每个Company
都有其最大或最小的结果。
data.csv
Area_code Subarea_code Company Result
10 101 A 15
10 101 P 10
10 101 C 4
10 102 A 10
10 102 P 8
10 102 C 5
11 111 A 15
11 111 P 20
11 111 C 5
11 112 A 10
11 112 P 5
11 112 C 10
result.csv should be like this
Company Area_code Largest_subarea_code Result Smallest_subarea_code Result
A 10 101 15 102 10
P 10 101 10 102 8
C 10 102 5 101 4
A 11 111 15 112 10
P 11 111 20 112 5
C 11 112 10 111 5
每个公司Area_code
中可以有数百个,Subarea_codes
但我只希望每个公司的前十名和后十名。
同样,这不必在一个查询中解决,而是可以分为两个查询,这意味着最小的结果显示在results_10_smallest中,最大的结果显示在result_10_largest中。但我希望我可以对每个结果执行一个查询来完成此操作。
我尝试过的
SELECT Company, Area_code, Subarea_code MAX(Result)
AS Max_result
FROM data
GROUP BY Subarea_code
ORDER BY Company
;
这使我Companies
在每个Subarea_code中获得最高的结果。这意味着:以上数据为A,A,P,AC。
使用sqldf
包:
df <- read.table(text="Area_code Subarea_code Company Result
10 101 A 15
10 101 P 10
10 101 C 4
10 102 A 10
10 102 P 8
10 102 C 5
11 111 A 15
11 111 P 20
11 111 C 5
11 112 A 10
11 112 P 5
11 112 C 10", header=TRUE)
library(sqldf)
mymax <- sqldf("select Company,
Area_code,
max(Subarea_code) Largest_subarea_code
from df
group by Company,Area_code")
mymaxres <- sqldf("select d.Company,
d.Area_code,
m.Largest_subarea_code,
d.Result
from df d, mymax m
where d.Company=m.Company and
d.Subarea_code=m.Largest_subarea_code")
mymin <- sqldf("select Company,
Area_code,
min(Subarea_code) Smallest_subarea_code
from df
group by Company,Area_code")
myminres <- sqldf("select d.Company,
d.Area_code,
m.Smallest_subarea_code,
d.Result
from df d, mymin m
where d.Company=m.Company and
d.Subarea_code=m.Smallest_subarea_code")
result <- sqldf("select a.*, b.Smallest_subarea_code,b.Result
from mymaxres a, myminres b
where a.Company=b.Company and
a.Area_code=b.Area_code")
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句