我想获得该CID的最新日期,以及同一日期的最新金额。对于我实施的最新日期如下
A = LOAD '$input' AS (cid:chararray, date:chararray, amt:chararray,tid:chararray, time:chararray);
B = FOREACH (GROUP A BY (cid,tid)) {
sort = ORDER A BY date DESC;
latest = LIMIT sort 1;
GENERATE FLATTEN(newest);`enter code here`
};'
但是我想要最新的金额,因为我在同一日期有多个记录,因此尝试通过按以下所示的时间订购来获取金额。
AMT = FOREACH (GROUP B BY (cid,tid)){
sort1 = ORDER B BY time DESC;
lastamt = LIMIT sort1 1;
GENERATE FLATTEN(lastamt.amt);
};
输入/输出:
9822736906^A2015-08-02^A146.08^A^A21:57:05.000000
9822736906^A2015-08-02^A250.12^A58926968^A22:45:30.000000
9822736906^A2015-08-02^A132.1^A00000000^A22:55:29.000000
9822736906^A2015-08-02^A60.97^A00000000^A23:02:48.000000
9826964132^A2015-08-05^A98.2^A^A23:05:46.000000
9822736906^A2015-08-05^A85.71^A4F7581^A23:12:22.000000
9822736906^A2015-08-05^A655.73^A00000000^A23:17:24.000000
O / p应该是
9822736906^A2015-08-05^A655.73^A00000000^A23:17:24.000000
9826964132^A2015-08-05^A98.2^A^A23:05:46.000000
9822736906 ^ A2015-08-02 ^ A60.97 ^ A00000000 ^ A23:02:48.000000
如果目标是选择最新的cid记录,则下面的代码段将起作用。
在相同的ORDER BY运算符中按日期和时间顺序进行排序。
输入 :
9822736906 2015-08-02 146.08 21:57:05.000000
9822736906 2015-08-02 250.12 58926968 22:45:30.000000
9822736906 2015-08-02 132.1 00000000 22:55:29.000000
9822736906 2015-08-02 60.97 00000000 23:02:48.000000
9826964132 2015-08-05 98.2 23:05:46.000000
9822736906 2015-08-05 85.71 4F7581 23:12:22.000000
9822736906 2015-08-05 655.73 00000000 23:17:24.000000
猪脚本:
A = LOAD 'a.csv' USING PigStorage('\t') AS (cid:chararray, date:chararray, amt:chararray,tid:chararray, time:chararray);
B = GROUP A BY cid;
C = FOREACH B {
sort = ORDER A BY date DESC, time DESC;
latest = LIMIT sort 1;
GENERATE FLATTEN(latest);
};
输出:DUMP C:
(9822736906,2015-08-05,655.73,00000000,23:17:24.000000)
(9826964132,2015-08-05,98.2,,23:05:46.000000)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句