I have large pandas DataFrame with following format:
prod_id timestamp text
150523 0006641040 9.393408e+08 text_1
150500 0006641040 9.408096e+08 text_2
150499 0006641041 1.009325e+09 text_3
150508 0006641041 1.018397e+09 text_4
150524 0006641042 1.025482e+09 text_5
DataFrame is sorted by prod_id
and timestamp
. What I am trying to do, is to enumerate a counter for every prod_id
based on the timestamp
from earliest to latest. For example, I am trying to achieve something like this:
prod_id timestamp text enum
150523 0006641040 9.393408e+08 text_1 1
150500 0006641040 9.408096e+08 text_2 2
150499 0006641041 1.009325e+09 text_3 1
150508 0006641041 1.018397e+09 text_4 2
150524 0006641042 1.025482e+09 text_5 1
I can do this iteratively quite easily by going through each row and increasing counter, but is there a way to do this in a more functional programming fashion?
UPDATE:
In [324]: df
Out[324]:
prod_id timestamp text
150523 6641040 9.393408e+08 text_1
150500 6641040 9.408096e+08 text_2
150501 6641040 9.408096e+08 text_3
150499 6641041 1.009325e+09 text_3
150508 6641041 1.018397e+09 text_4
150524 6641042 1.025482e+09 text_5
In [325]: df['enum'] = df.groupby(['prod_id'])['timestamp'].cumcount() + 1
In [326]: df
Out[326]:
prod_id timestamp text enum
150523 6641040 9.393408e+08 text_1 1
150500 6641040 9.408096e+08 text_2 2
150501 6641040 9.408096e+08 text_3 3
150499 6641041 1.009325e+09 text_3 1
150508 6641041 1.018397e+09 text_4 2
150524 6641042 1.025482e+09 text_5 1
OLD answer:
In [314]: df['enum'] = df.groupby(['prod_id'])['timestamp'].rank().astype(int)
In [315]: df
Out[315]:
prod_id timestamp text enum
150523 6641040 9.393408e+08 text_1 1
150500 6641040 9.408096e+08 text_2 2
150499 6641041 1.009325e+09 text_3 1
150508 6641041 1.018397e+09 text_4 2
150524 6641042 1.025482e+09 text_5 1
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments