我有一个9行的数据框。我想将前三行乘以一个值,将后三行乘以第二个值,将第三组3行乘以另一个值。
我正在使用这些变量:
import pandas as pd
df = pd.DataFrame([[i] * 5 for i in range(9)], columns=list('ABCDE'))
a = pd.Series(range(3))
print df
A B C D E
0 0 0 0 0 0
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7
8 8 8 8 8 8
我能够使它像这样工作:
for i, e in a.iteritems():
start, end = i * len(a), (i + 1) * len(a)
df.iloc[start:end] *= e
print df
A B C D E
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 12 12 12 12 12
7 14 14 14 14 14
8 16 16 16 16 16
另一种解决方案多df
由mul
与numpy array
通过扩大numpy.repeat
:
print (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
A B C D E
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 3 3 3 3 3
4 4 4 4 4 4
5 5 5 5 5 5
6 12 12 12 12 12
7 14 14 14 14 14
8 16 16 16 16 16
时间-(len(df)=9
):
In [20]: %timeit (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
The slowest run took 6.12 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 197 µs per loop
In [21]: %%timeit
...: df.loc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)
__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
The slowest run took 6.16 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 199 µs per loop
计时代码-(len(df)=90k
):
df = pd.DataFrame([[i] * 5 for i in range(9)], columns=list('ABCDE'))
df = pd.concat([df]*10000).reset_index(drop=True)
a = pd.Series(range(3000))
print (df)
时间-(len(df)=90k
):
In [24]: %timeit (df.mul(np.repeat(a.index.values, [3] * len(a)), axis=0))
100 loops, best of 3: 3.58 ms per loop
In [33]: %%timeit
...: df.loc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)
...:
__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
100 loops, best of 3: 10.9 ms per loop
In [34]: %%timeit
...: df.iloc[:, :] = (df.values.reshape(3, df.size / 3) * np.arange(3)[:, None]).reshape(df.shape)
...:
__main__:257: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
100 loops, best of 3: 10.9 ms per loop
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句