如果您只想查找唯一值,建议您使用itertools.chain.from_iterable
串联所有这些列表
import itertools
>>> np.unique([*itertools.chain.from_iterable(df.Genre)])
array(['action', 'crime', 'drama'], dtype='<U6')
甚至更快
>>> set(itertools.chain.from_iterable(df.Genre))
{'action', 'crime', 'drama'}
Timings
df = pd.DataFrame({'Genre':[['crime','drama'],['action','crime','drama']]})
df = pd.concat([df]*10000)
%timeit set(itertools.chain.from_iterable(df.Genre))
100 loops, best of 3: 2.55 ms per loo
%timeit set([x for y in df['Genre'] for x in y])
100 loops, best of 3: 4.09 ms per loop
%timeit np.unique([*itertools.chain.from_iterable(df.Genre)])
100 loops, best of 3: 12.8 ms per loop
%timeit np.unique(df['Genre'].sum())
1 loop, best of 3: 1.65 s per loop
%timeit set(df['Genre'].sum())
1 loop, best of 3: 1.66 s per loop
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句