Is it possible to iterate over a dask GroupBy object to get access to the underlying dataframes? I tried:
import dask.dataframe as dd
import pandas as pd
pdf = pd.DataFrame({'A':[1,2,3,4,5], 'B':['1','1','a','a','a']})
ddf = dd.from_pandas(pdf, npartitions = 3)
groups = ddf.groupby('B')
for name, df in groups:
print(name)
However, this results in an error: KeyError: 'Column not found: 0'
More generally speaking, what kind of interactions does the dask GroupBy object allow, except from the apply method?
you could iterate through groups doing this with dask, maybe there is a better way but this works for me.
import dask.dataframe as dd
import pandas as pd
pdf = pd.DataFrame({'A':[1, 2, 3, 4, 5], 'B':['1','1','a','a','a']})
ddf = dd.from_pandas(pdf, npartitions = 3)
groups = ddf.groupby('B')
for group in pdf['B'].unique():
print groups.get_group(group)
this would return
dd.DataFrame<dataframe-groupby-get_group-e3ebb5d5a6a8001da9bb7653fface4c1, divisions=(0, 2, 4, 4)>
dd.DataFrame<dataframe-groupby-get_group-022502413b236592cf7d54b2dccf10a9, divisions=(0, 2, 4, 4)>
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加