I'm having a similar problem with the groupby
method that this person has posted on StackOverflow:
pandas group StopIteration error
What I am trying to do with the grouby
method is simpler, but I am getting a similar StopIteration
error:
Traceback (most recent call last):
File "prepare_data_TJ2012_v1p0.py", line 107, in <module>
grouped = df.groupby('hh').apply(f)
File "/Users/shafiquejamal/allfiles/htdocs/venvs/easyframes-py3/lib/python3.4/site-packages/pandas/core/groupby.py", line 637, in apply
return self._python_apply_general(f)
File "/Users/shafiquejamal/allfiles/htdocs/venvs/easyframes-py3/lib/python3.4/site-packages/pandas/core/groupby.py", line 644, in _python_apply_general
not_indexed_same=mutated)
File "/Users/shafiquejamal/allfiles/htdocs/venvs/easyframes-py3/lib/python3.4/site-packages/pandas/core/groupby.py", line 2657, in _wrap_applied_output
v = next(v for v in values if v is not None)
StopIteration
Here is the code that produces it:
df = pd.DataFrame(
{'educ': {0: 'pri', 1: 'bach', 2: 'pri', 3: 'hi', 4: 'bach', 5: 'sec',
6: 'hi', 7: 'hi', 8: 'pri', 9: 'pri'},
'hh': {0: 1, 1: 1, 2: 1, 3: 2, 4: 3, 5: 3, 6: 4, 7: 4, 8: 4, 9: 4},
'id': {0: 1, 1: 2, 2: 3, 3: 1, 4: 1, 5: 2, 6: 1, 7: 2, 8: 3, 9: 4},
'has_car': {0: 1, 1: 1, 2: 1, 3: 1, 4: 0, 5: 0, 6: 1, 7: 1, 8: 1, 9: 1},
'weighthh': {0: 2, 1: 2, 2: 2, 3: 3, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3, 9: 3},
'house_rooms': {0: 3, 1: 3, 2: 3, 3: 2, 4: 1, 5: 1, 6: 3, 7: 3, 8: 3, 9: 3},
'prov': {0: 'BC', 1: 'BC', 2: 'BC', 3: 'Alberta', 4: 'BC', 5: 'BC', 6: 'Alberta',
7: 'Alberta', 8: 'Alberta', 9: 'Alberta'},
'age': {0: 44, 1: 43, 2: 13, 3: 70, 4: 23, 5: 20, 6: 37, 7: 35, 8: 8, 9: 15},
'fridge': {0: 'yes', 1: 'yes', 2: 'yes', 3: 'no', 4: 'yes', 5: 'yes', 6: 'no',
7: 'no', 8: 'no', 9: 'no'},
'male': {0: 1, 1: 0, 2: 1, 3: 1, 4: 1, 5: 0, 6: 1, 7: 0, 8: 0, 9: 0}})
print(df)
print('-- groupby dataframes ---')
def f(df):
print('-------------------------')
print('DataFrame' )
print(df)
s = df['age']
print(s)
print('----> Not nulls:')
s_notnulls = ~s.isnull()
print(s_notnulls)
print('----> Number of non-nulls: %d' % len(s_notnulls[s_notnulls==True]))
df.groupby('hh').apply(f)
I want to perform an operation on a column, by group, if there is at least one non-null value in another column.
I'm using pandas==0.14.1
. It seems that the loop over the groups goes too long. Is this a bug? (or maybe I'm using the groupby
method wrong...)
You are getting this error because the function you are passing to apply doesn't return anything. If all you care about is the printed output, you could just return the df back, like this.
def f(df):
print('-------------------------')
print('DataFrame' )
print(df)
s = df['age']
print(s)
print('----> Not nulls:')
s_notnulls = ~s.isnull()
print(s_notnulls)
print('----> Number of non-nulls: %d' % len(s_notnulls[s_notnulls==True]))
return df
Then the apply will run through without error.
In [295]: df.groupby('hh').apply(f)
-------------------------
DataFrame
age educ fridge has_car hh house_rooms id male prov weighthh
0 44 pri yes 1 1 3 1 1 BC 2
1 43 bach yes 1 1 3 2 0 BC 2
2 13 pri yes 1 1 3 3 1 BC 2
.....
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments