I have a pandas dataframe that has multiple columns, of which I am interested in a specific column that has a series of (1, or 0). The logic that I want to perform is:
If (the current row is 1 and the next row is 0):
count = count + 1
else :
pass
df['NewCol'] = count
so, this is what I tried:
secCnt = 0
def sectionCount(data):
global secCnt
if( (data[['secFlg']] == 0) and (data[['secFlg'].shift(-1)] == 1) ):
secCnt = secCnt + 1
else:
pass
return secCnt
if __name__ == "__main__":
df['SectionIndex'] = df.apply(sectionCount(df), axis=1)
I get the error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
pI am new to pandas and am performing text extraction from a pdf file and am interested in finding out sections in the pdf file
I think need create boolean mask with comparing by 0
chaning by &
(bitwise AND
) with shift
ed values and for count use cumsum
:
np.random.seed(1213)
df = pd.DataFrame({'secFlg':np.random.randint(2, size=20)})
df['SectionIndex'] = ((df['secFlg'] == 0) & (df['secFlg'].shift() == 1)).cumsum()
print (df)
secFlg SectionIndex
0 0 0
1 1 0
2 1 0
3 1 0
4 0 1
5 0 1
6 0 1
7 0 1
8 0 1
9 1 1
10 0 2
11 0 2
12 0 2
13 0 2
14 1 2
15 1 2
16 1 2
17 0 3
18 1 3
19 0 4
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments