Iterate over rows in a pandas dataframe and apply a lambda function

debugcn Published at Dev

Venkatesh

I have a pandas dataframe that has multiple columns, of which I am interested in a specific column that has a series of (1, or 0). The logic that I want to perform is:

If (the current row is 1 and the next row is 0):
    count = count + 1
else :
    pass
df['NewCol'] = count

so, this is what I tried:

secCnt = 0 
def sectionCount(data):
    global secCnt
    if( (data[['secFlg']] == 0) and (data[['secFlg'].shift(-1)] == 1) ):
        secCnt = secCnt + 1 
    else:
        pass
    return secCnt


if __name__ == "__main__":
    df['SectionIndex'] = df.apply(sectionCount(df), axis=1)

I get the error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

pI am new to pandas and am performing text extraction from a pdf file and am interested in finding out sections in the pdf file

jezrael

I think need create boolean mask with comparing by 0 chaning by & (bitwise AND) with shifted values and for count use cumsum:

np.random.seed(1213)

df = pd.DataFrame({'secFlg':np.random.randint(2, size=20)})

df['SectionIndex'] = ((df['secFlg'] == 0) & (df['secFlg'].shift() == 1)).cumsum()
print (df)
    secFlg  SectionIndex
0        0             0
1        1             0
2        1             0
3        1             0
4        0             1
5        0             1
6        0             1
7        0             1
8        0             1
9        1             1
10       0             2
11       0             2
12       0             2
13       0             2
14       1             2
15       1             2
16       1             2
17       0             3
18       1             3
19       0             4

Collected from the Internet

Please contact [email protected] to delete if infringement.