I have a csv file with a format like this:
Header 1, Header 2, Header 3
'' '' ''
value 1, value2, value 3
value 1, value2, value 3
value 1, value2, value 3
'' '' ''
value 1, value 2, value 3
value 1, value 2, value 3
value 1, value 2, value 3
'' '' ''
I can read it into a pandas dataframe but the segments surrounded by empty rows (denoted by ''
) need to be each processed individually. What would be the simplest way to divide them into smaller dataframes based off of them being between empty rows? I have quite a few of these segments to go through.
Would it be easier to divide them into smaller dataframes or would removing the segment from the original dataframe after processing it be even easier?
EDIT:
IanS's answer was correct but in my case some of my files had simply no quotes in empty rows so the type was not a string. I modified his answer a little and this worked for them:
df['counter'] = (df['Header 1'].isnull()).cumsum()
df = df[df['Header 1'].isnull() == False] # remove empty rows
df.groupby('counter').apply(lambda df: df.iloc[0])
The simplest would be to add a counter that increments each time it encounters an empty row. You can then get your individual dataframes via groupby
.
df['counter'] = (df['Header1'] == "''").cumsum()
df = df[df['Header1'] != "''"] # remove empty rows
df.groupby('counter').apply(lambda df: df.iloc[0])
The last line applies your processing function to each dataframe separately (I just put a dummy example).
Note that the exact condition testing for empty rows (here df['Header1'] == "''"
) should be adapted to your exact situation.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments