Split a Pandas Dataframe into multiple smaller dataframes based on empty rows

GreenGodot Published at Dev

GreenGodot

I have a csv file with a format like this:

Header 1, Header 2, Header 3
''          ''        ''
value 1,  value2,   value 3
value 1,  value2,   value 3
value 1,  value2,   value 3
''          ''        ''
value 1,  value 2,   value 3
value 1,  value 2,   value 3
value 1,  value 2,   value 3
 ''          ''        ''

I can read it into a pandas dataframe but the segments surrounded by empty rows (denoted by '') need to be each processed individually. What would be the simplest way to divide them into smaller dataframes based off of them being between empty rows? I have quite a few of these segments to go through.

Would it be easier to divide them into smaller dataframes or would removing the segment from the original dataframe after processing it be even easier?

EDIT:

IanS's answer was correct but in my case some of my files had simply no quotes in empty rows so the type was not a string. I modified his answer a little and this worked for them:

df['counter'] = (df['Header 1'].isnull()).cumsum()
df = df[df['Header 1'].isnull() == False]  # remove empty rows
df.groupby('counter').apply(lambda df: df.iloc[0])

IanS

The simplest would be to add a counter that increments each time it encounters an empty row. You can then get your individual dataframes via groupby.

df['counter'] = (df['Header1'] == "''").cumsum()
df = df[df['Header1'] != "''"]  # remove empty rows
df.groupby('counter').apply(lambda df: df.iloc[0])

The last line applies your processing function to each dataframe separately (I just put a dummy example).

Note that the exact condition testing for empty rows (here df['Header1'] == "''") should be adapted to your exact situation.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-28

Comments

0 comments

From Dev

Related Related

Article

Split a Pandas Dataframe into multiple smaller dataframes based on empty rows

Split a Pandas Dataframe into multiple smaller dataframes based on empty rows

Pandas - Split dataframe into multiple dataframes based on dates?

Split pandas dataframe into multiple dataframes with equal numbers of rows

Split pandas dataframe into multiple dataframes based on null columns

Split pandas dataframe into multiple dataframes based on null columns

Split a Pandas Dataframe into multiple Dataframes based on Triangular Number Series

Pandas: Split data frame based on empty rows

Split one dataframe to multiple sub-dataframes based on common columns in Pandas

Split cell into multiple rows in pandas dataframe

PANDAS split dataframe to multiple by unique values rows

Split Python Dataframe into multiple Dataframes (where chosen rows are the same)

Split a pandas dataframe into two dataframes efficiently based on some condition

Modify pandas dataframe in python based on multiple rows

Pandas Split Dataframe into two Dataframes

Separating a dataframe into multiple dataframes based on the index value in pandas

pandas: fill multiple empty dataframes

pandas: fill multiple empty dataframes

Split a column containing a list into multiple rows in Pandas based on a condition

How to loop through Pandas DataFrame and split a string into multiple rows

Merge Multiple Duplicate rows based on multiple columns in Pandas.Dataframe

Selecting rows from a Dataframe based on values in multiple columns in pandas

Drop rows on multiple conditions (based on 2 column) in pandas dataframe

selecting rows based on multiple column values in pandas dataframe

Deleting DataFrame rows in Pandas based on column value - multiple values to remove

Select rows from a DataFrame based on multiple values in a column in pandas

Performing calculation based off multiple rows in Pandas dataframe

Pandas DataFrame, how to calculate a new column element based on multiple rows

How to split dataframe into multiple dataframes by column index

Split rows into multiple rows based on quantity

Split pandas dataframe based on groupby