Pandas: Split a dataframe rows and re-arrange column values

Abhishek

I have a DataFrame :

import pandas as pd
df = pd.DataFrame({'Board': ['A', 'B'], 'Off': ['C', 'D'], 'Stops': ['Q/W/E', 'Z'], 'Pax': [10, 100]})

which looks like:

    Board   Off  Pax    Stops
0   A       C    10     Q/W/E
1   B       D    100    Z

I want to have a DataFrame split by Stops column and re-arranged as Board and Off in rows with Pax value being duplicated as follows;

    Board   Off  Pax
0   A       Q    10
1   Q       W    10
2   W       E    10
3   E       C    10
4   B       Z    100
5   Z       D    100

Any help regarding this would be much appreciated.

jezrael
from itertools import islice
#https://stackoverflow.com/a/6822773/2901002
#added a for return Pax
def window(a, seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield (*result, a)
    for elem in it:
        result = result[1:] + (elem,)
        yield (*result, a)

#join all columns 
a = df['Board'] + '/' + df['Stops'] + '/' + df['Off']
#split to lists
a = a.str.split('/')

#apply window function to each value and flatten
L = [(j,k,l) for x, y in zip(a, df['Pax']) for j,k,l in list(window(y, x, 2)) ]
print (L)
[('A', 'Q', 10), ('Q', 'W', 10), ('W', 'E', 10), 
 ('E', 'C', 10), ('B', 'Z', 100), ('Z', 'D', 100)]

#DataFrame constructor
df = pd.DataFrame(L, columns=['Board','Off','Pax'])
print (df)
  Board Off  Pax
0     A   Q   10
1     Q   W   10
2     W   E   10
3     E   C   10
4     B   Z  100
5     Z   D  100

Timings:

import pandas as pd
N = 1000
L1 = list('ABCDEFGHIJKLMNOP')
L = ['Q/W/E','Q1/W1/E1','Z','A/B/C/D']
df = pd.DataFrame({'Board': np.random.choice(L1, N), 
                    'Off': np.random.choice(L1, N), 
                    'Stops': np.random.choice(L, N), 
                    'Pax': np.random.randint(100, size=N)})
print (df)

def bharath(df):
    df['New'] = (df['Board']+'/'+df['Stops']+'/'+df['Off']).str.split('/')
    temp = df['New'].apply(lambda x : list(zip(x,x[1:])))
    di = {0 : 'Board',1:'Off'}
    return pd.concat([pd.DataFrame(i,index=np.repeat(j,len(i))) for (i,j) in zip(temp,df['Pax'].values)]).reset_index().rename(columns=di)


def jez(df):
    from itertools import islice
    #https://stackoverflow.com/a/6822773/2901002
    #added a for return Pax
    def window(a, seq, n=2):
        "Returns a sliding window (of width n) over data from the iterable"
        "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
        it = iter(seq)
        result = tuple(islice(it, n))
        if len(result) == n:
            yield (*result, a)
        for elem in it:
            result = result[1:] + (elem,)
            yield (*result, a)

    a = df['Board'] + '/' + df['Stops'] + '/' + df['Off']
    a = a.str.split('/')
    L = [(j,k,l) for x, y in zip(a, df['Pax']) for j,k,l in list(window(y, x, 2)) ]
    return pd.DataFrame(L, columns=['Board','Off','Pax'])

def wen(df):
    df['New']=df[['Board','Stops','Off']].apply(lambda x : '/'.join(x),1)
    df['New2']= df['New'].str.split('/').apply(lambda x : list(zip(x[:-1],x[1:])))
    namedict = {0 : 'Board',1:'Off'}
    return df[['Pax','New2']].set_index('Pax').New2.apply(pd.Series).stack().apply(pd.Series).reset_index().drop('level_1',1).rename(columns=namedict)



print (jez(df))
#print (bharath(df))
print (wen(df))

In [433]: %timeit (jez(df))
100 loops, best of 3: 6.6 ms per loop

In [434]: %timeit (wen(df))
1 loop, best of 3: 747 ms per loop

In [450]: %timeit (bharath(df))
1 loop, best of 3: 406 ms per loop

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Python: Pandas dataframe re-arrange rows based on last three digits of Integer in Column

From Dev

Re-arrange rows on dataframe

From Dev

pandas DataFrame split the column and extend the rows

From Dev

PANDAS split dataframe to multiple by unique values rows

From Dev

Filter pandas dataframe rows by multiple column values

From Dev

DataFrame Split On Rows and apply on header one column using Python Pandas

From Dev

Group rows in dataframe by assigning values as a column in pandas dataframe

From Dev

Split Pandas Dataframe into separate pieces based on column values

From Dev

How to split pandas dataframe based on difference of values in a column

From Dev

Split a text(with names and values) column into multiple columns in Pandas DataFrame

From Dev

Split Pandas Dataframe into separate pieces based on column values

From Dev

How to split pandas dataframe based on difference of values in a column

From Dev

How do I take rows in Pandas Dataframe and transform into values for a Column?

From Dev

How to print rows if values appear in any column of pandas dataframe

From Dev

How to find rows with column values having a particular datatype in a Pandas DATAFRAME

From Java

Select rows of pandas dataframe based on column values with duplicates

From Dev

selecting rows based on multiple column values in pandas dataframe

From Dev

Deleting DataFrame rows in Pandas based on column value - multiple values to remove

From Dev

Select CONSECUTIVE rows from a DataFrame based on values in a column in Pandas with Groupby

From Dev

Select rows from a DataFrame based on multiple values in a column in pandas

From Dev

Delete rows if there are null values in a specific column in Pandas dataframe

From Dev

Python/Pandas: Drop duplicate rows in dataframe, concatenate values in one column

From Dev

Select rows from a DataFrame based on last characters of values in a column in pandas

From Dev

R re-arrange dataframe: some rows to columns

From Dev

R re-arrange dataframe: some rows to columns

From Dev

Pandas: split dataframe on values of ID column and write to csv, generate filenames from unique values in column

From Dev

Split pandas dataframe by column variable

From Java

Pandas split DataFrame by column value

From Dev

How to split dataframe or reorder dataframe by rows in pandas

Related Related

  1. 1

    Python: Pandas dataframe re-arrange rows based on last three digits of Integer in Column

  2. 2

    Re-arrange rows on dataframe

  3. 3

    pandas DataFrame split the column and extend the rows

  4. 4

    PANDAS split dataframe to multiple by unique values rows

  5. 5

    Filter pandas dataframe rows by multiple column values

  6. 6

    DataFrame Split On Rows and apply on header one column using Python Pandas

  7. 7

    Group rows in dataframe by assigning values as a column in pandas dataframe

  8. 8

    Split Pandas Dataframe into separate pieces based on column values

  9. 9

    How to split pandas dataframe based on difference of values in a column

  10. 10

    Split a text(with names and values) column into multiple columns in Pandas DataFrame

  11. 11

    Split Pandas Dataframe into separate pieces based on column values

  12. 12

    How to split pandas dataframe based on difference of values in a column

  13. 13

    How do I take rows in Pandas Dataframe and transform into values for a Column?

  14. 14

    How to print rows if values appear in any column of pandas dataframe

  15. 15

    How to find rows with column values having a particular datatype in a Pandas DATAFRAME

  16. 16

    Select rows of pandas dataframe based on column values with duplicates

  17. 17

    selecting rows based on multiple column values in pandas dataframe

  18. 18

    Deleting DataFrame rows in Pandas based on column value - multiple values to remove

  19. 19

    Select CONSECUTIVE rows from a DataFrame based on values in a column in Pandas with Groupby

  20. 20

    Select rows from a DataFrame based on multiple values in a column in pandas

  21. 21

    Delete rows if there are null values in a specific column in Pandas dataframe

  22. 22

    Python/Pandas: Drop duplicate rows in dataframe, concatenate values in one column

  23. 23

    Select rows from a DataFrame based on last characters of values in a column in pandas

  24. 24

    R re-arrange dataframe: some rows to columns

  25. 25

    R re-arrange dataframe: some rows to columns

  26. 26

    Pandas: split dataframe on values of ID column and write to csv, generate filenames from unique values in column

  27. 27

    Split pandas dataframe by column variable

  28. 28

    Pandas split DataFrame by column value

  29. 29

    How to split dataframe or reorder dataframe by rows in pandas

HotTag

Archive