Getting the three smallest values per row and returning the correspondent column names

Hong

I have two dataframe, df and df2,they are correspondent. Now based in the first dataframe df, I want to get the 3 smallest value in one row and return the correspondent column's name(in this case like "X"or"Y"or"Z"or"T"). So I can get the new dataframe df3.

df = pd.DataFrame({
        'X': [21, 2, 43, 44, 56, 67, 7, 38, 29, 130],
        'Y': [101, 220, 330, 140, 250, 10, 207, 320, 420, 50],
        'Z': [20, 128, 136, 144, 312, 10, 82, 63, 42, 12],
        'T': [2, 32, 4, 424, 256, 167, 27, 38, 229, 30]
    }, index=list('ABCDEFGHIJ'))

df2 = pd.DataFrame({
        'X': [0.5, 0.12,0.43, 0.424, 0.65,0.867,0.17,0.938,0.229,0.113],
        'Y': [0.1,2.201,0.33,0.140,0.525,0.31,0.20,0.32,0.420,0.650],
        'Z': [0.20,0.128,0.136,0.2144,0.5312,0.61,0.82,0.363,0.542,0.512],
        'T':[0.52, 0.232,0.34, 0.6424, 0.6256,0.3167,0.527,0.38,0.4229,0.73]
    },index=list('ABCDEFGHIJ'))

Besides that, I want to get another dataframe df4 which is correspondent from df3 in df2 which means in df row['A'] (2,20,21) is the 3 smallest value, so in df4 row['A'], I want to get (0.52,0.2,0.5) from df2.

jezrael

You can use if both DataFrames has same columns names in same order argsort for indices:

arr = df.values.argsort(1)[:,:3]
print (arr)
[[0 3 1]
 [1 0 3]
 [0 1 3]
 [1 2 3]
 [1 2 0]
 [2 3 1]
 [1 0 3]
 [0 1 3]
 [1 3 0]
 [3 0 2]]

#get values by indices in arr 
b = df2.values[np.arange(len(arr))[:,None], arr]
print (b)
[[ 0.52    0.2     0.5   ]
 [ 0.12    0.232   0.128 ]
 [ 0.34    0.43    0.136 ]
 [ 0.424   0.14    0.2144]
 [ 0.65    0.525   0.6256]
 [ 0.31    0.61    0.867 ]
 [ 0.17    0.527   0.82  ]
 [ 0.38    0.938   0.363 ]
 [ 0.229   0.542   0.4229]
 [ 0.512   0.73    0.65  ]]

Last use DataFrame constructors:

df3 = pd.DataFrame(df.columns[arr])
df3.columns = ['Col{}'.format(x+1) for x in df3.columns]
print (df3)
  Col1 Col2 Col3
0    T    Z    X
1    X    T    Z
2    T    X    Z
3    X    Y    Z
4    X    Y    T
5    Y    Z    X
6    X    T    Z
7    T    X    Z
8    X    Z    T
9    Z    T    Y

df4 = pd.DataFrame(b)
df4.columns = ['Col{}'.format(x+1) for x in df4.columns]
print (df4)
    Col1   Col2    Col3
0  0.520  0.200  0.5000
1  0.120  0.232  0.1280
2  0.340  0.430  0.1360
3  0.424  0.140  0.2144
4  0.650  0.525  0.6256
5  0.310  0.610  0.8670
6  0.170  0.527  0.8200
7  0.380  0.938  0.3630
8  0.229  0.542  0.4229
9  0.512  0.730  0.6500

Answers are similar, so I create timings:

np.random.seed(14)
N = 1000000
df1 = pd.DataFrame(np.random.randint(100, size=(N, 4)), columns=['X','Y','Z','T'])
#print (df1)

df1 = pd.DataFrame(np.random.rand(N, 4), columns=['X','Y','Z','T'])
#print (df1)


def jez():
    arr = df.values.argsort(1)[:,:3]
    b = df2.values[np.arange(len(arr))[:,None], arr]
    df3 = pd.DataFrame(df.columns[arr])
    df3.columns = ['Col{}'.format(x+1) for x in df3.columns]
    df4 = pd.DataFrame(b)
    df4.columns = ['Col{}'.format(x+1) for x in df4.columns]


def pir():
    v = df.values
    a = v.argpartition(3, 1)[:, :3]
    c = df.columns.values[a]
    pd.DataFrame(c, df.index)
    d = df2.values[np.arange(len(df))[:, None], a]
    pd.DataFrame(d, df.index, [1, 2, 3]).add_prefix('Col')

def cᴏʟᴅsᴘᴇᴇᴅ():
    #another solution is wrong
    df3 = df.apply(lambda x: df.columns[np.argsort(x)], 1).iloc[:, :3]
    pd.DataFrame({'Col{}'.format(i + 1) : df2.lookup(df3.index, df3.iloc[:, i]) for i in range(df3.shape[1])}, index=df.index)


print (jez())
print (pir())
print (cᴏʟᴅsᴘᴇᴇᴅ())

In [176]: %timeit (jez())
1000 loops, best of 3: 412 µs per loop

In [177]: %timeit (pir())
1000 loops, best of 3: 425 µs per loop

In [178]: %timeit (cᴏʟᴅsᴘᴇᴇᴅ())
100 loops, best of 3: 3.99 ms per loop

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

R - How to find three smallest values per row in a data frame?

From Dev

Print row values as lists iteratively without getting the column names

From Mysql

SQL: How to display data with row values as column names and then count number of times ID per row value?

From Dev

Shuffle column values per row

From Dev

Stackoverflow: efficiently getting order of smallest to largest in dataframe per row

From Dev

Calculate mean for only three values per row

From Dev

Returning a table of column names by row identity

From Dev

Getting the row with the smallest column value if another column is the same in SQL Server

From Python

For each row return the column names of the smallest value - pandas

From Dev

row values to column and columns names to values bigquery

From Dev

UPDATE SET column with different values per row

From Dev

Convert row values to column names in SQL Server

From Dev

Add values for matching column and row names

From Dev

Match Row Values with Column Names using R

From Dev

Get row and column names of particular values in dataframe

From Dev

How to group row names by values of a column

From Dev

convert row values into column names in r

From Dev

covert column values into row names in sql server

From Dev

Mysql dynamic row values as column names

From Dev

Pivot row values matching a pattern into column names

From Dev

Create dataframe with values, but changing column and row names

From Dev

PySpark DataFrame Converting Row values into column names

From Dev

Replace the Row Values with Matching row values with column names

From Dev

Select all non-null column names per row in spark

From Dev

Returning column names if there is match in the row, looking for multiple matches

From Dev

grab values and column names based on row values (multiple values in cell)

From Dev

Match row names and column names to values in another data frame

From Dev

Change values of matrix where row names equal column names

From Dev

Adding row names as keys and column names as values from pandas dataframe

Related Related

  1. 1

    R - How to find three smallest values per row in a data frame?

  2. 2

    Print row values as lists iteratively without getting the column names

  3. 3

    SQL: How to display data with row values as column names and then count number of times ID per row value?

  4. 4

    Shuffle column values per row

  5. 5

    Stackoverflow: efficiently getting order of smallest to largest in dataframe per row

  6. 6

    Calculate mean for only three values per row

  7. 7

    Returning a table of column names by row identity

  8. 8

    Getting the row with the smallest column value if another column is the same in SQL Server

  9. 9

    For each row return the column names of the smallest value - pandas

  10. 10

    row values to column and columns names to values bigquery

  11. 11

    UPDATE SET column with different values per row

  12. 12

    Convert row values to column names in SQL Server

  13. 13

    Add values for matching column and row names

  14. 14

    Match Row Values with Column Names using R

  15. 15

    Get row and column names of particular values in dataframe

  16. 16

    How to group row names by values of a column

  17. 17

    convert row values into column names in r

  18. 18

    covert column values into row names in sql server

  19. 19

    Mysql dynamic row values as column names

  20. 20

    Pivot row values matching a pattern into column names

  21. 21

    Create dataframe with values, but changing column and row names

  22. 22

    PySpark DataFrame Converting Row values into column names

  23. 23

    Replace the Row Values with Matching row values with column names

  24. 24

    Select all non-null column names per row in spark

  25. 25

    Returning column names if there is match in the row, looking for multiple matches

  26. 26

    grab values and column names based on row values (multiple values in cell)

  27. 27

    Match row names and column names to values in another data frame

  28. 28

    Change values of matrix where row names equal column names

  29. 29

    Adding row names as keys and column names as values from pandas dataframe

HotTag

Archive