Pandas dataframe - identify rows with value over threshold in any column

Selah

I have a word similarity matrix stored as a pandas data-frame where the columns are a "seed set" of ~400 words and the row indexes are a large dictionary of ~50,000 words. The value at any row/column is the similarity from 0 to 1 between the two words.

>>> df_sim_mf.info()
<class 'pandas.core.frame.DataFrame'>
Index: 46265 entries, #angry to wonga
Columns: 451 entries, abandon to wrongs
dtypes: float64(451)
memory usage: 159.5+ MB
>>> df_sim_mf.sample(10).sample(5, axis = 1)
              nationality    purest     unite   lawless      riot
assaulted        0.114270 -0.140504  0.182024  0.434651  0.510618
peekaboo        -0.008734 -0.027742  0.051084  0.260245  0.201117
antibiotic       0.145310  0.270748 -0.126459 -0.083965  0.043086
killin          -0.102474  0.123550  0.055935 -0.115381  0.285997
warrior          0.005229  0.281967  0.261230  0.344130  0.359228
actionscript    -0.029405  0.077793  0.114047 -0.052599 -0.123401
controversy      0.336688  0.271007  0.373474  0.362565  0.305548
nic              0.164550 -0.159097  0.080056  0.271184  0.231357
healy            0.072831  0.102996  0.286538  0.335697  0.183730
uncovered        0.061310  0.274003  0.328383  0.300315  0.277491

I'm trying to find all words from my large dictionary that are within a certain similarity range from ANY of my "seed set". That is, I'd like to select every row that contains at least one value over 0.75.

Can I do this with a few simple pandas commands?

ldirer

You could do:

df.loc[(df > 0.75).sum(axis=1) > 0, :]

and get the index attribute if you just want the words.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Drop rows with a 'question mark' value in any column in a pandas dataframe

From Dev

Apply a threshold on a Pandas DataFrame column

From Dev

Repeat rows in a pandas DataFrame based on column value

From Dev

How to print rows if values appear in any column of pandas dataframe

From Dev

Filter pandas dataframe based on a column: keep all rows if a value is that column

From Dev

Select rows in pandas dataframe for which value in the column is in XY0001-XY0879 where X and Y can be any digit

From Dev

Python Pandas Identify Duplicated rows with Additional Column

From Java

Dropping rows from pandas dataframe based on value in column(s)

From Java

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

From Dev

Drop rows if value in a specific column is not an integer in pandas dataframe

From Dev

Deleting pandas dataframe rows if value in given column not contained in a list

From Dev

Adding rows that have the same column value in a pandas dataframe

From Dev

Deleting DataFrame rows in Pandas based on column value - multiple values to remove

From Dev

pandas dataframe: how to aggregate a subset of rows based on value of a column

From Dev

How to replicate rows based on value of a column in same pandas dataframe

From Dev

Pandas dataframe remove rows based on index and column value

From Dev

Comparing groups of rows in Pandas Dataframe that share a column value

From Dev

pandas dataframe place rows with same column value together

From Dev

Iterate over column in dataframe (Pandas)

From Dev

Return top n rows based on threshold from pandas dataframe

From Dev

How to drop rows by threshold of index column's occur frequence in Pandas

From Dev

Pandas drop rows in one dataframe that share a common value with a rows in a column of another dataframe

From Dev

Move files that have a value above a threshold in any row of a specific column

From Dev

Pandas DataFrame compare columns to a threshold column using where()

From Dev

Filter pandas dataframe rows if any value on a list inside the dataframe is in another list

From Java

How to iterate over rows in a DataFrame in Pandas

From Dev

Iterate over rows and expand pandas dataframe

From Dev

List Comprehension Over Pandas Dataframe Rows

From Dev

Pandas udf loop over PySpark dataframe rows

Related Related

  1. 1

    Drop rows with a 'question mark' value in any column in a pandas dataframe

  2. 2

    Apply a threshold on a Pandas DataFrame column

  3. 3

    Repeat rows in a pandas DataFrame based on column value

  4. 4

    How to print rows if values appear in any column of pandas dataframe

  5. 5

    Filter pandas dataframe based on a column: keep all rows if a value is that column

  6. 6

    Select rows in pandas dataframe for which value in the column is in XY0001-XY0879 where X and Y can be any digit

  7. 7

    Python Pandas Identify Duplicated rows with Additional Column

  8. 8

    Dropping rows from pandas dataframe based on value in column(s)

  9. 9

    How to drop rows of Pandas DataFrame whose value in a certain column is NaN

  10. 10

    Drop rows if value in a specific column is not an integer in pandas dataframe

  11. 11

    Deleting pandas dataframe rows if value in given column not contained in a list

  12. 12

    Adding rows that have the same column value in a pandas dataframe

  13. 13

    Deleting DataFrame rows in Pandas based on column value - multiple values to remove

  14. 14

    pandas dataframe: how to aggregate a subset of rows based on value of a column

  15. 15

    How to replicate rows based on value of a column in same pandas dataframe

  16. 16

    Pandas dataframe remove rows based on index and column value

  17. 17

    Comparing groups of rows in Pandas Dataframe that share a column value

  18. 18

    pandas dataframe place rows with same column value together

  19. 19

    Iterate over column in dataframe (Pandas)

  20. 20

    Return top n rows based on threshold from pandas dataframe

  21. 21

    How to drop rows by threshold of index column's occur frequence in Pandas

  22. 22

    Pandas drop rows in one dataframe that share a common value with a rows in a column of another dataframe

  23. 23

    Move files that have a value above a threshold in any row of a specific column

  24. 24

    Pandas DataFrame compare columns to a threshold column using where()

  25. 25

    Filter pandas dataframe rows if any value on a list inside the dataframe is in another list

  26. 26

    How to iterate over rows in a DataFrame in Pandas

  27. 27

    Iterate over rows and expand pandas dataframe

  28. 28

    List Comprehension Over Pandas Dataframe Rows

  29. 29

    Pandas udf loop over PySpark dataframe rows

HotTag

Archive