Add new column based on boolean values in a different column

Marius Butuc

I'm trying to add a new column to a DataFrame based on the boolean values in another column.

Given a DataFrame like this:

snr = DataFrame({ 'name': ['A', 'B', 'C', 'D', 'E'],  'seniority': [False, False, False, True, False] })

The furthest I've come so far is this:

def refine_seniority(contact):
    contact['refined_seniority'] = 'Senior' if contact['seniority'] else 'Non-Senior'

snr.apply(refine_seniority)

yet I'm getting this error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-208-0694ebf79a50> in <module>()
      2     contact['refined_seniority'] = 'Senior' if contact['seniority'] else 'Non-Senior'
      3 
----> 4 snr.apply(refine_seniority)
      5 
      6 snr

/usr/lib/python2.7/dist-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds )
   4414                     return self._apply_raw(f, axis)
   4415                 else:
-> 4416                     return self._apply_standard(f, axis)
   4417             else:
   4418                 return self._apply_broadcast(f, axis)

/usr/lib/python2.7/dist-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures)
   4489                     # no k defined yet
   4490                     pass
-> 4491                 raise e
   4492 
   4493 

KeyError: ('seniority', u'occurred at index name')

Feels like I'm missing some fundamental understanding on DataFrames, but I'm stuck.

What's the proper way to add a new column based on boolean values in a different column?

EdChum

You can create a dict and call map:

In [176]:

temp = {True:'senior', False:'Non-senior'}
snr['refined_seniority'] = snr['seniority'].map(temp)
snr
Out[176]:
  name seniority refined_seniority
0    A     False        Non-senior
1    B     False        Non-senior
2    C     False        Non-senior
3    D      True            senior
4    E     False        Non-senior

As user @Jeff has pointed out using map or apply should be a last resort if a vectorised solution can be applied.

Or use numpy where

In [178]:

snr['refined_seniority'] = np.where(snr['seniority'] == True, 'senior', 'Non-senior')
snr
Out[178]:
  name seniority refined_seniority
0    A     False        Non-senior
1    B     False        Non-senior
2    C     False        Non-senior
3    D      True            senior
4    E     False        Non-senior

If you modifed your function to this then it would work:

In [187]:

def refine_seniority(contact):
    if contact == True:
        return 'senior'
    else:
        return 'Non-senior'

snr['refined_seniority'] = snr['seniority'].apply(refine_seniority)
snr
Out[187]:
  name seniority refined_seniority
0    A     False        Non-senior
1    B     False        Non-senior
2    C     False        Non-senior
3    D      True            senior
4    E     False        Non-senior

What you wrote is incorrect, you are calling apply on the df but the column as a label does not exist, see below:

In [193]:

def refine_seniority(contact):
    print(contact)


snr['refined_seniority'] = snr.apply(refine_seniority)

0    A
1    B
2    C
3    D
4    E
Name: name, dtype: object
0    False
1    False
2    False
3     True
4    False
Name: seniority, dtype: object

Here you can see that it outputs 2 pandas series, there is no key value for 'seniority' hence the error.

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정
0

몇 마디 만하겠습니다

0리뷰
로그인참여 후 검토

관련 기사

분류에서Dev

What is wrong with this Numpy/Pandas code to construct new boolean column based on the values in two other boolean columns?

분류에서Dev

How to sum the values in a column based on another column or different group?

분류에서Dev

Filling in column based on column values

분류에서Dev

How to make a new column based on difference of max values by index?

분류에서Dev

Add a new calculated column from 2 values in RDD

분류에서Dev

Add column to Data Frame based on values of other columns

분류에서Dev

Add column with accumulative count of unique values in a column

분류에서Dev

Create new columns containing the rows of an existing column based on values in those rows

분류에서Dev

Create a new column value based on another column content from a list

분류에서Dev

Create new dataframe column based on other column in R

분류에서Dev

Splitting one Pandas column on values from a different column in the same row?

분류에서Dev

Populate new column with values from database MySQL

분류에서Dev

Excel - sum values based on a column that match another column in another table

분류에서Dev

Split/Expand Dataframe based on column values

분류에서Dev

Building a column of values based on a formula in R

분류에서Dev

Building a column of values based on a formula in R

분류에서Dev

selecting a row based on a number of column values in SQLite

분류에서Dev

splitting file based on values in specific column

분류에서Dev

Return the occurance of a value in a column based on other values

분류에서Dev

How to fetch all column values based on email?

분류에서Dev

MySQL query to get values based on same column

분류에서Dev

Mysql split the column values based on the joining table

분류에서Dev

how to add output as a new column with the file names

분류에서Dev

Pandas - New column based on the value of another column N rows back, when N is stored in a column

분류에서Dev

Update the values of a dictionary checking if there are new values on a column of a Pandas Dataframe

분류에서Dev

Summing values from a column based on match in another column and first distinct occurrence of value in a third column

분류에서Dev

Multiply column values in SQL, then add all the values - Laravel 4

분류에서Dev

pandas: populate an empty column in a dataframe with values from multiple dataframes based on similar values in one column

분류에서Dev

Conditionally choose data from different columns based of data in another column

Related 관련 기사

  1. 1

    What is wrong with this Numpy/Pandas code to construct new boolean column based on the values in two other boolean columns?

  2. 2

    How to sum the values in a column based on another column or different group?

  3. 3

    Filling in column based on column values

  4. 4

    How to make a new column based on difference of max values by index?

  5. 5

    Add a new calculated column from 2 values in RDD

  6. 6

    Add column to Data Frame based on values of other columns

  7. 7

    Add column with accumulative count of unique values in a column

  8. 8

    Create new columns containing the rows of an existing column based on values in those rows

  9. 9

    Create a new column value based on another column content from a list

  10. 10

    Create new dataframe column based on other column in R

  11. 11

    Splitting one Pandas column on values from a different column in the same row?

  12. 12

    Populate new column with values from database MySQL

  13. 13

    Excel - sum values based on a column that match another column in another table

  14. 14

    Split/Expand Dataframe based on column values

  15. 15

    Building a column of values based on a formula in R

  16. 16

    Building a column of values based on a formula in R

  17. 17

    selecting a row based on a number of column values in SQLite

  18. 18

    splitting file based on values in specific column

  19. 19

    Return the occurance of a value in a column based on other values

  20. 20

    How to fetch all column values based on email?

  21. 21

    MySQL query to get values based on same column

  22. 22

    Mysql split the column values based on the joining table

  23. 23

    how to add output as a new column with the file names

  24. 24

    Pandas - New column based on the value of another column N rows back, when N is stored in a column

  25. 25

    Update the values of a dictionary checking if there are new values on a column of a Pandas Dataframe

  26. 26

    Summing values from a column based on match in another column and first distinct occurrence of value in a third column

  27. 27

    Multiply column values in SQL, then add all the values - Laravel 4

  28. 28

    pandas: populate an empty column in a dataframe with values from multiple dataframes based on similar values in one column

  29. 29

    Conditionally choose data from different columns based of data in another column

뜨겁다태그

보관