我有以下数据框:
{'population': {0: '38,928,346', 1: '2,877,797', 2: '43,851,044', 3: '77,265', 4: '32,866,272', 5: '97,929', 6: '45,195,774', 7: '2,963,243', 8: '25,499,884', 9: '9,006,398', 10: '10,139,177', 11: '393,244', 12: '1,701,575', 13: '164,689,383', 14: '287,375', 15: '9,449,323', 16: '11,589,623'}, 'index': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 12, 12: 13, 13: 14, 14: 15, 15: 16, 16: 17}, 'country': {0: 'Afghanistan', 1: 'Albania', 2: 'Algeria', 3: 'Andorra', 4: 'Angola', 5: 'Antigua and Barbuda', 6: 'Argentina', 7: 'Armenia', 8: 'Australia', 9: 'Austria', 10: 'Azerbaijan', 11: 'Bahamas', 12: 'Bahrain', 13: 'Bangladesh', 14: 'Barbados', 15: 'Belarus', 16: 'Belgium'}}
我需要创建一个新的列“ case statement”样式:
countryList=['Albania', 'Angola', 'Australia']
df['country1'] = (df['country'] if [df['country'].isin(countryList)] else 'Other')
新列应仅在countryList中列出这三个国家,或说“其他”。但是,当我运行上面的代码时,它只会复制原始列。这是我在处理数据时经常需要的东西,每当搜索时,我都找不到无法避免的不涉及循环的东西。
我希望使用ISIN函数的单行,简单易懂,简单明了的方法本质上做我在sql case语句中通常要做的事情。
编辑:该链接建议这是指向页面的重复链接,其中单个答案中未使用isin。我在最初的问题上特别询问了如何使用isin来执行此操作,并且如果无法使用isin只会接受其他解决方案。
用途where
:
df['country1'] = df['country'].where(df['country'].isin(countryList), 'Other')
或np.where
:
df['country1'] = np.where(df['country'].isin(countryList), df['country'], 'Other')
输出:
population index country country1
0 38,928,346 1 Afghanistan Other
1 2,877,797 2 Albania Albania
2 43,851,044 3 Algeria Other
3 77,265 4 Andorra Other
4 32,866,272 5 Angola Angola
5 97,929 6 Antigua and Barbuda Other
6 45,195,774 7 Argentina Other
7 2,963,243 8 Armenia Other
8 25,499,884 9 Australia Australia
9 9,006,398 10 Austria Other
10 10,139,177 11 Azerbaijan Other
11 393,244 12 Bahamas Other
12 1,701,575 13 Bahrain Other
13 164,689,383 14 Bangladesh Other
14 287,375 15 Barbados Other
15 9,449,323 16 Belarus Other
16 11,589,623 17 Belgium Other
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句