我有以下数据框,称为pop_df:
pop_df
Out[33]:
2014 2015 2016 2017 2018 2019
Geo
AL 2892394.0 2885796.0 2875592.0 2876591.0 2870324.0 2862427.0
AL0 2892394.0 2885796.0 2875592.0 2876591.0 2870324.0 2862427.0
AL01 844921.0 836448.0 830981.0 826904.0 819793.0 813758.0
AL011 134332.0 131054.0 129056.0 125579.0 120978.0 118948.0
AL012 276058.0 277989.0 280205.0 284823.0 289626.0 290126.0
... ... ... ... ... ...
UKN12 142028.0 142756.0 143363.0 143746.0 144105.0 144367.0
UKN13 139774.0 140222.0 140752.0 141368.0 141994.0 142565.0
UKN14 137722.0 139426.0 140691.0 141917.0 143286.0 144771.0
UKN15 136332.0 136904.0 137492.0 138000.0 138441.0 138948.0
UKN16 114696.0 115171.0 115581.0 116057.0 116612.0 117051.0
[2034 rows x 6 columns]
我还列出了欧盟国家代码:
EuropeanUnion=["BE","BG","CZ","DK","DE","EE","IE","EL","ES","FR","HR","IT","CY","LV","LT","LU","HU","MT","NL","AT","PL","PT","RO","SI","SK","FI","SE"]
我想过滤数据框以仅保留由列表中国家(地区)代码开头的行。我尝试了一下,pop_df.index.str.startswith
但无法正常工作。谢谢您能提供的任何帮助。提前致谢!
我建议您使用切片索引(作为字符串),最多两个首字母,并使用应用于国家代码变量的pandas .isin方法作为布尔掩码:
eu_countries_filtered = pop_df[pop_df.index.str[:2].isin(EuropeanUnion)]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句