我正在处理一些代码,需要将熊猫数据框映射到由复合键和一些值组成的字典中。以下是一个初始示例,key
由组成,(PostalCode, Sex)(Name, Age)
而则value
是与匹配的sum
所有salary
的key
。我正在寻找一种优雅地进行此映射的方法。
import pandas as pd
data = [
["tom", 22, "ab 11", "M", 5555],
["Rob", 22, "ab 13", "M", 9999],
["nick", 33, "ab 14", "M", 3333],
["nick", 33, "ab 14", "M", 8888],
["juli", 18, "ab 15", "F", 2222],
]
people = pd.DataFrame(data, columns=["Name", "Age", "PostalCode", "Sex", "Salary"])
df = people.groupby(["PostalCode", "Sex", "Age"])["Salary"].sum().unstack(0)
d = {col: df[col].dropna().to_dict() for col in df}
print(d)
# Expected output
print(
{
(("ab 11", "M"), ("tom", 22)): 5555,
(("ab 13", "M"), ("Rob", 22)): 9999,
(("ab 14", "M"), ("nick", 33)): 12221,
(("ab 15", "F"), ("juli", 18)): 2222,
}
)
首先聚合sum
,然后MultiIndex
使用解包变量的键来更改字典理解中的值格式a,b,c,d
:
s = people.groupby(["PostalCode", "Sex","Name", "Age"])["Salary"].sum()
print (s)
PostalCode Sex Name Age
ab 11 M tom 22 5555
ab 13 M Rob 22 9999
ab 14 M nick 33 12221
ab 15 F juli 18 2222
Name: Salary, dtype: int64
d= {((a,b), (c,d)): v for (a,b,c,d), v in s.items()}
print(d)
{(('ab 11', 'M'), ('tom', 22)): 5555,
(('ab 13', 'M'), ('Rob', 22)): 9999,
(('ab 14', 'M'), ('nick', 33)): 12221,
(('ab 15', 'F'), ('juli', 18)): 2222}
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句