我有以下堆积直方图的代码,当FIELD
为数字时,它可以正常工作。然而,当我把FIELD_str
那个,而不是1,2,3,...有abc1
,abc2
,abc3
,等,那么它失败,出现错误TypeError: cannot concatenate 'str' and 'float' objects
。如何用它们的字符串值替换(直接或间接)X轴中的数字(为了更好地读取图表,这是必需的):
filter = df["CLUSTER"] == 1
plt.ylabel("Absolute frequency")
plt.hist([df["FIELD"][filter],df["FIELD"][~filter]],stacked=True,
color=['#8A2BE2', '#EE3B3B'], label=['1','0'])
plt.legend()
plt.show()
数据集:
s_field1 = pd.Series(["5","5","5","8","8","9","10"])
s_field1_str = pd.Series(["abc1","abc1","abc1","abc2","abc2","abc3","abc4"])
s_cluster = pd.Series(["1","1","0","1","0","1","0"])
df = pd.concat([s_field1, s_field1_str, s_cluster], axis=1)
df
编辑:
我试图创建字典,但无法弄清楚如何将其放在直方图中:
# since python 2.7
import collections
yes = collections.Counter(df["FIELD_str"][filter])
no = collections.Counter(df["FIELD_str"][~filter])
您可能必须使用barplot而不是直方图,因为直方图的定义是针对数字(间隔)标度而不是名义(分类)标度的数据。您可以尝试以下方法:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
s_field1 = pd.Series(["5","5","5","8","8","9","10"])
s_field1_str = pd.Series(["abc1","abc1","abc1","abc2","abc2","abc3","abc4"])
s_cluster = pd.Series(["1","1","0","1","0","1","0"])
df = pd.concat([s_field1, s_field1_str, s_cluster], axis=1)
df.columns = ['FIELD', 'FIELD_str', 'CLUSTER']
counts = df.groupby(['FIELD_str', 'CLUSTER']).count().unstack()
# calculate counts by CLUSTER and FIELD_str
counts.columns = counts.columns.get_level_values(1)
counts.index.name = 'xaxis label here'
ax = counts.plot.bar(stacked=True, title='Some title here')
ax.set_ylabel("yaxis label here")
plt.tight_layout()
plt.savefig("stacked_barplot.png")
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句