我需要在matplotlib中创建一个有点不寻常的条形图,并且标准功能似乎无法提供我所需要的。
我已经聚类了一些文档,并希望显示每个聚类中5个最重要的关键字。第一个问题是我每个群集有一组,由5个独立的条组成。第二个问题是,这些单个条形的标签很重要,各组之间的标签不一样,而且也不唯一。
我有一个临时的原型,看起来像这样:
我只是按正确的顺序绘制了所有单独的条形图,并用空条目将它们分开。最大的问题(除了难看)是,识别群集的唯一方法是对组进行计数。如果可以通过颜色或其他方式识别群集,将会很有帮助,但是我不知道该怎么做。
编辑:这是一些要求的玩具数据以及用于生成我已经拥有的情节的代码。
玩具数据:
数组中包含以下两个熊猫数据框。这两个代码块包含的结果df_list[i].to_csv()
。希望对您有所帮助,但对于此问题,实际数据并不重要,因此您也可以只创建自己的数据框。
,features,score
0,knowledg,0.09862235117497174
1,manag,0.07812351138840486
2,innov,0.06502084705448799
3,organ,0.0561819290497529
4,km,0.05580332888282127
和
,features,score
0,knowledg,0.04217018718591911
1,develop,0.03423580137595049
2,manag,0.032239226503136
3,system,0.031064303713788467
4,sustain,0.029628875636649198
码:
当前解决方案的方法是将所有单个数据帧组合为一个数据帧,在必要时添加空条目,然后绘制结果。
def plot_all_clusters_words(dfs):
# target structure: word as non unique column, value as other non unique column
df_dict_list = []
for df in dfs:
for index, row in df.iterrows():
df_dict_list.append({"word": row.features, "value": row.score})
df_dict_list.append({"word": "", "value": 0})
df_dict_list = df_dict_list[:-1]
new_df = pd.DataFrame(df_dict_list)
new_df.plot.bar(x="word")
plt.show()
return new_df
注意:
我只需要一种容易识别组的方法,如果您知道与我上面建议的方法不同的方法,请随时这样做。
调用plt.bar
每个具有自己的标签和颜色的数据框,将创建以下图:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from io import StringIO
df1_str = '''features,score
0,knowledg,0.09862235117497174
1,manag,0.07812351138840486
2,innov,0.06502084705448799
3,organ,0.0561819290497529
4,km,0.05580332888282127'''
df2_str = '''features,score
0,knowledg,0.04217018718591911
1,develop,0.03423580137595049
2,manag,0.032239226503136
3,system,0.031064303713788467
4,sustain,0.029628875636649198'''
df1 = pd.read_csv(StringIO(df1_str))
df2 = pd.read_csv(StringIO(df2_str))
dfs = [df1, df2]
cluster_names = [f'cluster {i}' for i in range(1, len(dfs) + 1)]
colors = plt.cm.rainbow(np.linspace(0, 1, len(dfs)))
bar_width = 0.8 # width of individual bars
cluster_gap = 0.2 # extra distance between clusters
starts = np.append(0, np.array([len(df) + cluster_gap for df in dfs]).cumsum())
all_tickpos = [s + np.arange(len(df)) for df, s in zip(dfs, starts)]
for df, name, color, tickpos in zip(dfs, cluster_names, colors, all_tickpos):
plt.bar(tickpos, df['score'], width=bar_width, color=color, label=name)
plt.xticks(np.concatenate(all_tickpos), [f for df in dfs for f in df['features']], rotation=90)
plt.legend()
plt.tight_layout()
plt.show()
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句