通常当我做树状图和热图时,我使用距离矩阵并做很多SciPy
事情。我想尝试Seaborn
但是Seaborn
想要我的数据为矩形(行=样本,cols =属性,而不是距离矩阵)?
我本质上想seaborn
用作后端来计算我的树状图并将其附加到我的热图上。这可能吗?如果不是这样,将来是否可以将其作为功能。
也许我可以调整一些参数,以便可以使用距离矩阵而不是矩形矩阵?
用法如下:
seaborn.clustermap¶
seaborn.clustermap(data, pivot_kws=None, method='average', metric='euclidean',
z_score=None, standard_scale=None, figsize=None, cbar_kws=None, row_cluster=True,
col_cluster=True, row_linkage=None, col_linkage=None, row_colors=None,
col_colors=None, mask=None, **kwargs)
我的代码如下:
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target
DF = pd.DataFrame(X, index = ["iris_%d" % (i) for i in range(X.shape[0])], columns = iris.feature_names)
我认为下面的方法不正确,因为我给了它一个预先计算的距离矩阵,而不是它要求的矩形数据矩阵。没有关于如何使用相关性/距离矩阵的示例,clustermap
但是有https://stanford.edu/~mwaskom/software/seaborn/examples/network_correlations.html的示例,但是排序不是与普通sns.heatmap
函数一起进行的。
DF_corr = DF.T.corr()
DF_dism = 1 - DF_corr
sns.clustermap(DF_dism)
您可以将预先计算的距离矩阵作为链接传递给clustermap()
:
import pandas as pd, seaborn as sns
import scipy.spatial as sp, scipy.cluster.hierarchy as hc
from sklearn.datasets import load_iris
sns.set(font="monospace")
iris = load_iris()
X, y = iris.data, iris.target
DF = pd.DataFrame(X, index = ["iris_%d" % (i) for i in range(X.shape[0])], columns = iris.feature_names)
DF_corr = DF.T.corr()
DF_dism = 1 - DF_corr # distance matrix
linkage = hc.linkage(sp.distance.squareform(DF_dism), method='average')
sns.clustermap(DF_dism, row_linkage=linkage, col_linkage=linkage)
For clustermap(distance_matrix)
(i.e., without linkage passed), the linkage is calculated internally based on pairwise distances of the rows and columns in the distance matrix (see note below for full details) instead of using the elements of the distance matrix directly (the correct solution). As a result, the output is somewhat different from the one in the question:
注意:如果norow_linkage
传递给clustermap()
,则通过将每行视为一个“点”(观察)并计算这些点之间的成对距离来内部确定行链接。因此行树状图反映了行相似性。与相似col_linkage
,其中每列均视为一个点。此说明可能应该添加到docs中。在此修改文档的第一个示例,以使内部链接计算明确:
import seaborn as sns; sns.set()
import scipy.spatial as sp, scipy.cluster.hierarchy as hc
flights = sns.load_dataset("flights")
flights = flights.pivot("month", "year", "passengers")
row_linkage, col_linkage = (hc.linkage(sp.distance.pdist(x), method='average')
for x in (flights.values, flights.values.T))
g = sns.clustermap(flights, row_linkage=row_linkage, col_linkage=col_linkage)
# note: this produces the same plot as "sns.clustermap(flights)", where
# clustermap() calculates the row and column linkages internally
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句