朱莉娅（Julia）：跨观测张量广播成对距离计算

debugcn 发表于 Dev

康纳

我正在尝试使用DistancesJulia中的包来执行距离矩阵的广播计算。

我了解如何N x N为某些矩阵X（带有维D x N）计算单个距离矩阵，其中每列X[:,i]存储一个D维特征向量以进行观察i。该代码将是：

using Distances

dist_matrix = pairwise(Euclidean(), X, dims = 2)

dist_matrix包含每对D维列dist_matrix[m,n]之间的欧几里得距离，例如存储X[:,m]和之间的欧几里得距离X[:,n]。

现在想象一下，我的数组X实际上是-维观测值的整个张量或'体积' D，因此X[:,i,j]存储了j我的D x N观测值的-th'切片' 。因此，整个数组的X尺寸为D x N x T，其中T为切片数。

因此，我想计算距离矩阵的张量或“体积”，以便dist_matrix具有尺寸N x N x T。

有没有一种方法可以通过pairwise()在Julia中广播函数来单行执行此操作？最快的方法是什么？下面显示了带有基本for循环的想法：

using Distances

dist_matrix_tensor = zeros(N,N,T);

for t = 1:T
        dist_matrix_tensor[:,:,t] = pairwise(Euclidean(), X[:,:,t], dims = 2)
end

编辑：我想出了如何使用进行此操作mapslices，但仍不确定这是否是最佳方法。

using Distances

dist_function(x)  = pairwise(Euclidean(), x, dims = 2) # define a function that gets the N x N distance matrix for a single 'slice'

dist_matrix_tensor = mapslices(dist_function, X, dims = [1,2]) # map your matrix-operating function across the slices of the main tensor X

当然，这也可以并行化，因为X的每个“切片”在此计算中都是独立的，因此我基本上只是在寻找实现此目的的最快方法。我总体上也对您如何通过广播具体做到这一点感兴趣。

弗雷德里克·巴格（Fredrik Bagge）

mapslices如果的维度X较大，则您的解决方案的性能会比较合理。以下是JuliennedArrays的示例，它对于small而言X，速度稍快一些，但mapslices与两个第一个维度的大小为100时具有相同的性能。

using Distances, JuliennedArrays, BenchmarkTools

dist_function(x)  = pairwise(Euclidean(), x, dims = 2) # define a function that gets the N x N distance matrix for a single 'slice'

X = randn(10,10,20);
dist_matrix_tensor = @btime mapslices(dist_function, X, dims = [1,2]); # 61.172 μs (198 allocations: 42.28 KiB)
dist_matrix_tensor2 = @btime map(dist_function, Slices(X, 1, 2)); # 41.529 μs (62 allocations: 21.67 KiB)

但是请注意，JuliennedArrays返回一个VectorofMatrix数组，而不是一个三维数组。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。