我有两个数组,A
并且B
维数相同1000 x 3 x 20 x 20
。我想生成一个第三阵列C
尺寸的3 x 3 x 20 x 20
,这将是对应的切片的矩阵乘法的结果A
和B
,即C(:,:,i,j) = A(:,:,i,j)'*B(:,:,i,j)
。然后,我需要通过反转相应的矩阵(即)将数组C
转换为新数组。同样,很清楚如何使用循环执行此操作。有没有一种方法可以使物品仿制循环?D
3 x 3
D(:,:,i,j) = inv(C(:,:,i,j))
400
编辑:比较不同解决方案性能的基准代码将是-
%// Inputs
n1 = 50;
n2 = 200;
A = rand(n1,3,n2,n2);
B = rand(n1,3,n2,n2);
%// A. CPU loopy code
tic
C = zeros(3,3,n2,n2);
for ii = 1:n2
for jj = 1:n2
C(:,:,ii,jj) = A(:,:,ii,jj)'*B(:,:,ii,jj); %//'
end
end
toc
%// B. Vectorized code (using squeeze)
tic
C1 = squeeze(sum(bsxfun(@times,permute(A,[2 1 5 3 4]),permute(B,[5 1 2 3 4])),2));
toc
%// C. Vectorized code (avoiding squeeze)
tic
C2 = sum(bsxfun(@times,permute(A,[2 5 3 4 1]),permute(B,[5 2 3 4 1])),5);
toc
%// D. GPU vectorized code
tic
A = gpuArray(A);
B = gpuArray(B);
C3 = sum(bsxfun(@times,permute(A,[2 5 3 4 1]),permute(B,[5 2 3 4 1])),5);
C3 = gather(C3);
toc
运行时结果-
Elapsed time is 0.287511 seconds.
Elapsed time is 0.250663 seconds.
Elapsed time is 0.337628 seconds.
Elapsed time is 1.259207 seconds.
代码
%// Part - 1
C = sum(bsxfun(@times,permute(A,[2 5 3 4 1]),permute(B,[5 2 3 4 1])),5);
%// Part - 2: Use MATLAB file-exchange tool multinv
D = multinv(C);
对于第一部分,您还可以尝试以下操作-
C = squeeze(sum(bsxfun(@times,permute(A,[2 1 5 3 4]),permute(B,[5 1 2 3 4])),2));
这似乎是在重新排列元素,而不是上面代码中提到的那样“破坏性”,但是缺点是需要这样做squeeze
可能会使它变慢一点。我会把它留给您,也鼓励您进行基准测试并选择更好的一个。
bsxfun
+ GPU
?我增加了循环限制,因为这可能是对循环代码和矢量化代码之间的真实测试。因此,这是第1部分的修改后的代码-
%// Inputs
n1 = 50;
n2 = 200;
A = rand(n1,3,n2,n2);
B = rand(n1,3,n2,n2);
%// A. CPU loopy code
tic
C = zeros(3,3,n2,n2);
for ii = 1:n2
for jj = 1:n2
C(:,:,ii,jj) = A(:,:,ii,jj)'*B(:,:,ii,jj); %//'
end
end
toc
%// B. GPU vectorized code
tic
A = gpuArray(A);
B = gpuArray(B);
C1 = sum(bsxfun(@times,permute(A,[2 5 3 4 1]),permute(B,[5 2 3 4 1])),5);
C1 = gather(C1);
toc
我的系统的运行时结果是-
Elapsed time is 0.310056 seconds.
Elapsed time is 0.172499 seconds.
所以你看!
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句