在数组中查找唯一值的最快方法

m_power 发表于 Dev

m_power

我试图找到一种最快的方法来查找数组中的唯一值，并删除0唯一值的可能性。

现在，我有两种解决方案：

result1 = setxor(0, dataArray(1:end,1)); % This gives the correct solution
result2 = unique(dataArray(1:end,1)); % This solution is faster but doesn't give the same result as result1

dataArray 等效于：

dataArray = [0 0; 0 2; 0 4; 0 6; 1 0; 1 2; 1 4; 1 6; 2 0; 2 2; 2 4; 2 6]; % This is a small array, but in my case there are usually over 10 000 lines.

因此，在这种情况下，result1等于[1; 2]和result2等于[0; 1; 2]。该unique功能更快，但我不想0被考虑。有没有办法做到这一点，unique而不是将其0视为唯一值？还有另一种选择吗？

编辑

我想安排各种解决方案的时间。

clc
dataArray = floor(10*rand(10e3,10));
dataArray(mod(dataArray(:,1),3)==0)=0;
% Initial
tic
for ii = 1:10000
   FCT1 = setxor(0, dataArray(:,1));
end
toc
% My solution
tic
for ii = 1:10000
   FCT2 = unique(dataArray(dataArray(:,1)>0,1));
end
toc
% Pursuit solution
tic
for ii = 1:10000
   FCT3 = unique(dataArray(:, 1));
   FCT3(FCT3==0) = [];
end
toc
% Pursuit solution with chappjc comment
tic
for ii = 1:10000
   FCT32 = unique(dataArray(:, 1));
   FCT32 = FCT32(FCT32~=0);
end
toc
% chappjc solution
tic
for ii = 1:10000
   FCT4 = setdiff(unique(dataArray(:,1)),0);
end
toc
% chappjc 2nd solution
tic
for ii = 1:10000
   FCT5 = find(accumarray(dataArray(:,1)+1,1))-1;
   FCT5 = FCT5(FCT5>0);
end
toc

结果：

Elapsed time is 5.153571 seconds. % FCT1 Initial
Elapsed time is 3.837637 seconds. % FCT2 My solution
Elapsed time is 3.464652 seconds. % FCT3 Pursuit solution
Elapsed time is 3.414338 seconds. % FCT32 Pursuit solution with chappjc comment
Elapsed time is 4.097164 seconds. % FCT4 chappjc solution
Elapsed time is 0.936623 seconds. % FCT5 chappjc 2nd solution

然而，该解决方案sparse，并accumarray只适用于integer。这些解决方案不适用于double。

查普

这是一个奇怪的建议accumarray，已使用Floris的测试数据进行了演示：

a = floor(10*rand(100000, 1)); a(mod(a,3)==0)=0;
result = find(accumarray(nonzeros(a(:,1))+1,1))-1;

感谢Luis Mendo指出nonzeros，不必执行result = result(result>0)！

请注意，此解决方案需要整数值的数据（不一定是整数数据类型，而没有小数部分）。像这样比较浮点数是否相等unique是很危险的。看到这里和这里。

原建议：结合unique使用setdiff：

result = setdiff(unique(a(:,1)),0)

或在逻辑索引之后删除unique：

result = unique(a(:,1));
result = result(result>0);

我通常不喜欢[]像（result(result==0)=[];）中那样分配，因为它对于大型数据集效率非常低。

之后删除零unique应该更快，因为它处理较少的数据（除非每个元素都是唯一的，否则，如果a/dataArray非常短）。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-02-5

我来说两句

0条评论

登录后参与评论

上一篇：如何使用mongoengine连接mongodb副本集？

来自分类Dev