MATLAB性能基准测试

user2271770 发表于 Dev

用户名

设置：

这篇文章是关于测试以下问题的解决方案的性能的：

S给出了以下划线分隔为数字格式的字符串的单元格数组，例如：

list_of_words = repmat({'02_04_04_52_23_14_54_672_0'},10,1);

保证所有符合以下条件的字符串：

它们仅包含十进制数字和下划线；

下划线的数量相同；

没有两个连续的下划线。

编写MATLAB代码，将字符串的单元格数组转换为双精度的数字矩阵S× （值小1000倍），其中是字符串数，并且是字符串中的数字数。MSM

一个非常类似的问题发布在StackOverflow上，并提出了几种解决方案。最初的目标是提高速度性能。

代码：

从速度的角度来看，其中两种解决方案在从一种MATLAB安装到另一种MATLAB安装，甚至从一种运行到另一种运行之间，似乎都有很大的不同。而且，它们的实现方式确实不同。

在黑暗的角落，您会发现一个违反MATLAB最神圣宗旨的解决方案：eval是邪恶的，应不惜一切代价避免循环。但是，该代码尝试通过避免重复分配内存，使用将字符串转换为数字的快速方法以及执行就地操作来优化速度：

%'eval_and_loops_solution.m'
function array_of_numbers = eval_and_loops_solution(list_of_words)

        n_numbers = 1 + sum(list_of_words{1}=='_');
        n_words   = numel(list_of_words);

        array_of_numbers = zeros(n_numbers, n_words);

        for k = 1:n_words
                temp = ['[', list_of_words{k}, ']'];
                temp(temp=='_') = ';';
                array_of_numbers(:,k) = eval(temp);
        end;

         %'this is a waste of memory, but is kind of required'
        array_of_numbers = transpose(array_of_numbers / 1000);

end

注意：原始解决方案将结果返回为M×S double数组。该代码适用于S× M; 但是，这种修改占用了两倍的内存。是的，我写了这个解决方案。

在明确的角落，您将找到一个符合MATLAB精神的解决方案，避免使用循环和eval，而只需一次sscanf读取所有字符串（这是避免调用函数S时间的开销的聪明方法）：

%'single_sscanf_solution.m'
function array_of_numbers = single_sscanf_solution(list_of_words)

        N = 1 + sum(list_of_words{1}=='_'); %'hard-coded in the original'
        lens = cellfun(@numel,list_of_words);
        tlens = sum(lens);
        idx(1,tlens)=0;
        idx(cumsum(lens(1:end-1))+1)=1;
        idx2 = (1:tlens) + cumsum(idx);

        one_str(1:max(idx2)+1)='_';
        one_str(idx2) = [list_of_words{:}];
        delim = repmat('%d_',1,N*numel(lens));
        array_of_numbers = reshape(sscanf(one_str, delim),N,[])'./1000;

end

注意：此解决方案属于@Divakar。

裁判由两个功能组成：一个功能生成测试用例，另一个功能执行timig。

测试用例生成器将单元格数组中的下划线分隔的字符串从1到9999之间的10个随机整数中分组（为简单起见）；但是，实现应仅假定数字为正数或零，并且该数字应位于double：

%'referee_test_case.m'
function list_of_words = referee_test_case(n_words)

        NUM_PER_WORD  = 10;
        NUM_MAX       = 9999;

        word_format   = [repmat('%d_', 1, NUM_PER_WORD-1), '%d'];
        list_of_words = cell(n_words,1);

        fprintf('Generating %d-words test case...\n', n_words);
        for k = 1:n_words
                list_of_words{k} = sprintf(word_format, randi(NUM_MAX, [1, NUM_PER_WORD]));
        end;

end

计时功能将测试用例和任意数量的处理功能句柄作为参数；它被实现为例如一个功能中的错误不应该影响其他功能的执行。它timeit按照@Divakar和@LuisMendo的建议使用；对于那些在默认的MATLAB安装中没有此功能的用户，可以从MATLAB Central / File Exchange下载：

%'referee_timing.m'
function referee_timing(test_case, varargin)
        %' Specify the test case as a cell array of strings, then the function '
        %' handles that will process the test case.                            '
        %'                                                                     '
        %' The function uses timeit() to evaluate the performance of different '
        %' processing handles; if your MATLAB installation does not have it    '
        %' natively, download the available version from File Exchange:        '
        %'                                                                     '
        %'     http://www.mathworks.com/matlabcentral/fileexchange/18798-timeit-benchmarking-function '

        fprintf('Timing %d-words test case...\n', numel(test_case));
        for k = 1:numel(varargin)
                try
                        t = timeit(@() varargin{k}(test_case));
                        fprintf('  %s: %f[s]\n', func2str(varargin{k}), t);
                catch ME
                        fprintf('  %s: Error - %s\n', func2str(varargin{k}), ME.message);
                end;
        end;
end

问题：

如前所述，从一次安装到另一次安装，甚至从一次运行到另一次，结果似乎都不同。我提出的问题有三个方面：

在特定的MATLAB安装上（版本+ OS +硬件），这两种解决方案的性能相较之下如何？
是否有可能以相当大的方式改进一个解决方案？
是否有基于完全不同的思想/功能的甚至更快的MATLAB本机解决方案（例如，没有MEX或专用工具箱）？

对于点1（基准），请在MATLAB路径创建四个功能文件eval_and_loops_solution.m，single_sscanf_solution.m，referee_test_case.m，referee_timig.m等功能，你可能想测试（例如，通过回答提出的实现）。将它们用于几个单词，例如以下脚本：

%'test.m'
clc;
feature accel on;  %'tune for speed'

%'low memory stress'
referee_timing( ...
   referee_test_case(10^4), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

%'medium memory stress'
referee_timing( ...
   referee_test_case(10^5), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

%'high memory stress'
referee_timing( ...
   referee_test_case(10^6), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

发布结果时，请指定您的MATLAB版本，操作系统，RAM的大小和CPU模型。请注意，这些测试可能需要花费一些时间来处理大量单词！但是，运行此脚本不会更改当前工作空间的内容。

对于分2或3，你可以发布使用同一个输入/输出约定代码eval_and_loops_solution.m，并single_sscanf_solution.m与标杆支持的声明。

赏金：

我会投票给所有基准测试答案，并鼓励所有人都这样做。对于那些以自己的技能，时间和处理能力做出贡献的人来说，这是最起码的事情。

+500赏金将授予最快的代码编写者，无论是建议的两个代码之一，还是性能优于它们的第三个新代码。我真的希望这将与一般协议相符。奖金将以月从原始发布日期起一周。

迪卡卡

方法1

从广义上讲，第一种方法包括两个部分。第一个为我们提供了一个很大的长字符串，该字符串具有单元格数组中的所有字符，并用下划线将两个单元格分隔开，这实际上是通过实现的cumsum。第二个基于的代码为bsxfun我们提供了所需格式的所需数字输出。

代码看起来像这样-

function out = solution_cumsum_bsxfun(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
tlens = sum(lens); %// Total number of characters [total of lengths]
cumlens = cumsum(lens); %// Cumulative lengths

%// Create an array of ones and zeros. 
%// The length of this array would be equal to sum of characters in all cells. 
%// The ones would be at the positions where new cells start, zeros otherwise
startpos_newcell(1,tlens)=0;
startpos_newcell(cumlens(1:end-1)+1)=1;

%// Calculate new positions for all characters in the cell array with 
%// places/positions left between two cells for putting underscores
newpos = (1:tlens) + cumsum(startpos_newcell);

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells
one_str(newpos) = [list_of_words{:}];
one_str(cumlens + [1:numel(cumlens)]') = '_'; %//'#

pos_us = find(one_str=='_'); %// positions of underscores
wordend_idx = pos_us - [1:numel(pos_us)]; %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create mask where digit characters are to be placed
mask = bsxfun(@ge,[1:max_wordlens]',[max_wordlens+1-wordlens]); %//'#

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1(max_wordlens,size(mask,2)) = 0;
num1(mask) = one_str(one_str~='_') - 48;

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
out = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#

return;

方法＃2

方法＃1略有变化，因为它使用进行了第一部分的工作sprintf。现在，在@chappjc的解决方案中看到了这个想法后，我想到了这个想法。我感谢他让我在此修改版本中使用它。

这是代码-

function out = solution_sprintf_bsxfun(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells
one_str = sprintf('%s_',list_of_words{:});

pos_us = find(one_str=='_'); %// positions of underscores
wordend_idx = pos_us - [1:numel(pos_us)]; %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create mask where digit characters are to be placed
mask = bsxfun(@ge,[1:max_wordlens]',[max_wordlens+1-wordlens]); %//'#

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1(max_wordlens,size(mask,2)) = 0;
num1(mask) = one_str(one_str~='_') - 48;

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
out = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#

return;

方法＃3

这是一种基于以下方面的全新方法getByteStreamFromArray：

function out = solution_bytstrm_bsxfun(list_of_words)

allchars = [list_of_words{:}];
digits_array = getByteStreamFromArray(allchars);

%// At my 32-bit system getByteStreamFromArray gets the significant digits 
%// from index 65 onwards. Thus, we need to crop/index it accordingly
digits_array = digits_array(65:65+numel(allchars)-1) - 48;
% --------------------------------------------------------------------
N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
cumlens = cumsum(lens)'; %//'# Cumulative lengths
% ----------------------------------------------------------
pos_us = find(digits_array==47);
starts = [[1 cumlens(1:end-1)+1];reshape(pos_us+1,N-1,[])];
ends = [reshape(pos_us-1,N-1,[]) ; cumlens];
gaps = ends(:) - starts(:) + 1;

maxg = max(gaps);
mask1 = bsxfun(@lt,maxg-gaps',[1:maxg]');

num_array(maxg,numel(lens)*N)=0;
num_array(mask1) = digits_array(digits_array~=47);
out = reshape(10.^(maxg-4:-1:-3)*num_array,N,[]).';

return;

接下来列出上述方法的GPU版本。

方法＃4

function out = soln_sprintf_bsxfun_gpu(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells. Now this one uses sprintf as
%// firstly proposed in @chappjc's solution and he has agreed to let me use
%// it, so appreciating his help on this.
digits_array = gpuArray(single(sprintf('%s_',list_of_words{:})));
digits_array = digits_array - 48;

mask_us = digits_array==47; %// mask of underscores
pos_us = find(mask_us);

wordend_idx = pos_us - gpuArray.colon(1,numel(pos_us)); %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1 =  single(zeros(max_wordlens,numel(pos_us),'gpuArray')); %//'
num1(bsxfun(@ge,gpuArray.colon(1,max_wordlens)',...
    max_wordlens+1-single(wordlens))) = digits_array(~mask_us); %//'

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
outg = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#
out = gather(outg);

返回;

方法5

function out = soln_bytstrm_bsxfun_gpu(list_of_words)

us_ascii_num = 95;
allchars = [list_of_words{:}];

%// At my 32-bit system getByteStreamFromArray gets the significant digits 
%// from index 65 onwards. Thus, we need to crop/index it accordingly
alldigits = getByteStreamFromArray(allchars);
digits_array1 = gpuArray(alldigits(65:65+numel(allchars)-1));
% --------------------------------------------------------------------
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
N = sum(digits_array1(1:lens(1))==us_ascii_num)+1; %// Number of "words" per cell
lens = gpuArray(lens);
cumlens = cumsum(lens)'; %//'# Cumulative lengths
% ----------------------------------------------------------
mask_us = digits_array1==us_ascii_num; %// mask of underscores
pos_us = find(mask_us);
starts = [[1 cumlens(1:end-1)+1];reshape(pos_us+1,N-1,[])];
ends = [reshape(pos_us-1,N-1,[]) ; cumlens];
gaps = ends(:) - starts(:) + 1;

maxg = max(gaps);
mask1 = bsxfun(@lt,maxg-gaps',[1:maxg]');

num_array = zeros(maxg,numel(lens)*N,'gpuArray'); %//'
num_array(mask1) = digits_array1(~mask_us)-48;
out = reshape(10.^(maxg-4:-1:-3)*num_array,N,[]).'; %//'
out = gather(out);

return;

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-02-14

我来说两句

0条评论

登录后参与评论

上一篇：Apache精简了“ Authorization”标头

来自分类Dev

Related 相关文章

文章

MATLAB性能基准测试

MATLAB性能基准测试

设置：

代码：

问题：

赏金：

您将如何基准测试功能的性能？

Hyperledger Fabric是否有性能基准测试工具？

基准测试RAM性能-UWP和C＃

如何对Python和Clojure的性能进行基准测试（比较）？

在Oracle JDK 8中进行基准测试以进行性能测试

尝试基准lambda性能

JavaScript引擎性能基准

基准XSLT性能

基准XSLT性能

插入性能基准

对Qt5和C ++ 17容器进行性能和内存使用的基准测试

如何对两台Windows PC的性能进行基准测试

Hive（在Tez上）和Spark之间针对我的特定用例进行的性能基准测试

我们如何在硬件平台上对 Qt Wayland 的性能进行基准测试？

基准测试结果奇怪

访问基准测试结果

Maven工件基准测试

Memcached的UDP基准测试

Redis集群基准测试

MySQL基准测试

Ubuntu资源基准测试

Netlogo 基准测试

imageresizer.net性能基准

Ruby on Rails的方法基准性能

基准测试结果ForkJoin与Disruptor？

Ruby中的基准测试方法

Docker和OpenStack基准测试

FreeBSD中的MySQLSlap基准测试

Wrk与Gatling基准测试比较