MATLAB性能基准测试

用户名

设置:

这篇文章是关于测试以下问题的解决方案的性能的:

S给出了以下划线分隔为数字格式字符串的单元格数组,例如:

        list_of_words = repmat({'02_04_04_52_23_14_54_672_0'},10,1);

保证所有符合以下条件的字符串:

  1. 它们仅包含十进制数字和下划线;
  2. 下划线的数量相同;
  3. 没有两个连续的下划线。

编写MATLAB代码,将字符串的单元格数组转换为双精度的数字矩阵S× (值小1000倍),其中是字符串数,并且字符串中的数字数。MSM

一个非常类似的问题发布在StackOverflow上,并提出了几种解决方案。最初的目标是提高速度性能。

代码:

从速度的角度来看,其中两种解决方案在从一种MATLAB安装到另一种MATLAB安装,甚至从一种运行到另一种运行之间,似乎都有很大的不同。而且,它们的实现方式确实不同。

在黑暗的角落,您会发现一个违反MATLAB最神圣宗旨的解决方案eval是邪恶的,应不惜一切代价避免循环。但是,该代码尝试通过避免重复分配内存,使用将字符串转换为数字的快速方法以及执行就地操作来优化速度:

%'eval_and_loops_solution.m'
function array_of_numbers = eval_and_loops_solution(list_of_words)

        n_numbers = 1 + sum(list_of_words{1}=='_');
        n_words   = numel(list_of_words);

        array_of_numbers = zeros(n_numbers, n_words);

        for k = 1:n_words
                temp = ['[', list_of_words{k}, ']'];
                temp(temp=='_') = ';';
                array_of_numbers(:,k) = eval(temp);
        end;

         %'this is a waste of memory, but is kind of required'
        array_of_numbers = transpose(array_of_numbers / 1000);

end

注意:原始解决方案将结果返回为M×S double数组。该代码适用于S× M; 但是,这种修改占用了两倍的内存。是的,我写了这个解决方案。

在明确的角落,您将找到一个符合MATLAB精神的解决方案,避免使用循环和eval,而只需一次sscanf读取所有字符串(这是避免调用函数S时间的开销的聪明方法):

%'single_sscanf_solution.m'
function array_of_numbers = single_sscanf_solution(list_of_words)

        N = 1 + sum(list_of_words{1}=='_'); %'hard-coded in the original'
        lens = cellfun(@numel,list_of_words);
        tlens = sum(lens);
        idx(1,tlens)=0;
        idx(cumsum(lens(1:end-1))+1)=1;
        idx2 = (1:tlens) + cumsum(idx);

        one_str(1:max(idx2)+1)='_';
        one_str(idx2) = [list_of_words{:}];
        delim = repmat('%d_',1,N*numel(lens));
        array_of_numbers = reshape(sscanf(one_str, delim),N,[])'./1000;

end

注意:此解决方案属于@Divakar

裁判由两个功能组成:一个功能生成测试用例,另一个功能执行timig。

测试用例生成器将单元格数组中的下划线分隔的字符串从1到9999之间的10个随机整数中分组(为简单起见);但是,实现应仅假定数字为正数或零,并且该数字应位于double

%'referee_test_case.m'
function list_of_words = referee_test_case(n_words)

        NUM_PER_WORD  = 10;
        NUM_MAX       = 9999;

        word_format   = [repmat('%d_', 1, NUM_PER_WORD-1), '%d'];
        list_of_words = cell(n_words,1);

        fprintf('Generating %d-words test case...\n', n_words);
        for k = 1:n_words
                list_of_words{k} = sprintf(word_format, randi(NUM_MAX, [1, NUM_PER_WORD]));
        end;

end

计时功能将测试用例和任意数量的处理功能句柄作为参数;它被实现为例如一个功能中的错误不应该影响其他功能的执行。timeit按照@Divakar和@LuisMendo的建议使用对于那些在默认的MATLAB安装中没有此功能的用户,可以从MATLAB Central / File Exchange下载:

%'referee_timing.m'
function referee_timing(test_case, varargin)
        %' Specify the test case as a cell array of strings, then the function '
        %' handles that will process the test case.                            '
        %'                                                                     '
        %' The function uses timeit() to evaluate the performance of different '
        %' processing handles; if your MATLAB installation does not have it    '
        %' natively, download the available version from File Exchange:        '
        %'                                                                     '
        %'     http://www.mathworks.com/matlabcentral/fileexchange/18798-timeit-benchmarking-function '

        fprintf('Timing %d-words test case...\n', numel(test_case));
        for k = 1:numel(varargin)
                try
                        t = timeit(@() varargin{k}(test_case));
                        fprintf('  %s: %f[s]\n', func2str(varargin{k}), t);
                catch ME
                        fprintf('  %s: Error - %s\n', func2str(varargin{k}), ME.message);
                end;
        end;
end

问题:

如前所述,从一次安装到另一次安装,甚至从一次运行到另一次,结果似乎都不同。我提出的问题有三个方面:

  1. 在特定的MATLAB安装上(版本+ OS +硬件),这两种解决方案的性能相较之下如何?
  2. 是否有可能以相当大的方式改进一个解决方案?
  3. 是否有基于完全不同的思想/功能的甚至更快的MATLAB本机解决方案(例如,没有MEX或专用工具箱)?

对于点1(基准),请在MATLAB路径创建四个功能文件eval_and_loops_solution.msingle_sscanf_solution.mreferee_test_case.mreferee_timig.m等功能,你可能想测试(例如,通过回答提出的实现)。将它们用于几个单词,例如以下脚本:

%'test.m'
clc;
feature accel on;  %'tune for speed'

%'low memory stress'
referee_timing( ...
   referee_test_case(10^4), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

%'medium memory stress'
referee_timing( ...
   referee_test_case(10^5), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

%'high memory stress'
referee_timing( ...
   referee_test_case(10^6), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

发布结果时,请指定您的MATLAB版本,操作系统,RAM的大小和CPU模型。请注意,这些测试可能需要花费一些时间来处理大量单词!但是,运行此脚本不会更改当前工作空间的内容。

对于分2或3,你可以发布使用同一个输入/输出约定代码eval_and_loops_solution.m,并single_sscanf_solution.m与标杆支持的声明。

赏金:

我会投票给所有基准测试答案,并鼓励所有人都这样做。对于那些以自己的技能,时间和处理能力做出贡献的人来说,这是最起码的事情。

+500赏金将授予最快的代码编写者,无论是建议的两个代码之一,还是性能优于它们的第三个新代码。我真的希望这将与一般协议相符。奖金将以 从原始发布日期起一周。

迪卡卡

方法1

从广义上讲,第一种方法包括两个部分。第一个为我们提供了一个很大的长字符串,该字符串具有单元格数组中的所有字符,并用下划线将两个单元格分隔开,这实际上是通过实现的cumsum第二个基于的代码为bsxfun我们提供了所需格式的所需数字输出。

代码看起来像这样-

function out = solution_cumsum_bsxfun(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
tlens = sum(lens); %// Total number of characters [total of lengths]
cumlens = cumsum(lens); %// Cumulative lengths

%// Create an array of ones and zeros. 
%// The length of this array would be equal to sum of characters in all cells. 
%// The ones would be at the positions where new cells start, zeros otherwise
startpos_newcell(1,tlens)=0;
startpos_newcell(cumlens(1:end-1)+1)=1;

%// Calculate new positions for all characters in the cell array with 
%// places/positions left between two cells for putting underscores
newpos = (1:tlens) + cumsum(startpos_newcell);

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells
one_str(newpos) = [list_of_words{:}];
one_str(cumlens + [1:numel(cumlens)]') = '_'; %//'#

pos_us = find(one_str=='_'); %// positions of underscores
wordend_idx = pos_us - [1:numel(pos_us)]; %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create mask where digit characters are to be placed
mask = bsxfun(@ge,[1:max_wordlens]',[max_wordlens+1-wordlens]); %//'#

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1(max_wordlens,size(mask,2)) = 0;
num1(mask) = one_str(one_str~='_') - 48;

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
out = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#

return;

方法#2

方法#1略有变化,因为它使用进行了第一部分的工作sprintf现在,在@chappjc的解决方案中看到了这个想法后,我想到了这个想法我感谢他让我在此修改版本中使用它。

这是代码-

function out = solution_sprintf_bsxfun(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells
one_str = sprintf('%s_',list_of_words{:});

pos_us = find(one_str=='_'); %// positions of underscores
wordend_idx = pos_us - [1:numel(pos_us)]; %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create mask where digit characters are to be placed
mask = bsxfun(@ge,[1:max_wordlens]',[max_wordlens+1-wordlens]); %//'#

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1(max_wordlens,size(mask,2)) = 0;
num1(mask) = one_str(one_str~='_') - 48;

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
out = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#

return;

方法#3

这是一种基于以下方面的全新方法getByteStreamFromArray

function out = solution_bytstrm_bsxfun(list_of_words)

allchars = [list_of_words{:}];
digits_array = getByteStreamFromArray(allchars);

%// At my 32-bit system getByteStreamFromArray gets the significant digits 
%// from index 65 onwards. Thus, we need to crop/index it accordingly
digits_array = digits_array(65:65+numel(allchars)-1) - 48;
% --------------------------------------------------------------------
N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
cumlens = cumsum(lens)'; %//'# Cumulative lengths
% ----------------------------------------------------------
pos_us = find(digits_array==47);
starts = [[1 cumlens(1:end-1)+1];reshape(pos_us+1,N-1,[])];
ends = [reshape(pos_us-1,N-1,[]) ; cumlens];
gaps = ends(:) - starts(:) + 1;

maxg = max(gaps);
mask1 = bsxfun(@lt,maxg-gaps',[1:maxg]');

num_array(maxg,numel(lens)*N)=0;
num_array(mask1) = digits_array(digits_array~=47);
out = reshape(10.^(maxg-4:-1:-3)*num_array,N,[]).';

return;

接下来列出上述方法的GPU版本。

方法#4

function out = soln_sprintf_bsxfun_gpu(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells. Now this one uses sprintf as
%// firstly proposed in @chappjc's solution and he has agreed to let me use
%// it, so appreciating his help on this.
digits_array = gpuArray(single(sprintf('%s_',list_of_words{:})));
digits_array = digits_array - 48;

mask_us = digits_array==47; %// mask of underscores
pos_us = find(mask_us);

wordend_idx = pos_us - gpuArray.colon(1,numel(pos_us)); %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1 =  single(zeros(max_wordlens,numel(pos_us),'gpuArray')); %//'
num1(bsxfun(@ge,gpuArray.colon(1,max_wordlens)',...
    max_wordlens+1-single(wordlens))) = digits_array(~mask_us); %//'

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
outg = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#
out = gather(outg);

返回;

方法5

function out = soln_bytstrm_bsxfun_gpu(list_of_words)

us_ascii_num = 95;
allchars = [list_of_words{:}];

%// At my 32-bit system getByteStreamFromArray gets the significant digits 
%// from index 65 onwards. Thus, we need to crop/index it accordingly
alldigits = getByteStreamFromArray(allchars);
digits_array1 = gpuArray(alldigits(65:65+numel(allchars)-1));
% --------------------------------------------------------------------
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
N = sum(digits_array1(1:lens(1))==us_ascii_num)+1; %// Number of "words" per cell
lens = gpuArray(lens);
cumlens = cumsum(lens)'; %//'# Cumulative lengths
% ----------------------------------------------------------
mask_us = digits_array1==us_ascii_num; %// mask of underscores
pos_us = find(mask_us);
starts = [[1 cumlens(1:end-1)+1];reshape(pos_us+1,N-1,[])];
ends = [reshape(pos_us-1,N-1,[]) ; cumlens];
gaps = ends(:) - starts(:) + 1;

maxg = max(gaps);
mask1 = bsxfun(@lt,maxg-gaps',[1:maxg]');

num_array = zeros(maxg,numel(lens)*N,'gpuArray'); %//'
num_array(mask1) = digits_array1(~mask_us)-48;
out = reshape(10.^(maxg-4:-1:-3)*num_array,N,[]).'; %//'
out = gather(out);

return;

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

您将如何基准测试功能的性能?

来自分类Dev

Hyperledger Fabric是否有性能基准测试工具?

来自分类Dev

基准测试RAM性能-UWP和C#

来自分类Dev

如何对Python和Clojure的性能进行基准测试(比较)?

来自分类Dev

在Oracle JDK 8中进行基准测试以进行性能测试

来自分类Dev

尝试基准lambda性能

来自分类Dev

JavaScript引擎性能基准

来自分类Dev

基准XSLT性能

来自分类Dev

基准XSLT性能

来自分类Dev

插入性能基准

来自分类Dev

对Qt5和C ++ 17容器进行性能和内存使用的基准测试

来自分类Dev

如何对两台Windows PC的性能进行基准测试

来自分类Dev

Hive(在Tez上)和Spark之间针对我的特定用例进行的性能基准测试

来自分类Dev

我们如何在硬件平台上对 Qt Wayland 的性能进行基准测试?

来自分类Dev

基准测试结果奇怪

来自分类Dev

访问基准测试结果

来自分类Dev

Maven工件基准测试

来自分类Dev

Memcached的UDP基准测试

来自分类Dev

Redis集群基准测试

来自分类Dev

MySQL基准测试

来自分类Dev

Ubuntu资源基准测试

来自分类Dev

Netlogo 基准测试

来自分类Dev

imageresizer.net性能基准

来自分类Dev

Ruby on Rails的方法基准性能

来自分类Dev

基准测试结果ForkJoin与Disruptor?

来自分类Dev

Ruby中的基准测试方法

来自分类Dev

Docker和OpenStack基准测试

来自分类Dev

FreeBSD中的MySQLSlap基准测试

来自分类Dev

Wrk与Gatling基准测试比较