Why is the first cudaMalloc the only bottleneck?

user3910910

I defined this function :

void cuda_entering_function(...)
{
    StructA *host_input, *dev_input;
    StructB *host_output, *dev_output;

    host_input = (StructA*)malloc(sizeof(StructA));
    host_output = (StructB*)malloc(sizeof(StructB));
    cudaMalloc(&dev_input, sizeof(StructA));
    cudaMalloc(&dev_output, sizeof(StructB));

    ... some more other cudaMalloc()s and cudaMemcpy()s ...

    cudaKernel<< ... >>(dev_input, dev_output);

    ...
}

This function is called several times (about 5 ~ 15 times) throughout my program, and I measured this program's performance using gettimeofday().

Then I found that the bottleneck of cuda_entering_function() is the first cudaMalloc() - the very first cudaMalloc() throughout my whole program. Over 95% of the total execution time of cuda_entering_function() was consumed by the first cudaMalloc(), and this also happens when I changed the size of first cudaMalloc()'s allocating memory or when I changed the executing order of cudaMalloc()s.

What is the reason and is there any way to reduce the first cuda allocating time?

Etienne Pellegrini

The first cudaMalloc is responsible for the initialization of the device too, because it's the first call to any function involving the device. This is why you take such a hit: it's overhead due to the use of CUDA and your GPU. You should make sure that your application can gain a sufficient speedup to compensate for the overhead.

In general, people use a call to an initialization function in order to setup their device. In this answer, you can see that apparently a call to cudaFree(0) is the canonical way to do so. This sample shows the use of cudaSetDevice, which could be a good habit if you ever work on machines with several CUDA-ready devices.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Why this only works for the first div?

From Dev

Reverse Proxy: Why response dispatch is not a bottleneck?

From Dev

Why is only the first window.open working?

From Java

Why are double quotes shown only for the first element?

From Dev

Why is bash only appending first element to array

From Dev

In guake, why only the first tab is transparent?

From Dev

Why is strtok printing only first word?

From Dev

Why .html() and .text() selects only the first word?

From Dev

why only first letter is returned by match function?

From Dev

why select * from return only the first field?

From Dev

Why autofocus property works only on first trial?

From Dev

Why is only the first word in records being matched?

From Dev

why it only "see's" the first row?

From Dev

Why my listview only detects the first checkbox?

From Dev

In guake, why only the first tab is transparent?

From Dev

Why is only my first HTTP request running?

From Dev

why mouseover only works for the first row

From Dev

Why is only the first line of the condition executed?

From Dev

Why JQuery function works for only the first textbox

From Dev

Why does --text="$@" only pass the first word?

From Dev

Why only the First Element is being added to the div

From Dev

Why is my for only working on the first variable?

From Dev

Why is it only the first CSS Selector working?

From Dev

Why does only the first interrupt work?

From Dev

Why is strtok printing only first word?

From Dev

Why autofocus property works only on first trial?

From Dev

Why setBackground works only the first time? (JPanel)

From Dev

Why is this program only printing out the first line?

From Dev

Why is my for loop only grabbing first element?

Related Related

HotTag

Archive