nvcc compiler not optimizing

Michael

Why is the compiler not doing some trivial optimizations that can be done in the kernel? I have the following code for matrix multiplication:

__global__ void matrixMultiply(float * A, float * B, float * C,
        int numARows, int numAColumns,
        int numBRows, int numBColumns,
        int numCRows, int numCColumns) {

    int n=numAColumns;
    int Row=blockIdx.x*blockDim.x+threadIdx.x;
    int Col=blockIdx.y*blockDim.y+threadIdx.y;
    if((Row<numCRows) && (Col<numCColumns)){
        for(int k=0;k<n;++k){
                    C[Row*numCColumns+Col]+=
                        A[Row*numAColumns+k]*B[k*numBColumns+Col];
        }
    }   
}

The example would go much faster if I use a temporal register Cvalue to store the sum:

__global__ void matrixMultiply(float * A, float * B, float * C,
        int numARows, int numAColumns,
        int numBRows, int numBColumns,
        int numCRows, int numCColumns) {

    int n=numAColumns;
    int Row=blockIdx.x*blockDim.x+threadIdx.x;
    int Col=blockIdx.y*blockDim.y+threadIdx.y;
    if((Row<numCRows) && (Col<numCColumns)){
        float Cvalue=0;
        for(int k=0;k<n;++k){
            Cvalue+=A[Row*numAColumns+k]*B[k*numBColumns+Col];
        }
        C[Row*numCColumns+Col]=Cvalue;
    }   
}

In the last case, global memory for C is accessed only once whereas in the first case it is accessed many times in the loop. Isn't this kind of optimizations normally done by compilers? Both codes have a difference of about 30% in performance in my tests, I'm doing nvcc -O3 ...

Michael

Since C is not declared as __restrict__ the compiler cannot know whether C is the same matrix as A or B, so it cannot perform the optimization mentioned. When I instead used float* __restrict__ C, the times for both became practically the same. Thanks Chris Dodd.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

nvcc compiler not optimizing

From Dev

How to use nvcc as compiler in ns3

From Dev

How to use nvcc as compiler in ns3

From Dev

Why isn't compiler optimizing away this code

From Dev

Prevent compiler from optimizing logic away

From Dev

Microsoft C++ Optimizing Compiler crashing constantly

From Dev

Is the C# compiler optimizing nullable types?

From Dev

Why is Java compiler not optimizing a trivial method?

From Dev

Prevent compiler from optimizing logic away

From Dev

Using CMakes CHECK_CXX_COMPILER_FLAG with nvcc/cuda

From Dev

Why doesn't OpenCL Nvidia compiler (nvcc) use the registers twice?

From Dev

Can i execute Kernel Function in C without nvcc compiler

From Dev

Why doesn't OpenCL Nvidia compiler (nvcc) use the registers twice?

From Dev

Using CMakes CHECK_CXX_COMPILER_FLAG with nvcc/cuda

From Dev

How can one configure mex to pass compiler flags to nvcc

From Java

Compiler stops optimizing unused string away when adding characters

From Dev

Microsoft c++ optimizing compiler has stopped working

From Dev

How to Disable V8's Optimizing Compiler

From Dev

Roslyn compiler optimizing away function call multiplication with zero

From Dev

Why C++ compiler isn't optimizing unused reference variables?

From Dev

C/C++ Optimizing compiler corrupted due to use of template

From Dev

nvcc fatal : Compiler 'cl.exe' in PATH different than the one specified with -ccbin

From Dev

nvcc fatal : Cannot find compiler 'cl.exe' in PATH although Visual Studio 12.0 is added to PATH

From Dev

using private method from base in friend sub class -- compiler bug in NVCC?

From Dev

What does the compiler do in assembly when optimizing code? ie -O2 flag

From Dev

Why does printf print 0s when I wrongly returned pointer to value on the stack only when optimizing with gcc compiler?

From Dev

nvcc --version command says nvcc is not installed

From Dev

Using GHC with NVCC

From Dev

NVCC ptas=-v output

Related Related

  1. 1

    nvcc compiler not optimizing

  2. 2

    How to use nvcc as compiler in ns3

  3. 3

    How to use nvcc as compiler in ns3

  4. 4

    Why isn't compiler optimizing away this code

  5. 5

    Prevent compiler from optimizing logic away

  6. 6

    Microsoft C++ Optimizing Compiler crashing constantly

  7. 7

    Is the C# compiler optimizing nullable types?

  8. 8

    Why is Java compiler not optimizing a trivial method?

  9. 9

    Prevent compiler from optimizing logic away

  10. 10

    Using CMakes CHECK_CXX_COMPILER_FLAG with nvcc/cuda

  11. 11

    Why doesn't OpenCL Nvidia compiler (nvcc) use the registers twice?

  12. 12

    Can i execute Kernel Function in C without nvcc compiler

  13. 13

    Why doesn't OpenCL Nvidia compiler (nvcc) use the registers twice?

  14. 14

    Using CMakes CHECK_CXX_COMPILER_FLAG with nvcc/cuda

  15. 15

    How can one configure mex to pass compiler flags to nvcc

  16. 16

    Compiler stops optimizing unused string away when adding characters

  17. 17

    Microsoft c++ optimizing compiler has stopped working

  18. 18

    How to Disable V8's Optimizing Compiler

  19. 19

    Roslyn compiler optimizing away function call multiplication with zero

  20. 20

    Why C++ compiler isn't optimizing unused reference variables?

  21. 21

    C/C++ Optimizing compiler corrupted due to use of template

  22. 22

    nvcc fatal : Compiler 'cl.exe' in PATH different than the one specified with -ccbin

  23. 23

    nvcc fatal : Cannot find compiler 'cl.exe' in PATH although Visual Studio 12.0 is added to PATH

  24. 24

    using private method from base in friend sub class -- compiler bug in NVCC?

  25. 25

    What does the compiler do in assembly when optimizing code? ie -O2 flag

  26. 26

    Why does printf print 0s when I wrongly returned pointer to value on the stack only when optimizing with gcc compiler?

  27. 27

    nvcc --version command says nvcc is not installed

  28. 28

    Using GHC with NVCC

  29. 29

    NVCC ptas=-v output

HotTag

Archive