I want to printf()
something in my CUDA kernel. [The Programming Guide suggests][1] I do this like so:
#include <stdio.h>
__global__ void helloCUDA(float f)
{
printf("Hello thread %d, f=%f\n", threadIdx.x, f);
}
But this is simply including the standard C library's stdio.h
. Why would that be necessary? CUDA's printf()
doesn't have the same behavior of stdio's printf()
; and I certainly don't need most of everything else that's in there.
It's an implementation detail you don't need to know about which stems from limitations in the CUDA syntax (basically it is illegal to define different __device__
and __host__
versions of the same function).
The standard library prototype is used as a proxy in device code during compilation, and when compiling for a supported architecture, some sneaky template overloading is used to insert the device implementation into the device code.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments