使用Thrust排序静态分配的数组

J0hn D0e 发表于 Dev

J0hn D0e

在我的代码中，我在全局内存中有一个静态分配的数组（即，使用分配了__device__），而我想使用进行排序thrust::sort，这是行不通的。关于该主题的所有示例都使用CUDA运行时分配的数组（使用cudaMalloc）。有什么办法可以对静态分配的数组进行排序？

我想这与无法从主机访问的静态分配内存有关。使用cudaMalloc-allocated数组，可以正常工作。但是，我想避免使用这种类型的分配，因为静态分配可以更轻松地从设备代码访问数据（不是吗？）。

最小（非）工作示例：

#include <stdio.h>
#include <thrust/device_ptr.h>
#include <thrust/sort.h>

#define N 4

typedef struct element {
  int key;
  int value;
  __host__ __device__ bool operator<(element e) const
  { return key > e.key; }
} element;

__device__ element array[N];

__global__ void init() {
  for (int i = 0; i < N; ++i) {
    array[N - i - 1].key = i;
  }
}

__global__ void print_array() {
  for (int i = 0; i < N; ++i) {
    printf("%d ", array[i].key);
  }
  printf("\n");
}

int main(void) {
  thrust::device_ptr<element> array_first(array);

  init<<<1,1>>>();

  printf("unsorted: ");
  print_array<<<1, 1>>>();
  cudaDeviceSynchronize();

  thrust::sort(array_first, array_first + N);

  printf("sorted: ");
  print_array<<<1, 1>>>();
  cudaDeviceSynchronize();
}

贾里德·霍伯洛克（Jared Hoberock）

用于从函数cudaGetSymbolAddress获取array变量的地址__host__：

void* array_ptr = 0;
cudaGetSymbolAddress(&array_ptr, array);
thrust::device_ptr<element> array_first(reinterpret_cast<element*>(array_ptr));

这是完整的程序：

#include <stdio.h>
#include <thrust/device_ptr.h>
#include <thrust/sort.h>

#define N 4

typedef struct element {
  int key;
  int value;
  __host__ __device__ bool operator<(element e) const
  { return key > e.key; }
} element;

__device__ element array[N];

__global__ void init() {
  for (int i = 0; i < N; ++i) {
    array[N - i - 1].key = i;
  }
}

__global__ void print_array() {
  for (int i = 0; i < N; ++i) {
    printf("%d ", array[i].key);
  }
  printf("\n");
}

int main(void) {
  cudaError_t error;

  void* array_ptr = 0;
  if(error = cudaGetSymbolAddress(&array_ptr, array))
  {
    throw thrust::system_error(error, thrust::cuda_category());
  }

  thrust::device_ptr<element> array_first(reinterpret_cast<element*>(array_ptr));

  init<<<1,1>>>();

  printf("unsorted: ");
  print_array<<<1, 1>>>();

  if(error = cudaDeviceSynchronize())
  {
    throw thrust::system_error(error, thrust::cuda_category());
  }

  thrust::sort(array_first, array_first + N);

  if(error = cudaDeviceSynchronize())
  {
    throw thrust::system_error(error, thrust::cuda_category());
  }

  printf("sorted: ");
  print_array<<<1, 1>>>();

  if(error = cudaDeviceSynchronize())
  {
    throw thrust::system_error(error, thrust::cuda_category());
  }

  return 0;
}

这是我系统上的输出：

$ nvcc test.cu -run
unsorted: 3 2 1 0 
sorted: 3 2 1 0

排序后的输出与未排序后的输出相同，但是考虑到数据的生成方式和的定义，我认为这是有意的element::operator<。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-02-25

我来说两句

0条评论

登录后参与评论

上一篇：SSIS：通过表达式将int变量添加到SQL查询

来自分类Dev

Related 相关文章

文章

使用Thrust排序静态分配的数组

使用Thrust排序静态分配的数组

最小（非）工作示例：

使用CUDA / Thrust对多个数组进行排序

静态数组分配

使用Thrust，CUDA进行慢速排序

如何使用CUDA Thrust执行策略来覆盖Thrust的低级设备内存分配器

在cuda Thrust中排序

在cuda Thrust中排序

编译时数组的静态内存分配

指向静态分配数组的增量指针

可以快速创建静态分配的数组吗？

使用静态数组

如何编写可以同时使用动态/静态分配的2D数组的ac函数？

静态，堆栈和堆内存分配中的地址排序？

使用类似条件的排序将值分配给数组中的变量

动态分配的数组和静态数组之间的区别

使用 Structure 的数组排序

WPF使用绑定分配静态资源

WPF使用绑定分配静态资源

使用单例分配静态成员

使用nmcli分配静态IP地址

同时使用动态分配和静态分配的共享内存

为什么使用固定长度分配而不是静态分配？

如何在C中的内存中静态分配数组？

如何始终分配大小为32的静态数组

将大型全局静态分配数组转换为集合

将值从静态char数组分配给动态分配的char数组

使用快速排序对字符数组进行排序

使用GetMem分配多维数组

使用数组分配内存

在C ++ 11中使用std :: sort和lambda函数对动态分配的多维C数组进行排序