0. CUDA简介
CUDA(Compute Unified Device Architecture,统一计算架构)是由英伟达NVIDIA所推出的一种集成技术,是该公司对于GPGPU的正式名称。透过这个技术,用户可利用NVIDIA的GPU进行图像处理之外的运算,亦是首次可以利用GPU作为C-编译器的开发环境。CUDA 开发包(CUDA Toolkit )只能将自家的CUDA C-语言,也就是执行于GPU的部分编译成PTX中间语言或是特定NVIDIA GPU架构的机器代码(NVIDIA 官方称为 “device code”);而执行于中央处理器部分的C / C++代码(NVIDIA 官方称为 “host code”)仍依赖于外部的编译器,如Microsoft Windows下需要Microsoft Visual Studio;Linux下则主要依赖于GCC。
在GPUs(GPGPU)上使用图形APIs进行传统通用计算,CUDA技术有下列几个优点:
- 分散读取——代码可以从存储器的任意地址读取
- 统一虚拟内存(Unified Memory, 从 CUDA 6.0 开始)—— 将所有 CPU 和 GPU 的内存置于统一管理的虚拟内存空间下。
- 共享存储器(Global Memory)—— 访问快速的区域,使之在多个线程间共享,有效带宽比纹理存储器(Texture Memory)更大。
- 与GPU之间更快的下载与回读
- 全面支持整型与位操作,包括整型纹理查找
1. 安装步骤
- Add GPG Key on Ubuntu 20.04
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
- Add CUDA Toolkit Repository
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
#Add the repository
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
- Run APT Update
sudo apt-get update
- Install Nvidia CUDA on Ubuntu 20.04
sudo apt install cuda
#Add the CUDA installed folder in your system path
echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc
source ~/.bashrc
#check the version
nvcc --version
2. 运行你的第一个CUDA程序-HelloWorld
- Here is the sample of CUDA C code to create your first program:
gedit helloworld.cu
- Paste the following code in the file:
#include
__global__
void saxpy(int n, float a, float *x, float *y)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < n) y[i] = a*x[i] + y[i];
}
int main(void)
{
int N = 1<<20;
float *x, *y, *d_x, *d_y;
x = (float*)malloc(N*sizeof(float));
y = (float*)malloc(N*sizeof(float));
cudaMalloc(&d_x, N*sizeof(float));
cudaMalloc(&d_y, N*sizeof(float));
for (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
}
cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);
// Perform SAXPY on 1M elements
saxpy<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y);
cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost);
float maxError = 0.0f;
for (int i = 0; i < N; i++)
maxError = max(maxError, abs(y[i]-4.0f));
printf("Max error: %f\n", maxError);
cudaFree(d_x);
cudaFree(d_y);
free(x);
free(y);
}
- Compile your program
nvcc -o mycuda helloworld.cu
- Run your CUDA program
./mycuda
3. 更多教程
- NVIDIA CUDA C++ Programming Guide:
- https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
- CUDA C++ Best Practices Guide:
- https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html
- CUDA编程入门极简教程:
- https://zhuanlan.zhihu.com/p/34587739
- CUDA学习资料-博主推荐:
- https://www.cnblogs.com/5long/p/cuda-learning.html
- …