Example code: cudaHello.zip

Note: CUDA is available by logging into the cluster host bambleweeny:

ssh bambleweeny

GPU - Graphics Processing Unit

Designed for fast rendering of 3D graphics, but useful for general-purpose computation. (“GPGPU” computation means “General Purpose Graphics Processing Unit” computation.)

Graphics rendering is highly parallelizable, so GPUs have many cores.

CUDA

Language and runtime environment designed and implemented by nvidia for GPGPU computation.

The language is a dialect of C.

Important properties:

Keywords:

Calling a kernel function:

__global__ void kernel( ...params... )
{
        ...code...
}

...

void main( void )
{
        kernel<<<par blocks, num threads per block>>>( ...args... );
}

par blocks is a description of how many “blocks” (chunks of data) the kernel function will be called on. num threads per block specifies how many threads are executed (in parallel) per block.

par blocks can be a single integer N, in which case the kernel function is executed on N blocks. It can also be a two dimensional grid:

dim3 grid( xdim, ydim );

The above declaration defines a grid where the x dimension is in the range 0..xdim - 1 and the y dimension is in the range 0..ydim - 1. The kernel function is then invoked as

kernel<<<grid, 1>>>( ...args... );

Within the kernel function, the special blockIdx variable contains the information about which block the kernel is being executed on:

blockIdx.x

is the x index of the block, and

blockIdx.y

is the y index of the block (if the blocks are arranged in a two-dimensional grid.)

[Example: Mandelbrot set.]