Skip to content

DCcompute basics

Nicholas Wilson edited this page Feb 27, 2017 · 7 revisions

At the heart of DCompute is are the special attributes @compute and @kernel from the module dcompute.attributes

@compute tell the d compiler that this module should be built to target GPUs. @compute takes a single parameter that Indicate wether to target only GPUs (@compute(CompileFor.deviceOnly)) or to target host as well (@compute(CompileFor.hostAndDevice)).

@kernel specifies that the attached function should be an entry point for the GPU, i.e. you can tell the driver to execute this function on the GPU, whereas you can't for functions that aren't marked @kernel.

Also critical in using DCompute is the notion of address spaced pointers. These are available from the module ldc.dcomputetypes in the form of the magic template Pointer!(uint addrspace,T) which is a pointer to a T that resides in the address space addrspace. there are 5 address spaces Global, Shared, Constant, Private and Generic.

Global is just what you think it is. It is available to all tasks on the device, and is the only address space that the host (i.e. cpu) can both read and write.

Shared is memory that is local to a group of threads/work items. Threads (or work items in OpenCL speak) are the unit of execution.

Constant memory is memory that is writeable by the host but read only by the device and is kind of like read only pages but is has some spacial chaching properties.

Private memory is local to a thread and contains its registers and stack.

Generic is not really an address space but a Generic pointer can point anywhere in the other address spaces and is useful if you are writing library routines that don't know ahead of time where the pointer will point to. You could of course just template the address space.

For more information on this concept just search for documentation on OpenCL and/or CUDA.

About the simplest kernel you can have is shown below (note that @kernel function MUST return void or you'll get errors)

@compute(CompileFor.deviceOnly) module mykernels;
import ldc.attributes;
import ldc.dcomputetypes;
@kernel void mykernel(GlobalPointer!float a,GlobalPointer!float b, float c)
{
    *a = *b + c;
}

Its not a very useful kernel because it only assigns to the first element of a.

Compile with ldc -mdcompute-targets=ocl-210,cuda-350 -oq to target OpenCL 2.1 and CUDA SM 3.5.

Clone this wiki locally