Skip to content

Using Oclgrind

James Price edited this page Oct 20, 2016 · 7 revisions

Basic Usage

The simplest way of using Oclgrind is via the oclgrind command. This command will run an application and make Oclgrind the only available OpenCL platform and device visible to that application. To use it, simply prefix the command you would use with oclgrind:

oclgrind ./application arg1 arg2 ... argN

If an OpenCL kernel performs an invalid memory access when running with Oclgrind, the details of the error will be printed to stderr. For example, a kernel that reads past the end of a buffer may produce this error message:

Invalid write of size 4 at global memory address 0x1000000000040
        Kernel:  vecadd
        Entity:  Global(16,0,0) Local(0,0,0) Group(16,0,0)
        store i32 %tmp9, i32 addrspace(1)* %tmp15, align 4
        At line 4 of input.cl
          c[i] = a[i] + b[i]

Since it is interpreting an abstract intermediate representation and bounds-checking each memory access, Oclgrind will run quite slowly (typically a couple of orders of magnitude slower than a regular CPU implementation). Therefore, it is recommended to run your application with a small problem if possible.

Using Oclgrind via the OpenCL ICD

Oclgrind can also be used via the OpenCL ICD, as long as an ICD loading point has been created (see here). If installed correctly, Oclgrind will appear as a platform when an application calls clGetPlatformIDs(), alongside any other OpenCL platforms installed on your system. To enable interactive debugging when using the ICD, export the environment variable OCLGRIND_INTERACTIVE=1.

Options

Oclgrind provides a number of command-line options to modify its behaviour. If you are using Oclgrind via the ICD interface, these options can also be specified via environment variables.


--build-options OPTIONS (or OCLGRIND_BUILD_OPTIONS= OPTIONS)
Append additional compiler options to the OpenCL C compiler.


--check-api (or OCLGRIND_CHECK_API=1)
Report errors on API calls.

If any OpenCL API calls generate an an error code that isn't CL_SUCCESS, this flag causes Oclgrind to generate an error message reporting to error code and API call involved, along with some additional information that can help identify the cause of the problem. For example:

Oclgrind - OpenCL runtime error detected
    Function: clEnqueueNDRangeKernel
    Error:    CL_INVALID_WORK_GROUP_SIZE
    Dimension 0: local_work_size (64) does not divide global_work_size (1020)

--data-races (or OCLGRIND_DATA_RACES=1)
Enable data-race detection


--disasle-pch (or OCLGRIND_DISABLE_PCH=1)
Disable the use of precompiled headers for compiling OpenCL C kernels.


--dump-spir (or OCLGRIND_DUMP_SPIR=1)
Dump SPIR to $TEMP/oclgrind_*.{ll,bc}

This causes Oclgrind to dump out compiled OpenCL program objects in SPIR form to a temporary file, for debugging purposes.


-h --help
Display usage information


--inst-counts (or OCLGRIND_INST_COUNTS=1)
Output histograms of instructions executed


-i --interactive (or OCLGRIND_INTERACTIVE=1)
Enable interactive mode


--log LOGFILE (or OCLGRIND_LOG= LOGFILE)
Redirect log/error messages to a file

Any messages that Oclgrind generates during simulation are output to stderr by default. This options instructs Oclgrind to write these messages to LOGFILE instead.


--max-errors NUM (or OCLGRIND_MAX_ERRORS= NUM)
Limit the number of error/warning messages

By default, Oclgrind will output the first 1000 error messages, and then suppress further output. This flag allows this limit to be changed.


--num-threads NUM (or OCLGRIND_NUM_THREADS= NUM)
Set the number of worker threads to be used by the simulation (defaults to the number of hardware threads available.)


--pch-dir DIR (or OCLGRIND_PCH_DIR= PCH_DIR)
Set the directory containing the precompiled headers used when compiling OpenCL C programs.


--plugins PLUGINS (or OCLGRIND_PLUGINS= PLUGINS)
Load colon seperated list of plugin libraries

This option provides Oclgrind with a colon separated list of dynamic libraries that Oclgrind should load as additional plugins. More information about creating plugins can be found here.


-q --quick (or OCLGRIND_QUICK=1)
Only run first and last work-group

This causes Oclgrind to only simulate the first and last work-groups in each kernel invocation. This is useful if running the full kernel invocation would take too long, and only a quick sanity check is required.


--uniform-writes
Don't suppress uniform write-write data-races

By default, Oclgrind will ignore data-races where the same data is being written by multiple work-items, since this situation occurs in real codes. This option prevents Oclgrind from filtering these types of races.


--uninitialized (or OCLGRIND_UNINITIALIZED=1)
Check for uses of uninitialized memory.


-v --version
Display version information