Sunday, April 29, 2012

OpenCL (Part 1)

So suppose you had a fancy multicore processor and a fancy GPU...what can you do with them? How to take advantage of the parallelism?

It seems OpenCL and friends (e.g., CUDA) deal with this.

You've got kernels which intuitively is like a C function, but serves as the basic unit of executable code. It can be either data-parallel or task-parallel. In any event, kernels are parallel.

The Program Object then consists of kernels and other functions (analogous to a dynamic library).

More precisely, we have application queue kernel execution instances, which queues kernel objects in order. But it may execute kernel objects either in-order or out-of-order.

Since we are working with a graphics chip, we can process vectors, images, or volumes. These are 1-dimensional, 2-dimensional, and 3-dimensional domains, respectively.

Each independent element of execution in an N-dimensional domain is called a work-item; the N-dimensional domain defines the total number of work-items that execute in parallel.

Parallelization demands concern for synchronization, viz. synchronizing either data [i.e., memory] or execution.

Although OpenCL does not permit global synchronization, we can have "local" synchronization. What does this mean? Well, consider some image processing problem. We can make the image into a "quilt" of "patches" where each patch is, e.g., 128×128 pixels...this "patch" is called a workgroup, and we may synchronize within each workgroup.

Note we must be clear if we synchronize memory or execution.

We cannot synchronize between different workgroups.

We use "barriers" to synchronize execution; and "memory fences" to synchronize memory accesses.

Sadly, alas, this may require using multipass algorithms for global synchronization (e.g., between kernels). Alas, alas, multipass!

So How to Program in OpenCL?

Well, there are five things we work with: cl_device_id, cl_kernel, cl_program, cl_command_queue, and cl_context.

We already discussed kernels and programs, which are like functions (kernel) and a collection of functions (program).

So what's the other guys...bonus parts?

No! The Host is your computer, and it's connected to one or more Devices (e.g., CPU, GPU, DSP, etc.). A device is anything providing processing power.

The Device receives kernels from the host. A cl_device_id represents a device.

We have two things left: the command queue, and the context.

The device receives its kernels through a Command Queue.

OpenCL contexts enables devices to receive kernels and transfer data.

Further Reading

  1. A Gentle Introduction to OpenCL, Dr Dobbs Journal.
  2. OpenCL Presentation [pdf]
  3. OpenCL by Example [ucdavis.edu] discusses...OpenCL...by...example...
  4. Getting started with OpenCL and GPU computing

No comments: