Name: OCLAlgo
Owner: Samsung
Description: C++11 concept for OpenCL computations
Created: 2014-03-25 11:57:03.0
Updated: 2017-08-02 18:27:05.0
Pushed: 2014-03-14 20:03:49.0
Homepage: null
Size: 644
Language: C++
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
OpenCL framework, which is based on C++ OpenCL API Wrapper and C++11. It provides simple access to OpenCL platforms and devices for sync/async calculations.
OCLAlgo provides simple OpenCL platform and device initialization by:
st case
lgo::Queue queue("NVIDIA", "GeForce");
nd case
lgo::Queue queue(0, 0);
To enqueue OpenCL kernel you should make four simple steps:
*/
lgo::KernelArg arg(host_array, oclalgo::ArgType::IN_OUT);
add_value = 101;
*/
lgo::Task task("vector.cl", "modify", "", arg, add_value);
*/
lgo::Grid grid(cl::NDRange(size));
*/
ocl_res = queue.EnqueueTask(task, grid);
To get the output results you should call
:vector<cl::Buffer> v_res = ocl_res.get();
If you want to copy OpenCL buffer to host array or vise versa, you should call Queue::memcpy (it's available to use sync or async approach to copy memory objects between Host and OpneCL devices). In async case oclalgo::future object is returned).
e.memcpy(host_array, v_res[0]);
The source for OCLAlgo is licensed under the BSD licence Copyright (c) 2014, Samsung Electronics Co.,Ltd.
As an example of framework using oclalgo::Matrix and oclalgo::DMatrix calsses was added. This classes implement basic matrix operations ( + / - / * ) using host (oclalgo::Matrix) and device (oclalgo::DMatrix) resources.
You can find function test below, which checks correct work of oclalgo::Queue class in case of vectors addition.
lude <algorithm>
lude <iostream>
lude <string>
lude <gtest/gtest.h>
lude <src/gtest_main.cc>
lude <oclalgo/queue.h>
(Queue, VectorAdd) {
y {
// create OpenCL queue for sync/async task running using
// part platform and device names
oclalgo::Queue queue("NVIDIA", "GeForce");
// create and initialize input shared arrays
int size = 1024;
oclalgo::shared_array<int> a(size), b(size);
for (int i = 0; i < size; ++i) {
a[i] = i;
b[i] = size - i;
}
// initialize OpenCl kernel arguments
using oclalgo::ArgType;
using oclalgo::BufferArg;
BufferArg a_arg = queue.CreateKernelArg(a, ArgType::IN);
BufferArg b_arg = queue.CreateKernelArg(b, ArgType::IN);
BufferArg c_arg = queue.CreateKernelArg<int>(size, ArgType::OUT);
// create task using OpenCL program and kernel names, compilation options
// and arguments in the same order as in OpenCL kernel
oclalgo::Task task = queue.CreateTask("vector.cl", "vector_add", "",
a_arg, b_arg, c_arg);
// create grid to define dimensions of OpenCL task
// in global and local (group size) space
oclalgo::Grid grid = oclalgo::Grid(cl::NDRange(size));
// enqueue OpenCL task (EnqueueTask() returns oclalgo::future object
// for async task running)
auto ocl_res = queue.EnqueueTask(task, grid);
// copy device memory with result to host
// (ocl_res.get() waits while OpenCL finished task
// and returns std::vector with output OpenCL buffers,
// which was marked as ArgType::OUT or ArgType::IN_OUT when was created)
queue.memcpy(a, ocl_res.get()[0]);
// check result
auto it = std::find_if(a.get_raw(), a.get_raw() + a.size(),
[size](int x) { return x != size; });
ASSERT_EQ(a.get_raw() + a.size(), it);
catch (const cl::Error& e) {
std::cerr << e.what() << " (err_code = "
<< oclalgo::Queue::StatusStr(e.err()) << ")" << std::endl;
throw e;