OpenCL

From NaplesPU Documentation
Revision as of 13:34, 11 October 2017 by Catello (talk | contribs)
Jump to: navigation, search

The OpenCL support for the nu+ architecture is made through pocl.

How to install vanilla pocl

  1. Download following the link.
  2. In order to build pocl, you need the following support libraries and tools:
    • Latest released version of LLVM & Clang
    • GNU make
    • libtool dlopen wrapper files (e.g. libltdl3-dev in Debian)
    • pthread (should be installed by default)
    • hwloc v1.0 or newer (e.g. libhwloc-dev)
    • pkg-config
    • cmake
    • libclang-3.8-dev if you are using Ubuntu 16.04 LTS
    On Ubuntu 16.04 LTS you can run the following code on a terminal
    sudo apt-get install llvm & clang & libltdl3-dev & libhwloc-dev & pkg-config & libclang-3.8-dev & make & cmake
  3. Build and install
    cd <directory-with-pocl-sources>
    mkdir build
    cd build
    cmake [-D<option>=<value> ...] ..
    make && make install

Using pocl

To compile with pocl you have to execute:

gcc example1.c -o example `pkg-config --libs --cflags pocl`

See [1] and [2] for further informations.

Modify pocl

Adding a new device class in pocl

  • Create a directory for the new device class in "lib/CL/devices". In this case the "nuplus" folder is created.
  • Create at least the files newdevice.c and newdevice.h. In this case "nuplus.c" and "nuplus.h"
  • Create the CMakeList.txt file in the device folder, specifying the files created before and the device name.
     if(MSVC)
     set_source_files_properties( nuplus.h nuplus.c PROPERTIES LANGUAGE CXX )
     endif(MSVC)
     add_library("pocl-devices-nuplus" OBJECT nuplus.h nuplus.c)
  • Modify the "lib/CL/devices.c" file including the new header file for the device and adding the init fucntion in the vector pocl_devices_init_ops
    # include "nuplus/nuplus.h"
    
     ...
    
     static init_device_ops pocl_devices_init_ops[] = {
     pocl_pthread_init_device_ops,
     pocl_basic_init_device_ops,
     pocl_nuplus_init_device_ops,
    # if defined(TCE_AVAILABLE)
     pocl_ttasim_init_device_ops,
    # endif
    # if defined(BUILD_HSA)
     pocl_hsa_init_device_ops,
    # endif
     };
    
     ...


  • Modify the "lib/CL/devices/CMakeLists.txt" adding the new device subdirectory name.
     ...
    
     add_subdirectory("nuplus")
    
     ...
  • Modify the "CMakeLists.txt" in the pocl root directory adding the new device name to the "OCL_DRIVERS".
     ...
    
     set(OCL_DRIVERS "basic pthreads nuplus")
    
     ...

Build pocl with a custom LLVM

Prerequisites

Be sure that the LLVM compiler is built with all the targets and the default target is set as the host. For the nu+ toolchain the CMakeLists.txt in the LLVM root folder must be modified as reported below:

  1. Delete the code
    set(LLVM_TARGETS_TO_BUILD "NuPlus"
        CACHE STRING "Semicolon-separated list of targets to build, or \"all\".")
  2. modify
    set(LLVM_DEFAULT_TARGET_TRIPLE "nuplus-none-none" CACHE STRING
    in
    set(LLVM_DEFAULT_TARGET_TRIPLE "${LLVM_HOST_TRIPLE}" CACHE STRING

Building pocl with nu+ compiler

  1. Build and install nu+ compiler
  2. Build and install pocl
    cd <directory-with-pocl-sources>
    mkdir build
    cd build
    cmake -DWITH_LLVM_CONFIG=/usr/local/llvm-nuplus/bin/llvm-config ..
    make && make install

OpenCL support implementation

To enable OpenCL support, a new pocl device has been added. The device implements the device operations interface provided by pocl. The operations are: query devices, manage memory, transfer data, generate machine code, and manage execution.

Query devices

To query the amount of available devices, the nu+ device-layer scans the amount of fpga devices connected through USB to the host. This allows adding multiple accelerators to a system, each being used by a different program.

Manage Memory

The management of the memory of the accelerator is done by the device-layer on the host. As a kernel instance can not dynamically allocate memory, the amount of memory that needs to be reserved is known before an instance is executed. The nu+ device-layer uses the Bufalloc memory allocator that is included in pocl. This allocator is designed for typical OpenCL workloads and uses a simple first fit algorithm to find free space. As only one OpenCL program can use the device at the same time, the entire memory is available to the allocator.

Transfer Data

Data transfer is done using the nu+ driver. Read and write requests to the device memory are performed using nuplus_read and nuplus_write. To set the location in memory where the data should be written, the nuplus_lseek function is used.