This guide contains information on using DPC++ to run SYCL™ applications onNVIDIA® GPUs via the DPC++ CUDA® plugin.
For general information about DPC++, refer to theDPC++ Resources section.
This release has been tested on the following platforms:
GPU Hardware | Architecture | Operating System | CUDA | GPU Driver |
---|---|---|---|---|
NVIDIA A100-PCIE-40GB | Ampere - sm_80 | Ubuntu 22.04.2 LTS | 12.2 | 535.54.03 |
This release should work across a wide array of NVIDIA GPUs and CUDA versions,but Codeplay cannot guarantee correct operation on untested platforms.
The package has been tested on Ubuntu 22.04 only, but can be installed onany Linux systems.
This release of oneAPI for NVIDIA GPUs is not available for Windows, but aWindows package will be available in a future release.
The plugin relies on CUDA being installed on your system. As CUDA no longersupports macOS®, a oneAPI for NVIDIA GPUs package is not available for macOS.
Install C++ development tools.
You will need the following C++ development tools installed in order tobuild and run oneAPI applications:
cmake
,gcc
,g++
,make
andpkg-config
.The following console commands will install the above tools on the mostpopular Linux distributions:
Ubuntu
sudo apt updatesudo apt -y install cmake pkg-config build-essential
Red Hat and Fedora
sudo yum updatesudo yum -y install cmake pkgconfigsudo yum groupinstall "Development Tools"
SUSE
sudo zypper updatesudo zypper --non-interactive install cmake pkg-configsudo zypper --non-interactive install pattern devel_C_C++
Verify that the tools are installed by running:
which cmake pkg-config make gcc g++
You should see output similar to:
/usr/bin/cmake/usr/bin/pkg-config/usr/bin/make/usr/bin/gcc/usr/bin/g++
Install an Intel® oneAPI Toolkit version 2024.2.0 that contains the DPC++/C++ Compiler.
See AlsoBannerText Windows Installer | BytesConfiguring g++-13 and g++-14 in VSCode tasks.json for macOSRosetta compilation error through newer gcc-g++ versionsVS2003 SP1 goes stupid on install | BytesFor example, the “Intel oneAPI Base Toolkit” should suit most use cases.
The Toolkit must be version 2024.2.0 - otherwise oneAPI forNVIDIA GPUs cannot be installed.
Install the GPU driver and CUDA software stack for the NVIDIA GPU byfollowing the steps described in theNVIDIA CUDA Installation Guide for Linux.
Download the latest oneAPI for NVIDIA GPUs installer:
Directly via Website
Using Download API with cURL or WGET(requires an account)
Run the downloaded self-extracting installer:
sh oneapi-for-nvidia-gpus-2024.2.0-cuda-12.0-linux.sh
The installer will search for an existing Intel oneAPI Toolkit version2024.2.0 installation in common locations. If you haveinstalled an Intel oneAPI Toolkit in a custom location, use
--install-dir /path/to/intel/oneapi
.If your Intel oneAPI Toolkit installation is outside your home directory,you may be required to run this command with elevated privileges, e.g.
sudo
.
To set up your oneAPI environment in your current session, source theIntel-provided
setvars.sh
script.For system-wide installations:
. /opt/intel/oneapi/setvars.sh --include-intel-llvm
For private installations (in the default location):
. ~/intel/oneapi/setvars.sh --include-intel-llvm
The
--include-intel-llvm
option is required in order to add LLVM toolssuch asclang++
to the path.Note that you will have to run this script in every new terminal session.For options to handle the setup automatically each session, see therelevant Intel oneAPI Toolkit documentation, such as Set EnvironmentVariables for CLIDevelopment
Ensure that the CUDA libraries and tools can be found in your environment.
Run
nvidia-smi
- if it runs without any obvious errors in the outputthen your environment should be set up correctly.Otherwise, set your environment variables manually:
export PATH=/PATH_TO_CUDA_ROOT/bin:$PATHexport LD_LIBRARY_PATH=/PATH_TO_CUDA_ROOT/lib:$LD_LIBRARY_PATH
To verify the DPC++ CUDA plugin installation, the DPC++ sycl-ls
tool can beused to make sure that SYCL now exposes the available NVIDIA GPUs. You shouldsee something similar to the following in the sycl-ls
output if NVIDIA GPUsare found:
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.5]
If the available NVIDIA GPUs are correctly listed, then the DPC++ CUDAplugin was correctly installed and set up.
Otherwise, see the “Missing devices in sycl-ls output” section of theTroubleshooting documentation.
Note that this command may also list other devices such as OpenCL™devices, Intel GPUs, or AMD GPUs, based on the available hardwareand DPC++ plugins installed.
Create a file
simple-sycl-app.cpp
with the following C++/SYCL code:#include <sycl/sycl.hpp>int main() { // Creating buffer of 4 ints to be used inside the kernel code sycl::buffer<int, 1> Buffer{4}; // Creating SYCL queue sycl::queue Queue{}; // Size of index space for kernel sycl::range<1> NumOfWorkItems{Buffer.size()}; // Submitting command group(work) to queue Queue.submit([&](sycl::handler &cgh) { // Getting write only access to the buffer on a device auto Accessor = Buffer.get_access<sycl::access::mode::write>(cgh); // Executing kernel cgh.parallel_for<class FillBuffer>( NumOfWorkItems, [=](sycl::id<1> WIid) { // Fill buffer with indexes Accessor[WIid] = static_cast<int>(WIid.get(0)); }); }); // Getting read only access to the buffer on the host. // Implicit barrier waiting for queue to complete the work. auto HostAccessor = Buffer.get_host_access(); // Check the results bool MismatchFound{false}; for (size_t I{0}; I < Buffer.size(); ++I) { if (HostAccessor[I] != I) { std::cout << "The result is incorrect for element: " << I << " , expected: " << I << " , got: " << HostAccessor[I] << std::endl; MismatchFound = true; } } if (!MismatchFound) { std::cout << "The results are correct!" << std::endl; } return MismatchFound;}
Compile the application with:
icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda simple-sycl-app.cpp -o simple-sycl-app
Depending on your CUDA version, you may see this warning, which can be safely ignored:
icpx: warning: CUDA version is newer than the latest supported version 12.1 [-Wunknown-cuda-version]
Run the application with:
ONEAPI_DEVICE_SELECTOR="cuda:*" SYCL_PI_TRACE=1 ./simple-sycl-app
You should see output like:
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_cuda.so [ PluginVersion: 14.37.1 ]SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_unified_runtime.so [ PluginVersion: 14.37.1 ]SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automaticSYCL_PI_TRACE[all]: Selected device: -> final score = 1500SYCL_PI_TRACE[all]: platform: NVIDIA CUDA BACKENDSYCL_PI_TRACE[all]: device: NVIDIA A100-PCIE-40GBThe results are correct!
If so, you have successfully set up and verified your oneAPI for NVIDIA GPUsdevelopment environment, and you can begin developing oneAPI applications.
The rest of this document provides general information on compiling andrunning oneAPI applications on NVIDIA GPUs.
To compile a SYCL application for NVIDIA GPUs, use the icpx
compilerprovided with DPC++. For example:
icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda sycl-app.cpp -o sycl-app
The following flags are required:
-fsycl
: Instructs the compiler to build the C++ source file inSYCL mode. This flag will also implicitly enable C++17 and automaticallylink against the SYCL runtime library.-fsycl-targets=nvptx64-nvidia-cuda
: Instructs the compiler tobuild SYCL kernels for the NVIDIA GPU target.
It is also possible to build the SYCL kernels for a specific NVIDIAarchitecture using the following flags:
-Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_80
Note that kernels are built for sm_50
by default, allowing them to work on awider range of architectures, but limiting the usage of more recent CUDAfeatures.
For more information on available SYCL compilation flags, see theDPC++ Compiler User’s Manualor for information on all DPC++ compiler options see theCompiler Optionssection of the Intel oneAPI DPC++/C++ Compiler Developer Guide and Reference.
The icpx
compiler is by default a lot more aggressive with optimizations thanthe regular clang++
driver, as it uses both -O2
and -ffast-math
. In manycases this can lead to better performance but it can also lead to some issuesfor certain applications. In such cases it is possible to disable -ffast-math
by using -fno-fast-math
and to change the optimization level by passing adifferent -O
flag. It is also possible to directly use the clang++
driverwhich can be found in $releasedir/compiler/latest/linux/bin-llvm/clang++
, toget regular clang++
behavior.
In addition to targeting NVIDIA GPUs, you can build SYCL applications that canbe compiled once and then run on a range of hardware. The following exampleshows how to output a single binary including device code that can run on NVIDIAGPUs, AMD GPUs, or any device that supports SPIR e.g. Intel GPUs.
icpx -fsycl -fsycl-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda,spir64 \ -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx1030 \ -Xsycl-target-backend=nvptx64-nvidia-cuda --offload-arch=sm_80 \ -o sycl-app sycl-app.cpp
After compiling your SYCL application for an NVIDIA target, you should alsoensure that the correct SYCL device representing the NVIDIA GPU is selected atruntime.
In general, simply using the default device selector should select one of theavailable NVIDIA GPUs. However in some scenarios, users may want to change theirSYCL application to use a more precise SYCL device selector, such as the GPUselector, or even a custom selector.
The environment variable ONEAPI_DEVICE_SELECTOR
may be used to help the SYCLdevice selectors by restricting the set of devices that can be used. Forexample, to only allow devices exposed by the DPC++ CUDA plugin:
export ONEAPI_DEVICE_SELECTOR="cuda:*"
For more details on this environment variable, see theEnvironment Variables section of the oneAPI DPC++ Compiler documentation.Note: this environment variable will be deprecated in a subsequent release.