Making the CUDA driver API as simple to use as the runtime API. Almost.
NVIDIA provides two ways of controlling their GPU devices: the runtime API and the driver API. The runtime API is easier to use, but the driver API gives you more control over low level details. The biggest drawback to the driver API is that you need to
know more about how to create the files that the device needs to execute the code you write. The advantage is that you have more control over the device than the runtime API provides.
The runtime API syntactic sugar nvcc provides can be replaced with simple C++ wrappers around the driver API. Instead of
VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N);
all you need to write is
CUfunction VecAdd = mod.GetFunction("VecAdd");