NVIDIA would be announcing the latest versions of their development CUDA toolkits. Parallel Nsight, NVIDIA’s Visual Studio development toolkit, has just had its second release candidate for version 2.1 released. CUDA 4.1 is also being released as a release candidate.
Till date the CUDA compiler toolchain was developed entirely within NVIDIA as a proprietary product; developers could write tools that could generate PTX code (NVIDIA’s intermediate virtual ISA). The compiling of PTX to binary code was handled by NVIDIA’s tools. Next week at GTC Asia, CUDA 4.1 brings a different way: the CUDA compiler is now being built against LLVM, the modular compiler.
LLVM ain’t a true compiler i.e. it doesn’t generate binary code on its own, but as a modular compiler it’s earned quite a reputation for generating efficient intermediate code and for being easy to add new support for new languages and architectures to. If you can generate code that goes into LLVM, then you can get out code for any architecture LLVM supports and it will probably be pretty efficient too. LLVM has been around for quite some time – and is most famously used as the compiler for Mac OS X and iOS starting with Mac OS X 10.6 – but this is the first time it’s been used for a GPU in this fashion.
Benefits of LLVM for CUDA
- Immediate benefits include shorter compile times (upto 50% times faster) and slightly faster performing code for CUDA developers. Meanwhile the nature of GPU computing means that application/kernel performance won’t improve nearly as much – LLVM can’t parallelize your code for you – but it should be able to generate slightly smarter code, particularly code from non-NVIDIA languages where developers haven’t been able to invest as much time in optimizing their PTX code generation.
- Moving to LLVM is not completely about immediate performance benefits, it also marks the start of a longer transition by NVIDIA.
- In NVIDIA’s case moving to LLVM not only allows them to open up GPU computing to additional developers by making it possible to support more languages, but it allows CUDA developers to build CUDA applications for more architectures than just NVIDIA’s GPUs. Currently it’s possible to compile CUDA down to x86 through The Portland Group’s proprietary x86 CUDA compiler, and the move to LLVM would allow NVIDIA to target not just x86, but ARM too. ARM in fact is more than likely the key to all of this – just as how developers want to be able to use CUDA on their x86 + NVGPU clusters, they will want to be able to use CUDA on their Denver (ARM) + NVGPU clusters.
Accoriding to the source, NVIDIA will not be releasing CUDA LLVM in a truly open source manner, but they will be releasing the source in a manner akin to Microsoft’s “shared source” initiative – eligible researchers and developers will be able to apply to NVIDIA for access to the source code. This allows NVIDIA to share CUDA LLVM with the necessary parties to expand its functionality without sharing it with everyone and having the inner workings of the Fermi code generator exposed.