On 28th Feb, 2011, NVIDIA® Announced its CUDA® 4.0 major toolkit release for developing parallel applications using NVIDIA GPUs.
The NVIDIA CUDA Toolkit specially the 4.0 is designed to make parallel programming easier, and enable more developers to port their applications to GPUs.
NVIDIA’s CUDA 4.0 has now three distinctive features:
- NVIDIA GPUDirect™ 2.0 Technology: Offers support for peer-to-peer communication among GPUs within a single server or workstation. This enables easier and faster multi-GPU programming and application performance.
- Unified Virtual Addressing (UVA): Povides a single merged-memory address space for the main system memory and the GPU memories, enabling quicker and easier parallel programming.
- Thrust C++ Template Performance Primitives Libraries: Provides a collection of powerful open source C++ parallel algorithms and data structures that ease programming for C++ developers. With Thrust, routines such as parallel sorting are 5X to 100X faster than with Standard Template Library (STL) and Threading building Blocks (TBB).
The CUDA 4.0 architecture release includes a number of other key features and capabilities, including:
- MPI Integration with CUDA Applications
Modified MPI implementations automatically move data from and to the GPU memory over Infiniband when an application does an MPI send or
- Multi-thread Sharing of GPUs
Multiple CPU host threads can share contexts on a single GPU, making it easier to share a single GPU by multi-threaded applications.
- Multi-GPU Sharing by Single CPU Thread
A single CPU host thread can access all GPUs in a system. Developers can easily coordinate work across multiple GPUs for tasks such as “halo” exchange in applications.
- New NPP Image and Computer Vision Library — A rich set of image transformation operations that enable rapid development of imaging and computer vision applications.
- New and Improved Capabilities
- Auto performance analysis in the Visual Profiler
- New features in cuda-gdb and added support for MacOS
- Added support for C++ features like new/delete and virtual functions
- New GPU binary disassembler.
A release candidate of CUDA Toolkit 4.0 will be available
free of charge beginning March 4, 2011, by enrolling in the CUDA
Registered Developer Program at: https://nvdeveloper.nvidia.com.