With the introduction of the Fermi architecture, the new Quadro® solutions feature NVIDIA Dual Copy Engines that enable asynchronous data transfers with concurrent 3-way overlap. The current set of data can be processed while the previous set can be readback from the GPU, and the next set is uploaded. In the past, data transfers would stall due to architectural limitations in synchronizing the data with the GPU processing. For example, during texture uploads or frame buffer readbacks, the GPU is blocked from processing and incurs a heavy context switch. This synchronization requirement of traditional GPUs limits the overall processing throughput capabilities and creates bottlenecks with high performance applications.