NVidia GK110 to target HPC applications
NVidia last week published details of the ultra-high-end Kepler GK110 GPU and it's applications. The full GK110 core will have 15 SMX units, each having 192 CUDA cores, to give a maximum of 2880 CUDA cores. A 384-bit memory interface is provided by 6 x 64-bit memory controllers. NVidia say there will be different configurations for different cards, with some only using 13 or 14 of the 15 SMX, a move designed to maximize the number of functional die.
Most features of GK110 are the same as we have already seen in the recently released GK104, but GK110 has a couple of extra features to offer. Firstly, there can be up to 255 registers per thread, compared to 63 in Fermi and GK104. Additionally, there are a few new features, including Hyper-Q and Dynamic Parallelism. All this is built with 7.1 billion transistors using the 28nm process introduced with GK104.
Hyper-Q increases GPU utilization and reduces CPU idle time by allowing multiple CPU cores to simultaneously launch work on the same GPU. Where earlier GPUs could only manage a single connection, GK110 allows for 32 simultaneous, hardware managed connections. This enables a new thread to immediately execute when the current thread has to pause while waiting for data to be available.
Dynamic Parallelism enables the GPU to create it's own work, synchronize results between threads, and to control the scheduling of work, without the need for the CPU to be involved. This frees up the CPU for other tasks, as well as allowing less highly optimized code to run more efficiently using only the GPU.
A Grid Management Unit manages and prioritizes the execution of grids and queues on the GPU. This provides the means for suspended and queued threads to be restarted when they are ready, in much the same way that tasks are queued and prioritized in a multi-tasking operating system. This enables the previously mentioned Dynamic Parallelism to operate, as well as offering other runtime benefits.
NVidia GPUDirect, another useful feature for HPC applications introduced with GK110, makes it possible for separate GPUs in a system or across a network to communicate and share data without having to use the CPU or system memory. It also allows some other hardware to communicate directly with the GPU, significantly reducing latency involved in sending data and messages to/from GPU memory.
The Kepler GK110 core will be used in the TESLA GK20 card. A GK10 TESLA card using 2 x GK104 cores will also be produced. The speeds and other technical data are not yet known for either of these cards, although NVidia say the GK10 should offer a theoretical maximum of 2.29 TFLOPS of single-precision compute power, but only 95 GFLOPS double-precision.
TESLA cards built around the GK20 core are expected in October or November 2012.
NVIDIA GK110 architecture whitepaper (PDF file)
Related News (newer articles):
Jan 27, 2013: NVidia GK110 Rumoured To Be Coming To Desktop PCs
Dec 11, 2012: NVidia Unveils Tesla K20 Graphics Series
Jun 15, 2012: NVIDIA Tesla GPUs are available in Fujitsu servers