Understanding the CUDA memory model and utilizing it effectively is often key in achieving high performance from your NVIDIA GPU. The shuffle instruction, available on Kepler devices (compute 3.0 and newer), is a new tool that programmers can add to their bags
of tricks to further optimize memory performance.
Figure 1 – CUDA Memory Model
The CUDA memory model, illustrated in Figure 1, consists of several different memory regions that are on and off chip, with varying scopes, latencie......
阅读全文