现在的位置: 首页 > 综合 > 正文

CUDA学习笔记

2013年11月25日 ⁄ 综合 ⁄ 共 479字 ⁄ 字号 评论关闭

1. About page-locked host memory / pinned memory:

(1) Restrict their use to memory that will be used as a source/destination in calls to cudaMemcpy() and freeing
them when they are no longer needed.

(2) When we use cudaMemcpyAsync(), we need to use page locked host memory.

2. About streams:

(1)  Nvidia's GPU has two separate engines handling memory copies and kernel executions:Copy Engine & Kernel
Engine

     

Figure 1 : not efficient 


Figure2 : efficient 


Trick: queue operations in all streams in a breadth-first order instead of depth-first order

To be continued...

抱歉!评论已关闭.