CUDA学习笔记

现在的位置: 首页 > 综合 > 正文

2013年11月25日 ⁄ 综合 ⁄ 共 479字 ⁄ 字号小中大 ⁄ 评论关闭

1. About page-locked host memory / pinned memory:

(1) Restrict their use to memory that will be used as a source/destination in calls to cudaMemcpy() and freeing
them when they are no longer needed.

(2) When we use cudaMemcpyAsync(), we need to use page locked host memory.

2. About streams:

(1) Nvidia's GPU has two separate engines handling memory copies and kernel executions:Copy Engine & Kernel
Engine

Figure 1 : not efficient

Figure2 : efficient

Trick: queue operations in all streams in a breadth-first order instead of depth-first order

To be continued...

抱歉!评论已关闭.

返回首页

（其他合作也可洽谈）