CUDA Keywords

现在的位置: 首页 > 综合 > 正文

2013年09月09日 ⁄ 综合 ⁄ 共 1352字 ⁄ 字号小中大 ⁄ 评论关闭

This article is only a memo for I got a general idea about CUDA.
(Refer: http://www.pcinlife.com/article/graphics/2008-06-04/1212575164d532.html)

CUDA is the GPGPU model of nVidia.
Shader unit:
multiprocessors -- 8 stream processors -- 8192 Registers, 16KB share memory, texture cache, constant cache
info: cudaGetDeviceProperties(), cuDeviceGetProperties()

Each stream processors: FMA (fused-multiply-add) Unit, executes add or multiply function

Wrap(32 threads) = 2 * half-wrap(16 threads)

Pros: higher memory bandwidth, More execution units, cheaper...
Cons: No use for the task only can be sequential executed, only support 32bit float now, not good in branch program, different standard between nVidia and AMD/ATI

CPU:Host, GPU:Device

grid -- block -- thread
grid:   (individual) global memory, constant memory, texture memory
thread: (individual) register, local memory
        (in block)   shared memory
        (out block) global memory, constant memory, texture memory

Shared memory(16KB in each multiprocessor): divided into banks(16 banks), each bank is 4 bytes
Global memroy: coalesced
Texture: texture filtering

Differences with CPU: Latency, Branch code

CUDA Tookit: http://www.nvidia.com/object/cuda_get.html

Compile: nvcc

Two different API: Runtime API(easier), Driver API

cudaMalloc, cudaMemcpy -- malloc, memcpy
cudaMemcpyHostToDevice
cudaMemcpyDeviceToHost
__global__ , __shared__ , __syncthreads()
bank conflict

clock: timestamp

Arithmetic Unit in Stream processor: a float's fused multiply-add unit

cudaMallocHost: page locked

【上篇】小记：不明原因的解决了ORACLE慢的问题
【下篇】chrome如何开启硬件加速？

作者: profess

该日志由 profess 于11年前发表在综合分类下，最后更新于 2013年09月09日.
转载请注明: CUDA Keywords | 学步园 +复制链接

抱歉!评论已关闭.

学步园

CUDA Keywords

作者: profess

书签

最新文章New

本站推荐

返回首页