GPU Introduction
GPU Introduction
P. Bakowski
P.Bakowski
P.Bakowski
P.Bakowski
P.Bakowski
P.Bakowski
P.Bakowski
P.Bakowski
10
Global Memory
P.Bakowski 11
8 * 64-bit channels
P.Bakowski 12
P.Bakowski
13
P.Bakowski
14
P.Bakowski
15
The GPUs use SIMT operational mode; single instruction is executed by multiple threads. SIMT processing does not require the transformation of the data into vectors. It allows for arbitrary branches in the threads.
SIMT
P.Bakowski
16
calculations
high
low
memory access
P.Bakowski
17
GPUs : performance
P.Bakowski
18
P.Bakowski
19
P.Bakowski
20
P.Bakowski
21
P.Bakowski
22
P.Bakowski
23
P.Bakowski
24
P.Bakowski
25
P.Bakowski
26
P.Bakowski
27
P.Bakowski
28
P.Bakowski
29
The CUDA C code is compiled with nvcc, that is a script activating other programs: cudacc, g++ , cl , etc.
P.Bakowski 30
P.Bakowski
31
P.Bakowski
32
CUDA : advantages
Main CUDA advantage for GPGPU computing results from the new GPU architecture designed for the efficient implementation of non-graphic calculations and the use of C programming language. There is no need to convert the algorithms into pipelined format required for graphic calculations. The GPGPU does not use the graphic API and the corresponding drivers
P.Bakowski
33
CUDA : advantages
CUDA provides: the access to 16 KB of memory per SM; this access is shared by the SM threads an efficient transfer of data between the system and video memory (global GPU memory) a memory with linear addressing scheme and with random access to any memory location hardware implemented operations for FP, integers and bits
P.Bakowski
34
CUDA : limitations
Limitations: no recursive functions (no stack) processing block of minimum 32 threads (warp) CUDA is a proprietary architecture of nVIDIA
P.Bakowski
35
P.Bakowski
36
P.Bakowski
37
P.Bakowski
38
P.Bakowski
39
P.Bakowski
40
P.Bakowski
41
P.Bakowski
42
P.Bakowski
43
P.Bakowski
44
P.Bakowski
45
P.Bakowski
47
P.Bakowski
48
P.Bakowski
49
10 threads
++++++++++ no loop but several threads each thread with an index threadIdx.x
P.Bakowski 50
P.Bakowski
51
Summary
Evolution of multiprocessing CPUs and GPUs SIMD and SIMT processing modes Performances of GPUs NVIDIA and CUDA CUDA processing model CUDA memory model a simple example
P.Bakowski
52