4 - W4-Intro To Parallel and Distributed Computing

1
CS443
Introduction to Parallel and Distributed Computing
(WEEK 4)
LECTURE - 7 & 8
GPU architecture and programming
Lecturer
Muhammad Sheraz Tariq
Sheraz.t@scocs.edu.pk
2
Course Grading Policy
 Assignment = 10% (Research review report)
 Quizzes = 20%
 Mid-Term = 30%
 Final Exam = 40%
2
3
Course Outline
 Week 1: Asynchronous/synchronous computation/communication,

 Week 2: Concurrency control
 Week 3: Fault tolerance,
 Week 4: GPU architecture and programming concepts
 Week 5: Heterogeneity,
 Week 6: Interconnection topologies, load balancing,
 Week 7: Memory consistency model, memory hierarchies,
 Week 8: Message passing interface (MPI), MIMD/SIMD, multithreaded programming,
 Week 9: Mid Term – October 31, 2022
 Week 10: Parallel algorithms & architectures, parallel I/O,
4
Course Outline
 Week 11: Performance analysis and tuning,

 Week 12: Power issues in parallel and distributed environment,
 Week 13: programming models (data parallel, task parallel, process-centric, shared/distributed memory),
 Week 14: scalability and performance studies
 Week 15: scheduling, storage systems, synchronization,
 Week 16: tools (Cuda, Swift, Globus, Condor, Amazon AWS, OpenStack, Cilk, gdb, threads, MPICH,
OpenMP, Hadoop, FUSE).
5
Learning Objectives
1. Identify the key characteristics that distinguish a CPU from other kinds of GPU’s.
2. Understand How is a GPU different from a CPU.
3. Discuss how the way in which The Graphics Pipeline are structured, with parallel
programming concepts.
4. Discuss some of the issues which arise in managing the shader processor
distributed tasks.
6
Remaining part of lecture 6
 Fault Tolerance – Failure Models

7
Fault Tolerance – Failure Models
 A system that fails is not adequately providing the services it was designed for.
 If we consider a distributed system as a collection of servers that communicate with one another
and with their clients, not adequately providing services means that servers, communication
channels, or possibly both, are not doing what they are supposed to do.
 However, a malfunctioning server itself may not always be the fault we are looking for.
 If such a server depends on other servers to adequately provide its services, the cause of an
error may need to be searched for somewhere else.
8
 To get a better grasp on how serious a failure actually is, several classification schemes have
been developed.
 One such scheme is shown in Table-1, and is based on schemes described in Cristian (1991) and
Hadzilacos and Toueg (1993).
9
Table-1
10
Fault Tolerance- crash failure
 A crash failure occurs when a server prematurely halts, but was working correctly until it stopped.
 An important aspect of crash failures is that once the server has halted, nothing is heard from it
anymore.
 A typical example of a crash failure is an operating system that comes to a unending halt, and for
which there is only one solution: reboot it.
 Many personal computer systems suffer from crash failures so often that people have come to expect
them to be normal.
 So, moving the reset button from the back of a cabinet to the front was done for good reason
11
Fault Tolerance- omission failure
 An omission failure occurs when a server fails to respond to a request.

 Several things might go wrong.
 In the case of a receive omission failure, possibly the server never got the request in the first place.
 Note that it may well be the case that the connection between a client and a server has been correctly
established, but that there was no thread listening to incoming requests.
 Also, a receive omission failure will generally not affect the current state of the server, as the server is
unaware of any message sent to it.
12
Fault Tolerance- omission failure
 Other types of omission failures not related to communication may be caused by software errors such
as infinite loops or improper memory management by which the server is said to "hang."
13
Fault Tolerance- timing failure
 Another class of failures is related to timing.
 Timing failures occur when the response lies outside a specified real-time interval.
 As we discussed earlier, providing data too soon may easily cause trouble for a
 recipient if there is not enough buffer space to hold all the incoming data.
 More common, however, is that a server responds too late, in which case a performance failure is said
to occur.
14
Fault Tolerance- response failure
 A serious type of failure is a response failure, by which the server's response is simply incorrect.
 Two kinds of response failures may happen.
 In the case of a value failure, a server simply provides the wrong reply to a request.
 For example, a search engine that systematically returns Web pages not related to any of the search
terms used has failed.
15
Fault Tolerance- response failure
 The other type of response failure is known as a state transition failure.
 This kind of failure happens when the server reacts unexpectedly to an incoming
 request.
 For example, if a server receives a message it cannot recognize, a state transition failure happens if no
measures have been taken to handle such messages.
 In particular, a faulty server may incorrectly take default actions it should never have initiated.
16
Fault Tolerance- arbitrary failure
 The most serious are arbitrary failures, also known as Byzantine failures.
 In effect, when arbitrary failures occur, clients should be prepared for the worst.
 In particular, it may happen that a server is producing output it should never have produced, but
which cannot be detected as being incorrect Worse yet a faulty server may even be maliciously
working together with other servers to produce intentionally wrong answers.
 This situation illustrates why security is also considered an important requirement when talking about
dependable systems.
17
 The most serious are arbitrary failures, also known as Byzantine failures.
 In effect, when arbitrary failures occur, clients should be prepared for the worst.
 In particular, it may happen that a server is producing output it should never have produced, but
which cannot be detected as being incorrect Worse yet a faulty server may even be maliciously
working together with other servers to produce intentionally wrong answers.
 This situation illustrates why security is also considered an important requirement when talking about
dependable systems.
18
 Arbitrary failures are closely related to crash failures.
 The definition of crash failures as presented above is the most benign way for a server to halt.
 They are also referred to as fail-stop failures. In effect, a fail-stop server will simply stop producing
output in such a way that its halting can be detected by other processes.
 In the best case, the server may have been so friendly to announce it is about to crash; otherwise it
simply stops.
19
 Finally, there are also occasions in which the server is producing random output, but this output can
be recognized by other processes as plain junk.
 The server is then exhibiting arbitrary failures, but in a benign way. These faults are also referred to
as being fail-safe.
20
GPU computing
 GPU computing is the use of a GPU (graphics processing unit) as a co-processor to accelerate CPUs
for general-purpose scientific and engineering computing.
 The GPU accelerates applications running on the CPU by free from some of the compute-intensive and
time consuming portions of the code.
 The rest of the application still runs on the CPU.
 From a user's perspective, the application runs faster because it's using the massively parallel
processing power of the GPU to boost performance.
 This is known as "heterogeneous" or "hybrid" computing.
21
GPU computing
 A CPU consists of four to eight CPU cores, while the GPU consists of hundreds of smaller cores.
 Together, they operate to interfere through the data in the application.
 This massively parallel architecture is what gives the GPU its high compute performance.
 There are a number of GPU-accelerated applications that provide an easy way to access high-
performance computing (HPC).
22
How is a GPU different from a CPU?
Extremely parallel
 Different pixels and elements of the image can be operated on independently
 Hundreds of cores executing at the same time to take advantage of this fundamental parallelism
 Inputs and Outputs

 Inputs to GPU (from the CPU/memory):
 Vertices (3D coordinates) of objects
 Texture data
 Lighting data
23
How is a GPU different from a CPU?
Outputs from GPU:

 Frame buffer
 Placed in a specific section of graphics memory
 Contains RGB values for each pixel on the screen
 Data is sent directly to display
24
The Graphics Pipeline
 The GPU completes

every stage of this
computational Pipeline.
25
The Graphics Pipeline
26
Fixed-Function to Programmable
 Earlier GPUs were fixed-function hardware pipelines
 Software developers could set parameters (textures, light reflection colors, blend modes) but the function was
completely controlled by the hardware.
 In newer GPUs, portions of the pipeline are completely programmable
 Pipeline stages are now programs running on processor cores inside the GPU, instead of fixed-function ASICs
 Vertex shaders = programs running on vertex processors, fragment shaders = programs running on fragment
processors
 However, some stages are still fixed function (e.g. rasterization)

27
Shader processors
 These are vertex processors
 Single-precision floating-point format

(sometimes called FP32 or float32) is a
computer number format, usually occupying 32
bits in computer memory;
 it represents a wide dynamic range of numeric

values by using a floating radix point.
28
Single Instruction, Multiple Data (SIMD)
 Shader processors are generally SIMD

 A single instruction executed on every vertex or pixel
29
Optimizations
 Combining different types of shader cores into a single unified shader core
 Dynamic task scheduling to balance the load on all cores
 Frames with many “edges” (vertices) require more vertex shaders
 Frames with large primitives require more pixel shaders
30
Solution: Unified Shader
 Pixel shaders, geometry shaders, and vertex shaders run on the same core - a unified shader core
 Unified shaders limit idle shader cores
 Instruction set shared across all shader types
 Program determines type of shader
 Modern GPUs all use unified shader cores
 Shader cores are programmed using graphics APIs like OpenGL and Direct3D
31
Static Task Distribution
 Unequal task distribution leads to inefficient hardware usage

 Parallel processors should handle tasks of equal complexity
32
Dynamic Task Distribution
 Tasks are dynamically distributed among pixel shaders

 Slots for output are pre-allocated in output FIFO
33
References
 1. Distributed Systems: Principles and Paradigms, A. S. Tanenbaum and M. V. Steen, Prentice Hall,
2nd Edition, 2007
 2. Distributed and Cloud Computing: Clusters, Grids, Clouds, and the Future Internet, K Hwang, J
Dongarra and GC. C. Fox, Elsevier, 1st Ed.

4 - W4-Intro To Parallel and Distributed Computing

Uploaded by

4 - W4-Intro To Parallel and Distributed Computing

Uploaded by

1

 Assignment = 10% (Research review report)

 Final Exam = 40%

 Week 1: Asynchronous/synchronous computation/communication,

 Week 11: Performance analysis and tuning,

 Fault Tolerance – Failure Models

 An omission failure occurs when a server fails to respond to a request.

 Another class of failures is related to timing.

 Two kinds of response failures may happen.

 The other type of response failure is known as a state transition failure.

 Arbitrary failures are closely related to crash failures.

 Together, they operate to interfere through the data in the application.

 Inputs and Outputs

Outputs from GPU:

 The GPU completes

 Earlier GPUs were fixed-function hardware pipelines

 In newer GPUs, portions of the pipeline are completely programmable

 However, some stages are still fixed function (e.g. rasterization)

 These are vertex processors

 Single-precision floating-point format

 it represents a wide dynamic range of numeric

 Shader processors are generally SIMD

 Unified shaders limit idle shader cores

 Instruction set shared across all shader types

 Program determines type of shader

 Modern GPUs all use unified shader cores

 Unequal task distribution leads to inefficient hardware usage

 Tasks are dynamically distributed among pixel shaders

You might also like