Lecture 4
Lecture 4
Computing
CS-401
IBM BG/Q Compute Chip with 18 cores (PU) and 16 L2 Cache units (L2)
Networks connect multiple stand-alone computers (nodes) to make larger parallel computer clusters.
Network connections
Ø For example, the schematic below shows a typical LLNL parallel computer cluster:
• Each compute node is a multi-processor parallel computer in itself
• Multiple compute nodes are networked together with an Infiniband network
• Special purpose nodes, also multi-processor, are used for other purposes
Benefits of Parallel Programming
Ø SAVE TIME AND/OR MONEY
• In theory, throwing more resources at a task will shorten its time to completion, with potential cost savings.
• Parallel computers can be built from cheap, commodity components.
Collaborative networks
Ø TAKE ADVANTAGE OF NON-LOCAL RESOURCES
• Using compute resources on a wide area network, or even the Internet when local compute resources are scarce
or insufficient.
• Example: SETI@home (setiathome.berkeley.edu) has over 1.7 million users in nearly every country in the world
(May, 2018).
Ø Comprised of four main components: Memory, Control Unit, Arithmetic Logic Unit, Input/Output
Ø Read/write, random access memory is used to store both program instructions and data
Ø Program instructions are coded data which tell the computer to do something
Ø Control unit fetches instructions/data from memory, decodes the instructions and then sequentially coordinates operations to
accomplish the programmed task.
Ø Arithmetic Unit performs basic arithmetic operations
Ø Input/Output is the interface to the human operator
Flynn's Classical Taxonomy
Ø There are a number of different ways to classify parallel computers. Examples are available in the references.
Ø One of the more widely used classifications, in use since 1966, is called Flynn's Taxonomy.
Ø Flynn's taxonomy distinguishes multi-processor computer architectures according to how they can be classified
along the two independent dimensions of Instruction Stream and Data Stream. Each of these dimensions can
have only one of two possible states: Single or Multiple.
Ø The matrix below defines the 4 possible classifications according to Flynn:
Single Instruction, Single Data (SISD)
Ø A serial (non-parallel) computer
Ø Single Instruction: Only one instruction stream is being acted on by the CPU during any one clock cycle
Ø Single Data: Only one data stream is being used as input during any one clock cycle
Ø Deterministic execution
Ø This is the oldest type of computer
Ø Examples: older generation mainframes, minicomputers, workstations and single processor/core PCs.
Single Instruction, Multiple Data (SIMD)
Nodes in a supercomputer
Ø Task
A logically discrete section of computational work. A task is typically a program or program-like set of instructions
that is executed by a processor. A parallel program consists of multiple tasks running on multiple processors.
Ø Pipelining
Breaking a task into steps performed by different processor units, with inputs streaming through, much like an
assembly line; a type of parallel computing.
Ø Shared Memory
Describes a computer architecture where all processors have direct access to common physical memory. In a
programming sense, it describes a model where parallel tasks all have the same "picture" of memory and can directly
address and access the same logical memory locations regardless of where the physical memory actually exists.
Ø Parallel Overhead
Required execution time that is unique to parallel tasks, as opposed to that for doing useful work. Parallel overhead can
include factors such as:
•Task start-up time
•Synchronizations
•Data communications
•Software overhead imposed by parallel languages, libraries, operating system, etc.
•Task termination time
Ø Massively Parallel
Refers to the hardware that comprises a given parallel system - having many processing elements. The meaning of "many"
keeps increasing, but currently, the largest parallel computers are comprised of processing elements numbering in the
hundreds of thousands to millions.
Ø Embarrassingly (IDEALY) Parallel
Solving many similar, but independent tasks simultaneously; little to no need for coordination between the tasks.
Ø Scalability
Refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate increase in parallel
speedup with the addition of more resources. Factors that contribute to scalability include:
• Hardware - particularly memory-cpu bandwidths and network communication properties
• Application algorithm
• Parallel overhead related
• Characteristics of your specific application
Types of Parallelism:
1.Bit-level parallelism –
It is the form of parallel computing which is based on the increasing processor’s size. It reduces the number of
instructions that the system must execute in order to perform a task on large-sized data.
Example: Consider a scenario where an 8-bit processor must compute the sum of two 16-bit integers. It must first
sum up the 8 lower-order bits, then add the 8 higher-order bits, thus requiring two instructions to perform the
operation. A 16-bit processor can perform the operation with just one instruction.
2. Instruction-level parallelism –
A processor can only address less than one instruction for each clock cycle phase. These instructions can be re-
ordered and grouped which are later on executed concurrently without affecting the result of the program. This is
called instruction-level parallelism.
3. Task Parallelism –
Task parallelism employs the decomposition of a task into subtasks and then allocating each of the subtasks for
execution. The processors perform the execution of sub-tasks concurrently.