0% found this document useful (0 votes)

15 views

Different Parallel Processing Architecture

The document discusses different parallel processing architectures, including associative memory processors, multithreaded architectures, and data flow computers. Associative memory processors are array processors designed with associative memories, where all processing elements are controlled by a central control unit. Multithreaded architectures allow concurrent execution of multiple threads by designing processors to handle multiple contexts through context switching. Data flow computers are an alternative architecture based on data-driven computation rather than control flow, executing instructions when their operands are available rather than in a predetermined sequence.

Uploaded by

sanzog rai

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Different Parallel Processing Architecture

Uploaded by

sanzog rai

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Chapter- 4

Different parallel processing

architectures

1
content
Introduction to associative memory
processors
Multithreaded architectures–principles of
multithreading
• Latency hiding
• Scalable coherent multiprocessor model with
distributed shared memory

2
Introduction
• Computers are basically designed for execution of instructions, which are
stored as programs in the memory.

• These instructions are executed sequentially and hence are slow as the
next instruction can be executed only after the output of pervious
instruction has been obtained.

• As discussed earlier to improve the speed and throughput the concept of

parallel processing was introduced.

• To execute more than one instruction simultaneously one has to identify

the independent instruction which can be passed to separate processors.

3
Introduction
• The parallelism in multiprocessor can be
mainly implemented on principle in three
ways:

I. Instruction Level Parallelism

II. Data Level Parallelism
III. Thread Level Parallelism

4
Instruction-Level Parallelism (ILP)
• The potential of overlap among instructions is
called instruction-level parallelism (ILP) since
the instructions can be evaluated in parallel.
• Instruction level parallelism is obtained
primarily in two ways in uniprocessors:
through pipelining and through keeping
multiple functional units busy executing
multiple instructions at the same time.

5
Data Level Parallelism
• The simplest and most common way to
increase the amount of parallelism available
among instructions is to exploit parallelism
among iterations of a loop.
• This type of parallelism is often called loop-
level parallelism as an example of it is vector
processor.

6
Thread Level Parallelism

• Thread  Light weight Process

• Thread level parallelism (TLP) is the act of running multiple flows of

execution of a single process simultaneously.

• TLP is most often found in applications that need to run independent,

unrelated tasks (such as computing, memory accesses, and IO) simultaneously.

• These types of applications are often found on machines that have a high
workload, such as web servers.

• TLP is a popular ground for current research due to the rising popularity of
multi-core and multiprocessor systems, which allow for different threads to
truly execute in parallel.

7
Introduction to Associative Memory Processors

• SIMD computers appear in two basic architectural

organizations:
 Array processors
 Associative memory processors

Associative memory processors - Array processors

designed with associative memories are called as
associative processors.
8
Associative memory processors

9
Associative memory processors
• All the PEs control under the CU and each PEs is essentially
associated with ALU having local memory.
• The CU has its own main memory for storage of programs and
the OS and user program executed under it.
• Scalar and control type instruction executed by CU where
vector instruction through PEs to achieve parallelism by
duplicated arithmetic unit.
• An associative processor is normally interfaced to a host
computer through control unit. The host computer is a
general purpose machine which servers as a coordinator of
the entire system, consisting of the associative processor and
also handle the resource management and peripheral and I/O
supervision. 10
Principles of Multithreading
• In the multithreaded execution model, a
program is a collection of partially ordered
threads, and a thread consists of a sequence of
instructions which are executed in the
conventional von Neumann model.
• Multithreading is the process of executing
multiple threads concurrently on a processor.

11
Principles of multithreading
• Multithreading demands that the processor be
designed to handle multiple contexts
simultaneously on a context switching basis

12
Multithreaded
computation model
• Let us consider the system where memories are distributed to form
global address space. The machine parameter on which machine is
analyzed are
a. The latency (L) :this include network delay, cache miss penalty, and
delay.
b. The number of thread: the number of thread that can be interleaved
in each processor. A thread is represented by a context consisting a
program counter, register set and required context status word.
c. The context switching overhead: this refer to cycle lost in
performing context switching in processor. This depends on the
switching mechanism and the amount of processor state devoted to
maintaining the active thread.
d. The interval between switches: this refer to cycle between switches
triggered by remote reference. 13
Multiple context processor
• Multithreaded systems are constructed with multiple
context processors.
• For example the Horizon & Tera the compiler
detects such data dependencies and the hardware
enforces it by switching to another context if
dependency is being detected.
• This is implemented by inserting into each instruction
a field which indicates its minimum number of
independent successors over all possible control flows.

14
Context switching policies.
• Switching from one thread to another is
performed according to one of the following
policies :
 Switching on every instruction: the processor
switches from one thread to another every cycle.
 Switching on block of instructions: blocks of
instructions from different threads are
interleaved.

15
Context switching policies.
 Switching on every load: whenever a thread
encounters a load instruction, the processor
switches to another thread after that load
instruction is issued. The context switch is
irrespective of whether the data is local or
remote.
 Switch on cache miss: This policy correspond
the case where a context is preempted when it
causes a cache miss.
16
Data flow computers
• Data flow machines is an alternative of designing a
computer that can store program systems. The aim
of designing parallel architecture is to get high
performing machines.
• The designing of new computer is based on
following three principles:
 To achieve high performance
 To match technological progress
 To offer better programmability in application areas

17
Data flow computers
• Before we study in detail about these data flow computers lets
revise the drawbacks of processors based on pipeline architecture.
The major hazards are
 Structural hazards
 Data hazards due to
• true dependences which happens in case of WAR or
• false dependences also called name dependencies : anti and
output dependences (RAW or WAW)
 Control hazards
If data dependency can be removed the performance of the system
will definitely improve.

18
Data flow computers
• Data flow computers are based on the
principle of data driven computation which is
very much different from the von Neumann
architecture which is basically based on the
control flow while where the data flow
architecture is designed on availability of data
• hence also called data driven computers

19
Types of flow computer
• As a designers perspective there are various possible
ways in which one can design a system depending on
the way we execute the instructions. Two possible
ways are-:
• Control flow computers : The next instruction is
executed when the last instruction as stored in the
program has been executed.
• Data flow computers An instruction executed when
the data (operands) required for executing that
instruction is available.
20
Data driven computing and languages
• In order to understand how Dataflow is different from Control-
Flow, lets see the working of von Neumann architecture which
is based on the control flow computing model.
• Here each program is sequence of instructions which are
stored in memory.
• These series of addressable instructions store the information
about the operation along with the information about the
memory locations that store the operand or in case of interrupt
or some function call it store the address of the location where
control has to transferred or in case of conditional transfer it
specifies the status bits to be checked and location where the
control has to transferred.

21
Data driven computing and languages

The key features of control flow model are

• Data is passed between instructions via
reference to shared memory cells
• Flow of control is implicitly sequential but
special control operators can be used for
explicit parallelism
• Program counter are used to sequence the
execution of instruction in centralized control
environment.
22
Data driven computing and languages
• However the data driven model accept the
execution of any instruction only on availability
of the operand.
• Data flow programs are represented by directed
graphs which show the flow of data between
instructions.
• Each instruction consists of an operator, one or
two operands and one or more destinations to
which the result is to be transferred.
23
Data driven computing and languages
The key features of data driven model are as follows:
• Intermediate results as well as final result are passed directly as
data token between instruction.
• There is no concept of shared data storage as used in traditional
computers
• In contrast to control driven computers where the program has
complete control over the instruction sequencing here the data
driven computer the program sequencing is constrained only by
data dependency among the instructions.
• Instructions are examined to check the operand availability and if
functional unit and operand both are available the instruction is
immediately executed.

24
Data Flow Computer architecture
• The Pure dataflow computers are further
classified as the :
• static
• dynamic

25
Data Flow Computer architecture
• The basic principle of any Dataflow computer is data
driven and hence it executes a program by receiving,
processing and sending out token.

• These token consist of some data and a tag. These tags

are used for representing all types of dependences
between instructions.

• Thus dependencies are handled by translating them

into tag matching and tag transformation.
26
Data Flow Computer architecture
• The processing unit is composed of two parts,
matching unit that is used for matching the
tokens and execution unit used for actual
implementation of instruction.
• When the processing element gets a token the
matching unit perform the matching operation
and when a set of matched tokens the
processing begins by execution unit

27
Data Flow Computer architecture
• The type of operation to be performed by the
instruction has to be fetched from the
instruction store which is stored as the tag
information. This information contains details
about-:
• what operation has to be performed on the
data
• how to transform the tags.

28
Data Flow Computer architecture
• There are variety of static, dynamic and also
hybrid dataflow computing models.
• In static model, there is possibility to place only
one token on the edge at the same time.
• When firing an actor, no token is allowed on the
output edge of an actor.
• Control tokens must be used to acknowledge the
proper timing in the transferring data token from
one node to another.
29
Data Flow Computer architecture
• Dynamic model of dataflow computer architecture
allows placing of more than one token on the edge
at the same time.
• To allow implementation of this feature of the
architecture, the concept of tagging of tokens was
used. Each token is tagged and the tag identifies
conceptual position of token in the token flow i.e.,
the label attached in each tag uniquely identify the
context in which particular token is used.

30
Data Flow Computer architecture
• Static and dynamic data flow architecture have a
pipelined ring structure with ring having four resource
sections
 The memories used for storing the instruction
 The processors unit that form the task force for parallel
execution of enabled instruction
 The routing network is used to pass the result data
token to their destined instruction
 The input output unit serves as an interface between
data flow computer and outside world.
31
Static Data Flow architecture
• Data flow graph used in the Dennis machine must follow
the static execution rule that only one token is allowed to
exist on any arc at any given time,
• otherwise successive sets of tokens cannot be
distinguished thus instead of FIFO design of string token at
arc is replace by simple design where the arc can hold at
most one data token.
• This is called static because here tokens are not labeled and
control token are used for acknowledgement purpose so
that proper timing in the transferring data tokens from
node to node can take place.

32
Static Data Flow architecture
• Here the complete program is loaded into memory before
execution begins.
• Same storage space is used for storing both the instructions
as well as data.
• In order to implement this, acknowledge arcs are implicitly
added to the dataflow graph that go in the opposite
direction to each existing arc and carry an acknowledgment
token
• Some example of static data flow computers are MIT Static
Dataflow, DDM1 UtahData Driven, LAU System, TI
Distributed Data Processor, NEC Image Pipelined Processor.

33
Dynamic Dataflow Architecture
• In Dynamic machine data tokens are tagged
( labeled or colored) to allow multiple tokens to
appear simultaneously on any input arc of an
operator.
• No control tokens are needed to acknowledge
the transfer of data tokens among the
instructions.

34
Dynamic Dataflow Architecture
• The tagging is achieve by attaching a label with each token
which uniquely identifies the context of that particular token.
• This dynamically tagged data flow model suggests that
maximum parallelism is exploited from the program graph.
• While this is the conceptual view of the tagged token model,
in reality only one copy of the graph is kept in memory and
tags are used to distinguish between tokens that belong to
each invocation.
• A general format for instruction has opcode, the number of
constants stored in instruction and number of destination for
the result token.

35
Dynamic Dataflow Architecture
• Each destination is identified by four fields namely the
destination address, the input port at the destination instruction,
number of token needed to enable the destination and the
assignment function used in selecting processing element for the
execution of destination instruction.
• The dynamic architecture has following characteristic different
from static architecture.
• Here Program nodes can be instantiated at run time unlike in
static architecture where it is loaded in the beginning.
• Also in dynamic architecture several instances of an data packet
are enabled and also separate storage space is used for
instructions and data

36
Dynamic Dataflow Architecture
• The dynamic architecture requires storage space for the
unmatched tokens.
• First in first out token queue for storing the tokens is not
suitable.
• A tag contains a unique subgraph invocation ID, as well as an
iteration ID if the subgraph is a loop.
• These pieces of information, taken together, are commonly
known as the color of the token
• However no acknowledgement mechanism is required. The
term “coloring” is used for the token labeling operations and
tokens with the same color belong together.

37
Dynamic Data Flow architecture

38
CONTD..
• In dynamic machines, data tokens are tagged (labeled
or colored) to allow multiple tokens to appear
simultaneously on any input are of an operator node.
• No control tokens are needed to acknowledge the
transfer of data tokens among instructions.
• Instead, the matching of token tags (labes or colors) is
performed to merge them for instructions requiring
more than one operand token.
• Therefore, additional hardware is needed to attach
tags onto data tokens and to perform tag matching
39
LATENCY HIDING
• Massively parallel and scalable systems may typically use
distributed shared memory.
• The access of remote memory significantly increases
memory latency. Furthermore, the processor speed has
been increasing at a much faster rate than memory speeds.
• Thus any scalable multiprocessor or large-scale
multicomputer must rely on the use of latency-reducing, -
tolerating, or —hiding mechanisms.
• Four latency-hiding mechanisms are studied below
enhancing scalability and programmability.

40
LATENCY HIDING
Latency hiding can be accomplished through four
complementary approaches:
i. using prefetching technique which bring instructions or data
close to the processor before they are actually needed;
ii. Using cached coherent supported by hardware to reduce
cache misses;
iii. using relaxed memory consistency models by allowing
buffering and pipelining of memory references; and –
iv. using multiple context support to allow a processor to
switch from one contest to another when a long-latency
operation is encountered.

ACA Unit 4
No ratings yet
ACA Unit 4
41 pages
Coa Chapter 5
No ratings yet
Coa Chapter 5
96 pages
Unit 5
No ratings yet
Unit 5
96 pages
Chapter 3 Processes
No ratings yet
Chapter 3 Processes
42 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
Mu Os Fall 2020 ch01 Introduction
No ratings yet
Mu Os Fall 2020 ch01 Introduction
54 pages
03 - Lecture #3
No ratings yet
03 - Lecture #3
32 pages
Unit V
No ratings yet
Unit V
95 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
No ratings yet
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
80 pages
OS-PROCESS MANAGEMENT module -2.2
No ratings yet
OS-PROCESS MANAGEMENT module -2.2
89 pages
Module 5
No ratings yet
Module 5
45 pages
DSP Unit 6
No ratings yet
DSP Unit 6
23 pages
Unit 5
No ratings yet
Unit 5
44 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
Final Unit5 CO Notes
No ratings yet
Final Unit5 CO Notes
7 pages
COA - Unit 4
No ratings yet
COA - Unit 4
84 pages
Organization CH 2
No ratings yet
Organization CH 2
102 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Processes
No ratings yet
Processes
47 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Unit-7 Design Issues For Parallel Computers Definition
No ratings yet
Unit-7 Design Issues For Parallel Computers Definition
11 pages
Micro 3
No ratings yet
Micro 3
4 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
MARIE: An Introduction To A Simple Computer: Chapter 4 Objectives
No ratings yet
MARIE: An Introduction To A Simple Computer: Chapter 4 Objectives
39 pages
Parallelism
No ratings yet
Parallelism
22 pages
Module 4 - Parallel & Pipeline Processing - Final
No ratings yet
Module 4 - Parallel & Pipeline Processing - Final
31 pages
ITT 05101 - Operating System
No ratings yet
ITT 05101 - Operating System
116 pages
Chapter 3, Process
No ratings yet
Chapter 3, Process
24 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
Ch04 New
No ratings yet
Ch04 New
62 pages
CMP 304 Process
No ratings yet
CMP 304 Process
12 pages
OS Unit - II
No ratings yet
OS Unit - II
74 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
5 Marks Q. Describe Array Processor Architecture
No ratings yet
5 Marks Q. Describe Array Processor Architecture
11 pages
08 Parallel algorithms approches
No ratings yet
08 Parallel algorithms approches
12 pages
Unit-6
No ratings yet
Unit-6
15 pages
Parallelism in Uni-Processor Systems and Intel 8089 Iop
No ratings yet
Parallelism in Uni-Processor Systems and Intel 8089 Iop
117 pages
Lecture 8 Miscellaneous Topics
No ratings yet
Lecture 8 Miscellaneous Topics
52 pages
Module 2
No ratings yet
Module 2
127 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
CSE211 Computer Architecturemodule 18-21
No ratings yet
CSE211 Computer Architecturemodule 18-21
19 pages
L2 Computer Architecture (1)_075755
No ratings yet
L2 Computer Architecture (1)_075755
12 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Lecture 02 AV-323 Computer System Architecture
No ratings yet
Lecture 02 AV-323 Computer System Architecture
16 pages
Darshan Institute of Engineering & Technology
No ratings yet
Darshan Institute of Engineering & Technology
49 pages
Memory Management Notes
No ratings yet
Memory Management Notes
21 pages
Introduction To Parallel Processing and Distributed Systems
No ratings yet
Introduction To Parallel Processing and Distributed Systems
15 pages
Computer Organization: - by Rama Krishna Thelagathoti (M.Tech CSE From IIT Madras)
No ratings yet
Computer Organization: - by Rama Krishna Thelagathoti (M.Tech CSE From IIT Madras)
118 pages
Need For Memory Hierarchy: (Unit-1,3) (M.M. Chapter 12)
No ratings yet
Need For Memory Hierarchy: (Unit-1,3) (M.M. Chapter 12)
23 pages
Operating Systems: - Chapter 1
No ratings yet
Operating Systems: - Chapter 1
6 pages
Unit 3
No ratings yet
Unit 3
31 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Operating Systems Interview Questions You'll Most Likely Be Asked
From Everand
Operating Systems Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
ES Syllabus ECE PDF
No ratings yet
ES Syllabus ECE PDF
12 pages
Introduction To Instruction Level Parallelism (ILP) : ECE338 Parallel Computer Architecture Spring 2022
No ratings yet
Introduction To Instruction Level Parallelism (ILP) : ECE338 Parallel Computer Architecture Spring 2022
13 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
ACA Notes
No ratings yet
ACA Notes
60 pages
(eBook PDF) Computer Architecture: A Quantitative Approach 5th Edition All Chapters Instant Download
100% (4)
(eBook PDF) Computer Architecture: A Quantitative Approach 5th Edition All Chapters Instant Download
55 pages
Mepco Schlenk Engg College Autonomous PG CURRICULUM Web Curriculum MTECH IT
No ratings yet
Mepco Schlenk Engg College Autonomous PG CURRICULUM Web Curriculum MTECH IT
45 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
3 pages
Computer Hardware Engineering: IS1200, Spring 2015
No ratings yet
Computer Hardware Engineering: IS1200, Spring 2015
17 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
Full (Ebook PDF) Computer Architecture: A Quantitative Approach 5th Edition PDF All Chapters
100% (2)
Full (Ebook PDF) Computer Architecture: A Quantitative Approach 5th Edition PDF All Chapters
51 pages
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
No ratings yet
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
15 pages
Speculative Execution in High Performance Computer Architectures Chapman Hall CRC Computer Information Science Series 1st Edition David Kaeli
100% (4)
Speculative Execution in High Performance Computer Architectures Chapman Hall CRC Computer Information Science Series 1st Edition David Kaeli
84 pages
Computer Architecture: First Edition
No ratings yet
Computer Architecture: First Edition
6 pages
Von Neumann Architecture vs. Parallel Processing
No ratings yet
Von Neumann Architecture vs. Parallel Processing
3 pages
Vliw Processors
No ratings yet
Vliw Processors
20 pages
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
No ratings yet
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
201 pages
Deep-Submicron Microprocessor Design Issues
No ratings yet
Deep-Submicron Microprocessor Design Issues
12 pages
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
No ratings yet
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
57 pages
CS2354 (Aca)
No ratings yet
CS2354 (Aca)
4 pages
BCA-303 Tutorial Sheet-8
No ratings yet
BCA-303 Tutorial Sheet-8
1 page
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
17 pages
Parallel Processing Chapter - 3: Instruction Level Parallelism
No ratings yet
Parallel Processing Chapter - 3: Instruction Level Parallelism
33 pages
Parallel Sorting Algorithms
No ratings yet
Parallel Sorting Algorithms
22 pages
ILP Saad Saeed
No ratings yet
ILP Saad Saeed
31 pages
EEE440 Computer Architecture
No ratings yet
EEE440 Computer Architecture
7 pages
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
20 pages
Computer Architecture and Organization Chapter 5 &6
No ratings yet
Computer Architecture and Organization Chapter 5 &6
22 pages
CPE440 Computer Architecture
No ratings yet
CPE440 Computer Architecture
7 pages