Vector Processor

This document provides an overview of vector computer architecture. It discusses how vector computers use pipelines to efficiently perform operations on vector (array) elements in parallel. Key points include: - Vector computers contain special arithmetic units called pipelines that can overlap the execution of different parts of operations on vector elements. - Pipelining vector operations makes them much more efficient than performing the same operations sequentially on each element. - Vector registers hold multiple vector elements to feed the pipelines efficiently. Scalar registers allow scalar values to operate on whole vectors. - Chaining pipelines together can further improve performance by allowing the output of one to directly feed into the next.

Uploaded by

Adedokun Abayomi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

153 views

Vector Processor

Uploaded by

Adedokun Abayomi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

A vector computer or vector processor is a machine designed to efficiently handle arithmetic operations on elements of arrays, called vectors.

Such machines are especially useful in highperformance scientific computing, where matrix and vector arithmetic are quite common. The Cray Y-MP and the Convex C3880 are two examples of vector processors used today. This tutorial provides a general overview of the architecture of a vector computer. This includes an introduction to vectors and vector arithmetic, a discussion of performance measurements used to evaluate this type of machine, and a comparison of the characteristics of particular vector computers. A brief history of vector processors is provided as well, with a focus on the Cray vector architectures.

Vectors and vector arithmetic

To understand the concepts behind a vector processor, we first present a short review of vectors and vector arithmetic.

Vector computing architectural concepts

We continue by showing the application of these ideas to the hardware in vector processors.

Vector computing performance

We then discuss performance and performance metrics, providing these figures for a few specific vector processors.

The evolution of vector computers

Finally we relate the history of some vector processors. In particular, we focus on Cray vector processors.

Vector computing architectural concepts

A vector computer contains a set of special arithmetic units called pipelines. These pipelines overlap the execution of the different parts of an arithmetic operation on the elements of the vector, producing a more efficient execution of the arithmetic operation. In many respects, a pipeline is similar to an assembly line in a factory where different steps of the assembly of an automobile, for example, are performed at different stages of the line. In this section, we discuss how a vector pipeline operates, the advantages of this type of architecture, and other architectural features found in vector processors.

The stages of a floating-point operation

Consider the steps or stages involved in a floating-point addition on a sequential machine with IEEE arithmetic hardware: s = x + y.
y

[A:] The exponents of the two floating-point numbers to be added are compared to find the number with the smallest magnitude.

y y y y y

[B:] The significand of the number with the smaller magnitude is shifted so that the exponents of the two numbers agree. [C:] The significands are added. [D:] The result of the addition is normalized. [E:] Checks are made to see if any floating-point exceptions occurred during the addition, such as overflow. [F:] Rounding occurs.

Figure 1 shows the step-by-step example of such an addition. The numbers to be added are x = 1234.00 and y = -567.8. In deference to the human reader, these are represented in decimal notation with a mantissa of four digits. Now consider this scalar addition performed on all the elements of a pair of vectors (arrays) of length n. Each of the six stages needs to be executed for every pair of elements. If each stage of the execution takes tau units of time, then each addition takes 6*tau units of time (not counting the time required to fetch and decode the instruction itself or to fetch the two operands). So the number of time units required to add all the elements of the two vectors in a serial fashion would be Ts = 6*n*tau. These execution stages are shown in figure 2 with respect to time.

An arithmetic pipeline
Suppose the addition operation described in the last subsection is pipelined; that is, one of the six stages of the addition for a pair of elements is performed at each stage in the pipeline. Each stage of the pipeline has a separate arithmetic unit designed for the operation to be performed at that stage. Once stage A has been completed for the first pair of elements, these elements can be moved to the next stage (B) while the second pair of elements moves into the first stage (A). Again each stage takes tau units of time. Thus, the flow through the pipeline can be viewed as shown in figure 3, where the stages of the pipeline addition execute with respect to time as in figure 4. (Compare figure 2 to figure 4.) Observe that it still takes 6*tau units of time to complete the sum of the first pair of elements, but that the sum of the next pair is ready in only tau more units of time. And this pattern continues for each succeeding pair. This means that the time, Tp, to do the pipelined addition of two vectors of length n is Tp = 6*tau + (n-1)*tau = (n + 5)*tau. The first 6*tau units of time are required to fill the pipeline and to obtain the first result. After the last result, xn + yn, is completed, the pipeline is emptied out or flushed. Comparing the equations for Ts and Tp, it is clear that (n + 5)*tau < 6*n*tau, for n > 1. Thus, this pipelined version of addition is faster than the serial version by almost a factor of the number of stages in the pipeline. This is an example of what makes vector processing more

efficient than scalar processing. For large n, the pipelined addition for this sample pipeline is about six times faster than scalar addition. In this discussion, we have assumed that the floating-point addition requires six stages and takes 6*tau units of time. There is nothing magic about this number 6; in fact, for some architectures, the number of stages in a floating-point addition may be more or less than six. Further, the individual stages may be quite different from the ones listed in section on pipelined addition. The operations at each stage of a pipeline for floating-point multiplication are slightly different than those for addition; a multiplication pipeline may even have a different number of stages than an addition pipeline. There may also be pipelines for integer operations. As shown in figure 8, pipelines to perform vector operations on the Cray-1 have from one to fourteen stages, depending on the type of operation performed by the pipeline. Vector registers Some vector computers, such as the Cray Y-MP, contain vector registers. A general purpose or a floating-point register holds a single value; vector registers contain several elements of a vector at one time. For example, the Cray Y-MP vector registers contain 64 elements while the Cray C90 vector registers hold 128 elements. The contents of these registers may be sent to (or received from) a vector pipeline one element at a time. Scalar registers Scalar registers behave like general purpose or floating-point registers; they hold a single value. However, these registers are configured so that they may be used by a vector pipeline; the value in the register is read once every tau units of time and put into the pipeline, just as a vector element is released from the vector pipeline. This allows the elements of a vector to be operated on by a scalar. To compute y = 2.5 * x, the 2.5 is stored in a scalar register and fed into the vector multiplication pipeline every tau units of time in order to be multiplied by each element of x to produce y. Chaining Figure 4 is a diagram of a single pipeline. As mentioned in section on pipelined addition, most vector architectures have more than one pipeline; they may also contain different types of pipelines. Some vector architectures provide greater efficiency by allowing the output of one pipeline to be chained directly into another pipeline. This feature is called chaining and eliminates the need to store the result of the first pipeline before sending it into the second pipeline. Figure 5 demonstrates the use of chaining in the computation of a saxpy vector operation: a*x + y, where x and y are vectors and a is a scalar constant.

Chaining can double the number of floating-point operations that are done in tau units of time. Once both the multiplication and addition pipelines have been filled, one floating-point multiplication and one floating-point addition (a total of two floating-point operations) are completed every tau time units. Conceptually, it is possible to chain more than two functional units together, providing an even greater speedup. However this is rarely (if ever) done due to difficult timing problems. Scatter and gather operations Sometimes, only certain elements of a vector are needed in a computation. Most vector processors are equipped to pick out the appropriate elements (a gather operation) and put them together into a vector or a vector register. If the elements to be used are in a regularly-spaced pattern, the spacing between the elements to be gathered is called the stride. For example, if the elements x1, x5, x9, x13, ..., x[4*floor((n-1)/4)+1] are to be extracted from the vector ( x1, x2, x3, x4, x5, x6, ..., xn ) for some vector operation, we say the stride is equal to 4. A scatter operation reformats the output vector so that the elements are spaced correctly. Scatter and gather operations may also be used with irregularly-spaced data. Vector-register vector processors If a vector processor contains vector registers, the elements of the vector are read from memory directly into the vector register by a load vector operation. The vector result of a vector operation is put into a vector register before it is stored back in memory by a store vector operation; this permits it to be used in another computation without needing to be reread, and it allows the store to be overlapped by other operations. On these machines, all arithmetic or logical vector operations are register-register operations; that is, they are only performed on vectors that are already in the vector registers. For this reason, these machines are called vector-register vector processors. Memory-memory vector processors Another type of vector processor allows the vector operands to be fetched directly from memory to the different vector pipelines and the results to be written directly to memory; these are called memory-memory vector processors. Because the elements of the vector need to come from memory instead of a register, it takes a little longer to get a vector operation started; this is due partly to the cost of a memory access. One example of a memory-memory vector processor is the CDC Cyber 205. Because of the ability to overlap memory accesses and the possible reuse of vector processors, vector-register vector processors are usually more efficient than memory-memory vector processors. However as the length of the vectors in a computation increase, this difference in efficiency between the two types of architectures is diminished. In fact, the memory-memory

vector processors may prove more efficient if the vectors are long enough. Nevertheless, experience has shown that shorter vectors are more common

Quantums primary interface is a programmatic RESTful API. The abstractions over which it operates are, by design, extremely simple.

The Quantum API allows for creation and management of virtual networks each of which can have one or more ports. A port on a virtual network can be attached to a network interface, where a network interface is anything which can source traffic, such as a vNIC exposed by a virtual machine, an interface on a load balancer, and so on. These abstractions offered by Quantum (virtual networks, virtual ports,and network interfaces) are the building blocks for building and managing logical network topologies.Of course, the technology that implements Quantum is fully decoupled from the API (that is, the backend is pluggable).

So, for example, the logical network abstraction could be implemented using simple VLANs, L2-in-L3 tunneling, or any other mechanism one can imagine and build. The only requirement is that the actual implementation provide the L2 connectivity described by the logical model.While the native Quantum API does not support more sophisticated network services such as, say, QoS or ACLs, it does provide an API extensibility mechanism that plugins can use to expose them. This is the conduit by which developers and vendors in the OpenStack ecosystem can innovate within Quantum. If an extension proves useful and generally applicable, it may become a part of the core Quantum API in a future version.

Quantum Internals

There are 3 key functional layers of abstraction that make up the Quantum service: 1) REST API layer: This layer is responsible for implementing the Quantum API and routing API requests to the correct end-point within Quantums pluggable infrastructure. The REST API layer also contains various infrastructure glue around launching the Quantum service, marshalling & unmarshalling requests and responses, and validating data format & data correctness. This layer can also contain security and stability infrastructure such as rate-limiting logic on inbound API calls to protect against Denial of Service attacks and make sure that the Service remains responsive under load.

REST API Extensions: Quantum provides an extensibility mechanism that enables anybody to extend the Core API and add additional features and functionality that are not currently part of the Core API. Taking todays Core API as an example, one could use the extensibility mechanism to create a QoS extension that enables setting up Quality of Service parameters associated with Quantum networks. Similarly, you can imagine multiple parties can easily integrate advanced networking functionality using Quantums extensibility mechanism. Quantum community is actively working on implementing the extensibility framework (to follow the progress, check out the blueprint here).

Key Quantum API methods: Method: Create Network

REST URL: POST /tenants/{tenant-id}/networks HTTP Request Body: Specified the symbolic name for the network being created. E.g. { "network": { "name": "symbolic name for network1" } }

Description: This operation creates a Layer-2 network in Quantum based on the information provided in the request body. Method: List all networks for a particular tenant
REST URL: GET /tenants/{tenant-id}/networks HTTP Request Body: Not Applicable

Description: This operation returns the list of all networks currently defined in Quantum Method: Update Network
REST URL: PUT /tenants/{tenant-id}/networks/{network-id} HTTP Request Body: Specify a new symbolic name for a particular Quantum network E.g. { "network": { "name": "new symbolic name" } }

Motor Learning and Control 9th Edition Test Bank Richard A Magill
100% (52)
Motor Learning and Control 9th Edition Test Bank Richard A Magill
4 pages
Feedback and Control Systems Lab Manual
100% (1)
Feedback and Control Systems Lab Manual
72 pages
Getting Started With Amazon Documentdb
100% (1)
Getting Started With Amazon Documentdb
28 pages
Panduan Microsoft Teams
No ratings yet
Panduan Microsoft Teams
9 pages
Implementing Linear Algebraalgorithms For Dense Matrices
No ratings yet
Implementing Linear Algebraalgorithms For Dense Matrices
22 pages
Pipelining PDF
No ratings yet
Pipelining PDF
19 pages
VI. Implicit Parallelism - Instruction Level VI. Implicit Parallelism Instruction Level Parallelism. Pipeline Superscalar & Vector P Processors
No ratings yet
VI. Implicit Parallelism - Instruction Level VI. Implicit Parallelism Instruction Level Parallelism. Pipeline Superscalar & Vector P Processors
26 pages
Experiment 1
No ratings yet
Experiment 1
13 pages
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
No ratings yet
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
20 pages
Document From ? 2
No ratings yet
Document From ? 2
33 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
Chapter 5 - CO - BIM - III
No ratings yet
Chapter 5 - CO - BIM - III
7 pages
Lab Experiment 1 (A)
No ratings yet
Lab Experiment 1 (A)
14 pages
COA Chapter 9
No ratings yet
COA Chapter 9
36 pages
Octave Tutorial Andrew NG
No ratings yet
Octave Tutorial Andrew NG
27 pages
Unit-IV Subsystem Design and VLSI Design Styles
No ratings yet
Unit-IV Subsystem Design and VLSI Design Styles
33 pages
Vlsi Design Unit4 PDF
No ratings yet
Vlsi Design Unit4 PDF
53 pages
coa unit 2
No ratings yet
coa unit 2
9 pages
ECE 5315: Project 3: Simulation of Pipelined Floating Point Processor Design: Pipelined FP Adder
No ratings yet
ECE 5315: Project 3: Simulation of Pipelined Floating Point Processor Design: Pipelined FP Adder
3 pages
Pipeline (Computing) : Concept and Motivation Design Considerations
No ratings yet
Pipeline (Computing) : Concept and Motivation Design Considerations
4 pages
Kim - A Carry Skip Adder With Logic Level Optimization
No ratings yet
Kim - A Carry Skip Adder With Logic Level Optimization
8 pages
Exercises 16.06 16.07 Matlab Simulink
No ratings yet
Exercises 16.06 16.07 Matlab Simulink
6 pages
Parallel Algorithms Underlying MPI Implementations
No ratings yet
Parallel Algorithms Underlying MPI Implementations
55 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
2 pages
Parallel Algorithms Underlying MPI Implementations
No ratings yet
Parallel Algorithms Underlying MPI Implementations
55 pages
Optimal Design Methodology For An AGV Transportation System by Using The Queuing Network Theory
No ratings yet
Optimal Design Methodology For An AGV Transportation System by Using The Queuing Network Theory
10 pages
Content PDF
No ratings yet
Content PDF
14 pages
Syllabus Topic: - Vector Processing - Vector Processor
No ratings yet
Syllabus Topic: - Vector Processing - Vector Processor
14 pages
Project
No ratings yet
Project
4 pages
Matlab Basic and Application To Smart Grid: Dr. Atma Ram Gupta Gunjesh Tahiliani (2014306) Rohit Ray (32014315)
No ratings yet
Matlab Basic and Application To Smart Grid: Dr. Atma Ram Gupta Gunjesh Tahiliani (2014306) Rohit Ray (32014315)
26 pages
Activity01 (1) Carreon
No ratings yet
Activity01 (1) Carreon
14 pages
ELEN90055 Control Systems Workshop 0: Prepared by Alejandro Maass and Michael Cantoni Email: Cantoni@unimelb - Edu.au
No ratings yet
ELEN90055 Control Systems Workshop 0: Prepared by Alejandro Maass and Michael Cantoni Email: Cantoni@unimelb - Edu.au
10 pages
1.1 Parallelism Is Ubiquitous
No ratings yet
1.1 Parallelism Is Ubiquitous
3 pages
Processor Design Using Square Root Carry Select Adder
No ratings yet
Processor Design Using Square Root Carry Select Adder
6 pages
Experiment 8
No ratings yet
Experiment 8
5 pages
Virtual Cellular Manufacturing
No ratings yet
Virtual Cellular Manufacturing
13 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
CH -4 DPC
No ratings yet
CH -4 DPC
6 pages
محاضرة 1
No ratings yet
محاضرة 1
24 pages
Matlab Handout
No ratings yet
Matlab Handout
6 pages
A at To: Reccmmendred
No ratings yet
A at To: Reccmmendred
10 pages
3.2 Performance Evaluations
No ratings yet
3.2 Performance Evaluations
18 pages
Pipeline Processing Coa
No ratings yet
Pipeline Processing Coa
34 pages
Computação Quantica
No ratings yet
Computação Quantica
7 pages
Unit II
No ratings yet
Unit II
25 pages
Kedy Jan Cuanan Cengr 3140 LR1 1
No ratings yet
Kedy Jan Cuanan Cengr 3140 LR1 1
8 pages
Basic of Matlab
No ratings yet
Basic of Matlab
11 pages
Hive Vectorized Query Execution Design
No ratings yet
Hive Vectorized Query Execution Design
7 pages
Chapter2 - M-Review of Octave
No ratings yet
Chapter2 - M-Review of Octave
12 pages
Copy of Stack_and_Queue_Level-II
No ratings yet
Copy of Stack_and_Queue_Level-II
2 pages
Experiment 9
No ratings yet
Experiment 9
6 pages
CVEN2002 Laboratory Exercises
No ratings yet
CVEN2002 Laboratory Exercises
41 pages
Analysis of Energy Comsumption and Traffic Flow by Means of Track Occupation Data
No ratings yet
Analysis of Energy Comsumption and Traffic Flow by Means of Track Occupation Data
7 pages
Octave Tutorial: Andrew NG
No ratings yet
Octave Tutorial: Andrew NG
27 pages
Addition Algorithms For Vlsi - A Review: Kapilramgavali1, Sandeepdubey, Gaurav Shete, Sushant Gawade
No ratings yet
Addition Algorithms For Vlsi - A Review: Kapilramgavali1, Sandeepdubey, Gaurav Shete, Sushant Gawade
8 pages
Unit 5 1
No ratings yet
Unit 5 1
21 pages
Unit-3 (Part-IV)
No ratings yet
Unit-3 (Part-IV)
4 pages
UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
No ratings yet
UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
12 pages
Implementation of Carry Select Adder Using Verilog On FPGA: Sapan Desai (17BEC023) & Devansh Chawla (17BEC024)
No ratings yet
Implementation of Carry Select Adder Using Verilog On FPGA: Sapan Desai (17BEC023) & Devansh Chawla (17BEC024)
9 pages
MATLAB - Transfer Functions - PrattWiki
No ratings yet
MATLAB - Transfer Functions - PrattWiki
4 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
28 pages
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter Kattan
No ratings yet
Sedimentary Basin 1
No ratings yet
Sedimentary Basin 1
2 pages
Sedimentary Basin 2
No ratings yet
Sedimentary Basin 2
2 pages
Power System Contingency Analysis
No ratings yet
Power System Contingency Analysis
9 pages
Men Conference / Fathers Day: June 2018
No ratings yet
Men Conference / Fathers Day: June 2018
1 page
Men Conference / Fathers Day: June 2018
No ratings yet
Men Conference / Fathers Day: June 2018
1 page
Leadership and Followership Communication
No ratings yet
Leadership and Followership Communication
23 pages
Simba Enterprises: Mr. Azeem
No ratings yet
Simba Enterprises: Mr. Azeem
2 pages
Minimization of Power Loss
No ratings yet
Minimization of Power Loss
119 pages
Movement of Nigeria From Mixed Economy To Capital Economy
No ratings yet
Movement of Nigeria From Mixed Economy To Capital Economy
9 pages
Numerical Ranges of Unbounded Operators
No ratings yet
Numerical Ranges of Unbounded Operators
22 pages
Konfigurationsraum Und Zweite Quantelung
No ratings yet
Konfigurationsraum Und Zweite Quantelung
5 pages
Diamonte Lesson Plan Nov
No ratings yet
Diamonte Lesson Plan Nov
5 pages
Trifles QA
No ratings yet
Trifles QA
4 pages
Python Classes
No ratings yet
Python Classes
23 pages
CH 07
No ratings yet
CH 07
45 pages
Udl
No ratings yet
Udl
5 pages
Learn Spanish Language (Easier) With Cognates
50% (2)
Learn Spanish Language (Easier) With Cognates
174 pages
3 Future form
No ratings yet
3 Future form
9 pages
Adult
No ratings yet
Adult
18 pages
Stopwatch 5 Unit 7 Reading 1 (5.7.R1)
100% (1)
Stopwatch 5 Unit 7 Reading 1 (5.7.R1)
1 page
068 ID029 Western CEFR Advanced
No ratings yet
068 ID029 Western CEFR Advanced
9 pages
Blumea Balsamifera
No ratings yet
Blumea Balsamifera
4 pages
CH 3 - IKS - PPT - Holistic Decision Making Analytical Approaches of West
No ratings yet
CH 3 - IKS - PPT - Holistic Decision Making Analytical Approaches of West
24 pages
I Chuan Articles
100% (1)
I Chuan Articles
26 pages
The Power of Confession and Declaration
71% (7)
The Power of Confession and Declaration
11 pages
Verbals Gerunds
No ratings yet
Verbals Gerunds
15 pages
L As Level Physics A 2821 01 January 2008 Question Paper Old g481
No ratings yet
L As Level Physics A 2821 01 January 2008 Question Paper Old g481
16 pages
DSP Lab Expt 3 EECE GITAM-19-23
No ratings yet
DSP Lab Expt 3 EECE GITAM-19-23
5 pages
Grade 10 Mathematics Syllabus for USA
No ratings yet
Grade 10 Mathematics Syllabus for USA
10 pages
01adasa Santi Apyayantu
No ratings yet
01adasa Santi Apyayantu
2 pages
Home Town Palembang .Edited
No ratings yet
Home Town Palembang .Edited
1 page
English Power Point Othello
No ratings yet
English Power Point Othello
8 pages
Shadow of The Cross
No ratings yet
Shadow of The Cross
2 pages
Modal Verbs: Multiple Choice
No ratings yet
Modal Verbs: Multiple Choice
2 pages
27 Tuesday 53 Speaking Speaking Health and Environment
No ratings yet
27 Tuesday 53 Speaking Speaking Health and Environment
3 pages
Asuncion 2221325 Module 1 - Missionary Response
No ratings yet
Asuncion 2221325 Module 1 - Missionary Response
4 pages
How To Write Memoir Essay
No ratings yet
How To Write Memoir Essay
2 pages
Selection Sort Algorithm PDF
No ratings yet
Selection Sort Algorithm PDF
9 pages