Systolic Array
Systolic Array
Presentation Overview
Systolic arrays
Introduction Structures Matrix Multiplication Applications
Systolic computers pump data through The architectures are not general but tied to specific algorithms
Memory
Memory
PE
PE PE ----- PE
Each cell performs sequence of operations on data that flows between them. Generally operations are same in each cell. Each cell performs an operation or small number of operations on a data item and then passes it to its neighbour. Systolic arrays compute in lock-step with each cell undertaking alternate compute/communicate phases.
Processing Units
Processing Units
.. Data Bus
Processing Units
Interconnection Network(Local)
Systolic Array.
Control Unit Processing Units Control Unit Processing Units
..
Interconnection Network(Local)
SIMD array usually loads data into its local memories before starting the computation. Systolic arrays usually pipe data from an outside host and also pipe the results back to the host.
Figure Ref [1] Systolic Computing Fundamentals, http://web.cecs.pdx.edu/~mperkows/temp/May13/systolic.pdf
Processor capability: ranging through Trivial- just an ALU ALU with several registers Simple CPU- registers, run own program Powerful CPU- local memory also
1D Linear Array
T0
T1 T2 T3 T4 T5 T6 T7
Figure Ref [2] Jason HandUber , Systolic Arrays , February 12, 2003 , http://web.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdf
Planar array with perimeter I/O. This configuration allows I/O only through its boundary cells.
Focal Plane array with 3D I/O. This configuration allows I/O to each systolic cell.
Systolic Disadvantages
Complicated Both in Hardware and Software.
In fact entire volumes exist outlining systolic array verification.
Expensive in comparison to uni-processor systems, although much faster. A systolic array used as attached array processor, integrated into
Fault Tolerance
One for One Redundancy
each PE of SA has a redundant PE
standby PE keeps monitoring the active one at all times it becomes active if active PE fails it has to keep itself synchronized with the active unit operations
Fault Tolerance
N + X redundancy
consists of N+X PEs, where typically X is much smaller than N. whenever any of N modules fails, one of the X modules takes over its functions
Fault Tolerance
Load Sharing
all the PEs that are equipped to perform the SA function share
the load
higher level module performs load distribution, maintains health status of the PEs. If one load-sharing PE fails, the higher level module starts distributing load among the rest of the units.
Figure Ref [6] Jacob A. Abraham, Prithviraj Banerjee, Chien-Yi Chen, W. Kent Fuchs, Sy-Yen Kuo, and A. L. Narasimha Reddy. 1987. Fault Tolerance Techniques for Systolic Arrays. Computer 20, 7 (July 1987), 65-75
gracefully degradable linear systolic arrays TMR Fault Detection/Correction Time redundancy achieved - concurrent error correction/detection
Figure Ref [7] Majumdar, A.; Raghavendra, C.S.; Breuer, M.A.; , "Fault tolerance in linear systolic arrays using time redundancy," System Sciences, 1988. Vol.I. Architecture Track, Proceedings of the Twenty-First Annual Hawaii International Conference on , vol.1, no., pp.311-320, 0-0 1988
Array B
Array A
Local Switches
Switches placed immediately around each PE. Information entering a faulty PE can be directed to one of its neighbours without processing.
Bus-structured switches
PEs are in collinear layout, with bundles of communication parallel to the row to which the PEs are connected
Address renaming
Each processor has modifiable address with redundant processors and links provided. Once a faulty PE is detected, addresses of the processor are rearranged so that the faulty PE is excluded and redundant PE is included.
Processor-Switch Lattice
Figure Ref [6] Jacob A. Abraham, Prithviraj Banerjee, Chien-Yi Chen, W. Kent Fuchs, Sy-Yen Kuo, and A. L. Narasimha Reddy. 1987. Fault Tolerance Techniques for Systolic Arrays. Computer 20, 7 (July 1987), 65-75
References
1. 2. Systolic Computing Fundamentals, http://web.cecs.pdx.edu/~mperkows/temp/May13/systolic.pdf Jason HandUber , Systolic Arrays , February 12, 2003 , http://web.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdf 3. Shaaban, Systolic Architectures , http://web.cecs.pdx.edu/~mperkows/temp/May22/0020.Matrixmultiplication-systolic.pdf 4. 5. Jop Sibeyn , Systolic Matrix Product, http://users.informatik.uni-halle.de/~jopsi/dpar03/chap3.shtml I.N. Tselepis and M.P. Bekakos, Fault-Tolerant Implementation of Systolic Arrays, http://www.aueb.gr/pympe/hercma/proceedings2009/H09-FULL-PAPERS-1/TSELEPIS-BEKAKOS-1.pdf 6. Jacob A. Abraham, Prithviraj Banerjee, Chien-Yi Chen, W. Kent Fuchs, Sy-Yen Kuo, and A. L. Narasimha Reddy. 1987. Fault Tolerance Techniques for Systolic Arrays. Computer 20, 7 (July 1987), 65-75 7. Majumdar, A.; Raghavendra, C.S.; Breuer, M.A.; , "Fault tolerance in linear systolic arrays using time redundancy," System Sciences, 1988. Vol.I. Architecture Track, Proceedings of the Twenty-First Annual Hawaii International Conference on , vol.1, no., pp.311-320, 0-0 1988
Thank you