Algorithm and Architectural Level Methodologies For Low Power

Uploaded by

sohelmudgal85

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views43 pages

Algorithm and Architectural Level Methodologies For Low Power

Uploaded by

sohelmudgal85

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Algorithm and Architectural Level

Methodologies for Low Power

By: BHOGESHRAO
2SD24LDE02
Introduction
• With ever increasing integration levels, power has
become a critical design parameter.
• Consequently, a lot of effort has gone into achieving
lower dissipation at all levels of the design process.
• It has been demonstrated by several researchers
that algorithm and architecture level design
decisions can have a dramatic impact on power
consumption
• In this chapter we explore some of the known
synthesis, optimization and estimation techniques
applicable at the algorithm and architectural levels.
Introduction
• The techniques mentioned in this chapter are
targeted for DSP applications but can readily
be adapted for more general applications.
• Two examples -a vector quantizer encoder and
an FIR filter - are used throughout the chapter
to illustrate how the methodologies may be
applied.
• While the former is evaluated and optimized
for ASIC design, the latter is targeted for a
programmable processor.
Design Flow
• A design environment oriented towards power
minimization must embody optimization and estimation
tools at all levels of the design flow. A top-down approach,
with examples of associated tools, is illustrated in Figure
11.1.
• The most effective design decisions derive from choosing
and optimizing algorithms at the highest levels.
• However, implementation details cannot be accurately
modeled or estimated at this level of abstraction so
relative metrics must be judiciously used in making design
selections.
• More information is available at the architectural level,
hence estimates are more accurate and the effectiveness
of optimizations can be more accurately quantified.
Design Example 1: Vector Quantization, Introduction
• Throughout this chapter we will be using the
design of a video vector quantizer to illustrate
design flow at the different levels.
• Vector Quantization (VQ) is a data compression
method used in voice recognition and video systems.
• This example implements 16-to-1 video compression.
• In this approach to video quantization, an image is
broken up into a sequence of 4x4 pixel images (Figure
11.2)
• Each pixel is represented by an 8-bit word indicating
luminance.
Design Example 1: Vector Quantization, Introduction

• The 4x4 image, therefore, can be thought of as a

vector of 16 words, each 8 bits in length.
• Each of these vectors are compared with a
previously generated codebook of - in this case - 256
different vectors.
• This codebook is generated a priori with the
intention of covering enough of the vector space to
give a good representation of all probable vectors.
• After compression, an 8 bit word is generated
delineating the address of a codevector that
approximates the original 4x4 vector image the best.
• This corresponds to a compression ratio of
16:1, since 16, 8-bit words are now
represented as a single 8-bit word.
• Our design is directed toward a 240x128 pixel
grey scale display. Processing the standard 30
frames/sec moving picture necessitates that
one 4x4 pixel-vector be compressed every
17.3 micro-second.
• Distortion calculation: Each input block is compared
with all the “code words” stored in CODEBOOK (256
code words). It determines the similarity (closest)
between the input and code word. (Calculates MSE-
Mean Square Error)
Design Example 2: FIR Filter, Introduction
• The second example that will be used
throughout this chapter is a 14 tap, low-pass
Finite Impulse Response (FIR) filter. The
algorithm will be optimized for targeted
architectures and various implementations of
the filter --using dedicated and programmable
hardware-- will be analyzed in terms of their
power consumption characteristics.
Algorithm level: Analysis and Optimization
• Estimation (Analysis)
• The sources of power consumption on a CMOS chip
can be classified as dynamic power, short circuit
currents and leakage.
• At the algorithm level, it makes sense to only consider
the dynamic power.
• The contributions of short circuit currents and leakage
are mostly determined at the circuit level and are only
marginally effected by algorithm-level decisions.
• The power dissipated can be described by the
following well-known equation for dynamic power:
• where f is the frequency of operations,
• V is the supply voltage and
• Ceff is the effective capacitance switched.
• Ceff combines two factors - C, the physical
capacitance being charged/discharged, and α
the corresponding switching probability:
• Ceff = αC
• For the purpose of estimation, we can divide
power dissipation into two components:
1) algorithm-inherent dissipation and
2) Implementation overhead.
• The algorithm-inherent dissipation comprises
the power of the execution units and memory.
This is "inherent" in the sense that it is
necessary to achieve the basic functionality of
the algorithm, and cannot be avoided
irrespective of the implementation.
• On the other hand, the implementation
overhead includes control, interconnect and
registers. The power consumed by this
component depends largely on the choice of
architecture/implementation.
1. Estimating the algorithm-inherent dissipation -The
algorithm-inherent dissipation refers to the power
consumed by the execution units and memory.
• This component is fundamental to a given algorithm
and is the prime factor for comparisons between
different algorithms as well as for quantifying the
effect of algorithm level design decisions.
• Its dissipation can be estimated by a weighted sum
of the number of operations in the algorithm.
• The weights used for the different operations must
reflect the respective capacitances switched
2 .Estimating the implementation overhead –
• The implementation overhead consists of the control,
interconnect and implementation related
memory/register power.
• The power consumed by these components depends
on the specific architecture platform chosen and on
the mapping of the algorithm onto the hardware
• Since this overhead is not essential to the basic
functionality of a given algorithm, several estimation
tools ignore its effect for algorithm level comparisons
• However the power consumed by these components
is often comparable if not greater than the algorithm-
inherent dissipation.
• It is clear, therefore, that it important to get reasonable estimates of the
implementation overhead for realistic comparisons between algorithms
and to guide high level decisions.
• This is a formidable task without a complete architecture description.
• Fortunately, it is possible to produce first-order predictions of the
overhead component given some properties of both the algorithm and
the targeted hardware platform or architecture.
• There are 2 Structural Properties of an algorithm they are Regularity and
Locality.
• Regularity refers to consistent and repeatable computation or data
access patterns. (Eg: Matrix Multiplication)
• Easy to map on to hardware, efficient for parallelization, hardware
sharing
• Spatial Locality- Accessing data elements stored near to each other (Eg :
Array)
• Temporal Locality- Reusing data in short intervals of time.
• Improves cache usage, reduces memory power consumption
• Minimizing Bus interconnects.
• A spatially-local algorithm renders itself more easily to
efficient partitioning on hardware, allowing highly
capacitive global buses to be used sparingly
(Moderately/Limitedly)
• A temporally-local algorithm tends to require less
temporary storage and have small register files leading
to lower capacitances.
• In terms of memory/register access, spatial locality
refers to distance between the addresses of items
referenced close together in time and temporal locality
refers to the probability of future accesses to items
referenced in the recent past. A spatially-local memory
access pattern allows partitioning of memory into
smaller blocks that require less power per access.
• Given a targeted hardware platform and a number of
algorithm properties, techniques can be developed for
early prediction of the implementation overhead.
• Consider the interconnect power in an custom ASIC
implementation. It has been established that in
general the average length and, hence, the physical
capacitance of the buses is proportional to the
predicted die area.
• In its turn, the active area is a function of algorithmic
parameters such as the number of operations to be
performed and their concurrency pattern.
• The switching activity can be derived from the number
of bus accesses, which is proportional to the number
of edges in the computational graph of the algorithm.
Design Example 1: Vector Quantization, Algorithmic
Estimation
• We have established that at this high level of design
description, there is no means to accurately estimate
absolute power. However by using such metrics as
operation count and first-order estimates of the critical
path, design decisions can be made.
• To illustrate this level of estimation we use the most
straight forward method of coding the vector, a full
search through the entire code book (FSVQ) combined
with the standard Mean Square Error (MSE) distortion
measure given by Eq.
Design Example 1: Vector Quantization, Algorithmic
Estimation
• where C is the code book code vector, X is the original 4x4 vector
representation, and i is the index of the individual pixel word.
• The computational complexity per vector can be quantified by
enumerating executions (e.g. memory accesses, multiplications,
additions, etc.) required to search the codebook.
• This gives a reasonable first order approximation of relative
power consumption. Computing the MSE between two vectors
requires 16 memory accesses, 16 subtractions, 16 multiplies and
16 additions.
• In FSVQ, this is done for each of the 256 vectors in the codebook,
and each of these vectors are compared with the leading MSE
candidate at the time.
Algorithm-inherent dissipation - Operation
count can now be used to estimate the
switching capacitance inherent to the
algorithm if the targeted hardware library is
known.
• Using black box capacitance models of the hardware and
making assumptions on the bit-widths of each operator, a first
order estimate of capacitance can be made.

• Knowing that memory accesses and multiplications are

power hungry, the first-order analysis produces the insight
that these are the functions most in need of optimization. This
picture can be refined by introducing some architectural
constraints.
• Assume for instance that a single one-ported memory
is used to store the codebook.
• This sets the maximum concurrency of the memory
accesses to one only, i.e equivalently, imposes a
sequential execution of the algorithm.
• To meet the real time constraint of 17.3 micro-second
/block, this means that the memory access time has
to be smaller than 4.2 nsec.
• For a power efficient implementation, it is obvious
that either a more complex memory architecture
(dual port memory/ multi port memory), or a revised
algorithm is necessary.
Design Example 2: FIR Filter, Algorithmic Exploration

• Consider a direct-form structure (Figure 11.4)

of the FIR filter and assume a throughput
constraint of 3.125 MHz. As the voltage is
reduced, it is necessary to choose faster
hardware to meet the required time
constraint.
• Availability of power and area estimates of different
modules in the cell-library allows us to evaluate power
and area trade-off's involved in using different library
cells.
• Though the ripple adder dissipates less power than the
carry select adder (CSA), it fails to meet the required
throughput below 5 V whereas the CSA continues to
meet the throughput requirement down to 3 V.
• Table 11.3 summarizes the energy and area estimates
obtained using the techniques described in Section
11.3.1. Using the carry select adder reduces the power
consumption to a third of its original value with
minimal area penalty.
• Design space exploration (estimation tool) provides
an interactive environment giving quick feedback to
the designer about the effect of design choices on
specified performance metrics and allows the user
to make intelligent decisions.
• It Provides guidance for the selection of algorithms,
as a cost function for transformations. and for
hardware selection resulting in large power savings
Power Minimization Techniques at the Algorithm Level
• After examining methods for estimating power
consumption at the algorithm level, the next
logical step is to examine power minimization
techniques at this level.
• We will start by mentioning some of the
general approaches for power minimization
and then look at specific techniques that can
be used for minimization of both the
algorithm-inherent dissipation and the
implementation overhead.
• The recurring theme in low power design at all levels
of abstraction is voltage reduction. At the algorithm
level, functional pipelining, retiming, algebraic
transformations and loop transformations can be
used to increase speed and allow lower voltages.
• Be aware that these approaches often translate into
larger silicon area implementations, hence the
approach has been termed trading area for power.
• Estimation and exploration tools help us decide how
much we can drop the voltage while still meeting the
required performance constraints, as well as the
associated area penalty.
• Another technique for low power design is
avoiding wasteful activity. At the algorithm
level, the size and complexity of a given
algorithm (e.g. operation counts, word lengths)
determine the activity.
• If there are several algorithms for a given task,
the one with least number of operations is
generally preferable.
• Reducing the algorithm-inherent dissipation –
• Important transformations in this category
include operation reduction and strength
reduction.
• Operation reduction includes common sub-expression
elimination, algebraic transformations (e.g. reverse
distributivity), dead code elimination.
• Strength reduction refers to replacing energy consuming
operations by a combination of simpler operations.
• The most common in this category is expansion of
multiplications by constants into shift and add
operations. Though this transformation typically results
in lower power, it may sometimes have the opposite
effect if it results in an increase in critical path.
• Another drawback is that it introduces extra overhead in
the form of registers and control.
• Another important component of the algorithm-inherent
dissipation is the memory power.
• These include conversion of background memory to
foreground register files and reduction of memory size
using loop reordering and loop merging transformations.
• Minimization of the implementation overhead - is a more
challenging problem.
• it was explained how certain algorithms have potentially
less overhead than others as they possess certain
structural properties such as locality (reduces data
movement) and regularity (predictable and repetitive
structure- Reduces complexity- promotes resource sharing)
• For selection of algorithms, therefore, we must be able to
detect these properties.
• Optimizations on the algorithm level should
enhance and preserve them.
• Spatial locality can be detected and used to
guide partitioning.
• Regular algorithms typically require less
control and interconnect overhead.
• One other way to reduce the implementation
overhead is to reduce the chip area, as this
typically translates into reduced bus
capacitances
Design Example 1: Vector Quantization, Algorithmic
Optimization
• Continuing with this design example, the properties of operation
count and critical path will be used to aid in the choice and
optimization of algorithms. Using an estimate of algorithm-
inherent dissipation of Table 11.1, memory access has been
identified as the main hurdle to achieving a low-power design.
To achieve significant power savings, other algorithms are
investigated.
• Tree Search Vector Quantization (TSVQ) - Tree Search Vector
Quantizer (TSVQ) encoding [5] requires far less computation.
• TSVQ performs a binary search of the vector space instead of a
full search.
• As a result, the computational complexity is proportional to
log2N rather than N, where N is the number of vectors in the
code book
• Figure 11.5 diagrams the structure of the tree search. At each level of
the tree, the input vector is compared with two codebook entries.

• If at level 1, for example, the input vector is closer to the left entry, then
the right branch of the tree is not analyzed further and an index bit 0 is
transmitted. This process is repeated until a leaf of the tree is reached.
Hence only 2*log2(256) = 16 distortion comparisons have to be made,
compared to 256 distortion calculations in the FSVQ.
• Mathematical Optimizations - In TSVQ, there is a
large computational reduction available by
mathematically rearranging the computation of
the difference between the input vector X, and
two code vectors Ca and Cb , originally given by Eq.

• Since a given node in the comparison tree always

compares the same two code vectors, the
calculation of the errors can be combined under
one summation.
• With the quadratics expanded, this yields

• The first summation can be precomputed once the

codebook is known and stored in a single memory
location. The quantities 2(Cbi - Ca;) may also be
calculated and pre-stored.
• Therefore, at each level of the tree the number of
multiplications/additions/subtractions is reduced almost
50%
• The impact of the algorithm selection and the mathematical
transformations is summarized in Table 11.4 for a 256 vector
codebook.

• Therefore , The Optimized Search is chosen as the favorable

algorithm.
Design Example 2: FIR Filter, Algorithmic Optimization

• The algorithmic transformations described in this section

represent one of the most powerful and widely applicable
class of optimization techniques.
• We revisit our FIR example to demonstrate their advantages.
• As mentioned before, the throughput required is 3.125 MHz.
• The direct form has 13 additions and 1 multiplication in the
critical path and cannot meet the throughput constraint
below 3 V for the given hardware library.
• We use retiming to reduce the critical path. Figure 11.6
shows the structure of the retimed version.
• The critical path (after retiming) is now reduced to only 1
multiplication and 1 addition operation.
• This allow for a reduction in supply voltage
below 3 V while maintaining the same
throughput.
• The area-energy trade-off's for both the
versions with the variation of the supply
voltage, as generated by the algorithmic
estimation tools, are shown in Figure 11. 7
• The retimed version allows the voltage to be
reduced to 1.5 V, thus reducing the power
consumption drastically.
• However, the area penalty may be prohibitive.
• The designer can choose the voltage that best
suited simultaneously taking into account the
area, throughput and energy.
THANK YOU

Algoritham and Architectural Level Methodologies
No ratings yet
Algoritham and Architectural Level Methodologies
44 pages
8.algorithm & Architectural Level Methodologies
No ratings yet
8.algorithm & Architectural Level Methodologies
53 pages
Low Power Vlsi Design: Architecture Optimizations/Synthesis
No ratings yet
Low Power Vlsi Design: Architecture Optimizations/Synthesis
13 pages
Low Power Design in Digital Systems
100% (1)
Low Power Design in Digital Systems
28 pages
Architectural Estimation Seminor
No ratings yet
Architectural Estimation Seminor
32 pages
VLSI Design, Verification & Test Verification & Test: Arnab Sarkar Dept. of CSE IIT Guwahati
No ratings yet
VLSI Design, Verification & Test Verification & Test: Arnab Sarkar Dept. of CSE IIT Guwahati
39 pages
II - Software Design For Low Power
No ratings yet
II - Software Design For Low Power
11 pages
Eda Ieee Eds26sep2013
No ratings yet
Eda Ieee Eds26sep2013
90 pages
ASIC Module - 3
No ratings yet
ASIC Module - 3
100 pages
CAD For VLSI (PE) Learning Material
No ratings yet
CAD For VLSI (PE) Learning Material
194 pages
Cs6303comparchnotes PDF
No ratings yet
Cs6303comparchnotes PDF
250 pages
PDF
No ratings yet
PDF
41 pages
AVLSI 11-07-22 Introduction
No ratings yet
AVLSI 11-07-22 Introduction
26 pages
Expo Potencia en Fpga
No ratings yet
Expo Potencia en Fpga
27 pages
Software For Low Power Design
No ratings yet
Software For Low Power Design
61 pages
Week 1 Lecture Material
No ratings yet
Week 1 Lecture Material
96 pages
Vlsi Physical Design Nptel
No ratings yet
Vlsi Physical Design Nptel
1,091 pages
Partioning and Floor Planing
No ratings yet
Partioning and Floor Planing
45 pages
Introduction To PNR: Ahmed Abdelazeem 7/7/2024
No ratings yet
Introduction To PNR: Ahmed Abdelazeem 7/7/2024
101 pages
WINSEM2024-25 BCSE305L TH VL2024250501461 2025-02-12 Reference-Material-I
No ratings yet
WINSEM2024-25 BCSE305L TH VL2024250501461 2025-02-12 Reference-Material-I
45 pages
Cs8491 - Computer Architecture Lession Notes Unit I Overview & Instructions Embedded Computer
No ratings yet
Cs8491 - Computer Architecture Lession Notes Unit I Overview & Instructions Embedded Computer
30 pages
DR Abdul Saleemand Vamsi
No ratings yet
DR Abdul Saleemand Vamsi
14 pages
CADVLSI Unit 1 LM
No ratings yet
CADVLSI Unit 1 LM
41 pages
Lecture 4
No ratings yet
Lecture 4
30 pages
Svcet: 1. Addressing Modes
100% (1)
Svcet: 1. Addressing Modes
14 pages
Low-Power VLSI Design Guide
No ratings yet
Low-Power VLSI Design Guide
34 pages
Ahuja2012 Book LowPowerDesignWithHigh-LevelPo
No ratings yet
Ahuja2012 Book LowPowerDesignWithHigh-LevelPo
186 pages
An Efficient ASIC Architecture For Real-Time Edge Detection
No ratings yet
An Efficient ASIC Architecture For Real-Time Edge Detection
10 pages
Introduction To Simulation - Lecture 1: Example Problems and Basic Equations
No ratings yet
Introduction To Simulation - Lecture 1: Example Problems and Basic Equations
48 pages
Computation of Storage Requirements For Multi-Dimensional Signal Processing Applications
No ratings yet
Computation of Storage Requirements For Multi-Dimensional Signal Processing Applications
14 pages
Dac80 338
No ratings yet
Dac80 338
2 pages
Vlsi
No ratings yet
Vlsi
20 pages
Lasc As 2010
No ratings yet
Lasc As 2010
4 pages
35 Vlsi
No ratings yet
35 Vlsi
12 pages
Chap 1
No ratings yet
Chap 1
12 pages
1d996928lecture 2 and 3 PDF
No ratings yet
1d996928lecture 2 and 3 PDF
53 pages
Zhiyao Zie 2022 Defense Duke
No ratings yet
Zhiyao Zie 2022 Defense Duke
85 pages
Ee382M - Vlsi I: Spring 2009 (Prof. David Pan) Final Project
No ratings yet
Ee382M - Vlsi I: Spring 2009 (Prof. David Pan) Final Project
13 pages
Computer Architecture Course Overview
No ratings yet
Computer Architecture Course Overview
18 pages
04-Time Area Reliability
No ratings yet
04-Time Area Reliability
46 pages
Handout Chapter-1 PBK
No ratings yet
Handout Chapter-1 PBK
14 pages
Lect 01
No ratings yet
Lect 01
40 pages
Innovate or Perish: FPGA Physical Design: Taraneh Taghavi, Soheil Ghiasi Abhishek Ranjan, Salil Raje Majid Sarrafzadeh
No ratings yet
Innovate or Perish: FPGA Physical Design: Taraneh Taghavi, Soheil Ghiasi Abhishek Ranjan, Salil Raje Majid Sarrafzadeh
8 pages
Introduction To Vlsi Lifecycles - Professor-Style Lecture Notes (Exam Prep)
No ratings yet
Introduction To Vlsi Lifecycles - Professor-Style Lecture Notes (Exam Prep)
6 pages
On Approximate Computing Techniques
No ratings yet
On Approximate Computing Techniques
58 pages
Making A Real Processor Step by Step Using RISC-V ISA: Yi Jiangfang Cui Hongwei
No ratings yet
Making A Real Processor Step by Step Using RISC-V ISA: Yi Jiangfang Cui Hongwei
2 pages
Q Electrical Dinamic Power
No ratings yet
Q Electrical Dinamic Power
8 pages
Dynamic Power Consumption in Virtex™-II FPGA Family: Li Shang Alireza S Kaviani Kusuma Bathala
No ratings yet
Dynamic Power Consumption in Virtex™-II FPGA Family: Li Shang Alireza S Kaviani Kusuma Bathala
8 pages
Verilog Fpga Tutorial
No ratings yet
Verilog Fpga Tutorial
77 pages
Digital Systems
No ratings yet
Digital Systems
45 pages
2 IN35 Slides 1
No ratings yet
2 IN35 Slides 1
61 pages
Day 1
No ratings yet
Day 1
25 pages
VLSI Synthesis of DSP Kernels - Algorithmic and Architectural Transformations
No ratings yet
VLSI Synthesis of DSP Kernels - Algorithmic and Architectural Transformations
220 pages
Low Power Seminar (Raksha)
No ratings yet
Low Power Seminar (Raksha)
20 pages
Vlsi DSP 5
No ratings yet
Vlsi DSP 5
25 pages
WC Cta
No ratings yet
WC Cta
31 pages
Low Power
No ratings yet
Low Power
15 pages
Semiconductor Memories
No ratings yet
Semiconductor Memories
97 pages
Unit 4 - Hypothesis in Research-1
No ratings yet
Unit 4 - Hypothesis in Research-1
26 pages
Unit 5 - Interpretation and Report Writing-1
No ratings yet
Unit 5 - Interpretation and Report Writing-1
58 pages
4.downlink Transport Channel Processing Final
No ratings yet
4.downlink Transport Channel Processing Final
23 pages
OFDM Final
No ratings yet
OFDM Final
38 pages
LTE Channels
No ratings yet
LTE Channels
82 pages
LTE
No ratings yet
LTE
91 pages
CONICET Digital Nro. L
No ratings yet
CONICET Digital Nro. L
3 pages
Textusal Use Case Diagram
No ratings yet
Textusal Use Case Diagram
3 pages
AHDL Basics: Syntax and Examples
No ratings yet
AHDL Basics: Syntax and Examples
16 pages
Modal Analysis of A Simulated System and A Wind Turbine Blade Example
No ratings yet
Modal Analysis of A Simulated System and A Wind Turbine Blade Example
20 pages
Solving Quadratic Equation by Using The Quadratic Formula: (M9AL-Ia-b-1)
No ratings yet
Solving Quadratic Equation by Using The Quadratic Formula: (M9AL-Ia-b-1)
4 pages
NEET 11th
No ratings yet
NEET 11th
8 pages
1-Number & Algebra
No ratings yet
1-Number & Algebra
90 pages
Modul 10 Penyelesaian Segitiga PDF
No ratings yet
Modul 10 Penyelesaian Segitiga PDF
23 pages
Software Testing Strategies 1 Software Tests - Definition
No ratings yet
Software Testing Strategies 1 Software Tests - Definition
5 pages
Chomsky
No ratings yet
Chomsky
7 pages
Research Methods Complete Worksheet and Papers
No ratings yet
Research Methods Complete Worksheet and Papers
4 pages
Straight Line Depreciation Explained
No ratings yet
Straight Line Depreciation Explained
5 pages
Python Data Science Toolbox
No ratings yet
Python Data Science Toolbox
14 pages
FEM Analysis Overview and Applications
100% (1)
FEM Analysis Overview and Applications
30 pages
Assignment 3 DL
No ratings yet
Assignment 3 DL
6 pages
Speed-Time Graphs and Forces Analysis
No ratings yet
Speed-Time Graphs and Forces Analysis
9 pages
B.E - (2008 Patt.)
100% (2)
B.E - (2008 Patt.)
825 pages
Matrix Operations and Inverses
No ratings yet
Matrix Operations and Inverses
10 pages
ML 4
No ratings yet
ML 4
30 pages
DSE Oficial - Riquelme 2015 - Discontinuity Spacing Analysis in Rock Masses Using 3D Point Clouds
No ratings yet
DSE Oficial - Riquelme 2015 - Discontinuity Spacing Analysis in Rock Masses Using 3D Point Clouds
11 pages
T-Ticker Tape Only
No ratings yet
T-Ticker Tape Only
2 pages
2019 Grade 08 Maths First Term Paper English Medium Jaffna Hindu College
No ratings yet
2019 Grade 08 Maths First Term Paper English Medium Jaffna Hindu College
6 pages
Good Soil Academy Math Exam 2023-24
No ratings yet
Good Soil Academy Math Exam 2023-24
1 page
Sample Size Calculation
No ratings yet
Sample Size Calculation
19 pages
Structural Beam Design Guide
No ratings yet
Structural Beam Design Guide
72 pages
Differential Equation (Seatwork)
No ratings yet
Differential Equation (Seatwork)
3 pages
Lectures On Physical Biochemistry)
No ratings yet
Lectures On Physical Biochemistry)
210 pages
Understanding Square and Cube Roots
No ratings yet
Understanding Square and Cube Roots
44 pages
Maths Assessment Year 3 Term 3: Number and Place Value: Name
No ratings yet
Maths Assessment Year 3 Term 3: Number and Place Value: Name
4 pages
Carnot 3
No ratings yet
Carnot 3
11 pages

Algorithm and Architectural Level Methodologies For Low Power

Uploaded by

Algorithm and Architectural Level Methodologies For Low Power

Uploaded by

Algorithm and Architectural Level

Methodologies for Low Power

• The 4x4 image, therefore, can be thought of as a

• Knowing that memory accesses and multiplications are

• Consider a direct-form structure (Figure 11.4)

• Since a given node in the comparison tree always

• The first summation can be precomputed once the

• Therefore , The Optimized Search is chosen as the favorable

• The algorithmic transformations described in this section

You might also like