Embedded Systems - 16CS402: Department of Computer Science and Engineering, Dayananda Sagar University, Bengaluru
Embedded Systems - 16CS402: Department of Computer Science and Engineering, Dayananda Sagar University, Bengaluru
Systems–
16CS402
Department of Computer Science and Engineering, Dayananda
Sagar University, Bengaluru
COMPONENTS FOR EMBEDDED
PROGRAMS
▪ consider code for three structures or components that are commonly used in
embedded software: the state machine, the circular buffer, and the queue. State
machines are well suited to reactive systems such as user interfaces; circular buffers
and queues are useful in digital signal processing.
▪ State Machines: The reaction of most systems can be characterized in terms of the
input received and the current state of the system. This leads naturally to a finite-state
machine style of describing the reactive system’s behavior.
▪ The state machine style of programming is also an efficient implementation of such
computations.
▪ Finite-state machines are usually first encountered in the context of hardware design.
Programming
▪ Example 5.1 shows how to write a finite-state machine in a high-level programming
language.
State machine example
no seat/-
no seat/ idle
buzzer off seat/timer on
no seat/- no belt
buzzer seated and no
Belt/buzzer on timer/-
belt/-
belt/
buzzer off no belt/timer on
belted
COMPONENTS FOR EMBEDDED
PROGRAMS
▪ The controller’s job is to turn on a buzzer if a person sits in a seat and does not fasten
the seat belt within a fixed amount of time.
▪ This system has three inputs and one output. The inputs are a sensor for the seat to
know when a person has sat down, a seat belt sensor that tells when the belt is
fastened, and a timer that goes off when the required time interval has elapsed.
▪ The output is the buzzer. Appearing below is a state diagram that describes the seat
belt controller’s behavior.
▪ The idle state is in force when there is no person in the seat. When the person sits
down, the machine goes into the seated state and turns on the timer.
▪ If the timer goes off before the seat belt is fastened, the machine goes into the buzzer
state. If the seat belt goes on first, it enters the belted state. When the person leaves
the seat, the machine goes back to idle
C implementation
#define IDLE 0
#define SEATED 1
#define BELTED 2
#define BUZZER 3
switch (state) {
break;
break;
}
Stream-Oriented Programming and
Circular Buffers
▪ Stream-Oriented Programming and Circular Buffers The data stream style
makes sense for data that comes in regularly and must be processed on the fly.
▪ we would process the samples over a given interval by reading them all in from
a file and then computing the results all at once in a batch process.
▪ The circular buffer is a data structure that lets us handle streaming data in an
efficient way. Figure 5.1 illustrates how a circular buffer stores a subset of the
data stream.
▪ At each point in time, the algorithm needs a subset of the data stream that
forms a window into the stream.
▪ The window slides with time as we throw out old values no longer needed and
add new values. Since the size of the window does not Appearing below are the
declarations for the circular buffer and filter coefficients, assuming that N, the
number of taps in the filter, has been previously defined.
Signal processing and circular buffer
x1 x2 x3 x4 x5 x6
t1 t2 t3
Data stream
x1
x5 x2
x6 x3
x7 x4
Circular buffer
Circular buffers
input d1 use d5
d2 input d2
d3 d3
use d4 d4
▪ Queues are used whenever data may arrive and depart at somewhat
unpredictable times or when variable amounts of data may arrive.
▪ As the name implies, the CDFG has constructs that model both data
operations (arithmetic and other computations) and control operations
(conditionals).
▪ Part of the power of the CDFG comes from its combination of control and
data constructs.
Data flow graph
x = a + b; x = a + b;
y = c - d; y = c - d;
z = x * y; z = x * y;
y = b + d; y1 = b + d;
x = a + b; a b c d
y = c - d;
+ -
z = x * y;
y
y1 = b + d; x
* +
a b c d Partial order:
▪ a+b, c-d; b+d x*y
+ -
Can do pairs of
y operations in any
x
order.
* +
z y1
Control-data flow graph
x = a + b;
y=c+d
Control
T v1 v4
cond value
F v2 v3
Equivalent forms
CDFG example
T
if (cond1) bb1(); cond1 bb1()
else bb2(); F
bb3(); bb2()
switch (test1) {
bb3()
case c1: bb4(); break;
case c2: bb5(); break; c3
c1 test1
case c3: bb6(); break;
c2
} bb4() bb5() bb6()
for loop
HLL
HLL
HLL compile assembly
assembly
assembly assemble
▪ The assembler takes care of instruction formats and does part of the job of
translating labels into addresses.
▪ However, since the pro- gram may be built from many files, the final steps in
determining the addresses of instructions and data are performed by the linker,
which produces an executable binary file.
▪ That file may not necessarily be located in the CPU’s memory, however, unless
the linker happens to create the executable directly in RAM.
▪ The program that brings the program into memory for execution is called a
loader.
▪ The simplest form of the assembler assumes that the starting address of the
assembly language program has been specified by the programmer. The
addresses in such a program are known as absolute addresses.
Multiple-module programs
▪ Major tasks:
▪ generate binary for symbolic instructions;
▪ translate labels into addresses;
▪ handle pseudo-ops (data, etc.).
▪ Pass 1:
▪ generate symbol table
▪ Pass 2:
▪ generate binary instructions
Symbol table
module1
module2
module3
Dynamic linking
▪ Compilation flow.
▪ Basic statement translation.
▪ Basic optimizations.
▪ Interpreters and just-in-time compilers.
Compilation
HLL
machine-independent
optimizations
machine-dependent
optimizations
assembly
Statement translation and optimization
a*b + 5*(c-d) a b c d
* -
expression
5
DFG
Arithmetic expressions, cont’d.
a b c d ADR r4,a
MOV r1,[r4]
1 * 2 - ADR r4,b
MOV r2,[r4]
5 ADD r3,r1,r2
ADR r4,c
3 * MOV r1,[r4]
ADR r4,d
MOV r5,[r4]
SUB r6,r4,r5
4 +
MUL r7,r6,#5
ADD r8,r7,r3
DFG code
Control code generation
if (a+b > 0)
x = 5;
a+b>0 x=5
else
x = 7;
x=7
Control code generation, cont’d.
ADR r5,a
LDR r1,[r5]
ADR r5,b
ADD r3,r1,r2
BLE label3
LDR r3,#5
3 x=7 ADR r5,x
STR r3,[r5]
B stmtent
LDR r3,#7
ADR r5,x
STR r3,[r5]
stmtent ...
Procedure linkage
growth
proc1 proc1(int a) {
proc2(5);
FP }
frame pointer
proc2
5 accessed relative to SP
SP
stack pointer
ARM procedure linkage
a a[0]
a[1] = *(a + 1)
a[2]
Two-dimensional arrays
▪ Column-major layout:
a[0,0]
a[0,1] M
...
N
... a[1,0]
a[1,1] = a[i*M+j]
Structures
aptr
struct { field1 4 bytes
int field1;
char field2; *(aptr+4)
} mystruct; field2
▪ Constant folding:
▪ 8+1 = 9
▪ Algebraic:
▪ a*b + a*c = a*(b+c)
▪ Strength reduction:
▪ a*2 = a<<1
Dead code elimination
▪ Dead code:
#define DEBUG 0 0
0
if (DEBUG) dbg(p1);
▪ Can be eliminated by 1
analysis of control dbg(p1);
flow, constant folding.
Procedure inlining
z = w + x + y;
Loop transformations
▪ Goals:
▪ reduce loop overhead;
▪ increase opportunities for pipelining;
▪ improve memory system performance.
Loop unrolling
for (i=0; i<2; i++) {
a[i*2] = b[i*2] * c[i*2];
a[i*2+1] = b[i*2+1] * c[i*2+1];
}
Loop fusion and distribution
before after
Register allocation
▪ Goals:
▪ choose register to hold each variable;
▪ determine lifespan of varible in the register.
w = a + b; t=1 a
x = c + w; t=2 b
t=3 c
y = c + d; d
w
x
y
1 2 3 time
Instruction scheduling
+ +
* +
* MUL ADD *
expression templates MADD
Using your compiler
▪ Need to understand
performance in detail:
▪ Real-time behavior, not just
typical.
▪ On complex platforms.
▪ Program performance
CPU performance:
▪ Pipeline, cache are windows
into program.
▪ We must analyze the entire
program.
Complexities of program performance
▪ Cache effects.
▪ Instruction-level performance variations:
▪ Pipeline interlocks.
▪ Fetch times.
How to measure program performance
if (a || b) { /* T1 */ a b c path
if ( c ) /* T2 */ 0 0 0 T1=F, T3=F: no assignments
y = r-t; /* A4 */
}
Paths in a loop
N
i=N
Y
f = f + c[i] * x[i]
i=i+1
Instruction timing
▪ Trace-driven:
▪ Instrument the program.
▪ Save information about the path.
i = i+1;
Induction variable elimination
▪ Rather than recompute i*M+j for each array in each iteration, share
induction variable between arrays, increment at end of loop body.
Cache analysis
a[0,0] 1024
1024 4099
▪ Array elements conflict because they are in the same line, even if not
mapped to same location.
▪ Solutions:
▪ move one array;
▪ pad array.
Performance optimization hints
while (TRUE)
a();
Sources of energy consumption
▪ First-order optimization:
▪ high performance = low energy.
▪ General rules:
▪ Don’t use function calls.
▪ Keep loop body small to enable local repeat (only forward branches).
▪ Use unsigned integer for loop counter.
▪ Use <= to test loop counter.
▪ Make use of compiler---global optimization, software pipelining.
Single-instruction repeat loop
example
STM #4000h,AR2
; load pointer to source
STM #100h,AR3
; load pointer to destination
RPT #(1024-1)
MVDD *AR2+,*AR3+
; move
Optimizing for program size
▪ Goal:
▪ reduce hardware cost of memory;
▪ reduce power consumption of memory units.
▪ Two opportunities:
▪ data;
▪ instructions.
Data size minimization
▪ Possible criteria:
▪ Execute every statement
at least once.
not covered
▪ Execute every branch
direction at least once.
▪ Equivalent for
structured programs.
▪ Not true for gotos.
Basis paths
▪ Approximate CDFG
with undirected graph.
▪ Undirected graphs
have basis paths:
▪ All paths are linear
combinations of basis
paths.
Cyclomatic complexity
▪ Cyclomatic complexity
is a bound on the size
of basis sets:
▪ e = # edges
▪ n = # nodes
▪ p = number of graph
components
▪ M = e – n + 2p.
Branch testing
▪ Correct: ▪ Test:
▪ if (a || (b >= c)) ▪a=F
{ printf(“OK\n”); } ▪ (b >=c) = T
▪ Incorrect: ▪ Example:
▪ if (a && (b >= c)) ▪ Correct: [0 || (3 >= 2)] =
{ printf(“OK\n”); } T
▪ Incorrect: [0 && (3 >= 2)]
=F
Another branch testing example
▪ Variable def-use:
▪ Def when value is
assigned (defined).
▪ Use when used on right-
hand side.
▪ Exercise each def-use
pair.
▪ Requires testing correct
path.
Loop testing
▪ Random tests.
▪ May weight distribution based on software specification.
▪ Regression tests.
▪ Tests of previous versions, bugs, etc.
▪ May be clear-box tests of previous versions.
How much testing is enough?
▪ Software modem.
Theory of operation
▪ Frequency-shift keying:
▪ separate frequencies for 0 and 1.
0 1
time
FSK encoding
0110101 bit-controlled
waveform
generator
FSK decoding
A/D converter
zero filter detector 0 bit
Line-in* Receiver
1 1
sample-in()
input()
bit-out()
Transmitter Line-out*
1 1
bit-in()
output()
sample-out()
System architecture
▪ Transmitter.
▪ Receiver.
Transmitter
time
Receiver
▪ CPU.
▪ A/D converter.
▪ D/A converter.
▪ Timer.
Component design and testing