Writing Optimized C Code For Microcontroller Applications
Writing Optimized C Code For Microcontroller Applications
By Wilson Chan
Toshiba America Electronics Components, Inc.
Email: wilson.chan@taec.toshiba.com
INTRODUCTION
If you have a microcontroller project that requires a small program, or the
application has very limited memory resource, you may prefer to use Assembly language
for programming. Nowadays, as the performance of microcontrollers has been
improving, application systems have become larger and more complicated. As a result,
programs can no longer be coded in Assembly language easily. To improve development
efficiency, many microcontroller based products are programmed in C. Generally, when
programs are written in C and compiled by a C compiler, the code efficiency decreases
compared to an Assembly language program. In order to improve code efficiency, most
C compilers make use of optimization techniques. Often, the output code is optimized
for size, or execution speed, or both. Besides relying on the C compiler to generate
efficient code, the programmer can lend a helping hand to the C compiler by adopting
certain programming styles. This paper provides an overview of common optimizing
techniques used by C compilers and recommend C programming guidelines that will
result in optimized code for microcontroller applications.
PROGRAMMING MODEL
Some microcontrollers do not have hardware support for a C stack. If you plan to
develop your embedded applications in C, you should select a microcontroller with a
stack-based architecture. If the microcontroller has dedicated address-specifying/index
registers, they will also help the C compiler to generate more efficient code.
In this paper, well use a C compiler for a microcontroller which has a
programming model as shown in Figure 1 to illustrate the effect of various optimization
methods on the quality of the generated machine code. The W, A, B, C, D, E, H, L
registers are 8-bit general purpose registers. They can be used in pairs as four 16-bit
general purpose registers: WA, BC, DE and HL. The IX and IY registers are specialpurpose 16-bit registers used as address-specifying registers under register indirect
addressing mode and as index registers under index addressing mode. The SP register is
a 16-bit stack pointer. The PC register is a 16-bit program counter. The PSW is a 16-bit
program status word register. JF is the jump status flag, ZF is the zero flag, CF is the
carry flag, HF is the half carry flag, SF is the sign flag, and VF is the overflow flag.
W
B
D
H
A
C
E
L
IX
IY
SP
PC
PSW
General-Purpose :
Special-Purpose :
8bit X 8
Without Optimization
1 _test:
2
ld
3
ld
4
add
5
ld
6
ld
7
ld
8
ld
9
inc
10
inc
11
ld
12
ld
13
ld
14
add
15
ld
16
ret
WA,0x1
BC,0x2
WA,BC
(_i),WA
WA,0x1
IY,_a
(IY),WA
WA
WA
IX,_b
(IX),WA
WA,(IY)
WA,(IX)
(_c),WA
With Optimization
1 _test:
2
ld
3
ld
4
ld
5
ld
6
ld
7
ld
8
ld
9
ld
10
ret
WA,0x3
(_i),WA
WA,0x1
(_a),WA
WA,0x3
(_b),WA
WA,0x4
(_c),WA
Dead-Code Elimination
This optimization method deletes unused variables at compile time. Consider the
C program example in Figure 3. With dead-code elimination optimization, the C
compiler eliminates the C statement in line 3.
C Language Program
1 int test(){
2
int a;
3
a = 1;
4
return 0;
5 }
Without Optimization
1 _test:
2
ld
3
xor
4
ret
WA,0x1
WA,WA
With Optimization
1 _test:
2
xor
3
ret
WA,WA
Strength Reduction
This optimization method replaces expensive operations with less expensive ones.
Consider the C program example in Figure 4. The most efficient code is a left-shift
instead of an integer multiplication. Without optimization, the generated code makes a
call to a multiplication function supplied by the C run-time library to compute the
multiplication which takes much longer than a left-shift operation.
C Language Program
1 int i;
2 test() {
3
i *= 2;
4 }
Without Optimization
1 _test:
2
ld
3
ld
4
cal
5
ld
6
ret
BC,0x2
WA,(_i)
C87C_muli
(_i),WA
With Optimization
1 _test:
2
ld
3
ld
4
shlca
5
ld
6
ret
IY,_i
WA,(IY)
WA
(IY),WA
C Language Program
1 int test(int *a, int *b, int i)
2 {
3
return(a[i+1] + b[i+1]);
4 }
Without Optimization
1 _test:
2
ld
3
inc
4
shlca
5
ld
6
add
7
ld
8
inc
9
shlca
10
ld
11
add
12
ld
13
add
14
ret
WA,(SP+0x7)
WA
WA
IX,WA
IX,(SP+0x3)
WA,(SP+0x7)
WA
WA
DE,WA
DE,(SP+0x5)
WA,(DE)
WA,(IX)
With Optimization
1 _test:
2
ld
3
inc
4
shlca
5
ld
6
add
7
ld
8
add
9
ld
10
add
11
ret
WA,(SP+0x7)
WA
WA
IX,WA
IX,(SP+0x3)
DE,WA
DE,(SP+0x5)
WA,(DE)
WA,(IX)
Code Motion
This optimization method is often used to optimize loops. Generally speaking,
most of the program execution time is spent in loops. Therefore, it is important for C
compilers to provide optimization for loops. Consider the C program example in Figure
6. First, the invariant operation, b + c, is moved outside of the loop. Second, array
address calculations that use an induced variable (updated at each iteration) are reduced
to incrementing an accumulator. The optimized code will not only be smaller in size (26
bytes versus 39 bytes), but will also execute much faster.
C Language Program
1 int a[10], b, c;
2 test(){
3
int i;
4
for (i = 0; i < 10; i++)
5
a[i] = b + c;
6 }
Without Optimization
1 _test:
2
ld
3
cmp
4
j
5 L2:
6
ld
7
shlca
8
ld
9
add
10
ld
11
add
12
ld
13
inc
14
cmp
15
j
16 L1:
17
ret
BC,0x0
BC,0xa
sge,L1
WA,BC
WA
DE,WA
DE,_a
WA,(_b)
WA,(_c)
(DE),WA
BC
BC,0xa
slt,L2
With Optimization
1 _test:
2
ld
3
add
4
ld
5
ld
6
add
7 L2:
8
ld
9
inc
10
inc
11
cmp
12
j
13
ret
BC,(_b)
BC,(_c)
WA,_a
DE,WA
WA,0x14
(DE),BC
DE
DE
DE,WA
lt,L2
Without Optimization
1 _test:
2
ld
3
cmp
4
j
5
ld
6
ret
7 L1:
8
ld
9
dec
10
ret
WA,(_a)
WA,0x1
t,L1
WA,(_a)
WA,(_a)
WA
With Optimization
1 _test:
2
ld
3
cmp
4
j
5
ret
6 L1:
7
dec
8
ret
WA,(_a)
WA,0x1
t,L1
WA
C Language Program 1
C Language Program 2
1 unsigned char j;
2 test1(unsigned char i) {
3
switch(i) {
4
case 1:
5
j = 1;
6
break;
7
case 2:
8
j = 2;
9
break;
10
case 3:
11
j = 3;
12
break;
13
default:
14
break;
15
}
16 }
1 unsigned char j;
2 test2(unsigned char i) {
3
switch(i) {
4
case 1:
5
j = 1;
6
break;
7
case 2:
8
j = 2;
9
break;
10
case 3:
11
j = 3;
12
break;
13
case 4:
14
j = 4;
15
break;
16
default:
17
break;
18
}
19 }
With Optimization
1 _test1:
2
ld
3
ld
4
cmp
5
j
6
cmp
7
j
8
cmp
9
j
10
ld
11
ret
12 L9:
13
ld
14
ret
15 L10:
16
ld
17 L6:
18
ret
A,(SP+0x3)
W,0x0
WA,0x3
t,L10
WA,0x2
t,L9
WA,0x1
f,L6
(_j),0x1
(_j),0x2
(_j),0x3
With Optimization
1 S50000:
2
db
3
db
4
db
5
db
6 _test2:
7
ld
8
ld
9
dec
10
cmp
11
j
12
ld
13
add
14
ld
15
ld
16 L14:
17
ret
1
2
3
4
A,(SP+0x3)
W,0x0
WA
WA,0x3
gt,L14
IX,S50000
IX,WA
A,(IX)
(_j),A
With Optimization
1 _test:
2
ld
3
clr
4
set
5
ret
IY,_a
(IY).0
(IY).2
Level Function
0
Minimum optimization (default)
Stack release absorption. Branch instruction optimization.
Deletion of unnecessary instructions
1
Basic block optimization
Propagation of copying restricted ranges.
Gathering of common partial expressions in restricted ranges.
2
Optimization of more than basic blocks
Propagation of copying whole functions.
Gathering of common partial expressions of whole functions
3
Maximum optimization
Loop optimization and other miscellaneous optimization
Figure 12 C Compiler's Optimization Options Example
The C compiler may have an option that minimizes program size. Use it if it is available.
A possible side effect of this option is that in certain situations, the resultant code may be
smaller in terms of number of bytes, but it may execute slower than the code generated
without specifying the option.
Format
-XS
Function
Specifies the output of minimum object code size.
Description
When this option is specified, part of optimization is skipped. The
default, when this option is not specified, is the output of code with execution
speed priority.
Figure 13 C Compiler Code Size Optimization Example
Example:
int a0 = 0, a1; /* default
memory area */
int __tiny at0 = 0, __tiny at1;
/* to tiny area */
void fcn(void) {
a1 = a0;
at1 = at0;
}
Opcode
_fcn:
; 16-bit address offset
E1000048 R
ld WA,(_a0)
F1000068 R ld (_a1),WA
; 8-bit address offset
E00048
R
ld WA,(_at0)
F00068
R
ld (_at1),WA
FA
ret
C Language Program 1
C Language Program 2
With Optimization
_test:
With Optimization
_test:
ld
(_j),0x0
ld
ld
ld
inc
ld
ld
inc
cmp
j
ret
IY,_a
DE,(IY)
WA,(IY)
WA
(IY),WA
(DE),0x0
(_j)
(_j),0x64
lt,L2
L2:
ld
xor
BC,(_a)
A,A
ld
inc
ld
inc
cmp
j
ret
DE,BC
BC
(DE),0x0
A
A,0x64
lt,L6
L6:
Example:
struct field {
unsigned char a:1;
unsigned char b:3;
unsigned char c:3;
unsigned char d:1;
};
struct field array[10];
_fcn:
ld
ld
ld
add
IY,_array
IX,IY
BC,IY
IY,0xa
ld
ld
and
or
ld
or
inc
inc
cmp
j
ret
DE,BC
A,(DE)
A,0x8f
A,0x50
(DE),A
(IX),0xe
BC
IX
IX,IY
lt,L4
L4:
void fcn( ) {
unsigned char i;
for (i=0; i < 10; i++) {
array[i].b = 5;
array[i].c = 7;
}
}
Example:
char a_src[41] = {"Hello"};
char a_des[41];
void testcpy(void) {
register char *p_src = a_src;
register char *p_des = a_des;
while (*p_src)
*p_des++ = *p_src++;
*p_des = '\0';
}
_testcpy:
push
ld
ld
j
L2:
ld
inc
ld
inc
ld
ld
L3:
cmp
j
ld
pop
ret
HL
IY,_a_src
IX,_a_des
L3
DE,IX
IX
HL,IY
IY
A,(HL)
(DE),A
(IY),0x0
f,L2
(IX),0x0
HL
Example:
int g1, g2, g3, g4, sum;
int __adecl RegParm( int p1,
int p2, int p3, int p4) {
g1 = p1;
g2 = p2;
g3 = p3;
g4 = p4;
return(p1+p2);
}
void test( ) {
sum = RegParm(2, -2, 88, 88);
}
.RegParm:
ld
ld
ld
ld
ld
add
pop
pop
j
_test:
ld
push
ld
ld
ld
cal
ld
ret
(_g1),WA
(_g2),BC
(_g3),DE
DE,(SP+0x3)
(_g4),DE
WA,BC
DE
BC
DE
WA,0xffa8
WA
DE,0x58
WA,0x2
BC,0xfffe
.RegParm
(_sum),WA
Example:
float array4[10], vf;
fcn1( ) {
unsigned char i;
for (i=0; i < 10; i++)
array4[i] = vf;
}
_fcn1:
push
push
ld
L2:
ld
ld
shlca
shlca
ld
ld
add
ld
ld
cal
inc
cmp
j
pop
pop
ret
DE
HL
(SP+0x4),0x0
A,(SP+0x4)
W,0x0
WA
WA
BC,_array4
HL,WA
HL,BC
BC,_vf
WA,HL
._fld_ff
(SP+0x4)
(SP+0x4),0xa
lt,L2
HL
DE
Example:
char c1;
int i1, i2, i3;
_fcn1:
fcn1( ) {
i1 = c1 + i3;
}
fcn2( ) {
i1 = i2 + i3;
}
ld
test
subb
add
ld
ret
A,(_c1)
A.7
W,W
WA,(_i3)
(_i1),WA
ld
add
ld
ret
WA,(_i2)
WA,(_i3)
(_i1),WA
_fcn2:
Example:
struct s1 {
char *text;
int count;
};
extern struct s1 ays1[5];
int sum;
_SumCount:
ld
xor
ld
ld
cmp
j
L2:
ld
add
inc
add
cmp
j
L1:
ret
_test3:
ld
push
ld
push
cal
ld
ld
ret
IX,(SP+0x3)
IY,IY
WA,IY
DE,(SP+0x5)
DE,0x0
sle,L1
BC,(IX+0x2)
WA,BC
IY
IX,0x4
IY,DE
slt,L2
WA,0x5
WA
WA,_ays1
WA
_SumCount
SP,SP+0x4
(_sum),WA
CONCLUSION
We have described some of the commonly used optimizing methods of C
compilers. Not all C compilers are created equal. Therefore, it is important to choose a
C compiler that incorporates good optimizing methods in generating code for your
particular microcontroller. We have also discussed a set of C programming guidelines
that will help improve code efficiency of your microcontroller applications.