M16C Optimization options

advertisement
M16C Optimization Options
Optimization Levels [-O1] to [-O5]
1. Introduction
This document explores the optimization level [-O1] to [-O5]. These compiler options are used for source
code optimization, reduce ROM size and increase execution speed. These options can be found in HEW
under the Build  Renesas M16C Standard Toolchain menu as shown in the image below.
Figure 1
Optimization Level Menu
These options optimize the code, for speed and ROM size. Open HEW and click on Build  Renesas M16C
Standard Toolchain  C  Category. Select Optimize from the drop down menu. This opens the
optimization dialog box. Select one of the following optimization levels from the Optimization level drop
down menu.
[-O1] Makes –O3, -ONB, -ONBSD, -ONFCF and –ONS valid
[-O2] Makes no difference with –O1
[-O3] Optimizes speed and ROM size to the maximum
[-O4] Makes –O3 and –Oconst valid
[-O5] Effect the best possible optimization
2. Optimization level [-O3]
Optimization level [-O3] includes [-O1] and [-O2] levels.
Select [-O3] option from the Optimization level drop down menu.
The [-O3] option performs the following:
1
V1.04
2.1 Remove meaningless comparison statements
In the example below, the ucComparison variable is already initialized, hence the “if” statement is
meaningless. Therefore, the assembly code for this “if” statement is removed after enabling the
optimization and the code size is reduced by 16 bytes with the execution speed increasing by 26
machine cycles.
Code before optimization
(Size: 22 bytes, Speed: 35 Machine cycles1)
Code after optimization
(Size: 6 bytes, Speed: 9 Machine cycles1)
void main(void){
F0028 7CF202 _main
ENTER #02H
volatile uchar8 ucComparison = 5;
F002B C605FF MOV.B #05H,-1H[FB]
volatile uchar8 ucCounter = 0;
F002E B6FE
MOV.B #0,-2H[FB]
if(ucComparison == 5)
F0030 0AFF
MOV.B -1H[FB],R0L
F0032 B3
MOV.B #0,R0H
F0033 D150
CMP.W #5H,R0
F0035 6E04
JNE
F0022H
ucCounter++;
F0037 A6FE
INC.B -2H[FB]
F0039 61
JMP.S F0024H
else
ucCounter = 0;
F003A B6FE
MOV.B #0,-2H[FB]
}
F003C 7DF2
EXITD
void main(void){
F0028 C405 _main MOV.B
#05h,R0L
volatile uchar8 ucComparison =
5;
volatile uchar8 ucCounter = 0;
F002A B4
MOV.B #0,R0L
if(ucComparison == 5)
ucCounter++;
F002B C401 MOV.B #01H,R0L
else
ucCounter = 0;
}
F002D F3
RTS
Example 1 Remove Meaningless Comparison Statements
*1– Code execution speed is calculated as per the information provided in software Manual “M16C/60,
M16C/20, M16C/Tiny Series Rev. 4.00 Revision date: Jan 21, 2004”.
Note:
Optimization option [-OR] also performs the same optimization.
2.2 Remove dead code
In the example below, the C code does not use any of the microcontroller specific feature/function.
Hence, it is dead code. After optimization, the assembly code is not generated for this C code, thus
reducing the code size by 21 bytes and improving execution speed by 29 machine cycles.
2
V1.04
Code before optimization
(Size: 22 bytes, Speed: 35 Machine cycles)
Code after optimization
(Size: 1 Byte, Speed: 6 Machine cycles)
void main(void){
F0028 7CF202 _main
ENTER #02H
uchar8 ucComparison = 5;
F002B C605FF MOV.B #05H,-1H[FB]
uchar8 ucCounter = 0;
F002E B6FE
MOV.B #0,-2H[FB]
if(ucComparison == 5)
F0030 0AFF
MOV.B -1H[FB],R0L
F0032 B3
MOV.B #0,R0H
F0033 D150
CMP.W #5H,R0
F0035 6E04
JNE
F0022H
ucCounter++;
F0037 A6FE
INC.B -2H[FB]
F0039 61
JMP.S F0024H
else
ucCounter = 0;
F003A B6FE
MOV.B #0,-2H[FB]
}
F003C 7DF2
EXITD
void main(void){
uchar8 ucComparison = 5;
uchar8 ucCounter = 0;
if(ucComparison == 5)
ucCounter++;
else
ucCounter = 0;
}
F0028 F3
main
RTS
F0029 04
NOP
Example 2 Remove Dead Code
Note:
Optimization option [-OR] also performs the same optimization.
To suppress dead code optimization, use the type qualifier “volatile”. As shown in the example
below, using “volatile” will prevent removal of the code.
Code before using volatile
(Size: 1 byte, Speed: 6 Machine cycles)
Code after using volatile
(Size: 5 bytes, Speed: 11 Machine cycles)
void main(void){
uchar8 ucData;
ucData = 5;
}
F0028 F3 main
RTS
F0029 04 NOP
void main(void){
volatile uchar8 ucData;
ucData = 5;
F002B C605FF MOV.B #05H,-1H[FB]
}
F002E 7DF2
EXITD
Example 3 Suppression of Dead Code Optimization using Volatile
2.3 Allocate CPU Registers to variables
As shown in the Example 1 before optimization the compiler stores the variable ucComparison into
the stack area. However, after optimization the compiler stores the variable ucComparison into the
CPU register ROL. Hence, the code size is reduced by 16 bytes and execution speed is faster by 26
machine cycles.
Note:
The compiler will not allocate a CPU register to a variable if the compiler does not have enough
free CPU registers.
2.4 Grouping of bit manipulation
In this optimization setting the compiler assigns a constant value to the bit fields mapped to the
same memory area and used in the same routine using a single instruction.
As shown in the example below, before optimization the compiler generates three instructions to
write the values in bit fields variable; cBit0 and cBit, but after optimization compiler generates two
instructions to write the values in the variable cBit0 and cBit. Hence, the code size is reduced by 4
bytes and execution speed is faster by 3 machine cycles.
3
V1.04
Code before Optimization
(Size: 13 bytes, Speed: 15 Machine cycles)
Code after Optimization
(Size: 9 bytes, Speed: 12 Machine cycles)
struct bit {
struct bit {
char cBit0 : 1;
char cBit0 : 1;
char cBit1 : 1;
char cBit1 : 1;
char cBit : 2;
char cBit : 2;
};
};
#pragma BIT sflag
#pragma BIT sflag
struct bit sflag;
struct bit sflag;
void main (void){
void main (void){
sflag.cBit0 = 1 ;
sflag.cBit0 = 1 ;
F0010 7E9FE020_main BSET 0,041CH
F0010 9F051C04_main
OR.B #05H,041CH
sflag.cBit = 1 ;
F0014 97F31C04
AND.B #F3H,041CH sflag.cBit = 1 ;
F0018 7E9FE220
BSET 2,041CH
F0014 7E8FE320 BCLR 3,041CH
}
}
F001C F3
RTS
F0018 F3
RTS
Example 4 Grouping of Bit Manipulation
Note:
Grouping of bit manipulation instruction on I/O variables is not suitable, use [-ONB] option to
suppress the grouping of bit manipulation instruction. As shown in the example below, structure
variable sflag has an I/O reference. Hence, the [-ONB] option is used to suppress the optimization.
Code before [-ONB] enable
(Size: 11 bytes, Speed: 14 Machine cycles)
Code after [-ONB] enable
(Size: 15 bytes, Speed: 17 Machine cycles)
struct bit {
char cBit0 : 1;
char cBit1 : 1;
char cBit2 : 2;
char cBit3 : 1;
};
#pragma BIT sflag
#pragma ADDRESS sflag 006ch
struct bit sflag;
void main (void){
sflag.cBit0 = 1 ;
F0010 0B6C00 _main
MOV.B 006CH,R0L
F0013 94F2
AND.B #F2H,R0L
F0015 9C05
OR.B #05H,R0L
F0017 036C00 MOV.B R0L,006CH
sflag.cBit2 = 1;
}
F001A F3
RTS
struct bit {
char cBit0 : 1;
char cBit1 : 1;
char cBit2 : 2;
char cBit3 : 1;
};
#pragma BIT sflag
#pragma ADDRESS sflag 006ch
struct bit sflag;
void main (void){
sflag.cBit0 = 1 ;
F0010 7E9F6003 _main
BSET 0,006CH
sflag.cBit2 = 1;
F0014 0B6C00 MOV.B 006CH,R0L
F0017 94F3
AND.B #F3H,R0L
F0019 9C04
OR.B #04H,R0L
F001B 036C00 MOV.B R0L,006CH
}
F001E F3
RTS
Example 5 Suppression of Grouping of Bit Manipulation using [-ONB]
3. Optimization level [-O4]
Select [-O4] option from Optimization level drop down menu.
Optimization level [-O4] includes [-O3] option.
The [-O4] option performs the following:
3.1 Replace the reference of a constant variable with a constant
In the example below, before optimization the compiler generates 11 bytes of code to copy the
constant data cData to the variable cResult. However, after optimization the compiler replaces the
4
V1.04
reference of the constant data with the constant value 5, hence generates 8 bytes of code.
Therefore, code size is reduced by 3 bytes and execution is faster by 3 machine cycles.
Code before optimization
(Size: 11 bytes, Speed: 18 Machine cycles)
Code after optimization
(Size: 8 bytes, Speed: 15 Machine cycles)
const char cDATA = 5;
const char cDATA = 5;
void main(void){
void main(void){
F002A 7CF201 _main ENTER #01H
F002A 7CF201 _main ENTER #01H
volatile char cResult;
volatile char cResult;
cResult = cDATA;
cResult = cDATA;
F002D 748BFF14000F
F002D C605FF
LDE.B F0000H,-1H[FB]
MOV.B #05H,-1H[FB]
}
}
F0033 7DF2 EXITD
F0030 7DF2 EXITD
Example 6 Replace the Reference of a Constant Variable with a Constant Data
3.2 Optimizing the standard library functions
In this optimization setting the compiler uses the optimized standard library functions in place of the
standard library functions.
As shown in the example below, before optimization the compiler calls the standard library function
strcpy() to copy the string into the array cRead. However, after optimization the compiler calls the
optimized routine _n_n_st to copy the string. Hence, execution speed is faster by 120 machine
cycles and code size is reduced by 45 bytes.
Code before Optimization
(Size: 129 bytes, Speed: 145 Machine cycles)
Code after Optimization
(Size: 84 bytes, Speed: 73 Machine cycles)
#include "string.h"
void main(void){
F0010 7CF228 _main ENTER #28H
char cData[20] = "Optimization";
F0013 C64FEC MOV.B #4FH,-14H[FB]
F0016 C670ED MOV.B #70H,-13H[FB]
F0019 C674EE MOV.B #74H,-12H[FB]
F001C C669EF MOV.B #69H,-11H[FB]
F001F C66DF0 MOV.B #6DH,-10H[FB]
F0022 C669F1 MOV.B #69H,-FH[FB]
F0025 C67AF2 MOV.B #7AH,-EH[FB]
F0028 C661F3 MOV.B #61H,-DH[FB]
F002B C674F4 MOV.B #74H,-CH[FB]
F002E C669F5 MOV.B #69H,-BH[FB]
F0031 C66FF6 MOV.B #6FH,-AH[FB]
F0034 C66EF7 MOV.B #6EH,-9H[FB]
F0037 B6F8
MOV.B #0,-8H[FB]
F0039 B6F9
MOV.B #0,-7H[FB]
F003B B6FA
MOV.B #0,-6H[FB]
F003D B6FB
MOV.B #0,-5H[FB]
F003F B6FC
MOV.B #0,-4H[FB]
F0041 B6FD
MOV.B #0,-3H[FB]
F0043 B6FE
MOV.B #0,-2H[FB]
F0045 B6FF
MOV.B #0,-1H[FB]
char cRead[20];
strcpy(cRead,cData);
F0047 7DE20000 PUSH.W #0000H
F004B 7D9BEC
PUSHA -14H[FB]
F004E 7DE20000 PUSH.W #0000H
F0052 7D9BD8
PUSHA -28H[FB]
F0055 FD28070F
JSR.A _strcpy F0728H
F0059 7CEB08 ADD.B #8H,SP
#include "string.h"
void main(void){
F0010 7CF228 _main ENTER #28H
char cData[20]= "Optimization";
F0013 75CBEC4F70
MOV.W #704FH,-14H[FB]
F0018 75CBEE7469
MOV.W #6974H,-12H[FB]
F001D 75CBF06D69
MOV.W #696DH,-10H[FB]
F0022 75CBF27A61
MOV.W #617AH,-EH[FB]
F0027 75CBF47469
MOV.W #6974H,-CH[FB]
F002C 75CBF66F6E
MOV.W #6E6FH,-AH[FB]
F0031 D90BF8 MOV.W #0H,-8H[FB]
F0034 D90BFA MOV.W #0H,-6H[FB]
F0037 D90BFC MOV.W #0H,-4H[FB]
F003A D90BFE MOV.W #0H,-2H[FB]
char cRead[20];
strcpy(cRead,cData);
F003D EB2BEC MOVA -14H[FB],R2
F0040 EB1BD8 MOVA -28H[FB],R1
F0043 FD14070F
JSR.A $_n_n_st F0714H
}
F0047 7DF2
EXITD
5
V1.04
}
F005C 7DF2
EXITD
Example 7 Optimization of Standard Library Functions
Note:
Use [-ONS] option to suppress the optimization of standard library function.
Optimization option [-OR] and [-OS] performs the same optimization.
4. Optimization level [-O5]
Select [-O5] option from Optimization level drop down menu.
The [–O5] option performs the following:
4.1 Optimization of bit manipulation instructions
As shown in the example below, before optimization the compiler generates two instructions to test
and clear the value of the variable sData.cBit0. While after optimization the compiler uses a special
Bit Test and Clear instruction (BTSTC) instruction to test and clear the value of sData.cBit0. Hence,
the code size is reduced by 4 bytes and execution speed is faster by 2 machine cycles.
Code before optimization
(Size: 11 bytes, Speed: 14 Machine cycles)
Code after optimization
(Size: 7 bytes, Speed: 12 Machine cycles)
#pragma ADDRESS sData 006Ch
struct {
char cBit0 : 1;
char cBit1 : 1;
} sData;
void main(){
while(sData.cBit0==0)
F0010 7EBF6003_main BTST 0,006CH
F0014 6AFB
JEQ _main F0010H
;
sData.cBit0 = 0;
F0016 7E8F6003 BCLR 0,006CH
}
F001A F3
RTS
#pragma ADDRESS sData 006Ch
struct {
char cBit0 : 1;
char cBit1 : 1;
} sData;
void main(){
while(sData.cBit0==0)
;
sData.cBit0 = 0;
F0010 7E0F6003_main
BTSTC 0,006CH
F0014 6AFB JEQ _main F0010H
}
F0016 F3
RTS
Example 8 Optimization of Bit Manipulation Instructions
4.2 Optimization of the pointer
As shown in the example below, after optimization the compiler optimizes the pointer operations on
the iRefData. Hence, the code size reduces by 10 bytes and execution speed increases by 10 machine
cycles.
Code before optimization
(Size: 39 bytes, Speed: 36 Machine cycles)
Code after optimization
(Size: 29 bytes, Speed: 26 Machine cycles)
int iData = 3;
int *pData = &iData;
void main(void){
F0014 7CF202 _main
ENTER #02H
int iRefData;
*pData = 9;
F0017 73F40204 MOV.W 0402H,A0
F001B 75C60900 MOV.W #0009H,[A0]
iData = 10;
F001F 75CF00040A00
MOV.W #000AH,0400H
iRefData = *pData;
F0025 73F40204 MOV.W 0402H,A0
F0029 736BFE
MOV.W [A0],-2H[FB]
int iData = 3;
int *pData = &iData;
void main(void){
int iRefData;
*pData = 9;
F0014 73F40204 _main
MOV.W 0402H,A0
F0018 75C60900
MOV.W #0009H,[A0]
iData = 10;
F001C 75CF00040A00
MOV.W #000AH,0400H
iRefData = *pData;
F0022 7360 MOV.W [A0],R0
6
V1.04
if(iRefData == 9)
if(iRefData == 9)
F002C 778BFE0900
F0024 77800900
CMP.W #0009H,-2H[FB]
CMP.W #0009H,R0
F0031 DF09000414
F0028 DF09000414
STZX #09H,#14H,0400H
STZX #09H,#14H,0400H
F0036 B70104
MOV.B #0,0401H
F002D B70104 MOV.B #0,0401H
iData = 9;
iData = 9;
else
else
iData = 20;
iData = 20;
}
}
F0039 7DF2
EXITD
F0030 F3
RTS
Example 9 Optimization of Pointer
5. Effect of Optimization
ROM usage:
Decreases
Execution speed: Increases
Stack usage:
Decreases
6. References
For more details, please refer to the Compiler User Manual (nc30ue.pdf).
7
V1.04
Revision History
Ver.
No.
1.01
Date
2007/7/04
Section No.
1
1
2.1,2.2,2.3
2.4
1.02
1.03
1.04
2007/8/13
2007/8/30
2008/01/10
2
3
3.2
4
4.2
6
1,2,3,4
1,2,3,4,5,6
1
2.2
2.2, 2.3, 3.1, 3.2,
4.1, 4.2
Changes
Heading change “Function” to
“Introduction”
Format change
3 examples of the option “O1” are
combined into one option “O3”
Miscellaneous options [-ONS], [ONFCF], [-ONB], removed.
Format change
Format change
2 examples for option “–O4”
Format change
2 examples for option “ –O5”
New section added “References”
Bytes changed to bytes
Format and grammar
Remove “” and path is shown using 
Description reframe
Changed “execution increases by” to
“ execution is faster by”
8
Reason for Changes
RTA and RSO comments
RTA and RSO comments
RTA and RSO comments
RTA and RSO comments
RTA and RSO comments
RTA and RSO comments
RTA and RSO comments
RTA and RSO comments
RTA and RSO comments
RTA and RSO comments
RTA and RSO comments
RTA and RSO comments
RTA and RSO comments
KPIT review comments
RTA review
V1.04
Download