M16C Optimization Options Optimization Levels [-O1] to [-O5] 1. Introduction This document explores the optimization level [-O1] to [-O5]. These compiler options are used for source code optimization, reduce ROM size and increase execution speed. These options can be found in HEW under the Build Renesas M16C Standard Toolchain menu as shown in the image below. Figure 1 Optimization Level Menu These options optimize the code, for speed and ROM size. Open HEW and click on Build Renesas M16C Standard Toolchain C Category. Select Optimize from the drop down menu. This opens the optimization dialog box. Select one of the following optimization levels from the Optimization level drop down menu. [-O1] Makes –O3, -ONB, -ONBSD, -ONFCF and –ONS valid [-O2] Makes no difference with –O1 [-O3] Optimizes speed and ROM size to the maximum [-O4] Makes –O3 and –Oconst valid [-O5] Effect the best possible optimization 2. Optimization level [-O3] Optimization level [-O3] includes [-O1] and [-O2] levels. Select [-O3] option from the Optimization level drop down menu. The [-O3] option performs the following: 1 V1.04 2.1 Remove meaningless comparison statements In the example below, the ucComparison variable is already initialized, hence the “if” statement is meaningless. Therefore, the assembly code for this “if” statement is removed after enabling the optimization and the code size is reduced by 16 bytes with the execution speed increasing by 26 machine cycles. Code before optimization (Size: 22 bytes, Speed: 35 Machine cycles1) Code after optimization (Size: 6 bytes, Speed: 9 Machine cycles1) void main(void){ F0028 7CF202 _main ENTER #02H volatile uchar8 ucComparison = 5; F002B C605FF MOV.B #05H,-1H[FB] volatile uchar8 ucCounter = 0; F002E B6FE MOV.B #0,-2H[FB] if(ucComparison == 5) F0030 0AFF MOV.B -1H[FB],R0L F0032 B3 MOV.B #0,R0H F0033 D150 CMP.W #5H,R0 F0035 6E04 JNE F0022H ucCounter++; F0037 A6FE INC.B -2H[FB] F0039 61 JMP.S F0024H else ucCounter = 0; F003A B6FE MOV.B #0,-2H[FB] } F003C 7DF2 EXITD void main(void){ F0028 C405 _main MOV.B #05h,R0L volatile uchar8 ucComparison = 5; volatile uchar8 ucCounter = 0; F002A B4 MOV.B #0,R0L if(ucComparison == 5) ucCounter++; F002B C401 MOV.B #01H,R0L else ucCounter = 0; } F002D F3 RTS Example 1 Remove Meaningless Comparison Statements *1– Code execution speed is calculated as per the information provided in software Manual “M16C/60, M16C/20, M16C/Tiny Series Rev. 4.00 Revision date: Jan 21, 2004”. Note: Optimization option [-OR] also performs the same optimization. 2.2 Remove dead code In the example below, the C code does not use any of the microcontroller specific feature/function. Hence, it is dead code. After optimization, the assembly code is not generated for this C code, thus reducing the code size by 21 bytes and improving execution speed by 29 machine cycles. 2 V1.04 Code before optimization (Size: 22 bytes, Speed: 35 Machine cycles) Code after optimization (Size: 1 Byte, Speed: 6 Machine cycles) void main(void){ F0028 7CF202 _main ENTER #02H uchar8 ucComparison = 5; F002B C605FF MOV.B #05H,-1H[FB] uchar8 ucCounter = 0; F002E B6FE MOV.B #0,-2H[FB] if(ucComparison == 5) F0030 0AFF MOV.B -1H[FB],R0L F0032 B3 MOV.B #0,R0H F0033 D150 CMP.W #5H,R0 F0035 6E04 JNE F0022H ucCounter++; F0037 A6FE INC.B -2H[FB] F0039 61 JMP.S F0024H else ucCounter = 0; F003A B6FE MOV.B #0,-2H[FB] } F003C 7DF2 EXITD void main(void){ uchar8 ucComparison = 5; uchar8 ucCounter = 0; if(ucComparison == 5) ucCounter++; else ucCounter = 0; } F0028 F3 main RTS F0029 04 NOP Example 2 Remove Dead Code Note: Optimization option [-OR] also performs the same optimization. To suppress dead code optimization, use the type qualifier “volatile”. As shown in the example below, using “volatile” will prevent removal of the code. Code before using volatile (Size: 1 byte, Speed: 6 Machine cycles) Code after using volatile (Size: 5 bytes, Speed: 11 Machine cycles) void main(void){ uchar8 ucData; ucData = 5; } F0028 F3 main RTS F0029 04 NOP void main(void){ volatile uchar8 ucData; ucData = 5; F002B C605FF MOV.B #05H,-1H[FB] } F002E 7DF2 EXITD Example 3 Suppression of Dead Code Optimization using Volatile 2.3 Allocate CPU Registers to variables As shown in the Example 1 before optimization the compiler stores the variable ucComparison into the stack area. However, after optimization the compiler stores the variable ucComparison into the CPU register ROL. Hence, the code size is reduced by 16 bytes and execution speed is faster by 26 machine cycles. Note: The compiler will not allocate a CPU register to a variable if the compiler does not have enough free CPU registers. 2.4 Grouping of bit manipulation In this optimization setting the compiler assigns a constant value to the bit fields mapped to the same memory area and used in the same routine using a single instruction. As shown in the example below, before optimization the compiler generates three instructions to write the values in bit fields variable; cBit0 and cBit, but after optimization compiler generates two instructions to write the values in the variable cBit0 and cBit. Hence, the code size is reduced by 4 bytes and execution speed is faster by 3 machine cycles. 3 V1.04 Code before Optimization (Size: 13 bytes, Speed: 15 Machine cycles) Code after Optimization (Size: 9 bytes, Speed: 12 Machine cycles) struct bit { struct bit { char cBit0 : 1; char cBit0 : 1; char cBit1 : 1; char cBit1 : 1; char cBit : 2; char cBit : 2; }; }; #pragma BIT sflag #pragma BIT sflag struct bit sflag; struct bit sflag; void main (void){ void main (void){ sflag.cBit0 = 1 ; sflag.cBit0 = 1 ; F0010 7E9FE020_main BSET 0,041CH F0010 9F051C04_main OR.B #05H,041CH sflag.cBit = 1 ; F0014 97F31C04 AND.B #F3H,041CH sflag.cBit = 1 ; F0018 7E9FE220 BSET 2,041CH F0014 7E8FE320 BCLR 3,041CH } } F001C F3 RTS F0018 F3 RTS Example 4 Grouping of Bit Manipulation Note: Grouping of bit manipulation instruction on I/O variables is not suitable, use [-ONB] option to suppress the grouping of bit manipulation instruction. As shown in the example below, structure variable sflag has an I/O reference. Hence, the [-ONB] option is used to suppress the optimization. Code before [-ONB] enable (Size: 11 bytes, Speed: 14 Machine cycles) Code after [-ONB] enable (Size: 15 bytes, Speed: 17 Machine cycles) struct bit { char cBit0 : 1; char cBit1 : 1; char cBit2 : 2; char cBit3 : 1; }; #pragma BIT sflag #pragma ADDRESS sflag 006ch struct bit sflag; void main (void){ sflag.cBit0 = 1 ; F0010 0B6C00 _main MOV.B 006CH,R0L F0013 94F2 AND.B #F2H,R0L F0015 9C05 OR.B #05H,R0L F0017 036C00 MOV.B R0L,006CH sflag.cBit2 = 1; } F001A F3 RTS struct bit { char cBit0 : 1; char cBit1 : 1; char cBit2 : 2; char cBit3 : 1; }; #pragma BIT sflag #pragma ADDRESS sflag 006ch struct bit sflag; void main (void){ sflag.cBit0 = 1 ; F0010 7E9F6003 _main BSET 0,006CH sflag.cBit2 = 1; F0014 0B6C00 MOV.B 006CH,R0L F0017 94F3 AND.B #F3H,R0L F0019 9C04 OR.B #04H,R0L F001B 036C00 MOV.B R0L,006CH } F001E F3 RTS Example 5 Suppression of Grouping of Bit Manipulation using [-ONB] 3. Optimization level [-O4] Select [-O4] option from Optimization level drop down menu. Optimization level [-O4] includes [-O3] option. The [-O4] option performs the following: 3.1 Replace the reference of a constant variable with a constant In the example below, before optimization the compiler generates 11 bytes of code to copy the constant data cData to the variable cResult. However, after optimization the compiler replaces the 4 V1.04 reference of the constant data with the constant value 5, hence generates 8 bytes of code. Therefore, code size is reduced by 3 bytes and execution is faster by 3 machine cycles. Code before optimization (Size: 11 bytes, Speed: 18 Machine cycles) Code after optimization (Size: 8 bytes, Speed: 15 Machine cycles) const char cDATA = 5; const char cDATA = 5; void main(void){ void main(void){ F002A 7CF201 _main ENTER #01H F002A 7CF201 _main ENTER #01H volatile char cResult; volatile char cResult; cResult = cDATA; cResult = cDATA; F002D 748BFF14000F F002D C605FF LDE.B F0000H,-1H[FB] MOV.B #05H,-1H[FB] } } F0033 7DF2 EXITD F0030 7DF2 EXITD Example 6 Replace the Reference of a Constant Variable with a Constant Data 3.2 Optimizing the standard library functions In this optimization setting the compiler uses the optimized standard library functions in place of the standard library functions. As shown in the example below, before optimization the compiler calls the standard library function strcpy() to copy the string into the array cRead. However, after optimization the compiler calls the optimized routine _n_n_st to copy the string. Hence, execution speed is faster by 120 machine cycles and code size is reduced by 45 bytes. Code before Optimization (Size: 129 bytes, Speed: 145 Machine cycles) Code after Optimization (Size: 84 bytes, Speed: 73 Machine cycles) #include "string.h" void main(void){ F0010 7CF228 _main ENTER #28H char cData[20] = "Optimization"; F0013 C64FEC MOV.B #4FH,-14H[FB] F0016 C670ED MOV.B #70H,-13H[FB] F0019 C674EE MOV.B #74H,-12H[FB] F001C C669EF MOV.B #69H,-11H[FB] F001F C66DF0 MOV.B #6DH,-10H[FB] F0022 C669F1 MOV.B #69H,-FH[FB] F0025 C67AF2 MOV.B #7AH,-EH[FB] F0028 C661F3 MOV.B #61H,-DH[FB] F002B C674F4 MOV.B #74H,-CH[FB] F002E C669F5 MOV.B #69H,-BH[FB] F0031 C66FF6 MOV.B #6FH,-AH[FB] F0034 C66EF7 MOV.B #6EH,-9H[FB] F0037 B6F8 MOV.B #0,-8H[FB] F0039 B6F9 MOV.B #0,-7H[FB] F003B B6FA MOV.B #0,-6H[FB] F003D B6FB MOV.B #0,-5H[FB] F003F B6FC MOV.B #0,-4H[FB] F0041 B6FD MOV.B #0,-3H[FB] F0043 B6FE MOV.B #0,-2H[FB] F0045 B6FF MOV.B #0,-1H[FB] char cRead[20]; strcpy(cRead,cData); F0047 7DE20000 PUSH.W #0000H F004B 7D9BEC PUSHA -14H[FB] F004E 7DE20000 PUSH.W #0000H F0052 7D9BD8 PUSHA -28H[FB] F0055 FD28070F JSR.A _strcpy F0728H F0059 7CEB08 ADD.B #8H,SP #include "string.h" void main(void){ F0010 7CF228 _main ENTER #28H char cData[20]= "Optimization"; F0013 75CBEC4F70 MOV.W #704FH,-14H[FB] F0018 75CBEE7469 MOV.W #6974H,-12H[FB] F001D 75CBF06D69 MOV.W #696DH,-10H[FB] F0022 75CBF27A61 MOV.W #617AH,-EH[FB] F0027 75CBF47469 MOV.W #6974H,-CH[FB] F002C 75CBF66F6E MOV.W #6E6FH,-AH[FB] F0031 D90BF8 MOV.W #0H,-8H[FB] F0034 D90BFA MOV.W #0H,-6H[FB] F0037 D90BFC MOV.W #0H,-4H[FB] F003A D90BFE MOV.W #0H,-2H[FB] char cRead[20]; strcpy(cRead,cData); F003D EB2BEC MOVA -14H[FB],R2 F0040 EB1BD8 MOVA -28H[FB],R1 F0043 FD14070F JSR.A $_n_n_st F0714H } F0047 7DF2 EXITD 5 V1.04 } F005C 7DF2 EXITD Example 7 Optimization of Standard Library Functions Note: Use [-ONS] option to suppress the optimization of standard library function. Optimization option [-OR] and [-OS] performs the same optimization. 4. Optimization level [-O5] Select [-O5] option from Optimization level drop down menu. The [–O5] option performs the following: 4.1 Optimization of bit manipulation instructions As shown in the example below, before optimization the compiler generates two instructions to test and clear the value of the variable sData.cBit0. While after optimization the compiler uses a special Bit Test and Clear instruction (BTSTC) instruction to test and clear the value of sData.cBit0. Hence, the code size is reduced by 4 bytes and execution speed is faster by 2 machine cycles. Code before optimization (Size: 11 bytes, Speed: 14 Machine cycles) Code after optimization (Size: 7 bytes, Speed: 12 Machine cycles) #pragma ADDRESS sData 006Ch struct { char cBit0 : 1; char cBit1 : 1; } sData; void main(){ while(sData.cBit0==0) F0010 7EBF6003_main BTST 0,006CH F0014 6AFB JEQ _main F0010H ; sData.cBit0 = 0; F0016 7E8F6003 BCLR 0,006CH } F001A F3 RTS #pragma ADDRESS sData 006Ch struct { char cBit0 : 1; char cBit1 : 1; } sData; void main(){ while(sData.cBit0==0) ; sData.cBit0 = 0; F0010 7E0F6003_main BTSTC 0,006CH F0014 6AFB JEQ _main F0010H } F0016 F3 RTS Example 8 Optimization of Bit Manipulation Instructions 4.2 Optimization of the pointer As shown in the example below, after optimization the compiler optimizes the pointer operations on the iRefData. Hence, the code size reduces by 10 bytes and execution speed increases by 10 machine cycles. Code before optimization (Size: 39 bytes, Speed: 36 Machine cycles) Code after optimization (Size: 29 bytes, Speed: 26 Machine cycles) int iData = 3; int *pData = &iData; void main(void){ F0014 7CF202 _main ENTER #02H int iRefData; *pData = 9; F0017 73F40204 MOV.W 0402H,A0 F001B 75C60900 MOV.W #0009H,[A0] iData = 10; F001F 75CF00040A00 MOV.W #000AH,0400H iRefData = *pData; F0025 73F40204 MOV.W 0402H,A0 F0029 736BFE MOV.W [A0],-2H[FB] int iData = 3; int *pData = &iData; void main(void){ int iRefData; *pData = 9; F0014 73F40204 _main MOV.W 0402H,A0 F0018 75C60900 MOV.W #0009H,[A0] iData = 10; F001C 75CF00040A00 MOV.W #000AH,0400H iRefData = *pData; F0022 7360 MOV.W [A0],R0 6 V1.04 if(iRefData == 9) if(iRefData == 9) F002C 778BFE0900 F0024 77800900 CMP.W #0009H,-2H[FB] CMP.W #0009H,R0 F0031 DF09000414 F0028 DF09000414 STZX #09H,#14H,0400H STZX #09H,#14H,0400H F0036 B70104 MOV.B #0,0401H F002D B70104 MOV.B #0,0401H iData = 9; iData = 9; else else iData = 20; iData = 20; } } F0039 7DF2 EXITD F0030 F3 RTS Example 9 Optimization of Pointer 5. Effect of Optimization ROM usage: Decreases Execution speed: Increases Stack usage: Decreases 6. References For more details, please refer to the Compiler User Manual (nc30ue.pdf). 7 V1.04 Revision History Ver. No. 1.01 Date 2007/7/04 Section No. 1 1 2.1,2.2,2.3 2.4 1.02 1.03 1.04 2007/8/13 2007/8/30 2008/01/10 2 3 3.2 4 4.2 6 1,2,3,4 1,2,3,4,5,6 1 2.2 2.2, 2.3, 3.1, 3.2, 4.1, 4.2 Changes Heading change “Function” to “Introduction” Format change 3 examples of the option “O1” are combined into one option “O3” Miscellaneous options [-ONS], [ONFCF], [-ONB], removed. Format change Format change 2 examples for option “–O4” Format change 2 examples for option “ –O5” New section added “References” Bytes changed to bytes Format and grammar Remove “” and path is shown using Description reframe Changed “execution increases by” to “ execution is faster by” 8 Reason for Changes RTA and RSO comments RTA and RSO comments RTA and RSO comments RTA and RSO comments RTA and RSO comments RTA and RSO comments RTA and RSO comments RTA and RSO comments RTA and RSO comments RTA and RSO comments RTA and RSO comments RTA and RSO comments RTA and RSO comments KPIT review comments RTA review V1.04