Code Optimization of PSoC® 1 Project when using ImageCraft Compiler AN60486 Author: Archana Yarlagadda Associated Project: No Associated Part Families: CY8C29x66, CY8C28xxx, CY8C27x43, CY8C24x94, CY8C24x23A CY8C24x33, CY8C23x33, CY8C21x34, CY8C21x23 Software Version: PSoC Designer™ 5.0 Associated Application Notes: AN2017, AN2218, AN2129 Application Note Abstract This application note shows the basic guidelines of optimizing code using the ImageCraft compiler with PSoC Designer while ® developing PSoC 1 projects. (a) Introduction This application note shows some methods for code optimization for a project in PSoC Designer (PD) using the ImageCraft compiler. The Build Tab in the Output status window reports the amount of ROM and RAM used by the project. A portion of the PSoC Designer build window below shows the memory usage is shown in Figure 1. Figure 1. Build Message showing ROM and RAM Usage When a C function is called from an ISR, the compiler saves (pushes onto stack) and restores (pops from stack) all the virtual registers, since the registers used by the called function are unknown to the ISR. PSoC chips with RAM greater than 256 bytes use a paging system. The programming model that handles the paging scheme is called the Large Memory Model (LMM). A single page model is known as Small Memory Model (SMM). In the case of larger PSoC chips with LMM, 4 page pointers are also stored and restored along with virtual registers. They are named as follows: (CUR_PP, IDX_PP, MVW_PP and MVR_PP). For more information about the LMM, refer to “Design Aids - Large Memory Model Programming for PSoC” AN2218. Consider the following simple example of inline code and a function call from the ISR. Since the assignment in Code 1 can be done without use of any virtual registers, none are stored. Several code optimizing methods to decrease/optimize the use of ROM (also called Flash or code space) are shown in this application note. Function Calls in Interrupt Service Routines (ISR) When using the ImageCraft compiler, function calls in Interrupt Service Routines (ISR) use more ROM. In ISRs, the active state of the processor’s registers might get changed when a function is called from an ISR. Consequently these registers need to be saved and restored for a proper jump and return from the ISR’s function call. The ImageCraft C Compiler uses up to 15 virtual registers to store temporary data on the stack. They are r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, rX, rY, rZ and can be found in the .mp file. July 27, 2010 Code 1 BYTE bVar1; #pragma interrupt_handler SleepTimerHandler; void SleepTimerHandler(void) { bVar1 = 1; } If however the same functionality was implemented using a function call as shown in Code 2, then an additional 15 virtual registers and 4 page pointers are required to be saved and restored. Each register requires the following additional overhead code: MOV [2 bytes] + PUSH [1 byte] + POP [1 byte] + MOV [2 bytes] for a total of 6 bytes per register. Therefore, Code 2 method takes an additional 114 bytes of code. There is also the additional call and return code, which are comparatively negligible. Document No. 001-60486 Rev. ** 1 [+] Feedback AN60486 Function calls from a ISRs written in C should be avoided to help optimize code size. Figure 3. Map File in PSoC Designer Code 2 BYTE bVar1; void TestFunc() { bVar1 = 1; } #pragma interrupt_handler SleepTimerHandler; void SleepTimerHandler(void) { TestFunc(); } Relocatable Start Code Address When a C program is compiled, the ImageCraft C compiler converts the C files into assembly files. The assembler then converts these into relocatable object files, which are then mapped by the linker to obtain the executable .hex file. The PSoC designer software IDE gives an option to specify the address where the code has to be placed in the .hex file, and thus relocated. This is given under ProjectSettingsLinker, and the popup window is as shown in Figure 2. The map file shows the start and end address of different areas. The boot section is shown by “TOP”, and the code area is shown by “lit” in the .mp file shown in Figure 4. The “Relocatable code start address” can be set to the End address of the “TOP” area for the most efficient use of space. For example, the “Relocatable code start address” is set to 0x150 in Figure 2, and the End of the “TOP” section in map files was at 0xD1. Thus by setting the right value 127 (0x150 – 0xD1 = 0x7F) bytes were saved. Figure 2. Relocatable Start Code Address Selection Figure 4. “TOP” and “lit” Area in .mp File Boot code is not included in this area and is placed at the start of memory. For efficient use of memory, the relocatable code address should be placed immediately following the end of boot code. The default value is set to a higher value than required, to support all the devices and can be changed using the above tool. The end of the boot code can be found by looking into the map file (.mp) that can be opened through PD as shown in Figure 3. If the start address is set to a value lower than that required for boot code, then the compiler throws an error notifying the user that the code space contains a value. For example, If the value if set to 0xD0, and the boot code ends at 0xD1, then “!E psocconfigtbl.asm(112): Code address 0:0xd0 already contains a value “ message is displayed during build. July 27, 2010 Document No. 001-60486 Rev. ** 2 [+] Feedback AN60486 Sublimation and Condensation Condensation PSoC Designer gives code compression tools that can be set under ProjectSettingsCompiler. The selection window is shown in Figure 5. When the Condensation option is chosen, a subroutine is formed for segments of code that are repeated in a project. Therefore rather than inline repetition of code, a jump to a subroutine is added in the executable. A simple piece of code, see Code 3, was repeated four times in a test project to verify the Condensation option. With the condensation option, 193 bytes were saved as shown in Figure 7. Figure 5. Sublimation and Condensation Selection Code 3 iTest1=1; iTest2=iTest1+1; iTest3=iTest1+2; iTest4=iTest1+3; iTest5=iTest1+4; Figure 7. ROM Saved with Condensation Sublimation When the sublimation option is chosen, PSoC Designer deletes the unused user module APIs and thus saves space. In a test project, a PGA and PWM user modules were placed and started. The first compilation was with no Sublimation and the second was with Sublimation. As shown in Figure 6, 88 bytes of memory were saved for these modules due to elimination of unused user module APIs. In the event there is no repeated code in a program, and the Condensation option was chosen, the following note will be displayed in the Build message: “program code in 'text' area too small for worthwhile code compression”. Treat const as ROM vs. const as RAM This option can be accessed from Program SettingsCompiler, as shown in Figure 5. This is not a code optimization technique for ROM. The “Treat const as ROM” handles the treatment of constants to be compliant with standard C. The “Treat const as RAM” is for backward compatibility with previous versions of the ImageCraft compiler. The “Treat const as ROM” selection uses less RAM, and the ROM usage remains the same with either selection. Figure 6. ROM Saved with Sublimation In some cases, all the API are used in the project. In this event, the following note will be displayed in the Build message: “no dead symbol found”. July 27, 2010 Document No. 001-60486 Rev. ** 3 [+] Feedback AN60486 Configuration Initialization Type Direct Access vs. Index Addressing During startup of PSoC, all the initialization values for example, gain settings for PGAs, routing, and more, are written into the configuration registers. This can be done through two methods: “Loop” and “Direct write”, as shown in Figure 8. In the “Loop” selection, a table is created with the register address and initial values. A function is used to traverse through the table and load the values into the respective addresses. In the case of “Direct Write”, the assignment is done through MOV instructions for each register. Using “direct access” addressing such as with global or static variables, it is more efficient with the SMM. Using “indexed addressing”, local variables, is more efficient when using the LMM. This is because in the LMM, the page pointer is set every time a global or static variable are accessed. Thus when multiple variables are being accessed in LMM, it is ROM code efficient to access local variables. Figure 8. Configuration Initialization Selection There are many general coding optimization techniques that are not IDE, platform, or chip specific. This section presents several of them. Optimization in Firmware Use of Unsigned Integers When integer arithmetic is used in a program, it adds math library functions into the code space as required. Depending on the size and type of the variables used (8, 16 or 32 bit, signed or unsigned), different functions are added to the code. The details about the byte usage of these functions can be found in the “Libraries user guide” in the PSoC Designer documentation folder (HelpDocumentation). The difference in code between the two selections can be observed in the configuration file (PSoCConfigTBL.asm) as shown in Figure 9. The compiler makes this change in code based on the selection. The user does not need to change anything in the configuration file. The math functions in the M8C processor in PSoC are for unsigned variables by default. When other variables are used, there are additional functions to handle the conversion and value checking. Thus, it is recommended to use unsigned integers when possible. For example, the use of unsigned integers instead of signed integers as loop variables will help optimize the memory usage. Shift and Add in Place of Multiply or Divide Figure 9. Code Difference between Loop and Direct Write Some math libraries may be avoided from being included into code space. Tricks such as a bitwise-shift and add, in place of a multiply or divide are examples for unsigned integers. In unsigned integers, a single bitwise shift right is equivalent to divide by 2, and shift left is equivalent to multiplication by 2. By using shift and add, as shown in the example below, the multiplication and division functions can be avoided in few cases. The two methods differ in memory usage in two aspects. The first difference is that the loop method occupies a fixed amount of memory for the traverse function, which is not required in direct method. The second difference is the loop method uses two bytes of ROM per register, whereas the Direct Write method uses 3 bytes per register for the MOV instruction. As a result, the “Loop” selection will optimize code size when the number of registers are more than the size of the traverse function (94 bytes). In programs that have multiple user modules, the loop method is usually recommended. In the following two similar pieces of code, the Code 4 implementation uses 50 bytes more than Code 5. This is due to the addition of “__mul16” function into the code. Code 4 unsigned int iTest1, iTest2; void main(void) { iTest1 = iTest2 *3; } Code 5 unsigned int iTest1, iTest2; void main(void) { iTest1 = (iTest2 << 1) + iTest2; } July 27, 2010 Document No. 001-60486 Rev. ** 4 [+] Feedback AN60486 Avoiding Floating Point Math Code 7 Floating-point math should be avoided when possible because of the overhead of the libraries. Anytime a floating-point operation is used, utility functions such as rounding, normalization, and checking special conditions are added to the code on top of the floating point parent function. The byte usage for the floating point functions are provided below for an estimate of memory usage. The byte sizes differ based on small and large memory model and the version of PSoC Designer used. The complete details of floating point libraries are also given in “Libraries user guide”. It can be accessed through HelpDocumentationLibraries user guide. int iTest1, iTest2; void main(void) { iTest1 = iTest2 * 242; if(iTest1 > 750) { iTest3 = 2; } else { iTest3 = 1; } } Comparisons (*_fpcmp): 78 bytes Addition (*_fpadd): 250 In some instances, a look up table can be used in place of either floating-point or integer arithmetic math to save code space. Subtraction (*_fpsub + *_fpadd) = 9 + 250 = 259 bytes Look up Table in place of Calculation Multiplication (*_fpmul+i_mulu8_block_util) = 292 + 29 = 321 bytes The use of a formula for calculation can include multiple integer or floating-point math library functions into the code space. Instead of using a formula, a Look Up Table (LUT) method can be used to obtain results to save code space. There are multiple tradeoffs, like speed and accuracy, along with the code space in choosing one over the other. The choice is based on the type of application. Floating point utility functions (*_util): 180 bytes Division (*_fpdiv) = 221 bytes The floating-point utility functions (180 bytes) are common to all the functions except for comparisons functions. Thus the total memory usage of the floating point functions are obtained by adding the byte size of the utility function to the parent floating point function. For example, the total memory usage for addition floating point function is 250 + 180 = 430 bytes. The floating-point math functions use the integer math libraries as the base. The floating-point math libraries use more code space than the integer math libraries. In place of using floating-point math, the variables sometimes can be scaled up so the integer math can be used. For example, in the following two pieces of code, Code 6 method uses 492 bytes more than the Code 7. Code 6 int iTest2; float fTest1; void main(void) { fTest1 = iTest2 * 2.42; if(fTest1 > 7.5) { iTest3 = 2; } else { iTest3 = 1; } } July 27, 2010 For example, the project given in “Thermistor-Based Thermometer, PSoC Style” AN2017, offers an option for floating point and LUT method implementation. The use of a LUT in place of floating point math in this project saves 1920 bytes of memory. Array Indexing vs. Pointer Embedded system platforms implement array-indexing and pointer access differently. Depending on whether the variables are local or global can also change the memory usage. For example, in the following two similar pieces of code, Code 8 uses three bytes more than Code 9. When the variables are configured as local instead of global, Code 8 uses two bytes less than Code 9. Code 8 BYTE bVar1; BYTE array[10]; void main(void) { bVar1=0; while(array[bVar1]!=0) { bVar1++; } } Document No. 001-60486 Rev. ** 5 [+] Feedback AN60486 sData* myPtr; void main(void) { myTest.myArray[1].iData myTest.myArray[1].bData myTest.myArray[2].iData myTest.myArray[2].bData myTest.myArray[3].iData myTest.myArray[3].bData Code 9 BYTE *ptr; BYTE array[10]; void main(void) { ptr = array; while(*ptr != 0) { ptr++; } } Part of Code in Assembly For example, consider the following two code examples. Code 10 uses 60 bytes more than Code 11. Thus a careful observation of the type of access method being used (array-index vs. pointer) is important for code optimization. There are number of variations in the type of access and variable types. Providing every combination of array-index and pointer access comparison is beyond the scope of this application note. typedef struct { int iData; BYTE bData; }sData; typedef struct { sData myArray[10]; }sArray; sArray myTest; sData* myPtr; void main(void) { myPtr = myTest.myArray; myPtr->iData = 100; myPtr->bData = 10; myPtr++; myPtr->iData = 200; myPtr->bData = 20; myPtr++; myPtr->iData = 200; myPtr->bData = 20; } Code 11 typedef struct { int iData; BYTE bData; }sData; typedef struct { sData myArray[10]; }sArray; sArray myTest; July 27, 2010 100; 10; 200; 20; 300; 30; } The number of bytes saved for the simple example is only a few bytes. When the code is part of a structure or other user defined variable, the difference in the type of access will lead to a large variation in memory usage. Code 10 = = = = = = Writing a program in assembly will avoid compiler interpretations and allow complete optimization by the user. Though writing an entire program in assembly is tedious and cumbersome, converting a part of code into assembly level language can optimize code size and performance. For more information refer to “Interfacing Assembly and C Source Files” AN2129. IF-ELSE vs. Switch For a switch statement on a single byte variable (BYTE), the ImageCraft compiler produces more efficient code using an if-else construct as compared to a switch construct. The number of bytes used by the switch statement is 9 + 5 bytes more than if-else per each case. For example, a four case switch statement with a default clause as shown in the Code 12 will use (9 + 5 * 4) = 29 more bytes than the equivalent Code 13. Code 12 BYTE bTest1, bTest2; void main(void) { switch(bTest1) { case 4: { bTest2 = 1; break; } case 3: { bTest2 = 2; break; } case 2: { bTest2 = 3; break; } default: { bTest2 = 4; } } } Document No. 001-60486 Rev. ** 6 [+] Feedback AN60486 Code 13 About the Author BYTE bTest1, bTest2; void main(void) { if(bTest1 == 4) { bTest2=1; } else if(bTest1 == 3) { bTest2 = 2; } else if(bTest1 ==2) { bTest2 = 3; } else { bTest2=4; } } Name: Archana Yarlagadda Title: Applications Engineer Background: Applications Engineer at Cypress with focus on PSoC. Masters in Analog VLSI from University of Tennessee, Knoxville Contact: yara@cypress.com When the switch statement is for a two-byte variable (WORD), the resulting code size is nearly identical for either the switch or the if-else implementation. Initializing Global Variables Global variables are initialized to zero by default. Reinitializing the global variables to zero explicitly adds additional code. Thus, for code optimization, global variables should not be explicitly set to zero. Conclusion This application note discusses the basic methods of code optimization. Some of these are specific to the ImageCraft compiler, and few are general. There are many more general optimization techniques to be explored beyond what has been given here. July 27, 2010 Document No. 001-60486 Rev. ** 7 [+] Feedback AN60486 Document History ® Document Title: Code Optimization of PSoC 1 Project when using ImageCraft Compiler Document Number: 001-60486 Revision ** ECN 2994124 Orig. of Change YARA Submission Date 07/27/2010 Description of Change New Application Note. PSoC is a registered trademark of Cypress Semiconductor Corp. "Programmable System-on-Chip," PSoC Designer, and PSoC Express are trademarks of Cypress Semiconductor Corp. All other trademarks or registered trademarks referenced herein are the property of their respective owners. Cypress Semiconductor 198 Champion Court San Jose, CA 95134-1709 Phone: 408-943-2600 Fax: 408-943-4730 http://www.cypress.com/ © Cypress Semiconductor Corporation, 2010. The information contained herein is subject to change without notice. Cypress Semiconductor Corporation assumes no responsibility for the use of any circuitry other than circuitry embodied in a Cypress product. Nor does it convey or imply any license under patent or other rights. Cypress products are not warranted nor intended to be used for medical, life support, life saving, critical control or safety applications, unless pursuant to an express written agreement with Cypress. Furthermore, Cypress does not authorize its products for use as critical components in life-support systems where a malfunction or failure may reasonably be expected to result in significant injury to the user. The inclusion of Cypress products in life-support systems application implies that the manufacturer assumes all risk of such use and in doing so indemnifies Cypress against all charges. This Source Code (software and/or firmware) is owned by Cypress Semiconductor Corporation (Cypress) and is protected by and subject to worldwide patent protection (United States and foreign), United States copyright laws and international treaty provisions. Cypress hereby grants to licensee a personal, non-exclusive, non-transferable license to copy, use, modify, create derivative works of, and compile the Cypress Source Code and derivative works for the sole purpose of creating custom software and or firmware in support of licensee product to be used only in conjunction with a Cypress integrated circuit as specified in the applicable agreement. Any reproduction, modification, translation, compilation, or representation of this Source Code except as specified above is prohibited without the express written permission of Cypress. Disclaimer: CYPRESS MAKES NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Cypress reserves the right to make changes without further notice to the materials described herein. Cypress does not assume any liability arising out of the application or use of any product or circuit described herein. Cypress does not authorize its products for use as critical components in life-support systems where a malfunction or failure may reasonably be expected to result in significant injury to the user. The inclusion of Cypress’ product in a life-support systems application implies that the manufacturer assumes all risk of such use and in doing so indemnifies Cypress against all charges. Use may be limited by and subject to the applicable Cypress software license agreement. July 27, 2010 Document No. 001-60486 Rev. ** 8 [+] Feedback