‘C’ for Microcontrollers, Just Being Efficient Lloyd Moore, President Lloyd@CyberData-Robotics.com www.CyberData-Robotics.com Agenda Microcontroller Resources Knowing Your Environment Memory Usage Code Structure Interrupts Math Tricks Optimization Disclaimer Some microcontroller techniques necessarily need to trade one benefit for another – typically lower resource usage for maintainability Point of this presentation is to point out various techniques that can be used as needed Use these suggestions when necessary Feel free to suggest better solutions as we go along Microcontroller Resources EVERYTHING resides on one die inside one package: RAM, Flash, Processor, I/O Cost is a MAJOR design consideration Typical costs are $0.25 to $25 each (1000’s) RAM: 16 BYTES to 32K Bytes typical Flash/ROM: 384 BYTES to 256K Bytes Clock Speed: 4MHz to 80MHz typical Much lower for battery saving modes (32KHz) Bus is 8, 16, or 32 bits wide (just like the old days) Other Considerations Specialized resources often present Portability inside families a big concern May have hardware centric API, or just raw registers! No floating point hardware Across families, not so much Typically no operating system present Counters, UART, USB PHY, LCD Controller May have other math hardware (MAC, CRC) No protected memory / MMU Do have specialized memory segments Power Consumption Microcontrollers typically used in battery operated devices Power requirements can be EXTREMELY tight Energy harvesting applications Long term battery installations (remote controls, hard to reach devices, etc.) EVERY instruction executed consumes power, even if you have the time! Know Your Environment Traditionally we ignore hardware details Need to tailor code to hardware available Specialized hardware MUCH more efficient Compilers typically have extensions Interrupt – specifies code as being ISR Memory model – may handle banked memory and/or simultaneous access banks Multiple data pointers / address generators Debugger may use some resources Memory Usage Use ‘const’ to put data into program memory Alignment / padding issues Avoid dynamic memory allocation Take extra space and processing time Memory fragmentation a big issue Use and reuse static buffers Typically NOT an issue, non-aligned access ok Reduces variable passing overhead Allows for smaller / faster code due to reduced indirections Does bring back over write bugs if not done carefully Use the appropriate variable type Don’t use int and double for everything!! Affects processing time as well as storage Char vs. Int Increment on 8051 int iX; iX++; char cX; cX++; 000A 000D 000E 000F 900000 E0 04 F0 MOV MOVX INC MOVX DPTR,#cX A,@DPTR A @DPTR,A 6 Bytes of Flash 4 Instruction cycles 0000 0003 0004 0007 900000 E4 75F001 120000 MOV CLR MOV LCALL DPTR,#iX A B,#01H ?C?IILDX 10 Bytes of Flash + subroutine overhead Many more than 4 instruction cycles with a LCALL Code Structure Count down instead of up Pointers vs. array notation Saves a subtraction on all processors DJNZ style instruction on some processors Generally better using pointers Bit Shifting May not always generate what you think May or may not have barrel shifter hardware May or may not have logical vs. arithmetic shifts Shifting Example cX = cX << 3; 0006 0007 0008 0009 33 33 33 54F8 cA = 3; cX = cX << cA; RLC RLC RLC ANL A A A A,#0F8H Constants turn into seperate statements Variables turn into loops Both of these can be one instruction with a barrel shifter 000B 000E 000F 0010 0011 0013 0014 0016 0016 0017 0018 0018 900000 E0 FE EF A806 08 8002 C3 33 D8FC MOV DPTR,#cA MOVX A,@DPTR MOV R6,A MOV A,R7 MOV R0,AR6 INC R0 SJMP ?C0005 ?C0004: CLR C RLC A ?C0005 DJNZ R0,?C0004 More Code Structure Actual parameters typically passed in registers if available Global variables Keep function parameters to less than 3 May also be passed on stack or special parameter area May be more efficient to pass pointer to struct While generally frowned upon for most code can be very helpful here Typically ends up being a direct access Read assembly code for critical areas Know which optimizations are present Small compilers do not always have common optimizations Inline, loop unrolling, loop invariant, pointer conversion Indexed Array vs Pointer on M8C ucMode = g_Channels[uc_Channel].ucMode; 01DC 01DE 01E0 01E2 01E3 01E5 01E6 01E8 01E9 01EB 01EC 01EF 01F1 01F4 01F7 01FA 01FD 01FF 52FC 5300 5000 08 5100 08 5000 08 5007 08 7C0000 38FC 5F0000 5F0000 060000 0E0000 3E00 5403 mov A,[X-4] mov [__r1],A mov A,0 push A mov A,[__r1] push A mov A,0 push A mov A,7 push A xcall __mul16 add SP,-4 mov [__r1],[__rX] mov [__r0],[__rY] add[__r1],<_g_Channels adc[__r0],>_g_Channels mvi A,[__r1] mov [X+3],A ucMode = pChannel->ucMode; 01ED 01EF 01F1 01F3 5201 5300 3E00 5405 mov mov mvi mov A,[X+1] [__r1],A A,[__r1] [X+5],A Does the same thing Saves 29 bytes of memory AND a call to a 16 bit multiplication routine! Pointer version will be at least 4x faster to execute as well, maybe 10x Most compilers not this bad – but you do find some! Interrupts Generally implemented as individual hardware vectors with a small amount of program memory at the location ISR is what you get – no OS, no threads, no IST Also very common to use interrupts to simulate threads Can use a flag with main loop to get IST behavior for less time critical code Interrupt itself take the place of the WaitFor_XXX or signal Follows very naturally for hardware tasks and timers Generally an “interrupt” statement provided Interrupt Example static unsigned char g_TimerTriggered; void main() { ConfigureTimer0(); g_TimerTriggered = 0; GlobalEnableInterrupt(); while(1) { if(g_TimerTriggered) { g_TimerTriggered = 0; //Could also disable the timer interrupt here DoTimerTask(); //to avoid a race condition resetting g_TimerTriggered } //Can put optional sleep here, interrupts can wake up processor } } void Timer0ISR(void) interrupt 1 using 2 { g_TimerTriggered = 1; //Can put other small, quick work here } //Interrupt source 1, attached to vector 2 Switch Statement Implementation Switch statements can be implemented in various ways Specific implementation can also vary based case clauses Sequential compares In line table look up for case block Special function with look up table Clean sequence (1, 2, 3, 4, 5) Gaps in sequence (1, 10, 30, 255) Ordering of sequence (5, 4, 1, 2, 3) Knowing which method gets implemented critical to optimizing! Switch Statement Example switch(cA) { case 0: cX = 4; break; case 1: cX = 10; break; case 2: cX = 30; break; default: cX = 0; break; } 0006 0009 000A 000B 000C 000F 0011 0012 0014 0015 0017 0018 001A 001C 001C 001F 0021 0022 900000 E0 FF EF 120000 0000 00 0000 01 0000 02 0000 0000 900000 7404 F0 8015 MOV MOVX MOV MOV LCALL DW DB DW DB DW DB DW DW ?C0002: MOV MOV MOVX SJMP DPTR,#cA A,@DPTR R7,A A,R7 ?C?CCASE ?C0003 00H ?C0002 01H ?C0004 02H 00H ?C0005 DPTR,#cX A,#04H @DPTR,A ?C0006 ...More blocks follow for each case Bit Variables Some processors have special memory areas and op-codes for single bit storage Saves overhead of masking operations Some key from bit fields notation, some need keyword (frequently ‘bit’) struct { unsigned int foo : 1; } flags; unsigned int my_bit : 1; bit my_bit; Math Tricks Floating point math VERY expensive on microcontrollers No hardware support Typically 32 bits for float, 64 bits for double Support provided by a BIG library Can use fixed point math in many cases Basically the same as integer math, however move the decimal inside the integer. Binary number is really: To make a fixed point number just adjust the exponents: 2^7 + 2^6 +… 2^2 + 2^1 + 2^0 2^6 + 2^5 + … 2^1 + 2^0 + 2^-1 :Note 2^-1 = 0.5 Assume 8 bit value: Range = [0,255] Assume one binary decimal point XXXXXX.X Range is now [0, 127.5] All the internal math stays the same so long as only fixed point numbers with the same binary point location used together! More Math Tricks You may not have multiply and/or divide ops! Decomposing operations can help X*5=X*4+X (X * 4) can become 2 shift left operations Formulas should also be restructured for math available: Y=ax^2 + bx + c : 1 Pow or Mult, 2 Mult, 2 Add Y = x (ax + b) + c : 2 Mult, 2 Add Lookup tables can be great for limited domain problems Optimization Step 0 – Before coding anything think about risk points and prototype unknowns!!! Step 1 – Get it working!! Fast but wrong is of no use to anyone Optimization will typically reduce readability Step 2 – Profile to know where to optimize Usually only one or two routines are critical You need to have specific performance metrics to target Optimization Step 3 – Let the tools do as much as they can Turn off debugging! Select the correct memory model Select the correct optimization level Step 4 – Do it manually Read the generated code! Might be able to make a simple code or structure change. Last – think about assembly coding Summary Microcontroller hardware is much simpler than most of us are used to Be familiar with the hardware in your microcontroller Be familiar with your compiler options and how it translates your code For time or space critical code look at the assembly listing from time to time Questions?