The University of Zambia School of Engineering Department of Electrical and Electronic Engineering MICROCONTROLLER TECHNOLOGY AND EMBEDDED SYSTEMS Lecture #3 – Assembly Language Basics By Louis N. Mumba (2022) louis.mumba@unza.zm louis.mumba.eng@gmail.com Introduction to Assembly Language. AVR Assembly tools SFR Instructions (Execution Time) Loops Implementation of delay. EEE 4135 2 While the CPU can work only in binary, it can do so at a very high speed. It is quite tedious and slow for humans, however, to deal with 0s and 1s in order to program the computer. A program that consists of 0s and 1s is called machine language. In the early days of the computer, programmers coded programs in machine language. Although the hexadecimal system was used as a more efficient way to represent binary numbers, the process of working in machine code was still cumbersome for humans. Eventually, Assembly languages were developed, which provided mnemonics for the machine code instructions, plus other features that made programming faster and less prone to error. The term mnemonic is frequently used in computer science and engineering literature to refer to codes and abbreviations that are relatively easy to remember. Assembly language programs must be translated into machine code by a program called an assembler. Assembly language is referred to as a low-level language because it deals directly with the internal structure of the CPU. To program in Assembly language, the programmer must know all the registers of the CPU and the size of each, as well as other details. EEE 4135 3 There are several computer programming languages in use today. An engineer should therefore have sufficient proficiency in a number of them and understand the differences in terms of: I. II. III. IV. Uses: What language is good for scripts (Ruby, Python, PHP, Javascript), Games (C++, C, C#, Java), Servers (C, C++, Java, PHP, Python) Syntax: What the code looks like (Keywords, weakly/strongly typed, OOP/Procedural) Runtime: How the code executes. Compiled (C, C++), Interpreted (Python, Ruby, PHP) Level: Low level(Assembly), High level (C, C++, Java, Python) In this course, assembly language will be used to understand the AVR architecture. For the rest of the course, Embedded C will be used. [Same as C, same syntax, same conditional statement, recursion etc but only additional libraries and registers] EEE 4135 4 What is Assembly Language; Alphanumeric representation of machine code Each line is an instruction telling the MCU to do a task Instructions are specific to MCU architecture The process through which the processor controls the execution of instructions is referred as the fetchdecode-execute cycle or the execution cycle. It consists of three continuous steps − Fetching the instruction from memory Decoding or identifying the instruction Executing the instruction EEE 4135 5 Each assembly language statement consists of four fields: Label (followed by full colon) Opcode Operand Comment (preceded by semi colon) Example: tampa: LDI MOV ElNino: COM JMP EEE 4135 R17, 0x55 R3, R17 R3 ElNino ;load GPR 17 with hex 55 ;copy the hex 55 in R17 to R3 ;one’s complement the contents of R3 ;loop back to label “ElNino” 6 A directive is an instruction to the assembler to do certain configurations before it even assembles the code; the .include directive tells the assembler to include certain ‘library files’ before assembly. Below are some of the commonly encountered directives. Note that the assembler directives start with a full stop (synonymous to the # in C and C++) .EQU :- This directive is used to assign a name to a constant value that cannot be changed later. e.g. .EQU COUNT = 100 .DEF :- This directive is also called DEFine register. It defines a synonym for a register. e.g. .DEF MyRegisterForAge = R18 .ORG :- This directive is used to specify a location in memory (program or data) where the program following the directive is to be placed, i.e. the program origin. It is like initializing the program counter. AVR programming in Atmel Studio allows inclusion of definition files in which SFRs have been declared by name so that you can call SFRs by names and not addresses e.g. PORTB instead of 0x25. This definition file must be included using .include “m328Pdef.inc” . The 328P part must be edited to correct controller in use. EEE 4135 7 Assuming that the program below is burned into the ROM of an AVR chip, the following is a step-by-step description of the action of the AVR upon applying power to it: 1. When the AVR is powered up, the PC (program counter) has 00000 and starts to fetch the first instruction from location 00000 of the program ROM. In the case of the program below the first code is the code for moving operand 0x25 to R16. Upon executing the code, the CPU places the value of 0x25 in R16. Now one instruction is finished. Then the program counter is incremented to point to 00001 (PC = 00001), which contains the machine code for the instruction "LDI R17 , 0x34". EEE 4135 8 2. 3. 4. Upon executing the machine code, the value 0x34 is loaded to R17. Then the program counter is incremented to 0002. ROM location 0002 has the machine code for instruction "LDI R18 , 0x31". This instruction is executed and now PC = 0003. This process goes on until all the instructions are fetched and executed. EEE 4135 9 To write code for embedded systems, a developer needs a text editor (source code editor), a compiler/assembler program, a linker and a debugger. Atmel uses Atmel Studio as an IDE for both ARM and AVR devices. It is a free software package that has a large library of free source code examples. For Atmel Studio, the output file that gets downloaded into the flash memory of the MCU is the machine code (HEX file). For example, if the source file is created in assembly language, it will have a .asm extension. After assembling, the assembler will produce several files as shown on the next slide; namely .eep, .hex, .map, .lst, .obj AVRPROG.exe is an Atmel software component that actually burns the HEX file into the MCU. AVRDude (AVR Downloader Uploader) is also another free software that can be used to burn the HEX file into the ROMEEE 4135 10 CODE EDITOR ASSEMBLER PROGRAM code.eep code.hex code.map (Downloaded to AVR EEPROM) (Burnt to Code ROM) Shows labels and their values EEE 4135 code.lst code.obj Shows the code in binary and in hexadecimal Used by simulator 11 Recall from EEE3131 that flash memory requires some voltage and proper addressing to be written to. This is implemented by special hardware called a PROGRAMMER or BURNER. For AVR, there are a number of programmers in use. The programmers can be 10 pin or 6 pin. The common types of programmers are: 1) USBaspISP: Possibly the cheapest. It is composed of an ATmega88 or ATmega8 and a few passive components that allow writing of data to the target chip. The code therefore is directed from the computer through USBasp (MASTER) to the target chip (SLAVE). [Cost: about $8 on Amazon] 2) USBTinyISP: A slight improvement over USBasp. Note that USBTiny has limitations on the size of memory it can program. It uses an ATtiny2313. [Cost: about $20] 3) Atmel-ICE: Official programmer by Atmel for their AVR chips. It gets used complete with debugger. It is the most expensive of the three. [Cost: over $120] EEE 4135 12 USBaspISP (Interior) EEE 4135 USBaspISP (Exterior) 13 USBTinyISP (Interior) USBTinyISP (Exterior) EEE 4135 14 Atmel-ICE (Exterior) EEE 4135 Atmel-ICE (Interior) 15 Previously, we introduced assembly language mnemonics for dealing with data within GPRs [LDI Rd, K], [ADD Rd, Rr], [MOV Rd,Rr]. We’ll now look at the mnemonics for data transfer from any section of RAM Space to GPRs [LDS Rd, K] and also from GPRs to any part of the RAM space [STS K, Rd]. Then introduce assembly instructions to specifically transfer data from SFRs to GPRs [IN instruction] and the other way round, from GPRs to SFRs [OUT instruction]. These will be helpful to write values for output to a PORT (or to other SFRs) or read data as input from a PORT (or from other SFRs) EEE 4135 16 LDS :- LoaD direct from data Space. Syntax: LDS Rd, K ;load GPR d with value in address K from anywhere in the memory space Features Loads data from anywhere in the RAM space (GPR or SFR or SRAM) to any GPR. K (source register) should always be specified in terms of an address, For example, to add data in some sections of SRAM, we will need to first load it into GPRs and then add: LDS R0, 0x300 ;contents of 0x300 are copied into R0 LDS R1, 0x302 ;contents of 0x302 are copied into R1 ADD R1, R0 ;add R0 to R1 The instruction above is executed assuming 0x300 and 0x302 were pre-loaded with data or they are SFR with data that comes from an operation. A pictorial representation of the execution is shown in next slide. EEE 4135 17 STS :- STore direct to data Space. Syntax: STS K, Rd ;store to any location (addressed by K) of memory space with data from any GPRs Features Stores data to any part of RAM (GPR or SFR or SRAM) from any GPR. K (destination register) should always be specified in terms of an address, For example, we can write (store) some user defined data to the output ports (PORTB = 0x38, PORTC = 0x35, PORTD = 0x32) LDI R16, 0x55 ;load R16 with hex 55 STS 0x38, R16 ;store contents of R16 to PORTB STS 0x35, R16 ;store contents of R16 to PORTC STS 0x32, R16 ;store contents of R16 to PORTD EEE 4135 18 A few points to note when dealing with SFR(IO memory) instructions; IO memory has two kinds of addresses Data RAM Address and IO Addresses. Taking the ATmega32 for example, the SFRs can be addressed by using RAM addresses, 32 to 95 (0x20 to 0x5F), as well as by using unique IO Memory address which run from 0 to 63 (0x00 to 0x3F). EEE 4135 19 IN Rd, A ; load any GPR with data from IO address A ;0 ≤ 𝐝 ≤ 31 and 0 ≤ 𝐀 ≤ 63 Features: The IN instruction fetches data from the SFR (IO Memory) only (64 address locations, 0 to 63). To that effect, the IN instruction uses IO Addresses and not data memory addresses. For example, to load the decimal number 50 into GPR number 19, we use: IN R19, 0x10 ; load R19 GPR with data from SFR location 0x10 (from SFR memory map, 0x10 = PIND) In short, the instruction above is reading data from PIND. EEE 4135 20 LDS is a four byte instruction, i.e. it has to be divided into two 16-bit pieces (two words) as shown below. The first word is a mixture of opcode and destination register address. The second word contains only the source memory address (16-bits) LDS Rd, K ; load from memory location K to GPR register Rd 0 d 31 5-bit addresses for the 32 Bytes of GPRs 0 K 65535 16-bit addresses for the 64K RAM EEE 4135 21 On the other hand, the IN instruction is a two byte instruction as shown below(16-bit word); the first five bits are for the opcode and the rest of the bits are used for the IO Address of the source SFR (6-bits) mixed with the address of the destination GPR (5-bits) IN Rd, A ; load from address A of IO Memory into register Rd. From the instruction lengths indicated for LDS and IN, we can see that in as much as we can use either one of them to read values from an IO register, LDS takes two clock cycles to fully execute (32-bits) while IN takes only one clock cycle (16-bits). So if a clock of 8MHz is in use, a complete cycle is (1/8000000) seconds = 0.125 micro-seconds; meaning that LDS takes 0.25 microseconds while IN takes 0.125 microseconds. 0 d 31 0 A 63 EEE 4135 22 OUT A, Rr ; store GPR r to IO location A ;0 ≤ 𝐫 ≤ 31 and 0 ≤ 𝐀 ≤ 63 OUT instruction is equivalent to IN but data moving in the opposite direction: from GPR to SFR. STS achieves the same result as OUT but it should be noted that OUT (just like its opposite, IN) is a two byte instruction (single clock cycle) while STS is a four byte instruction (two clock cycles). STS and LDS have the advantage of increased range of addresses they can take as operands. STS K, Rr OUT A, Rr 0 ≤ 𝐫 ≤ 31 0 ≤ 𝐀 ≤ 63 EEE 4135 0 ≤ 𝐫 ≤ 31 0 ≤ 𝐊 ≤ 65535 23 To run a loop more than 255 times, nested loop is used (loop inside loop). The maximum number of times a particular loop is repeated becomes the product of the counters per loop. The code below loops 700 times, i.e. it complements bits on PORTB 700 times: .include “m328pdef.inc” LDI R16, 0x55 OUT PORTB, R16 LDI R20, 10 LOOP_1: LDI R21, 70 LOOP_2: COM R16 OUT PORTB, R16 DEC R21 BRNE LOOP_2 DEC R20 BRNE LOOP_1 EEE 4135 ;load R16 with 0x55 ;send the contents of R16 to PORTB ;load decimal 10 into R20 (counter for outer loop) ;load R21 with decimal 70 (counter for inner loop) ;decrement R21 by one and store in R21 (inner loop) ;repeat the decrement 70 times ;decrement R20 by one and store in R20 (outer loop) 24 EEE 4135 25 EEE 4135 26 EEE 4135 27 EEE 4135 28 EEE 4135 29 EEE 4135 30 .INCLUDE “m328pdef.inc” .ORG 0x00 LDI R16, HIGH(RAMEND) ;loads R16 with the higher byte of RAMEND OUT SPH, R16 ;higher byte of SP will have the higher byte of RAMEND LDI R16, LOW(RAMEND) ;loads R16 with the lower byte of RAMEND OUT SPL, R16 ;the lower byte of SP will have the lower byte of RAMEND LDI R16, 0x55 ;load R16 with 0x55 COM R16 OUT PORTB, R16 CALL DELAY_1S RJMP BACK ;ones complement of contents of R16 ;send the contents of R16 to port B register (actual pins) ;call a function called DELAY_1S ;relative jump to BACK i.e. keep doing this indefinitely BACK: DELAY_1S is shown on next page -> EEE 4135 31 DELAY_1S: LDI R20, 32 L1: LDI R21, 200 L2: LDI R22, 250 L3: NOP NOP DEC R22 BRNE L3 DEC R21 BRNE L2 DEC R20 BRNE L1 RET ;number of decrements for outer loop ;number of decrements for middle loop ;number of decrements for innermost loop Neglecting the middle loop and the outermost loop, this delay function is approximately 1 second in duration. Shown next is how the 1 second is derived… EEE 4135 32 Using a clock of 8MHz, a clock cycle will be 1/8000000 = 125nano seconds long. Each instruction in the inner loop has the following number of clock cycles: NOP => 1 NOP => 1 DEC => 1 BRNE => 2 Total cycles in inner loop is 5 clock cycles This means that the five clock cycles in inner loop are repeated 250 times (DEC R22), but the inner loops are also repeated 200 times by the middle loop (DEC R21) and further more, the middle loop is repeated 32 times by the outermost loop (DEC R20). Therefore, the five clock cycles of the inner loop are done 250 x 200 x 32 = 1 600 000 times. i.e. 4 four instructions (5 clocks) done 1600000 times with each clock lasting 125 nanoseconds implies a total duration of: 5 x 1 600 000 x 125 nanoseconds = 1 sec [exact is (3x250+2x249 +1)x200x32 =0.999s] So why did we use only the inner loop for approximating the total duration of the delay? EEE 4135 33 DELAY_1S: LDI R20, 32 L1: LDI R21, 200 L2: LDI R22, 250 L3: NOP NOP DEC R22 BRNE L3 DEC R21 BRNE L2 DEC R20 BRNE L1 RET EEE 4135 ;number of decrements for outer loop ;number of decrements for middle loop ;number of decrements for innermost loop Two instructions >> 3 clock cycles>> 200 decrements done 32 times (outer loop) >> 3x200x32 = 19200 clocks >> 19200 clocks x 125nanoseconds/clock = 0.0024 sec Two instructions >> 3 clock cycles>> 32 decrements >> 3x32 = 96 clocks >> 96 clocks x 125 nanoseconds/clock = 0.000012 sec 34 LDI is one (1) machine cycle and RET is four (4) machine cycles. With this in mind, we have neglected: LDI R20, 32 which is done once per call to DELAY_1S thus 125 ns long = 0.000000125 sec. LDI R21, 200 which is done 32 times per call to DELAY_1S thus 32x125ns long = 0.000004 sec. LDI R22, 250 which is done 200 x 32 times per call to DELAY_1S thus 200x32x125ns long = 0.0008 sec. RET instruction which is done once per call to DELAY_1S thus 4x125ns long = 0.0000005 sec. The total neglected time consisting of the above instructions plus the DEC and BRNE in middle and outer loops is 0.000000125 + 0.000004 + 0.0008 + 0.0000005 + 0.0024 + 0.000012 = 0.003216625 seconds You can see how negligible the neglected components are. It will therefore be safe to only use the inner loop for calculation of your delay time in most cases. EEE 4135 35 EEE 4135 36