Uploaded by Bunda Kambikambi

EEE 4135 2022 Lecture #2

advertisement
The University of Zambia
School of Engineering
Department of Electrical and Electronic Engineering
MICROCONTROLLER TECHNOLOGY AND EMBEDDED SYSTEMS
Lecture #3 – Assembly Language Basics
By Louis N. Mumba (2022)
louis.mumba@unza.zm
louis.mumba.eng@gmail.com
 Introduction to Assembly Language.
 AVR Assembly tools
 SFR Instructions (Execution Time)
 Loops
 Implementation of delay.
EEE 4135
2
 While the CPU can work only in binary, it can do so at a very high speed. It is quite tedious and slow for
humans, however, to deal with 0s and 1s in order to program the computer.
A program that consists of 0s and 1s is called machine language. In the early days of the computer,
programmers coded programs in machine language. Although the hexadecimal system was used as a
more efficient way to represent binary numbers, the process of working in machine code was still
cumbersome for humans.
Eventually, Assembly languages were developed, which provided mnemonics for the machine code
instructions, plus other features that made programming faster and less prone to error. The term
mnemonic is frequently used in computer science and engineering literature to refer to codes and
abbreviations that are relatively easy to remember.
Assembly language programs must be translated into machine code by a program called an assembler.
Assembly language is referred to as a low-level language because it deals directly with the internal
structure of the CPU. To program in Assembly language, the programmer must know all the registers of
the CPU and the size of each, as well as other details.
EEE 4135
3
 There are several computer programming languages in use today.
 An engineer should therefore have sufficient proficiency in a number of them and understand the
differences in terms of:
I.
II.
III.
IV.
Uses: What language is good for scripts (Ruby, Python, PHP, Javascript), Games (C++, C, C#, Java), Servers (C, C++, Java,
PHP, Python)
Syntax: What the code looks like (Keywords, weakly/strongly typed, OOP/Procedural)
Runtime: How the code executes. Compiled (C, C++), Interpreted (Python, Ruby, PHP)
Level: Low level(Assembly), High level (C, C++, Java, Python)
 In this course, assembly language will be used to understand the AVR architecture. For the rest of the
course, Embedded C will be used. [Same as C, same syntax, same conditional statement, recursion etc
but only additional libraries and registers]
EEE 4135
4
What is Assembly Language;
 Alphanumeric representation of machine code
 Each line is an instruction telling the MCU to do a task
 Instructions are specific to MCU architecture
The process through which the processor controls the execution of instructions is referred as the fetchdecode-execute cycle or the execution cycle. It consists of three continuous steps −
 Fetching the instruction from memory
 Decoding or identifying the instruction
 Executing the instruction
EEE 4135
5
Each assembly language statement consists of four fields:
 Label (followed by full colon)
 Opcode
 Operand
 Comment (preceded by semi colon)
Example:
tampa: LDI
MOV
ElNino: COM
JMP
EEE 4135
R17, 0x55
R3, R17
R3
ElNino
;load GPR 17 with hex 55
;copy the hex 55 in R17 to R3
;one’s complement the contents of R3
;loop back to label “ElNino”
6
A directive is an instruction to the assembler to do certain configurations before it even assembles the
code; the .include directive tells the assembler to include certain ‘library files’ before assembly. Below
are some of the commonly encountered directives. Note that the assembler directives start with a full
stop (synonymous to the # in C and C++)
 .EQU :- This directive is used to assign a name to a constant value that cannot be changed later. e.g.
.EQU COUNT = 100
 .DEF :- This directive is also called DEFine register. It defines a synonym for a register. e.g.
.DEF MyRegisterForAge = R18
 .ORG :- This directive is used to specify a location in memory (program or data) where the program
following the directive is to be placed, i.e. the program origin. It is like initializing the program
counter.
AVR programming in Atmel Studio allows inclusion of definition files in which SFRs have been declared by name so that
you can call SFRs by names and not addresses e.g. PORTB instead of 0x25. This definition file must be included using
.include “m328Pdef.inc” . The 328P part must be edited to correct controller in use.
EEE 4135
7
Assuming that the program below is burned into the ROM of an AVR chip, the following is a step-by-step
description of the action of the AVR upon applying power to it:
1. When the AVR is powered up, the PC (program counter) has 00000 and starts to fetch the first
instruction from location 00000 of the program ROM. In the case of the program below the first
code is the code for moving operand 0x25 to R16. Upon executing the code, the CPU places the
value of 0x25 in R16. Now one instruction is finished. Then the program counter is incremented to
point to 00001 (PC = 00001), which contains the machine code for the instruction "LDI R17 , 0x34".
EEE 4135
8
2.
3.
4.
Upon executing the machine code, the value 0x34 is loaded to R17. Then the program counter is
incremented to 0002.
ROM location 0002 has the machine code for instruction "LDI R18 , 0x31". This instruction is
executed and now PC = 0003.
This process goes on until all the instructions are fetched and executed.
EEE 4135
9
 To write code for embedded systems, a developer needs a text editor (source code editor), a compiler/assembler
program, a linker and a debugger.
 Atmel uses Atmel Studio as an IDE for both ARM and AVR devices. It is a free software package that has a large
library of free source code examples.
 For Atmel Studio, the output file that gets downloaded into the flash memory of the MCU is the machine code
(HEX file).
 For example, if the source file is created in assembly language, it will have a .asm extension.
 After assembling, the assembler will produce several files as shown on the next slide; namely .eep, .hex, .map, .lst,
.obj
 AVRPROG.exe is an Atmel software component that actually burns the HEX file into the MCU.
 AVRDude (AVR Downloader Uploader) is also another free software that can be used to burn the HEX file into the
ROMEEE 4135
10
CODE
EDITOR
ASSEMBLER PROGRAM
code.eep
code.hex
code.map
(Downloaded to
AVR EEPROM)
(Burnt to Code
ROM)
Shows labels and
their values
EEE 4135
code.lst
code.obj
Shows the code in
binary and in
hexadecimal
Used by simulator
11
 Recall from EEE3131 that flash memory requires some voltage and proper addressing to be written to. This is
implemented by special hardware called a PROGRAMMER or BURNER.
 For AVR, there are a number of programmers in use. The programmers can be 10 pin or 6 pin. The common types
of programmers are:
1) USBaspISP: Possibly the cheapest. It is composed of an ATmega88 or ATmega8 and a few passive components
that allow writing of data to the target chip. The code therefore is directed from the computer through USBasp
(MASTER) to the target chip (SLAVE). [Cost: about $8 on Amazon]
2) USBTinyISP: A slight improvement over USBasp. Note that USBTiny has limitations on the size of memory it can
program. It uses an ATtiny2313. [Cost: about $20]
3) Atmel-ICE: Official programmer by Atmel for their AVR chips. It gets used complete with debugger. It is the most
expensive of the three. [Cost: over $120]
EEE 4135
12
USBaspISP (Interior)
EEE 4135
USBaspISP (Exterior)
13
USBTinyISP (Interior)
USBTinyISP (Exterior)
EEE 4135
14
Atmel-ICE (Exterior)
EEE 4135
Atmel-ICE (Interior)
15
 Previously, we introduced assembly language mnemonics for dealing with data within GPRs [LDI Rd, K],
[ADD Rd, Rr], [MOV Rd,Rr].
 We’ll now look at the mnemonics for data transfer from any section of RAM Space to GPRs [LDS Rd, K]
and also from GPRs to any part of the RAM space [STS K, Rd].
 Then introduce assembly instructions to specifically transfer data from SFRs to GPRs [IN instruction] and
the other way round, from GPRs to SFRs [OUT instruction].
 These will be helpful to write values for output to a PORT (or to other SFRs) or read data as input from a
PORT (or from other SFRs)
EEE 4135
16
LDS :- LoaD direct from data Space.
Syntax:
LDS Rd, K
;load GPR d with value in address K from anywhere in the memory space
Features
 Loads data from anywhere in the RAM space (GPR or SFR or SRAM) to any GPR.
 K (source register) should always be specified in terms of an address,
For example, to add data in some sections of SRAM, we will need to first load it into GPRs and then add:
LDS R0, 0x300 ;contents of 0x300 are copied into R0
LDS R1, 0x302 ;contents of 0x302 are copied into R1
ADD R1, R0
;add R0 to R1
The instruction above is executed assuming 0x300 and 0x302 were pre-loaded with data or they are SFR
with data that comes from an operation. A pictorial representation of the execution is shown in next
slide.
EEE 4135
17
STS :- STore direct to data Space.
Syntax:
STS K, Rd
;store to any location (addressed by K) of memory space with data from any GPRs
Features
 Stores data to any part of RAM (GPR or SFR or SRAM) from any GPR.
 K (destination register) should always be specified in terms of an address,
For example, we can write (store) some user defined data to the output ports (PORTB = 0x38, PORTC =
0x35, PORTD = 0x32)
LDI R16, 0x55 ;load R16 with hex 55
STS 0x38, R16 ;store contents of R16 to PORTB
STS 0x35, R16 ;store contents of R16 to PORTC
STS 0x32, R16 ;store contents of R16 to PORTD
EEE 4135
18
A few points to note when dealing with SFR(IO memory) instructions;
 IO memory has two kinds of addresses Data RAM Address and IO Addresses.
Taking the ATmega32 for example, the
SFRs can be addressed by using RAM
addresses, 32 to 95 (0x20 to 0x5F), as
well as by using unique IO Memory
address which run from 0 to 63 (0x00
to 0x3F).
EEE 4135
19
IN Rd, A ; load any GPR with data from IO address A
;0 ≤ 𝐝 ≤ 31 and 0 ≤ 𝐀 ≤ 63
Features:
 The IN instruction fetches data from the SFR (IO Memory) only (64 address locations, 0 to 63). To that
effect, the IN instruction uses IO Addresses and not data memory addresses.
For example, to load the decimal number 50 into GPR number 19, we use:
IN R19, 0x10 ; load R19 GPR with data from SFR location 0x10 (from SFR memory map, 0x10 = PIND)
In short, the instruction above is reading data from PIND.
EEE 4135
20
 LDS is a four byte instruction, i.e. it has to be divided into two 16-bit pieces (two words) as shown
below. The first word is a mixture of opcode and destination register address. The second word
contains only the source memory address (16-bits)
 LDS Rd, K ; load from memory location K to GPR register Rd
0  d  31 5-bit addresses for the 32 Bytes of GPRs
0  K  65535 16-bit addresses for the 64K RAM
EEE 4135
21
On the other hand, the IN instruction is a two byte instruction as shown below(16-bit word); the first five
bits are for the opcode and the rest of the bits are used for the IO Address of the source SFR (6-bits)
mixed with the address of the destination GPR (5-bits)
IN Rd, A ; load from address A of IO Memory into register Rd.
From the instruction lengths indicated for LDS and IN, we can see that in as much as we can use either
one of them to read values from an IO register, LDS takes two clock cycles to fully execute (32-bits)
while IN takes only one clock cycle (16-bits).
So if a clock of 8MHz is in use, a complete cycle is (1/8000000) seconds = 0.125 micro-seconds; meaning
that LDS takes 0.25 microseconds while IN takes 0.125 microseconds.
0  d  31 0  A  63
EEE 4135
22
OUT A, Rr ; store GPR r to IO location A
;0 ≤ 𝐫 ≤ 31 and 0 ≤ 𝐀 ≤ 63
OUT instruction is equivalent to IN but data moving in the opposite direction: from GPR to SFR.
STS achieves the same result as OUT but it should be noted that OUT (just like its opposite, IN) is a two
byte instruction (single clock cycle) while STS is a four byte instruction (two clock cycles).
STS and LDS have the advantage of increased range of addresses they can take as operands.
STS K, Rr
OUT A, Rr
0 ≤ 𝐫 ≤ 31
0 ≤ 𝐀 ≤ 63
EEE 4135
0 ≤ 𝐫 ≤ 31
0 ≤ 𝐊 ≤ 65535
23
To run a loop more than 255 times, nested loop is used (loop inside loop). The maximum number of
times a particular loop is repeated becomes the product of the counters per loop.
The code below loops 700 times, i.e. it complements bits on PORTB 700 times:
.include “m328pdef.inc”
LDI R16, 0x55
OUT PORTB, R16
LDI R20, 10
LOOP_1: LDI R21, 70
LOOP_2: COM R16
OUT PORTB, R16
DEC R21
BRNE LOOP_2
DEC R20
BRNE LOOP_1
EEE 4135
;load R16 with 0x55
;send the contents of R16 to PORTB
;load decimal 10 into R20 (counter for outer loop)
;load R21 with decimal 70 (counter for inner loop)
;decrement R21 by one and store in R21 (inner loop)
;repeat the decrement 70 times
;decrement R20 by one and store in R20 (outer loop)
24
EEE 4135
25
EEE 4135
26
EEE 4135
27
EEE 4135
28
EEE 4135
29
EEE 4135
30
.INCLUDE “m328pdef.inc”
.ORG 0x00
LDI R16, HIGH(RAMEND) ;loads R16 with the higher byte of RAMEND
OUT SPH, R16
;higher byte of SP will have the higher byte of RAMEND
LDI R16, LOW(RAMEND) ;loads R16 with the lower byte of RAMEND
OUT SPL, R16
;the lower byte of SP will have the lower byte of RAMEND
LDI R16, 0x55
;load R16 with 0x55
COM R16
OUT PORTB, R16
CALL DELAY_1S
RJMP BACK
;ones complement of contents of R16
;send the contents of R16 to port B register (actual pins)
;call a function called DELAY_1S
;relative jump to BACK i.e. keep doing this indefinitely
BACK:
DELAY_1S is shown on next page ->
EEE 4135
31
DELAY_1S:
LDI R20, 32
L1:
LDI R21, 200
L2:
LDI R22, 250
L3:
NOP
NOP
DEC R22
BRNE L3
DEC R21
BRNE L2
DEC R20
BRNE L1
RET
;number of decrements for outer loop
;number of decrements for middle loop
;number of decrements for innermost loop
Neglecting the middle loop and the outermost loop, this delay function is approximately 1 second in duration.
Shown next is how the 1 second is derived…
EEE 4135
32
Using a clock of 8MHz, a clock cycle will be 1/8000000 = 125nano seconds long. Each instruction in the
inner loop has the following number of clock cycles:
NOP => 1
NOP => 1
DEC => 1
BRNE => 2
Total cycles in inner loop is 5 clock cycles
This means that the five clock cycles in inner loop are repeated 250 times (DEC R22), but the inner
loops are also repeated 200 times by the middle loop (DEC R21) and further more, the middle loop is
repeated 32 times by the outermost loop (DEC R20).
Therefore, the five clock cycles of the inner loop are done 250 x 200 x 32 = 1 600 000 times.
i.e. 4 four instructions (5 clocks) done 1600000 times with each clock lasting 125 nanoseconds implies
a total duration of: 5 x 1 600 000 x 125 nanoseconds = 1 sec [exact is (3x250+2x249 +1)x200x32
=0.999s]
So why did we use only the inner loop for approximating the total duration of the delay?
EEE 4135
33
DELAY_1S:
LDI R20, 32
L1:
LDI R21, 200
L2:
LDI R22, 250
L3:
NOP
NOP
DEC R22
BRNE L3
DEC R21
BRNE L2
DEC R20
BRNE L1
RET
EEE 4135
;number of decrements for outer loop
;number of decrements for middle loop
;number of decrements for innermost loop
Two instructions >> 3 clock cycles>> 200 decrements done 32 times (outer loop)
>> 3x200x32 = 19200 clocks >> 19200 clocks x 125nanoseconds/clock = 0.0024
sec
Two instructions >> 3 clock cycles>> 32 decrements
>> 3x32 = 96 clocks >> 96 clocks x 125 nanoseconds/clock = 0.000012 sec
34
LDI is one (1) machine cycle and RET is four (4) machine cycles. With this in mind, we have neglected:
 LDI R20, 32 which is done once per call to DELAY_1S thus 125 ns long = 0.000000125 sec.
 LDI R21, 200 which is done 32 times per call to DELAY_1S thus 32x125ns long = 0.000004 sec.
 LDI R22, 250 which is done 200 x 32 times per call to DELAY_1S thus 200x32x125ns long = 0.0008 sec.
 RET instruction which is done once per call to DELAY_1S thus 4x125ns long = 0.0000005 sec.
The total neglected time consisting of the above instructions plus the DEC and BRNE in middle and outer loops is
0.000000125 + 0.000004 + 0.0008 + 0.0000005 + 0.0024 + 0.000012 = 0.003216625 seconds
You can see how negligible the neglected components are. It will therefore be safe to only use the inner loop for
calculation of your delay time in most cases.
EEE 4135
35
EEE 4135
36
Download