Defence Against the Dark ASM Harun Šiljak Preface This short text is meant to serve as a quick reference for Nios II assembly language. It does not attempt to replace the official Nios II processor reference guide, as it is the most authoritative reference, providing comprehensive information and all necessary details. This booklet on the other hand gives the basic information needed for drafting relatively simple code for Nios II, troubleshooting assembly language and hex files and getting a brief overview of Nios II architecture and assembly syntax from the practical perspective. Technical details on how to use Altera software for both hardware implementation of Nios II and its programming are not provided, as they are not within the scope of this text. Trademarks for Harry Potter references belong to J.K. Rowling, her publishers and Warner Bros. Nios II is a registered trademark of the Altera Corporation. This text is not an official publication and no copyright infringement is intended. M`a˚r`gˇi‹n`a˜l ”n`o˘t´e˙s `a`d`d`e´dffl ˜b“y ˚t‚h`e H`a˜l¨fB˜l´oˆoˆdffl P˚r˚i‹n`c´e `a˚r`e ˚u¯sfi˚u`a˜l¨l›y `c´o“m‹m`e›n˚ts˙ `a˜bˆo˘u˚t `e›x´c´e˙p˚tˇi`o“n¯s `a‹n`dffl `o˘t‚h`eˇrffl ˚u‹n`e›x¯p`e´cˇt´e´dffl ˜bfle‚h`a‹v˘i`o˘u˚rffl. I˜f ”y´o˘uffl ¯sfi˚tˇi˜l¨l ˚t‚h˚i‹n˛kffl `c´o¸p‹yˇr˚i`g‚h˚t ˚i¯s ˚i‹nffl- ˜fˇr˚i‹n`g´e´dffl, `c´o“n˜fˇu‹n`d`o. LOAD MOV OR TRIM DEL 3 Contents Preface Chapter 1. 3 Marauder’s Map of Nios II 5 Chapter 2. ASM Spells with examples for muggles 2.1. Types of instructions 2.2. Arithmetic & logic instructions 2.3. Comparison instructions 2.4. Branching instructions 2.5. Subroutines and exception handling instructions 2.6. Miscellaneous instructions 2.7. Moving and data manipulation instructions 2.8. Assembler macros 9 9 9 11 12 12 13 14 15 Chapter 3. ASM spells for wizards 3.1. Instruction fields 3.2. Machine code example 16 16 18 4 CHAPTER 1 Marauder’s Map of Nios II Nios-II based media computer used in this course is built on Altera’s DE2-115 development board. Its schematic is given in Figure 1. Memory: The computer has 128 MB of SDRAM organised as 32M x 32 bits. SRAM on the board is organised as 1M x 16 bits, but has a 32-bit interface. Finally, 8 Kbyte of memory on the FPGA chip itself is used as a character buffer for the video-out port and is organised as 8K x 8 bits. Parallel ports: Parallel ports have up to four 32-bit registers: writable or readable Data register, optional Direction register for input-output registers, as well as Mask and Edge registers used for interrupts. 18 red and 9 green LEDs have only Data register as they are output ports. Similarly, the 18 slider switches have only Data register as they are input ports. The four pushbutton parallel port can be used for interrupts (except for KEY0, which is the reset button for the Nios-II). Figure 2 shows the organisation of registers for pushbuttons. For the access of Data register in case of general purpose pins from JP5, take note that some of them are not used as inputs/outputs but as supply pins (see Figure 3). Communication: JTAG port is communicating between the DE2-115 media computer and the host computer, for programming and monitoring. It also includes Figure 1.0.1. The DE2-115 media computer 5 U˙p¯p`eˇrffl ˜b˘i˚t˙s ˚i‹nffl ”wˆo˘r`dffl `a`c´c´e˙sfi¯s ˚t´o ˚t‚h`e˙sfi`e D`a˚t´affl ˚r`e´gˇi¯s˚t´eˇr¯s `a˚r`e ˚i`g›n`o˘r`e´dffl. 1. MARAUDER’S MAP OF NIOS II 6 Figure 1.0.2. Pushbutton parallel port Figure 1.0.3. General purpose pins and their Data register numeration Figure 1.0.4. UART registers a UART , and the registers for it are shown in Figure 4. The serial port of the media computer also implements an UART, now connected to RS-232 chip. Register organisation is equivalent to the JTAG one. JTAG port can also be used to query the System ID module and confirm that the media computer is properly configured. Timer: The media computer also features a 50-Mhz clock based timer able to produce interrupts with a register structure shown in Figure 5. Media components: LCD display is controlled with two 8-bit registers, one for the instruction related to placing the cursor and the other one for the character to be placed (Figure 6). For the video output, each pixel can be coloured with an RGB value. For the audio port, four related registers are shown in Figure 7. For PS/2 port (for keyboard/mouse), see Figure 8. Y´o˘uffl `c´a‹nffl ˚u¯sfi`e UART ˚t´o ¯sfi`e›n`dffl `a‹nffl `o“w˝l ˚t´o ˚t‚h`e ˛h`o“m`e `c´o“m¯p˚u˚t´eˇrffl `a‹n`dffl ”v˘i`c´e ”vfleˇr¯sfi`affl UART `c´a‹nffl ˚tˇr˚i`g`g´eˇrffl `a‹nffl ˚i‹n˚t´eˇr˚r˚u¯p˚t, `d`o“nffl’˚t ˜f´o˘r`g´eˇt ˚t‚h`a˚t. 1. MARAUDER’S MAP OF NIOS II 7 Figure 1.0.5. Registers of the interval timer Figure 1.0.6. LCD display registers Figure 1.0.7. Audio port registers Figure 1.0.8. PS/2 registers In Table 1, base and end addresses of I/O peripherals are given for reference. Between these two addresses, relevant registers are positioned and may be accessed for peripheral control. For more details on the use of some more complicated peripherals, check the Altera University Program Media Computer Manual. 1. MARAUDER’S MAP OF NIOS II Base address End address I/O peripheral 0x00000000 0x07FFFFFF SDRAM 0x08000000 0x81FFFFFF SRAM 0x10003020 0x1000302F Pixel buffer control 0x09000000 0x09001FFF On-chip memory character buffer 0x10003030 0x10003037 Character buffer control 0x10000000 0x1000000F Red LED parallel port 0x10000010 0x1000001F Green LED parallel port 0x10000020 0x1000002F 7-segment HEX3-HEX0 displays parallel port 0x10000030 0x1000003F 7-segment HEX7-HEX4 displays parallel port 0x10000040 0x1000004F Slider switch parallel port 0x10000050 0x1000005F Pushbutton parallel port 0x10000060 0x1000006F JP5 expansion parallel port 0x10000100 0x10000107 PS/2 port 0x10000108 0x1000010F PS/2 port dual 0x10001000 0x10001007 JTAG UART port 0x10001010 0x10001017 Serial port 0x10002000 0x1000201F Interval timer 0x10002020 0x10002027 System ID 0x10003000 0x1000301F Audio/video configuration 0x10003040 0x1000304F Audio port 0x10003050 0x10003051 LCD display port Table 1. Memory map of DE2-115 Media Computer 8 CHAPTER 2 ASM Spells with examples for muggles 2.1. Types of instructions There are three basic types of instructions in Nios II assembly language. • R-type instructions are executed with values from registers (at most three of them, usually denoted in this text as RA, RB, RC) and possibly a 5-bit constant (immediate constant, usually denoted as IMM5). • I-type instructions are executed on at most two registers (usually denoted in this text as RA, RB) and a 16-bit constant (immediate constant, usually denoted as IMM16). • J-type instructions are executed on a 26-bit constant (immediate constant, usually denoted as IMM26) and perform a jump to the address defined by IMM26. Note that there is a certain number of pseudo-instructions in the Nios II instruction set: they are translated into real instructions before making the machine code, i.e. they exist only before compilation. Also bear in mind that assembly macros in Nios II assembly are used to extract high and low bits from constants efficiently, and they are not instructions. Which instructions are pseudo-instructions and which are real doesn’t matter for the muggles. For wizards investigating the machine code, it is important, as the pseudo-instructions have no machine code of their own. 2.2. Arithmetic & logic instructions 2.2.1. Basic Arithmetics. Nios II assembly provides instructions for addition, subtraction, division and multiplication, although not all Nios II processors may have the latter two implemented. In case you wish to add numbers in two registers and place the result into a third register, use an R-type instruction add RC, RB, RA. In case you are adding an immediate constant (16-bit) to contents of a register, use an I-type instruction addi RB, RA, IMM16. Note that the constant is sign-extended to 32 bits, so if the constant was 0x8000, it would be padded with ones to 0xFFFF8000, and if it was 0x7000 it would be padded with zeros to 0x00007000. This is arithmetical padding. In the same fashion, subtraction is done by using the R-type instruction sub RC, RA, RB and I-type subi RB, RA, IMM16. However, subi is a pseudo-instruction, implemented as addi RB, RA, -IMM16. Division, if implemented in the particular Nios II processor, is only possible with numbers already in registers and it results in the integer part of the quotient. Two I˜f `d˚i‹v˘i¯sfi˚i`o“nffl possible R-type instructions are div RC, RA, RB which assumes the two numbers `a‹n`dffl/`o˘rffl ”m˚u˜lˇtˇifflin input registers are signed, and divu RC, RA, RB where numbers being divided ¯p˜lˇi`c´a˚tˇi`o“nffl `a˚r`e ”n`o˘t are taken as unsigned. If the processor has multiplication implemented, there are five possible ways to ¯sfi˚u¯p¯p`o˘r˚t´e´dffl ˜b“y perform it. If an immediate 16-bit constant is multiplied with contents of a register, ¯p˚r`oˆc´e˙sfi¯sfi`o˘rffl, use an I-type instruction muli Rresult, Rone, IMM16. Result is the 32 low-order ˚t‚h`e `e›x´c´e˙p˚tˇi`o“nffl bits. Analogously, mul RC, RA, RB results in 32 low-order bits of a product of `a‹nffl 9 U”n˚i‹m¯p˜l´e›m`e›n˚t´e´dffl ˚i‹n¯sfi˚tˇr˚u`cˇtˇi`o“nffl ˚i¯s `g´e›n`eˇr`a˚t´e´dffl. 2.2. ARITHMETIC & LOGIC INSTRUCTIONS 10 numbers in two registers. If you need the 32 high-order bits and you are multiplying two signed integers in registers, use mulxss RC, RA, RB. If the integers are to be taken as unsigned, use mulxss RC, RA, RB. Finally, the instruction mulxsu RC, RA, RB treats the contents of Rone as signed and Rtwo as unsigned integer. As an example, we will calculate the division remainder for numbers in two registers. div mul sub r6 , r4 , r 5 r7 , r6 , r 5 r8 , r4 , r 7 Note that in this case, we know that multiplication is not going to produce a larger number than the number stored in r4, hence it is known that it only comprises of 32 low-order bits. 2.2.2. Basic bitwise logical operations. Logical bitwise operations directly implemented in Nios II assembly are AND, OR, XOR and partially NOR. Logical bitwise conjunction for contents of two registers is performed by R-type instruction and RC, RA, RB. When a 16-bit constant is used, it can be conjuncted with the 16 low-order bits of a register (i.e. padding it with 16 zeros to the left before conjunction) using I-type instruction andi RB, RA, IMM16. If the conjunction of the constant should be performed with the 16 high-order bits of a register (i.e. padding it with 16 zeros to the right first), instruction andhi RB, RA, IMM16 is used. Note that there is no sign extension, padding is always done with zeros. This is called logical padding. Logical bitwise disjunction is performed in the same manner: for numbers stored in two registers, R-type instruction or RC, RB, RA is used. If an immediate 16-bit constant is used with the 16 low-order bits of a register, the instruction is ori RB, RA, IMM16, while disjunction with the 16 high-order bits is performed with orhi RB, RA, IMM16. In the same manner, xor RC, RA, RB is used for exclusive disjunction of two registers. If an immediate 16-bit constant is used with the 16 low-order bits of a register, the instruction is xori RB, RA, IMM16, while exclusive disjunction with the 16 high-order bits is performed with xorhi RB, RA, IMM16. Bitwise logical NOR operation only exists in R-type instruction form for two registers as nor RC, RA, RB. As an example, we will change the first and the last bit in a register. ori orhi r4 , r4 , 0 x0001 r4 , r4 , 0 x8000 2.2.3. Shifting. Contents of a register can be rotated (i.e. circularly shifted) either to the left or to the right. Rotation to the left can be performed by means of an R-type instruction as rol RC, RA, RB where the first register will contain the contents of the second register shifted to the left n positions, where n is the number represented with 5 least significant bits of the third register (other 27 bits are ignored). Since this is rotation, the bits leaving the register on the most significant bit side reappear on the least significant side. If the number of rotation positions should be given as an immediate value, another R-type instruction is used, roli RC, RA, IMM5. Note that this is an R-instruction as it takes a 5-bit immediate value, and not an 16-bit one as the I-instructions. If you need a right circular shift (rotation) it can be done as ror RC, RA, RB where again the first register takes the value of the second register after rotation to A”n`o˘t‚h`eˇrffl `e›x´c´ep ˙ ffl- ˚tˇi`o“nffl, D˚i‹v˘i¯sfi˚i`o“nffl `eˇr˚r`o˘rffl `d`eˇt´e´cˇt˙s `d˚iffl”v˘i`d`e ˚i‹n¯sfi˚tˇr˚u`cˇtˇi`o“n¯s ˚t‚h`a˚t ¯p˚r`oˆd˚u`c´e `affl `qfi˚u`o˘tˇi`e›n˚t ˚t‚h`a˚t `c´a‹nffl’˚t ˜bfle ˚r`e˙p˚r`e˙sfi`e›n˚t´e´dffl: `a‹n`dffl `d˚i‹v˘i¯sfi˚i`o“nffl ˚t‚h`e ˜l´a˚r`g´e˙sfi˚t `d˚i‹v˘i¯sfi˚i`o“nffl ˜b“y ˚z´eˇr`o ˚tˇi‹vfle -1. `o˝f ”n`e´g´affl- ”n˚u‹m˜bfleˇrffl ˜b“y 2.3. COMPARISON INSTRUCTIONS 11 the right n positions, where n is the number represented with 5 least significant bits of the third register. However, in the default implementation of Nios II processor, there is no rori, but its behaviour can be achieved by using roli with 32 − n as the immediate value. If the shift is supposed to be non-circular, then it is assumed that free places in a register whose contents are being shifted are filled with zeros. Left logical shift is done either with R-type instruction sll RC, RA, RB or another R-type instruction slli RC, RA, IMM5. The logic is the same as in the case of rotation. Logical shift to the right is similarly done with srl RC, RA, RB or with srli RC, RA, IMM5. Another type of shift to the right is the arithmetic shift done with sra RC, RA, RB or with srai RC, RA, IMM5. Unlike logic shift, filling the newly freed places in the register is in this case done with duplicating the sign bit, just like in arithmetic operations covered in the beginning of this chapter. Notice that the arithmetic shift is not implemented for left shift, as in the case of left shift, the empty places appear on the least significant bits, making the padding with sign bit nonsensical. As an example, we will introduce multiplication by 4, signed and unsigned division by 4 respectively using shifts. slli srai srli r4 , r4 , 0 x02 r4 , r4 , 0 x02 r4 , r4 , 0 x02 2.3. Comparison instructions Comparison instructions place a boolean value (zero or one) in a register based on comparison ARB where A is a register, and B can be a register or an immediate value. An R-type instruction cmpeq RC, RA, RB compares RA and RB and if they are equal, places 1 in RC, otherwise 0. The I-type equivalent cmpeqi RB, RA, IMM16 does the same but with an immediate value. Inversely, cmpne RC, RA, RB compares RA and RB and if they are not equal, places 1 in RC, otherwise 0. The I-type equivalent cmpnei RB, RA, IMM16 does the same but with an immediate value. Similarly, cmpge RC, RA, RB compares RA and RB and if RA ≥RB, places 1 in RC, otherwise 0, while the I-type equivalent cmpgei RB, RA, IMM16 does the same but with an immediate value. If RA and RB are supposed to be considered unsigned values, this comparison is done with cmpgeu RC, RA, RB for registers and cmpgeui RB, RA, IMM16 for a register and an immediate value. Similarly, cmplt RC, RA, RB compares RA and RB and if RA < RB, places 1 in RC, otherwise 0, while the I-type equivalent cmplti RB, RA, IMM16 does the same but with an immediate value. If RA and RB are supposed to be considered unsigned values, this comparison is done with cmpltu RC, RA, RB for registers and cmpltui RB, RA, IMM16 for a register and an immediate value. The rest of the comparison instructions are pseudo-instructions, implemented using those above. cmpgt RC, RA, RB places 1 in RC if RA>RB, 0 otherwise and it is implemented as cmplt with swapped parameters. Its immediate equivalent cmpgti RB, RA, IMM16 is implemented as cmplti with swapped parameters. The unsigned equivalents cmpgtu RC, RA, RB and cmpgtui RB, RA, IMM16 are implemented as cmpltu and cmpltui with swapped parameters. 2.5. SUBROUTINES AND EXCEPTION HANDLING INSTRUCTIONS 12 cmple RC, RA, RB places 1 in RC if RA ≤ RB, 0 otherwise and it is implemented as cmpge with swapped parameters. Its immediate equivalent cmplei RB, RA, IMM16 is implemented as cmpgei with swapped parameters. The unsigned equivalents cmpleu RC, RA, RB and cmpleui RB, RA, IMM16 are implemented as cmpgeu and cmpgeui with swapped parameters. 2.4. Branching instructions Nios-II assembly offers several branching instructions. An I-type instruction beq RA, RB, label moves the execution of the program to the PC+4+IMM16 denoted by the label if contents of the two registers are the same. Otherwise, it continues with the next instruction. The I-type instruction bge RA, RB, label does the same under the condition RA≥RB for signed values in registers. An I-type instruction bgeu RA, RB, label does the same with unsigned values. Signed comparison RA>RB branching is performed with an I type instruction blt RA, RB, label while the unsigned version of it is bltu RA, RB, label. Signed comparison RA>RB branching is performed with a pseudo-instruction bgt RA, RB, label which is interpreted as blt with swapped parameters. Unsigned version of it, bgtu RA, RB, label is interpreted as bltu with swapped parameters. Signed comparison RA≤RB branching is performed with a pseudo-instruction ble RA, RB, label which is interpreted as bge with swapped parameters. Unsigned version of it, bleu RA, RB, label is interpreted as bgeu with swapped parameters. Branching if the two registers are not equal is done with an I type instruction bne RA, RB, label. An unconditional branching (a GOTO) is performed by an I type instruction br label. If the address where the program should continue is not a constant (an immediate value) but a calculated value in a register, then an R-type instruction jmp RA is used. Finally, if the full address in the 256 MB range of PC has to be provided, J-type instruction jmpi label where label is an IMM26 is used. The jump is performed to PC[31..28]:IMM26x4. S˚i‹n`c´e ˚t‚h`e `d`e˙sfi˚i˚r`e´dffl `a`d`d˚r`e˙sfi¯s ”m`a‹y ˛h`a¯pffl¯p`e›nffl ˚t´o ˜bfle ”n`o˘t `d˚i‹v˘i¯sfi˚i˜b˝l´e ˜b“y 4 (˚iffl.`e. ”n`o˘t ˚t´o `e›n`dffl ˚i‹nffl 00 `a¯s `a‹nffl `a`d`d˚r`e˙sfi¯s ˚i¯s ¯sfi˚u¯p¯p`o¸sfi`e´dffl ˚t´o, `a˜l¨l ˜b˘r`a‹n`c‚h˚i‹n`g ˚i‹nffl¯sfi˚tˇr˚u`cˇtˇi`o“n¯s `e›x´c´e˙p˚t "m˚i¯sfi`a˜lˇi`g›n`e´dffl `d`e˙s- ¯j›m¯p˚iffl `c´a‹nffl ˚t‚h˚r`o“w ˚tˇi‹n`a˚tˇi`o“nffl `a`d`d˚r`e˙sfi¯s" `e›x´c´e˙p˚tˇi`o“nffl. 2.5. Subroutines and exception handling instructions Subroutines in Nios II assembly language are called with a J-type instruction call label. Here, the label denotes the address in the 256 MB space determined by the highest four bits of the PC, and hence defines the next 26 bits of the PC (last two bits in PC are zero for alignment). So, the label (address) actually represents an IMM26 value, as expected in a J-type command. The instruction performs the following: r a <− PC+4 PC <− PC [ 3 1 . . 2 8 ] : IMM26: 0 0 which essentially saves the return address (the address of the instruction right after the call instruction) for the happy return from subroutine and moves the PC to the place where label forwarded in call points at. It is also possible to call a subroutine in register, i.e. to point to an instruction by giving its address (the whole address, i.e. the whole new content of PC) in a register with the R-type instruction callr Raddress. Calls of subroutines access the PC, or to be more precise, get the address of the next instruction and place it in the ra register. The only way to access PC S˚i‹n`c´e ˚t‚h`e `c´o“n˚t´e›n˚t `o˝f ˚t‚h`e ˚r`e´gˇi¯sfi˚t´eˇrffl ”m`a‹y ˛h`a¯p¯p`e›nffl ˚t´o ˜bfle ”n`o˘t `d˚i‹v˘i¯sfi˚i˜b˝l´e ˜b“y 4 (˚iffl.`e. ”n`o˘t ˚t´o `e›n`dffl ˚i‹nffl 00 `a¯s `a‹nffl `a`dffl`d˚r`e˙sfi¯s ˚i¯s ¯sfi˚u¯p¯p`o¸sfi`e´dffl ˚t´o, ˚t‚h˚i¯s ˚i‹n¯sfi˚tˇr˚u`c˚tˇi`o“nffl `c´a‹nffl ˚t‚h˚r`o“w "m˚i¯sfi`a˜lˇi`g›n`e´dffl `d`e˙s˚tˇi‹n`a˚tˇi`o“nffl `a`d`d˚r`e˙sfi¯s" `e›x´c´e˙p˚tˇi`o“nffl. 2.6. MISCELLANEOUS INSTRUCTIONS 13 directly (actually, PC+4 again) is an R-type instruction nextpc RC. The content of PC incremented by four is saved in the specified register. Return from a subroutine simply returns the content of ra register to PC and it is performed by an R-type instruction without parameters ret. Return from an exception is done with an R-type instruction eret. The content of ea register moves to PC and content of estatus moves to status. An R-type instruction trap is used either as trap or trap IMM5 to save the address of the next instruction in ea register, contents of status to estatus, disable interrupts and start the exception handler. IMM5 is used only for debugging purposes. Registers like status, estatus etc. are called control registers and can be read and written in using an R-type instruction for reading, rdctl RC, N which rcopies the contents of Nth control register to register RC, and a writing instruction wrctl N, RA which writes the contents of register RA into Nth control register. An I-type instruction rdprs RB, RA, IMM16 reads from register RA in the previous register set, adds sign-extended value IMM16 to its value and places it in RB. This only functions if the version of Nios II used allows shadow register sets. Writing in the previous register set is done via R-type instruction wrprs RC, RA. It copies the value of register RA in the current register set to register RC in previous register set. Note that to write to an arbitrary register set, software can insert the desired register set number in status.PRS prior to executing wrprs . 2.6. Miscellaneous instructions The most powerful magical unforgivable curse of Nios II language is an R-type instruction named custom. It enables introduction of 256 different custom user designed instructions to Nios II assembly. You design a custom hardware structure using hardware description adjacent to the Nios II ALU which can use two registers as inputs and one as an output (but it doesn’t have to, it can use its own custom registers). The syntax is custom N, xresult, xone, xtwo where x can stand either for R, general purpose Nios II register, or C, custom register. The part about machine code of custom instructions will provide more explanations. Most assembly languages include an instruction which does nothing and it is usually called nop. In Nios II assembly language nop is implemented as a pseudoinstruction nop. The instruction behind it is add r0, r0, r0. It is used to lose one instruction cycle for timing purposes. Debuggers place debugging breaking points using special R-type instructions. Such instructions are exclusively used by debuggers and hence they should not appear in exception handling routines, user programs and operating systems. Syntax of the breakpoint placement instruction is either break or break IMM5, where the 5bit immediate constant can be used by the debugger as the descriptor of breakpoint type. The effect of breakpoint is b s t a t u s <− s t a t u s PIE <− 0 U <− 0 ba <− PC + 4 PC <− break h a n d l e r a d d r e s s On the other hand, bret instruction returns from the break by performing the following: s t a t u s <− b s t a t u s PC <− ba S˚i‹n`c´e ˚t‚h`e `c´o“nffl˚t´e›n˚t `o˝f ˚t‚h`e ˚r`affl `o˘rffl `e´affl ˚r`e´gˇi¯sfi˚t´eˇrffl ”m`a‹y ˛h`a¯p¯p`e›nffl ˚t´o ˜bfle ”n`o˘t `d˚i‹v˘i¯sfi˚i˜b˝l´e ˜b“y 4 (˚iffl.`e. ”n`o˘t ˚t´o `e›n`dffl ˚i‹nffl 00 `a¯s `a‹nffl `a`d`d˚r`e˙sfi¯s ˚i¯s ¯sfi˚u¯p¯p`o¸sfi`e´dffl ˚t´o, ˚t‚h`e˙sfi`e ˚i‹n¯sfi˚tˇr˚u`cˇtˇi`o“n¯s `c´a‹nffl ˚t‚h˚r`o“w "m˚i¯sfi`a˜lˇi`g›n`e´dffl `d`e˙sfi˚tˇi‹n`a˚tˇi`o“nffl `a`dffl`d˚r`e˙sfi¯s" `e›x´c´e˙p˚tˇi`o“nffl. O˚t‚h`eˇr‹w˘i¯sfi`e, ˚i˚t ˚t‚h˚r`o“w¸s ˚t‚h`e ˚i˜l˜l´e´g´a˜l `o¸p`eˇr`a˚tˇi`o“nffl `e›x´c´e˙p˚tˇi`o“nffl. M`a‹n˚i¯p˚u˜l´a˚tˇi`o“nffl `o˝f `c´o“n˚tˇr`o˝l ˚r`e´g- ¯sfi`eˇt˙s, ˚tˇr`a¯p¯s `a‹n`dffl ˚i¯sfi˚t´eˇr¯s, `eˇr`eˇt `affl `o“n˜l›y ˚r`e´gˇi¯sfi˚t´eˇrffl `c´a‹nffl ˚t‚h˚r`o“w ¯sfi˚u¯p`eˇr‹v˘i¯sfi`o˘rffl˚i‹n¯sfi˚tˇr˚u`cˇtˇi`o“nffl `e›x´c´e˙p˚tˇi`o“nffl. O˜f¨f ˚t´o A˚z‚k`a˜bˆa‹nffl ”w˘i˚t‚hffl ”y´o˘uffl... `o˘rffl I”nffl˚t´e¨l. I˚t ˚t‚h˚r`o“w¸s `affl "b˘r`e´a˛k" `e›x´c´e˙p˚tˇi`o“nffl ”w˝h`e›nffl `e›x´e´cˇu˚t´e´dffl. I˚t ˚i¯s ¯p`o¸sfi¯sfi˚i˜b˝l´e ˚t´o ˛h`a‹vfle `affl ”m˚i¯sfi`a˜lˇi`g›n`e´dffl `a`d`d˚r`e˙sfi¯s ˚r`e´gˇi¯sfi˚t´eˇrffl. ˚i‹nffl ˜bˆaffl T‚h`e›nffl ˚t‚h`e ˜b˘r`eˇt ˚i‹n¯sfi˚tˇr˚u`c˚tˇi`o“nffl `c´a‹nffl ˜l´e´a`dffl ˚t´o "m˚i¯sfi`a˜lˇi`g›n`e´dffl `d`e˙s- ˚tˇi‹n`a˚tˇi`o“nffl `a`d`d˚r`e˙sfi¯s" `e›x´c´e˙p˚tˇi`o“nffl. I˜f ˚i˚t ˚i¯s `a`c´c´e˙sfi¯sfi`e´dffl ˚i‹nffl ˚u¯sfi`eˇrffl ”m`oˆd`e, `a‹n`dffl ”n`o˘t ˚i‹nffl ¯sfi˚u¯p`eˇr‹v˘i¯sfi`o˘rffl ”m`oˆd`e, ˚i˚t ˚t‚h˚r`o“w¸s ˚t‚h`e "sfi˚u¯p`eˇr‹v˘i¯sfi`o˘rffl`o“n˜l›y ˚i‹n¯sfi˚tˇr˚u`cˇtˇi`o“n" `e›x´c´e˙p˚tˇi`o“nffl. 2.7. MOVING AND DATA MANIPULATION INSTRUCTIONS 14 2.7. Moving and data manipulation instructions Cache manipulation in Nios II assembly is straightforward. Initialisation of cache line is done by using an I-type instruction initd IMM16(RA) which initialises the data cache line associated with address RA+IMM16 regardless of whether the address data is currently cached. On the other hand, I-type instruction initda IMM16(RA) initialises the cache line only when address data is currently cached. The R-type instruction initi RA initialises the instruction cache line associated with address RA. Similarly, I-type instruction flushd IMM16(RA) flushes the data cache line associated with address RA+IMM16 regardless of whether the address data is currently cached. On the other hand, I-type instruction flushda IMM16(RA) flushes the cache line only when address data is currently cached. The R-type instruction flushi RA flushes the instruction cache line associated with address RA. Finally, an R-type instruction flushp flushes the processor pipeline of any prefetched instructions. Loading data from memory or I/O peripherals is performed using a set of dual commands which can be explained on the example of a basic instruction for loading a byte from memory or I/O peripheral, ldb RB, IMM16(RA) or ldbio RB, IMM16(RA). Both of these instructions load a byte in RB from the address specified in RA, offset for the value of IMM16, but the former one may return the value from cache if cache is implemented. That is why the latter one is preferred for input/output devices. If there is no cache implemented, they perform the same operation. If the byte loaded should be unsigned, (zero extended in the register), use the I-type instruction ldbu RB, IMM16(RA) or ldbuio RB, IMM16(RA). Loading a half-word (16 bits) from memory or I/O peripheral is done by using an I-type instruction ldh RB, IMM16(RA) or ldhio RB, IMM16(RA), and if the half-word should be unsigned (zero-padded when loaded in the register), use I-type instruction ldhu RB, IMM16(RA) or ldhuio RB, IMM16(RA). Finally, if the whole word should be loaded, I-type instruction ldw RB, IMM16(RA) or ldwio RB, IMM16(RA) is used. Storing data to memory or I/O peripherals is done by similar instructions: stb RB, IMM16(RA) or stbio RB, IMM16(RA) stores a byte from RB to the address specified in RA, offset for the value of IMM16. However, the former one can be delayed by using cache, so the latter one is preferred for I/O peripherals. If a halfword should be written, sth RB, IMM16(RA) or sthio RB, IMM16(RA) is used, while for whole words, stw RB, IMM16(RA) or stwio RB, IMM16(RA) is used. Moving data from register to register and moving immediate values to registers is performed by using pseudo-instructions. Moving from register to register is done by mov RC, RA which is actually add RC, RA, r0. Moving signed immediate to a register is done by movi RB, IMM16 which is actually addi RB, r0, IMM16. Moving an unsigned immediate to a register is done by movui RB, IMM16 which is actually ori RB, r0, IMM16. Moving an immediate to a high half-word is done by movhi RB, IMM16 which is implemented as orhi RB, r0, IMM16. Moving an immediate address into word is done by movia RB, IMM32, which is in turn implemented as orhi RB, r0, %hiadj(IMM32) addi RB, RB, %lo(IMM32) (see the next section about assembler macros for reference). If you are writing a whole word (32-bit constant) to a register, you can do it in two steps, as: movhi ori RB, %h i ( v a l u e ) RB, RB, %l o ( v a l u e ) or movhi RB, %h i a d j ( v a l u e ) I˚t `c´a‹nffl ˚t‚h˚r`o“w `affl ”n˚u‹m˜bfleˇrffl `o˝f `e›x´c´e˙pffl- ˚tˇi`o“n¯s: ¯sfi˚u¯p`eˇr‹v˘i¯sfi`o˘rffl`o“n˜l›y `d`a˚t´affl `a`c´c´e˙sfi¯s, ”m˚i¯sfi`a˜lˇi`g›n`e´dffl `d`a˚t´affl `a`d`d˚r`e˙sfi¯s, TLB ¯p`eˇr‹m˚i¯sfi¯sfi˚i`o“nffl ”v˘iffl`o˝l´a˚tˇi`o“nffl, ˜f´a¯sfi˚t `o˘rffl `d`o˘u˜b˝l´e TLB ”m˚i¯sfi¯s `o˘rffl MPU ˚r`e´gˇi`o“nffl ”v˘i`o˝l´a˚tˇi`o“nffl. 2.8. ASSEMBLER MACROS addi 15 RB, RB, %l o ( v a l u e ) That is actually movia pseudo-instruction. 2.8. Assembler macros The following four assembler macros are implemented for convenience: • %hiadj(expression) Extract the upper 16 bits of expression and add one if the 15th bit is set. Useful to obtain zeros from sign-padded upper 16 bits in case expression is a signed 16-bit constant. • %hi(expression) Extract the upper 16 bits of expression. If expression was a signed 16-bit constant, the upper 16 bits might be padded with zeros or ones. • %lo(expression) Extract the lower 16 bits of expression. • %gprel(expression) Subtract the value of the symbol _gp from expression (global pointer). The intention of the %gprel relocation is to have a fast small area of memory which only takes a 16-bit immediate to access. CHAPTER 3 ASM spells for wizards 3.1. Instruction fields The following table lists machine code for all instructions. Pseudo-instructions have no machine code, as they are translated into real instructions before conversion to machine code. A, B and C in the table denote register arguments, IMM26, IMM16 and IMM5 denote immediate constants, N is the number of control register or the custom instruction, reada, readb and readc are bits for determining use of registers A, B and C in custom instructions. Number in parentheses denotes the number of bits used. Instruction add addi and andhi andi beq bge bgeu blt bltu bne br break bret call callr cmpeq cmpeqi cmpge cmpgei cmpgeu cmpgeui cmplt cmplti cmpltu cmpltui cmpne cmpnei custom div divu Instruction fields A(5) B(5) C(5) 0x31 (6) 0x0 (5) 0x3A (6) A(5) B(5) IMM16(16) 0x04 (6) A(5) B(5) C(5) 0x0E (6) 0x0 (5) 0x3A (6) A(5) B(5) IMM16(16) 0x2C (6) A(5) B(5) IMM16(16) 0x0C (6) A(5) B(5) IMM16(16) 0x26 (6) A(5) B(5) IMM16(16) 0x0E (6) A(5) B(5) IMM16(16) 0x2E (6) A(5) B(5) IMM16(16) 0x16 (6) A(5) B(5) IMM16(16) 0x36 (6) A(5) B(5) IMM16(16) 0x1E (6) 0x0 (5) 0x0 (5) IMM16(16) 0x06 (6) 0x0 (5) 0x0 (5) 0x1E (5) 0x34 (6) IMM5 (5) 0x3A (6) 0x1E (5) 0x0 (5) 0x1E (5) 0x09 (6) 0 (5) 0x3A (6) IMM26 (26) 0x0 (6) A (5) 0x0 (5) 0x1F (5) 0x1D (6) 0x0 (5) 0x3A (6) A (5) B (5) C (5) 0x20 (6) 0x0 (5) 0x3A (6) A (5) B (5) IMM16 (16) 0x20 (6) A (5) B (5) C (5) 0x08 (6) 0x0 (5) 0x3A (6) A (5) B (5) IMM16 (16) 0x08 (6) A (5) B (5) C (5) 0x28 (6) 0x0 (5) 0x3A (6) A (5) B (5) IMM16 (16) 0x28 (6) A (5) B (5) C (5) 0x10 (6) 0x0 (5) 0x3A (6) A (5) B (5) IMM16 (16) 0x10 (6) A (5) B (5) C (5) 0x30 (6) 0x0 (5) 0x3A (6) A (5) B (5) IMM16 (16) 0x30 (6) A (5) B (5) C (5) 0x18 (6) 0x0 (5) 0x3A (6) A (5) B (5) IMM16 (16) 0x18 (6) A(5) B(5) C(5) reada(1) readb(1) readc(1) N(8) 0x32 (6) A(5) B(5) C(5) 0x25(6) 0x0 (5) 0x3A (6) A(5) B(5) C(5) 0x24(6) 0x0 (5) 0x3A (6) 16 3.1. INSTRUCTION FIELDS eret flushd flushda flushi flushp initd initda initi jmp jmpi ldb ldbio ldbu ldbuio ldh ldhio ldhu ldhuio ldw ldwio mul muli mulxss mulxsu mulxuu nextpc nor or orhi ori rdctl rdprs ret rol roli ror sll slli sra srai srl srli stb stbio sth sthio stw stwio sub sync 0x1D(5) 0x1E(5) C(5) 0x01(6) 0x0 (5) 0x3A (6) A (5) 0x0 (5) IMM16 (16) 0x3B (6) A (5) 0x0 (5) IMM16 (16) 0x1B (6) A(5) 0x0(5) 0x0(5) 0x0C(6) 0x0 (5) 0x3A (6) A(5) 0x0(5) 0x0(5) 0x04(6) 0x0 (5) 0x3A (6) A (5) 0x0 (5) IMM16 (16) 0x33 (6) A (5) 0x0 (5) IMM16 (16) 0x13 (6) A(5) 0x0(5) 0x0(5) 0x29(6) 0x0 (5) 0x3A (6) A(5) 0x0(5) 0x0(5) 0x0D(6) 0x0 (5) 0x3A (6) IMM26 (26) 0x01 (6) A (5) B (5) IMM16 (16) 0x07 (6) A (5) B (5) IMM16 (16) 0x27 (6) A (5) B (5) IMM16 (16) 0x03 (6) A (5) B (5) IMM16 (16) 0x23 (6) A (5) B (5) IMM16 (16) 0x0F (6) A (5) B (5) IMM16 (16) 0x2F (6) A (5) B (5) IMM16 (16) 0x0B (6) A (5) B (5) IMM16 (16) 0x2B (6) A (5) B (5) IMM16 (16) 0x17 (6) A (5) B (5) IMM16 (16) 0x37 (6) A(5) B(5) C(5) 0x27(6) 0x0 (5) 0x3A (6) A (5) B (5) IMM16 (16) 0x24 (6) A(5) B(5) C(5) 0x1F(6) 0x0 (5) 0x3A (6) A(5) B(5) C(5) 0x17(6) 0x0 (5) 0x3A (6) A(5) B(5) C(5) 0x07(6) 0x0 (5) 0x3A (6) 0x0(5) 0x0(5) C(5) 0x1C(6) 0x0 (5) 0x3A (6) A(5) B(5) C(5) 0x06(6) 0x0 (5) 0x3A (6) A(5) B(5) C(5) 0x16(6) 0x0 (5) 0x3A (6) A (5) B (5) IMM16 (16) 0x34 (6) A (5) B (5) IMM16 (16) 0x14 (6) A(5) B(5) C(5) 0x26(6) N(5) 0x3A (6) A (5) B (5) IMM16 (16) 0x38 (6) 0x1F(5) 0x0(5) 0x0(5) 0x05(6) 0x0(5) 0x3A (6) A(5) B(5) C(5) 0x03(6) 0x0(5) 0x3A (6) A(5) 0x0(5) C(5) 0x02(6) IMM5(5) 0x3A (6) A(5) B(5) C(5) 0x0B(6) 0x0(5) 0x3A (6) A(5) B(5) C(5) 0x13(6) 0x0(5) 0x3A (6) A(5) 0x0(5) C(5) 0x12(6) IMM5(5) 0x3A (6) A(5) B(5) C(5) 0x3B(6) 0x0(5) 0x3A (6) A(5) 0x0(5) C(5) 0x3A(6) IMM5(5) 0x3A (6) A(5) B(5) C(5) 0x1B(6) 0x0(5) 0x3A (6) A(5) 0x0(5) C(5) 0x1A(6) IMM5(5) 0x3A (6) A (5) B (5) IMM16 (16) 0x05 (6) A (5) B (5) IMM16 (16) 0x25 (6) A (5) B (5) IMM16 (16) 0x0D (6) A (5) B (5) IMM16 (16) 0x2D (6) A (5) B (5) IMM16 (16) 0x15 (6) A (5) B (5) IMM16 (16) 0x35 (6) A(5) B(5) C(5) 0x39(6) 0x0(5) 0x3A (6) 0x0(5) 0x0(5) 0x0(5) 0x36(6) 0x0(5) 0x3A (6) 17 3.2. MACHINE CODE EXAMPLE trap wrctl wrprs xor xorhi xori 18 0x0(5) 0x0(5) 0x1D(5) 0x2D(6) IMM5(5) 0x3A (6) A(5) 0x0(5) 0x0(5) 0x2E(6) N(5) 0x3A (6) A(5) 0x0(5) C(5) 0x14(6) 0x0(5) 0x3A (6) A(5) B(5) C(5) 0x1E(6) 0x0(5) 0x3A (6) A (5) B (5) IMM16 (16) 0x3C (6) A (5) B (5) IMM16 (16) 0x1C (6) 3.2. Machine code example Let us take a sample code: START_TIMER = 0xF68C l a b e l = 5000 o r h i r8 , r0 , %h i a d j ( l a b e l ) a d d i r8 , r8 , %l o ( l a b e l ) s u b i r8 , r8 , 1 bne r8 , r0 , START_TIMER Now, the question is how to translate the four instructions to machine code. Note that it uses two assembler macros. (1) IMM16 for orhi is %hiadj(5000) which is 00000000000000000001001110001000 so the instruction is going to be 01000|00000|0000000000000000|110100. (2) now, IMM16 is 00000000000000000001001110001000 so the instruction is going to be 01000|01000|0001001110001000|000100. (3) subi is a pseudo-instruction implemented as addi with negative IMM16, so the instruction is going to be 01000|01000|1111111111111111|000100 (4) Finally, the last command is 01000|00000|1111011010001100|011110. The whole code is then 01000000000000000000000000110100 01000010000001001110001000000100 01000010001111111111111111000100 01000000001111011010001100011110