Section 1 2. (See Mano, pg 208, problem 6-2) The following is a list of instructions in hexadecimal code with their corresponding memory locations. PC is 100 at the start of the program. When this program is run, what are the locations and symbolic instructions that are executed, and what is the value in AC when the program halts? 100: 101: 102: 103: 104: 105: 106: 2105 7010 7200 3106 7001 CAB4 0 Instruction Location comment AC (Symbolic Op codes) load AC with the data in memory 105 100 LDA 105 M[105] AC <-- M[105] Skip if AC is positive If AC(15) = 0 then PC <-- PC+1 Complement AC Store AC data to memory at 106. M[106] <-- AC Halt the computer 101 SPA 102 CMA 103 STA 106 104 105 106 HLT BUN AND STA BUN 0000 Store sum here M[105] ̅̅̅̅̅̅̅̅̅̅ 𝑀[105] ̅̅̅̅̅̅̅̅̅̅ 𝑀[105] ̅̅̅̅̅̅̅̅̅̅ 𝑀[105] ̅̅̅̅̅̅̅̅̅̅ 𝑀[105] ̅̅̅̅̅̅̅̅̅̅ 𝑀[105] ̅̅̅̅̅̅̅̅̅̅ The value in AC when the program halts: 𝑀[105] 1. Addressing Modes. Create an example similar to the one Mano presents on pg 265, Figure 8-7, and the associated Table 8-4 on pg 266, using your own numbers. Addressing Mode Effective address Value of the operand loaded into the AC a) Direct Address PC+1=185+1=[186] 833 Immediate Address PC+1=185+1=186 300 PC+1=185+1=[[186]] 756 b) c) Indirect Address d) Relative Address 185+2+[185+1]=187+186=30D 256 e) Indexed Address 90+[185+1]=90+300=390 726 f) Register R1=216 216 g) Register Indirect [R1]=[216] 465 h) Autoincrement [R1]=[216] 465 i) Autodecrement [R1-1]=[215] 370 Registers Memory Address Memory 185 Load to AC | Mode 186 Address = 300 212 Next Instruction 215 370 216 465 300 756 30D 256 390 726 PC = 185 R1 = 216 XR = 90 AC You may find the following table helpful: http://sandsduchon.org/duchon/cs311/addressing.html 2. RPN practice. Fill in the following table. As before, do enough that you feel you have the hang of it. Infix notation RPN notation Tree notation Value - 15 - 10 15 10 - 15 10 5 / - (55 - 22) / 11 11 55 22 - 11 / 3 22 55 / 55 55 - 22 / 11 55 22 11 / - 53 11 22 / + + (14 + 13*2)/(3+5) 14 14 13 2 * + 3 5 + / * 3 13 5 5 2 42 + 13 (42 – (13 + (36 / (15 - 16 + 17)))) / + 36 42 13 36 15 16 17 + -/+- 15 17 16 31 + 42 ((42 + 13) - (36 + (15 /(16 – 17)))) + 13 36 42 13 + 36 15 16 17 -/+- / 15 34 16 17 You may find the following applet helpful: http://sandsduchon.org/duchon/math/trees.html 3. Instruction formats. Fill in some of the cells in the following table. Statement 3 address 2 address 1 address 0 address PUSH A SUB b POP x PUSH A PUSH B SUB POP X PUSH a SUB b DIV c POP x PUSH a PUSH b SUB POP a PUSH c DIV POP X x = (a-b*c)/e MULT b, c MULT b, b, c SUB a, b SUB a, a, b DIV a, e DIV x, a, e STA x, a PUSH b MULT c POP b PUSH a SUB b DIV e POP x PUSH b PUSH c MULT POP b PUSH a PUSH b SUB POP a PUSH a PUSH e DIV POP x x = (a*b-c*d)/(e-f) MULT a, a, b MULT a, b MULT c, c, d MULT c, d PUSH a MULT b PUSH a PUSH b x = a - b x = (a - b) / c SUB x,a,b SUB x, a, b DIV x, x, c SUB a,b STA x, a SUB a, b DIV a, c STA x, a SUB a, a, c SUB e, e, f DIV x, a, e SUB x, c SUB e, f DIV x, e STA x, a POP a PUSH c MULT d POP c PUSH a SUB c POP a PUSH e SUB f POP e PUSH a SUB e POP x You may find the following page helpful: http://sandsduchon.org/duchon/exTrees/ 1. (See Mano, pg 293, problem 8-12) Write a program to evaluate the following arithmetic statement: 𝑋= 𝐴 − 𝐵 + 𝐶 ∗ (𝐷 ∗ 𝐸 − 𝐹) 𝐺 +𝐻∗𝐾 Assume that the computer has the following arithmetic instructions: ADD SUB MULT DIV Using 3, 2, 1 and 0 address instructions a. b. c. d. Using a general register computer with 3-address instructions. Using a general register computer with 2-address instructions. Using an accumulator type computer with 1-address instructions. Using a stack organized computer with 0-address instructions. Hint: Converting the expression to RPN makes some of these programs easier. MULT POP a PUSH c PUSH d MULT POP c PUSH a PUSH c SUB POP a PUSH e PUSH f SUB POP e PUSH a PUSH e DIV POP x 3-address 2-address 1-address 0-address MULT D, D, E SUB D, D, F MULT C, C, D SUB A, A, B ADD A, A, C MULT H, H, K ADD G, G, H DIV X, A, G MULT D, E SUB D, F MULT C, D SUB A, B ADD A, C MULT H, K ADD G, H DIV A, G STA X, A PUSH H MULT K ADD G POP G PUSH D MULT E SUB F MULT C ADD B POP B PUSH A SUB B POP A DIV G POP X PUSH D PUSH E MULT PUSH F SUB PUSH C MULT PUSH B ADD POP B PUSH A PUSH B SUB POP A PUSH H PUSH K MULT PUSH G ADD POP G PUSH A PUSH G DIV POP X Conventions: a. b. c. d. 3-address: SUB R1,R2,R3 // R1 <-- R2 - R3 2-address SUB R1,R2 // R1 <-- R1 - R2 1-address SUB A // ACC <-- ACC - M[A] 0-address PUSH A // TOS <-- M[A] PUSH B // TOS <-- M[B] SUB // TOS <-- M[A] - M[B] 2. (See Mano, pp 265-266) A two-word instruction is stored at location 350 with its address field at location 351 (all numbers in hexadecimal). The first word of the instruction specifies the operation code and mode; the second word specifies the address part. The values of the program counter (PC register), a general register (R1), the index register (XR), the base register (BR), and certain addresses in memory are as shown below. Evaluate the effective address and the value that is loaded into the AC if the addressing mode of the instruction is: Addressing Mode Effective address Value of the operand loaded into the AC a) Direct Address PC+1=350+1=[351] 833 Immediate Address PC+1=350+1=351 400 c) Indirect Address PC+1=350+1=[[351]] 900 d) Relative Address 350+2+[350+1]=352+400=752 189 e) Indexed Address 125+[350+1]=125+400=525 735 f) Register R1=834 834 g) Register Indirect [R1]=[834] 950 h) Autoincrement [R1]=[834] 950 i) Autodecrement [R1-1]=[833] 900 b) Registers Memory Address Memory 350 Load to AC | Mode 351 Address = 400 PC = 350 352 Next Instruction R1 = 834 833 900 XR = 125 834 950 AC 400 833 900 456 752 189 525 735 Section 2 Find a reference to a processor that implements an instruction pipeline and describe some of its key features. You should say how many segments are used and whether the processor is CISC or RISC, among other issues. The ARM processor is a RISC processor which features a 5 stage, 4 segment pipeline. Pipeline description Advanced processing of instruction fetch and branch prediction - unblocks branch resolution from potential memory latency-induced instruction stalls. Up to four instruction cache line prefetch-pending - further reduces the impact of memory latency so as to maintain instruction delivery. Between two and four instructions per cycle forwarded continuously into instruction decode ensures efficient superscalar pipeline utilization. Fast-loop mode - provides low power operation while executing small loops. Superscalar decoder - capable of decoding two full instructions per cycle. Speculative execution of instructions - enabled by dynamic renaming of physical registers into an available pool of virtual registers. Increased pipeline utilization - removing data dependencies between adjacent instructions and reducing interrupt latency. Virtual renaming of registers - accelerating code through an effective hardware based unrolling of loops without the additional costs in code size and power consumption. Any of the four subsequent pipelines can select instructions from the issue queue - providing out of order dispatch further increasing pipeline utilization independent of developer or compiler (page 6) Reference: The ARM Cortex-A9 Processors, Whitepaper Be sure to include citations. Determine the number of clock cycles that it takes to process 500 tasks in a 8-segment pipeline. See Mano, pg 329, 9-3. k = 8 segments n = 500 tasks cycles = (k+n-1) = 8+500-1 = 507 cycles A non-pipeline system takes 50 ns to process a task. The same task can be processed in a 8-segment pipeline with a clock cycle of 10 ns. Determine the speedup ratio of the pipeline for 100 tasks. What is the maximum speedup that can be achieved? See Mano pg 329, 9-4 𝑆= 𝑛 ∗ 𝑡𝑛 100 ∗ 50 5000 = = = 4.67 (𝑘 + 𝑛 + 1)𝑡𝑝 (8 + 100 − 1) ∗ 10 1070 𝑆𝑚𝑎𝑥 = Section 3 Pick a number between 16 and 128 - call this A \ A = 112 Pick another number between 5 and 10 - call this T T=8 𝑡𝑛 50 = =5 𝑡𝑝 10 The numbers you select should be different from those selected by other students. Explain how a direct mapping cache would work with a computer using A bits for memory addressing and T bits for a direct mapping cache. The A bit memory address will be divided into two fields, T bits for the index field and A-T bits for the tag field. Given that A = 112 and T = 8, the index field is 8 bits and the tag field is 104 bits. The direct mapping cache organization uses the full 112 bit address to access the main memory and the 8 bit index to access the cache. How many words will the cache have? 28 = 256 words Ram sizing: o How many 256x8 RAM chips are needed to provide a memory capacity of 16,384 bytes? 256x8 RAM chips have 256 bytes per chip so 16384/256 = 64 RAM chips o How many lines of the address bus must be used to access 16,384 bytes of memory? How many of these lines will be common to all the chips? 14 bits are necessary to specify each byte of memory (214 = 16384), each chip has 256 1 byte addresses, so 8 lines will be common to all chips (2 8 = 256). o How many lines must be decoded for chip select? Specify the size of the decoders. 6 bits are necessary to specify the specific chips (2 6 = 64) A computer uses RAM chips of 1024x1 capacity o How many chips are needed, and how should their address lines be connected to provide a memory capacity of 4096 bytes? 1024x1 * 8 (to make each a byte) * 4 = 4096 bytes, so 8*4 = 32 chips o How many chips are needed to provide a memory capacity of 64K bytes? Explain in words how the chips are to be connected to the address bus. 4096 * 4 =16kB 32 chips (for 4096 bytes) * 4 = 128 chips to get 64kB Each chip must be connected to the address bus to take the lines necessary to determine which memory location is being addressed. The chips are connected in groups of eight, so that each of these groups form one word. This means that the design is repeated eight fold for the chips, such that for a given memory location each bit of the word is stored (as the chips are of x1 construction). How many tag bits are needed in a direct mapped cache of 1024 words in a computer with 1M words? It requires 20 bits to address 1M words, so the tag size would be 1024-20 = 1004.