Uploaded by kshreesh5

quiz2 with solutions

advertisement
Quiz2 for Section B,C and RGIT
Subject – Computer Organization and Architecture (COA)
Answer all questions
Full Marks : 30 [6+8+8+8]
Time : 1hr
Q1. Consider a direct mapped cache with 16 cache lines, indexed 0 to 15, where each cache line can
contain 32 integers (block size : 128 bytes). Consider a two-dimensional, 32* 32 array of integers a . This
array is laid out in memory so that a [0;0] is next to a [0;1], and so on. Assume the cache is initially empty,
but that a [0;0] maps to the first word of cache line 0. Consider the following column-first traversal:
int sum = 0;
for (int i = 0; i < 32; i++) {
for( int j=0; j < 32; j++) {
sum += a[i,j]; }}
and the following row-first traversal:
int sum = 0;
for (int i = 0; i < 32; i++) {
for( int j=0; j < 32; j++) {
sum += a[j,i]; }}
Compare the number of cache misses produced by the two traversals, assuming the oldest cache line is
evicted first. Assume that i,j, and sum are stored in registers. Assume that no part of array , a, is saved in
registers. It is always stored in the cache.
Ans:
Number of cache misses in column first traversal = 32. Miss rate = 3.1%.
Number of cache misses in row rst traversal = 32* 32 = 1024. Miss rate = 100%.
Q2. Pipelined arithmetic units are found in high speed computers. They are used to implement floating point
operations. Floating point operations can be divided into sub-operations which can be performed by
different segments of the pipeline. For example, the floating point adder can be divided into the following
four sub-operations that are performed by four segments:
i. Compare the exponents; ii. Align the mantissas; iii. Add or Subtract the mantissas; iv. Normalize the result
a. Construct the pipeline for computing the floating point addition of 100 pair of numbers.
b. The time delay for the four segments are : t1= 50ns , t2 = 30ns , t3 = 95ns , t4 = 45ns. The interface
register delay tr = 5ns. How long would it take to add the 100 pair of numbers using the pipelined floating
point adder?
Ans:
(a)
(b) The clock cycle time for the pipeline is the cycle time of the segment taking the longest time
i.e. Segment 3
Therefore, Clock cycle = 95 + 5 = 100 ns (time for segment 3)
For n = 100, k = 4, tp = 100 ns.
Time to add 100 numbers = (k + n – 1) tp =(4 + 99) 100 = 10,300 ns = 10.3 μs
Q3. (a) Draw the circuit diagram for the instruction fetch unit in the Simple RISC processor
Ans :
Fetch unit
32
32
1
isBranchTaken
branchPC
0
4
pc
32
32
Instruction
memory
1
0
inst
triggered by a negative
clock edge
1 - input 1
0 - input 0
Multiplexer
control signal
(b) Show the microcode implementation of the load or store instructions.
Ans:
.mbegin
mloadIR
mdecode
madd pc, 4
mswitch
/* transfer rs1 to register A*/
mmov regSrc, rs1, <read>
mmov A, regval
/* calculate the effective address*/
mmov B,immx, <add> /*ALU operation*/
/* perform the load */
mmov mar, alurResult, <load>
/* write the loaded value to the register file */
mmov regData, ldResult
mmov regSrc, rd, <write>
/* jump to the beginning */
.begin
(c) Write an ARM assembly program to find out if a number is prime using a recursive algorithm
Ans :
l1:
mod r5, r3, r4
cmp r5, #0
addeq r2, r2, #1
add r4, r4, #1
cmp r4, r6
ble l1
mov pc, lr
prime:
mov r7, #0
mov r3, r0
mov r6, r0, lsr #1
mov r4, #1
mov r2, #0
bl l1
cmp r2, #1
moveq r7, #1
Q4. (a) Which addressing modes are preferable in a machine with very few registers?
Ans : Immediate addressing , Register base addressing and PC relative addressing.
(b) Write a program in Simple RISC assembly to convert an integer stored in memory from the little
end-ian to the big end-ian format.
Ans: Assume the integer (0X12345678) stored in little endian format is stored at location 0X1234
The register r1 contains 0X1234 which points to the location storing the integer
ld r0,[r1]
/* load the contents of location pointed by r1 into r0. Contents of r0 will be 0X12345678
*/
lsl r2,r0,24 /* left shift r0 by 24 positions to obtain the 78 in most significant 8 bits; r2= 0X78000000
*/
lsr r3,r0,24 /* right shift r0 by 24 positions to obtain 12 in the least significant position; r3=0X00000012 */
mov r6,0Xff00
mov r7,0Xff0000
and r4,r0,r6
lsl r4,r4,8
/* Extract 56 from r0 by anding the number with 0X0000ff00
/*
Move the extracted 56 to left by 8 bits so that r4 = 0X00560000
and r5,r0,r7
/* Extract 34 from r0 by anding the number with 0X00ff0000 */
lsr r5,r5,8
/* Move the extracted 56 to left by 8 bits so that r4 = 0X00003400
add r3,r3,r5 /* Add the contents of r3 and r5 */
add r2,r2,r4 /* Add the contents of r2 and r4 */
add r2,r2,r3 /* Add the contents of r2 and r3 ; r2 contains 0X78563412 */
st r2,[r1] /* Store the contents of r2 in location r1 which is in big endian format */
*/
*/
*/
(c) Differentiate between polling and interrupt driven I/O scheme of data transfer. Assume that for a single
polling operation, a processor running at 1 MHz takes 200 cycles. A processor polls a printer 1000 times per
minute. What percentage of time does the processor spend in polling?
Ans: In polling mechanism, CPU periodically checks to see if it needs service from any I/O device. Polling
takes CPU time even when no requests are pending. This overhead may be reduced at expense of response
time.
While in interrupt based scheme, instead of the CPU checking I/O requests periodically, the device signals
the processor when it needs to send a request. Each device uses a wire (interrupt line) to signal the
processor. When interrupt is signaled, the processor stops the normal flow of the instruction execution and
executes a routine called an interrupt handler to deal with the interrupt. Interrupts are asynchronous with
respect to the current program being executed.
The “request” for the CPU to execute the interrupt handler could come from several sources:
• External hardware devices. Common example is pressing on the key on the keyboard, which causes to the
keyboard to send interrupt to the microcontroller to read the information of the pressed key.
• The processor can send interrupts to itself as a result of executing the program, to report an error in the
code. For example, division by 0 will causes an interrupt.
• In the multi-processor system, the processors can send to each other interrupts as a way to communicate.
Interrupt Handler / Interrupt Service Routine (ISR)
For every interrupt, there is a fixed location in memory that holds the address of its ISR. The group of
memory locations set aside to hold the addresses of ISRs is called the interrupt vector table. You don’t have
to know exact locations of these vectors. Compiler does this for you. When timing is important for CPU to
react or when it should detect signal from outside world that occurs relatively rear but lasts for very short
interval than interrupt is better solution.
Let’s take example if it should detect pulse lasting for 1ms and it appears once in 10s at random timing. If we
use polling method we would have to check every 500us for example for this pulse so we don’t miss it. But
if we use interrupt detection ISR would trigger itself and execute this only once at the moment when pulse
occurs. In this case interrupt method is much more efficient.
(c) Printer Polling Clocks/sec = 200 * 1000/60 = 3333.33 clocks/sec
% Processor time spent for polling = (3333.33/1*106 ) *100 = 0.33%
------------------------------------------------------------------------------------------------------------------------
Download