[EEE2007] NEWCASTLE UNIVERSITY _______________________ 2018-19 _______________________ COMPUTER SYSTEMS AND MICROPROCESSORS Time allowed - THREE hours Candidates should attempt FOUR questions in total, including at least ONE question from each section. The one question in section A carries 40% of the marks; questions in sections B and C carry 20% each. Marks shown in subsections are indicative only [Turn Over] Page 1 of 6 [EEE2007] Section A The question in this section is compulsory. A1 a) A processor's level 1 (L1) cache has a miss ratio of 40%. When an instruction/data is found in the L1 cache, the average cycles per instruction (CPI) is 5; however, when the instruction/data is not found, the average CPI is 20. What is the typical CPI of the processor? If the processor has a clock frequency of 1GHz, how many instructions will be effectively processed per second? [5 marks] b) Write a C++ class, Coordinate, with the following properties: (i) private: floating point types for coordinates x, y and z; (ii) private: floating point types for calibration x_cal, y_cal, z_cal; (iii) public: 3 methods to read each calibrated coordinate: for example, x = x + x_cal; y = y + y_cal; and z = z+ z_cal; (iv) public: a method to read the normalization of current coordinates from origin. [8 marks] c) Explain the 3 key stages in processor execution flow. [4 marks] d) Discuss the scalability of 3 major interconnect types in modern computing systems. [3 marks] e) Sketch the waveform diagram for a read cycle on a typical asynchronous microprocessor bus. Show with a circuit diagram how you might attach a typical RAM device to such a bus. [5 marks] f) Using any assembly-like language, write a polling routine that transfers data from a peripheral device to successive locations in memory. With an execution rate of 10 million instructions / sec (MIPS), calculate the maximum throughput of this routine. [5 marks] g) Using an assembly-like language, write an interrupt service routine that transfers data from a peripheral device to a fixed location in memory. With an execution rate of 10 million instructions / sec (MIPS), and assuming an interrupt response latency equivalent to the execution time of 3 instructions, calculate the maximum throughput. [5 marks] h) What is the difference between a three-state output and a standard logic output with pull-up and pull-down transistors? Under what circumstances would a three-state output be used? [5 marks] Page 2 of 6 [EEE2007] Section B Attempt at least ONE question from this section. B1 a) Draw the memory hierarchy in a microprocessor and discuss the tradeoffs. [5 marks] b) Explain the WRITE operation in a non-volatile flash memory cell. [4 marks] c) A software algorithm is compiled for both RISC and CISC processors. The RISC version generated 29,659 instructions, while CISC version produced 19,678 instructions. If the average cycles per instruction (CPI) for RISC and CISC processors are 1.78 and 1.11 respectively, which processor can execute the algorithm faster, given similar operating frequencies? If the average active energy per cycle is 2.35 uJ and 7.15 uJ for RISC and CISC processors respectively, which processor consumes more energy? [7 marks] d) Describe the two major packet routing algorithms used in Networks on Chip. [4 marks] B2 a) What are the differences between Harvard and von Neumann architectures? [4 marks] b) A computer designer is considering interconnect options for 4 processors sharing the main memory. What advantages and disadvantages would be expected if a multi-layered shared bus is used rather than a single layer shared bus? [5 marks] c) What is a cache coherence algorithm? What is the impact of cache sizing on this algorithm? [5 marks] d) Machine A has a dual port DRAM and features the same clock for both pipelined (with 4 pipeline stages) and unpipelined implementation. Machine B has a single port DRAM, but its pipelined implementation has a 1.05 times slower clock rate and a pipeline depth of 5. Considering ideal cycles per instruction (CPI) of 1 for both, what are their comparative speed ups when load instructions are 40% of the total instructions executed? [6 marks] [Turn Over] Page 3 of 6 [EEE2007] Section C Attempt at least ONE question from this section. C1 a) Explain how a chip enable (CE) signal would normally be used within an address decoding arrangement. [5 marks] b) Devise an address map for a microprocessor system containing the following devices. Give your answer in hexadecimal. A 32 kbyte flash memory; Two separate 32 kbyte blocks of random access memory; 4 I/O registers, each having a chip enable, occupying a byte of address space each. [5 marks] c) The following is an extract from the data sheet for a 2-4 decoder. Show how the map you have designed in part (b) might be implemented using one or more of these devices. 2 - 4 DECODER X0 A1 X1 A0 X2 EN X3 EN 1 0 0 0 0 A1 x 0 0 1 1 A0 x 0 1 0 1 Truth Table X3 1 1 1 1 0 X2 1 1 1 0 1 X1 1 1 0 1 1 X0 1 0 1 1 1 [5 marks] Page 4 of 6 [EEE2007] d) The following is another extract from the data sheet for this decoder. Parameter EN to any X Any A to any X Switching Characteristics Min (ns) 2 2 Max (ns) 5 7 What is the maximum time between the processor issuing an address and the chip enable becoming active at (i) one of the memories and (ii) one of the registers? [5 marks] [Turn Over] Page 5 of 6 [EEE2007] C2 A digital signal processor contains an on-chip timer, an ADC and a DAC. It takes samples of an incoming audio signal at a rate of 100,000 samples / sec, processes the samples, and outputs the processed results at the same rate. The samples are held in one of three arrays a, b and c, each of size 10,000. The arrays rotate their function, so that while one is receiving input samples, another is being processed, and the third is supplying processed output samples. The processing is programmed in C and is identical for all three arrays, as shown below for array a. Note that the processing for all three arrays shares an additional array v, also of size 10,000. 1 2 3 4 5 6 5 7 int a[10000], v[10000]; int i, x; a) Observe the one-bit right shift operation in line 6. What effect will this have on the value of the variable being shifted? [3 marks] b) Briefly describe the overall function of this audio processing system. Quantify your answer wherever possible. [3 marks] c) Discuss whether the input and output interfaces would be better programmed with polled or interrupt driven techniques, and state your conclusion as to which you would use. [4 marks] d) Using any typical assembly-like language, show the equivalent of the code above. [5 marks] e) Neglecting the processing required for the input and output operations, calculate, in millions of instructions / sec (MIPS), the minimum processing speed required for correct operation of this system. [5 marks] while (i = 0, i < 10000, i++) { x = a[i]; a[i] = a[i] + (v[i] >> 1); v[i] = x; } [END OF PAPER] Page 6 of 6