CS1104 Tutorial 1 – UNOFFICIAL ANSWERS Prepared by : Colin Tan ctank@comp.nus.edu.sg Tutorial 1 Part 1 The general idea of this tutorial is to give you some familiarity with the various terms in computer architecture. Specifically you should be familiar with what machine organization and instruction set architecture are, how they are related and how they differ. We will now go through each question in this set, and examine why each MCQ option is the right or wrong answer. The answers that I give here are heavy, but you do not need to know them the first time round. However they are handy to keep around as we venture onward through CS1104. Introductory Ideas Computer architecture is a study of 2 major fields: Computer (or machine) organization, and instruction set architecture. In the CS1103 course last semester we studied how digital logic gates can be combined to form hardware units like adders, subtractors and comparators. Unfortunately we stopped short of showing why we would want to build these things. Computer Organization is the field of study that looks at how hardware units (functional units) may be combined to solve a particular problem, in the most efficient manner possible. By efficient, we mean that a particular problem should be solved in the fastest manner possible. In Computer Organization we will learn how to connect up these functional units together, and how data may be efficiently carried from one unit to another. Having built our hardware units, our next challenge is how do we make use of these units to solve a computational (its obvious that a computer cannot iron your clothes for you) problem. The way we go about doing this is to specify the steps that we need to take in order to solve these problems. Consider this problem (expressed in C): x=a+b How do we get our functional units to load up the data from variables a and b, and how to we get the adder to add these pieces of data together? Way back in the 1940s the way to do this would be to use little toggle switches (like the ones you used in your CS1103 labs) to enter the numbers into registers, then manually connect (using wires, just like CS1103 lab) the registers to an adder, then manually connect the outputs of the adder to another register. Thankfully today we don’t need to do this anymore. Instead computer-organization designers arrange the functional units in a standard way, together with the pathways between these units for data to flow from one unit to another (e.g. from the registers to the adder and back). Little switches that channel data from one functional unit to Page-1 another, as you will learn, control the pathways. We, the programmers, are given an instruction set to control these pathways that route data from registers, through the functional unit we want to use (e.g. the multiplier) and back to other registers. Instruction Set Architecture refers to the design of this instruction set. There are 4 main aspects of ISA design: i) Instruction Types Given a processor that has functional units for adding, subtracting, comparing and manipulating data, we need to be able to tell the processor how to route data to the units that we want, and how to route the results to the desired destination. The ISA design would specify what instructions the programmer is to be given in order to do this. Some of the more common instruction types include: (a) Load/Store Instructions This class of instructions deals with the movement of data from memory to registers, and vice-versa. E.g.: lw R1, 0x01032 ; Load data in memory location 0x01032 to register R1 sw 0x0132, R2; Store the data in register R2 to memory location 0x0132 (b) Arithmetic Instructions This class of instructions deal with the normal arithmetic operations like adding, subtracting etc. E.g.: add R1, R2, R3; Add the numbers stored in registers R2 and R3 and store the results in register R1 (c) Bit-wise Operators This is a curious set of instructions that deal with the bits that make up the data. E.g.: rol R1, 16; Rotate the bits in the number in register R1 to the right by 16 positions. (See CS1103 Lecture 13 if you cannot remember what a rotate operation is) (d) Branching Instructions This class of instructions allow us to jump to a different part of a program in response to certain conditions. E.g.: beq R1, label2; Jump to the portion of program labelled “label2” if register R1 contains a ‘0’ Page-2 ii) Instruction Formats So far we have seen that instructions allow us to specify where data should come from, which functional units should deal with the data, and where the results should be sent too. In practice, how exactly do we specify all of this? The Instruction Format aspect of ISA design deals with this: given an instruction, which portion of the instruction tells the processor what to do (e.g. to add? To subtract?), and which portion tells the processor where to fetch the data from and where to send the results to? The diagram below shows an example of an instruction format: opcode source 1 source 2 destination The opcode field tells the processor which functional unit to use (e.g. the adder), where to get the data from (source 1 and source 2) and where to store the data (destination). iii) Data Formats In CS1103 we learnt about integer number types (2’s complement, 1’s complement, sign+magnitude) and floating point number formats. When we design an ISA we must specify what types of data an instruction can operate on. This comes under the data formats aspect of ISA. iv) Addressing Modes There are several ways in which we can access data to be operated on. Some examples are given here, and you will see more of this in future tutorials and lectures: i) Direct Addressing In Direct Addressing, the address in memory (for now just assume that memory is a huge pool where data is stored, and each piece of data is given an identifier, called its address) where the data is to be fetched from is specified directly in the instruction. E.g.: lw R12, 0x0134; Here the processor is told to load up the data at memory address 0x0134 (the “0x” portion of the address 0x0134 means that “0134” is specified in hexadecimal). ii) Immediate Addressing In immediate addressing, the data to be operated on (i.e. the operand) is given directly in the instruction. E.g.: add R1, #0001, #0002; R1 = 1 + 2 Page-3 iii) Register Indirect Addressing This is a slightly complicated form of addressing, where the address of a piece of data is put inside another register. E.g.: lw R12, [R1] Here register R1 contains the address of the data to be fetched. Addressing Modes specify which of these ways of accessing data is to be supported by the ISA. Assembly Language vs. Machine Language Remember from CS1103 that computers can only understand numbers. This is why we have things like the “ASCII” codes that allow us to map text to numbers and back. Likewise all instructions to the processor must be in the form of numbers. Take the following instruction sequence for example: lw R1, 0x0134 lw R2, 0x0154 add R1, R1, R2 sw 0x155, R2 Processor instructions given in human-readable form like this is called “assembly”. This is not usable directly by the computer. An assembler must be used to convert this sequence of assembly codes to machine codes. For example, suppose the lw, sw and add instructions are represented by the numbers 03, 05 and 06 respectively (this mapping to numbers is defined by the assembler and ISA), while registers R1 and R2 are represented by the numbers 01 and 02 respectively. Then the above assembly language program will be translated to this machine language program: 03 01 0134 03 02 0154 05 01 0102 06 02 0155 This sequence of numbers can be understood directly by the machine. That’s why its called “machine code”. Exercise 1 Question 1 By definition, computer architecture is the study of computer (machine) organization and instruction set architecture. Hence the only correct option is (d). The study of programming languages (option b) is termed “comparative languages” and does not come under computer architecture. Page-4 Exercise 1 Question 2 As was mentioned earlier, the instruction set architecture (ISA) provides programmers with an interface to the hardware functional units. Hence the ISA is an important interface between Application software and hardware organization. Let’s look at the other options and see why they are wrong: The interface between (a) Digital Circuit and Data Path Control does not, as far as I know exist. The interface between (b) Application software and operating systems is the Application Programmer’s Interface or API. In (d), there is no interface between a compiler and a programming language. The programming language defines the compilers. Saying that there is an interface between a compiler and a programming language makes as much sense as saying that there is an interface between a circle and the concept of a round-shape. For (e), the interface between a high level programming language and assembly language is the compiler. Exercise II question 3 Based on the definition of ISA, the only correct option is (b) Compilation of a high level C language program into the machine language program. It never makes sense to include the definition of a high-level language in the ISA design, as this will tie the processor to a particular high-level language. Exercise II question 4 Option (a) is incorrect as ISA deals only with the instructions themselves, and hence hardware design and implementation does not come under ISA. Hardware implementation deals with how to build adders, how to build subtractors, how to control data flow etc. This is clearly irrelevant to instruction set design. Option (b) is incorrect. As mentioned earlier, the interface between assembly and machine language is the assembler, not the ISA. Option (c) is correct. In general, the same generation of processors from the same manufacturer will have the same ISA. However, it is important to remember that different generations of processors will have different ISAs even if they are from the same manufacturer (e.g. Pentium II vs. Pentium III). (d) is wrong, by the definition of computer organization and computer architecture. Page-5 Exercise III question 5 Option (a) is clearly wrong. To see why: Consider a sign-and-magnitude number. S+Mag numbers have this format: Sign Magnitude Supposing that this number is sitting in memory. Any processor that agrees that an S+Mag number looks like the diagram above will be able to make sense of this number in memory. The ISA does not necessarily have to be the same. Option (b) is also wrong. The Pentium II 450 MHz and Pentium II 500 MHz in question (4) is a counter-example. Both have the same ISA but different performance. Option (c) is correct. Let’s take the machine language codes we saw earlier: 03 01 0134 03 02 0154 05 01 0102 06 02 0155 In our particular example, the first 2 lines are load instructions, the 3rd line is an add instruction, while the 4th line is a store instruction (see the “Assembly Language vs. Machine Language” section above). If we were to bring this to another processor with a different ISA, the numbers would mean completely different things, and the program will not execute correctly. Option (d) is again incorrect. The ISA has an influence over the hardware implementation, but with each ISA we can have a huge range of choices of how to implement the hardware. It would be incorrect to say that given a particular ISA, this ISA defines the hardware implementation that we should use. Exercise IV question 6 This statement is false. To see why: Suppose we have designed an ISA. We might have several different computer organizations to support this ISA. The diagram below shows how this is possible: Instruction Set Architecture Pipeline Organization Superscalar Organization Page-6 Scoreboard Superscalar Here we see a single ISA design being supported by 3 possible machine organizations. The hardware implementation for each organization will obviously be different. Hence if we switch between, say, the pipeline organization and the scoreboard superscalar organization, we are changing the hardware machine organization. However since both organizations support the same ISA, the ISA itself is not changed. Hence this statement is false. Exercise IV question 7 Option (a) is incorrect. The specification of high-level languages like C and JAVA is called the “language specification” (duuuuh). Again language specification is kept separate from ISA specification, so that a machine’s architecture will not be tied to a particular language. There is an exception to this: Java machines are built specifically to run JAVA byte-codes, and are hence built around the JAVA language specification. Option (b) is incorrect. This comes under the “compiler specifications”. Again it will not make sense to include compiler specifications in ISA specs. Option (c) comes under the “instruction type” perspective of ISA design, and is hence a part of ISA design. Option (d) comes under the “data formats” and “addressing modes” aspects of ISA design, and is hence a part of ISA design. Option (e) comes under machine organization, since we are dealing with hardware units. Hence option (e) is not a part of ISA design. Page-7 Tutorial 1 Part 2 Question 1 In this question all 4 factors will affect program execution time. To see why, we will look at each factor in turn: i) Compiler Technology: Compiler technology plays an important role in program execution time. Depending on how “clever” the compiler is, it can: a) Expand and re-arrange loop constructs for efficient execution. b) Inline short functions into main procedures. E.g. function f(x) { x = x + 2; } main() { int y; f(y); } The function call f(y) is replaced directly, and main becomes: main() { int y; y = y + 2; } This saves execution cycles, as function calls incur overheads such as state saving, saving the Program Counter (see later lectures) onto the stack etc. By “inlining” the function call, we have eliminated it completely. (c) Re-schedule machine instructions to prevent pipeline stalls (to be covered in Dr. Kato’s section), thus lowering the overall CPI. (d) Choose the best mix of instructions to minimize overall CPI. All of these strategies will help to shorten execution time. ii) Instruction Set Architecture: ISA design can affect execution time for 2 reasons: Page-8 a) The ISA restricts the scope of optimization that can be done by the compiler. For example, if the ISA does not support floating-point operations (this was common in the old Intel architectures like the 8086, 8088, 80186, 80286 and 80386, as well as the 80486SX processors), then the compiler has to introduce its own floatingpoint operations, which are often slow and inefficient. b) The ISA affects the complexity of the hardware organization. E.g. with a Reduced Instruction Set Computer (RISC) architecture where the ISA is highly simplified, control units that coordinate execution can be made simple and fast. Complex Instruction Set Computer (CISC) architecture with complex ISA designs require highly inefficient “micro-coded” control units that are many times slower than the simple control units. iii) Integration circuit (IC) chip technology: Depending on how sophisticated the IC technology is, it may be possible to build functional units very close together. This reduces signal propagation time (i.e. the time taken for a digital signal to move from one functional unit to another), allowing for faster clock rates. Certain substrates (e.g. gallium arsenide or GaAs) allow signals to propagate more rapidly, and this again allows for faster clock rates. Note that IC technology will not affect the overall CPI of instructions executed on the processor. This is because CPI is affected mainly by hardware organization and not by the IC technology. iv) Application Software: In poorly written software work is often duplicated (e.g. two loops to achieve something that a single loop can achieve), resulting in slow execution. Other problems include excessive reading and writing to disk, resulting in poor performance. Hence software design is extremely important in maximizing execution speeds. The correct answer is therefore (d) All of the above. Question 2 In this question, we examine how we can improve execution time. Option i) is wrong, as combining simple instructions into complex ones will not always result in good performance. To understand this, let’s look at this example. Suppose we wanted to add two numbers stored in memory locations with addresses 0x0001 and 0x0003, and wish to store the results in address 0x0005 (for now, just take “memory” as a huge pool where data is stored, and the data may be retrieved by providing “addresses”). Suppose we have a very simple Page-9 instruction set that consists only of instructions to load data from memory to register and vice versa, and arithmetic operations that can only operate on registers. So our program would look like this: lw r1, 0x0001; lw r2, 0x0003; add r1, r1, r2; sw 0x0005, r1; Load up data from memory location 0x0001 Load up data from memory location 0x0003 Add r1+r2 and store the result back in r1 Store result back in location 0x0005. If each instruction takes 5 cycles, then this entire program will take 20 cycles to execute. Suppose we introduce a new instruction addi that allows arithmetic operations on data still in memory. Then we can simplify the entire program to just: addi 0x0005, 0x0001, 0x0003 This may look simpler and faster. However we must remember that additional hardware must be put in place to support this new instructions. Extra clock cycles may be needed to compute addresses and fetch data. Extra pipeline penalties (to be covered in Dr. Kato’s part) may be incurred, and all in all the CPI of this instruction may add up to be more than the 20 cycles needed for the previous program. Hence it would be cheaper to use the simpler instructions than the complex one. Option (ii) is correct if and only if we assume a ceteris paribus condition. That is, we assume that when clock speed is improved, CPI and all other factors remain unchanged. This is, in general, not possible. Improving clock rate often but not always causes CPI to go up. Option (iii) is incorrect. As we had seen in option (i) a shorter sequence of code is not always a faster sequence. In fact modern compilers often give users a choice between generating the smallest code versus generating the fastest code. The correct answer is therefore (a). Question 3 This question examines your understanding of the CPI formula. To start off, let’s look at how the overall CPI is defined. Overall CPI is, by definition, the average number of cycles required by each class of instruction in a program. Therefore: CPI Cp Np Where Cp is the total number of clock cycles used by the program, and Np is the total number of instructions in the program. Let us assume then that this particular Page-10 processor has 4 instruction classes A, B, C, and D. Each class has average CPIs of CPIA, CPIB, CPIC and CPID. This means that every instruction in class x will take an average of CPIx clock cycles to complete. Let us further suppose that in a particular program, there are fA class A instructions, fB class B instructions, fC class C instructions and fD class D instructions. Cp is then trivial to compute: Cp = fACPIA + fBCPIB + fCCPIC + fDCPID CPI is therefore: CPI f fA f f CPI A B CPI B C CPI C D CPI D NP NP NP NP EQ1 The ratios fA/Np, fB/Np etc. Are often expressed as percentages or fractions instead, as we have seen in tutorial 2. With this EQ1 in mind, lets look at each of the options in turn. Option (a) is incorrect. As EQ1 shows, the important thing that affects the overall CPI (assuming that the class CPIs are constant) is the ratio fx/Np – i.e. the ratio of the instruction counts for class x against the total number of instructions. To decrease the overall CPI the ratio of fast instructions must be greater than the ratio of slow instructions. Reducing the number of instructions executed will not in general guarantee this. Option (b) is incorrect. It is basically saying that: CPI = CPIA + CPIB + CPIC + CPID This is obviously different from EQ1 and is therefore incorrect. Option (c) is similar to option (b) and is also incorrect: CPI = (CPIA + CPIB + CPIC + CPID)/4 Option (d) is incorrect. The CPI of a given program is determined solely by the ISA, compiler technology and in some ways by the hardware organization. Increasing the clock rate may require hardware organization changes that may increase CPI (remember that hardware organization does affect CPI). Therefore in general increasing clock rate will not decrease CPI. The correct answer is therefore (e) none of the above. Question 4 Assuming all other hardware and software parameters remain the same, the total number of instructions (i.e. the instruction count) executed depends entire on the ISA design and on the compiler. Option (a) is therefore correct, that changing the ISA will affect instruction count. Page-11 Option (b) is incorrect. Only the compiler and the ISA affect instruction count, and CPI does not play a part in determining it. Option (c) is identical to (b) and is therefore incorrect. Option (d) is again wrong, as CPI does not affect instruction count. Option (e) is incorrect since option (a) is correct. Question 5 The word “always” plays a very big part in this question. This means that each option must always, without exception, improve the performance of a program in terms of execution time. We now look at each option in turn: (i) Changing the instruction set architecture from a complex one to a simple one: In question 2 above we saw that changing from a simple ISA to a complex one may in fact increase execution time. Unfortunately the converse is also true; changing from a complex ISA to a simple one may also increase execution time. Simple instructions are generally faster than complex ones. However we would need many more simple instructions to serve the same function as a complex one, and the total number of cycles required by these many simple instructions may exceed that required by the complex instruction, resulting in poor performance. Hence this option is incorrect. (ii) Changing from a simple ISA to a complex one. This option is again incorrect. See question 2 for details. (iii) Changing the compiler so that a lower overall CPI value (and different program code sequence) is obtained. (Note: I had given the wrong answer during the tutorial, as I had misinterpreted the idea behind “lower overall CPI”) Having a lower overall CPI value may not necessarily improve program execution time if a different program code sequence is obtained. To understand why, let’s look at the relationship between CPI and execution time: Texec= CPI x IC x Tcycle EQ2 Where CPI is the overall CPI, IC is the instruction count, and Tcycle is the time in seconds taken by each clock cycle. Tcycle is a constant and is the reciprocal of the processor clock rate. By lowering the CPI, Texec should go down. Unfortunately a different program code sequence may also result in IC increasing. If IC increases faster than CPI decreases, Texec will go up, and the program takes longer to execute. Page-12 Thus this option is incorrect. Since all 3 options are incorrect, the only correct answer is (e) None of the above. Question 6 Which of the following affect the overall CPI of a given ISA? (i) (ii) (iii) Examining EQ1 in question 3 above, it is obvious that if the CPI of individual classes change, the overall CPI will also be changed. Hence this option is correct. Examining EQ1 again, it can be seen that if the number of instructions in a program is changed, it is possible that the ratios fx/Np may change (here fx is the number of instructions in a program belonging to class x, and Np is the total instruction count). This will result in a change in the CPI. This option is therefore correct. The overall CPI of a program depends on the ISA, the compiler and the individual class ISAs. The clock speed does not play a part (unless there is a change in clock speed resulting in a change in organization). Hence this option is incorrect. From here, we see that the correct answer is (d). Question 7 A program takes 5 seconds to execute on MM1 and 10 seconds on MM2. Which of the following MUST be true? Option (a) is untrue. If MM1 can execute 10,000,000 instructions per second, while MM2 can execute 2,000,000 instructions per second, then it is possible that while the program on MM1 has twice as many instructions as MM2, it will still execute faster on MM1. Option (b) is not always true. If the clock rate on MM1 is 4 times faster than the clock rate on MM2, then it is possible that the overall CPI on MM1 is twice as much as MM2, but yet the program on MM1 executes twice as fast. Option (c) is not always true. If the clock speed of MM1 is half that of MM2, the execution time on MM1 can still be half of the execution time on MM2 if the overall CPI on MM1 is four times smaller than the overall CPI on MM2. Since (a), (b) and (c) are all false, (d) must be false, leaving only (e) as the correct answer. Question 8 For this question, we assume that if we changed one factor (e.g. # of required cycles), all other factors remain constant. Without this assumption, the question cannot be done. E.g., will decreasing the # of instructions executed increase the overall CPI, and subsequently the # of required cycles? We don’t know this for sure. Page-13 To improve the performance of a program, you would: (a) Decrease the required # of cycles. As it can be from EQ2, if the # of cycles (given by CPI x IC) decreases, Texec will also decrease. This assumes that IC remains constant. (b) Decrease the # of instructions executed. From EQ2, we see that if we decrease IC, Texec decreases, assuming CPI remains constant. (c) Decrease the cycle time. This can be seen from EQ2. (d) Since the cycle time is the reciprocal of the clock rate, we can decrease execution time by increasing the clock rate, since this will result in a drop in the cycle time. Note however that here we must assume that CPI remains constant in order to answer this question. Question 9 In this question, we explore what happens when we change certain factors in the program execution time. For [1], if we change the instruction distribution from A10%, B:20%, C:30% and D:40% to A:40%, B:30%, C:20% and D:10%, the overall CPI will go down. This can be seen from the fact that the instructions get successively slower (i.e. require more clock cycles) going from class A to class D. The new distribution of instructions result in a larger number of faster instructions, and thus the overall CPI goes down. Assuming that the instruction count IC remains constant, and that Tcycle remains unchanged (it should, unless there was a change in processor clock frequency), then the execution time must go down. For [2], it is trivial to prove from EQ1 that the CPI will be double the original overall CPI. Since the overall CPI is increased, if we assume that IC remains constant, the program execution time will also increase. Question 10 State whether the following statements are true or false: Option (1) is false: ISA is not concerned with the circuit design of microprocessors. It is concerned with the design of the processor instruction set, not the circuitry. Option (2) is false. A higher clock rate does not translate to a faster processor. It is possible the PC may have a clock rate of 400MHz but an overall CPI of 3.0, while the Sun450 system may have a clock rate of 350 MHz but an overall CPI of 1.0. It is easily proven using EQ2 that the PC is slower, assuming fixed IC. Option (3) is false. When a program is submitted for execution, it may not execute immediately. The operating system may set aside the program and do more important work first. Only when the more important work is done will the submitted program be executed. Execution time is between the time the program starts to execute, and the time it finishes executing. Page-14 Question 11 This question is similar to Question 9, and can be tackled the same way. [1] If we had equal distribution (i.e. 25%) across all the classes, then there will be more instructions from the faster classes A and B, and fewer instructions from the slower classes C and D. This will result in a lower overall CPI. Assuming that the instruction count IC remains the same, the program execution time will also decrease. [2] In general doubling the clock rate often affects the overall CPI. However since this has not yet been covered in the lectures, we will assume that overall CPI is affected only by compiler technology and ISA design. So for this option we will assume that overall CPI remains constant. Since clock rate does not affect instruction count, IC remains constant. The Tcycle timing will now be halved (Tcycle being the reciprocal of the clock rate), and the execution time will decrease. Question 12 In this question we explore the effects on overall CPI, instruction count and clock speed if a different compiler or a new hardware implementation is used. [1] Different compiler: The compiler affects the relative distribution of instruction classes (i.e. it affects the Frequency), causing the overall CPI to change. It also affects the number of instructions produced, thus changing the instruction count. Finally, clock rate is determined completely by the hardware organization, and the compiler will not affect the CPU clock speed. [2] A new hardware implementation is likely to change the CPI of each individual class of instructions, thus affecting the overall CPI (see EQ1). The instruction count is completely determined by the ISA and the compiler, and since neither are changed the instruction count remains constant. Finally the clock speed is affected by the hardware implementation, and will also change. Page-15