CS1104 Tutorial 1 – UNOFFICIAL ANSWERS

advertisement
CS1104 Tutorial 1 – UNOFFICIAL ANSWERS
Prepared by : Colin Tan ctank@comp.nus.edu.sg
Tutorial 1 Part 1
The general idea of this tutorial is to give you some familiarity with the various terms
in computer architecture. Specifically you should be familiar with what machine
organization and instruction set architecture are, how they are related and how they
differ.
We will now go through each question in this set, and examine why each MCQ option
is the right or wrong answer. The answers that I give here are heavy, but you do not
need to know them the first time round. However they are handy to keep around as we
venture onward through CS1104.
Introductory Ideas
Computer architecture is a study of 2 major fields: Computer (or machine)
organization, and instruction set architecture.
In the CS1103 course last semester we studied how digital logic gates can be
combined to form hardware units like adders, subtractors and comparators.
Unfortunately we stopped short of showing why we would want to build these things.
Computer Organization is the field of study that looks at how hardware units
(functional units) may be combined to solve a particular problem, in the most
efficient manner possible. By efficient, we mean that a particular problem should be
solved in the fastest manner possible. In Computer Organization we will learn how to
connect up these functional units together, and how data may be efficiently carried
from one unit to another.
Having built our hardware units, our next challenge is how do we make use of these
units to solve a computational (its obvious that a computer cannot iron your clothes
for you) problem. The way we go about doing this is to specify the steps that we need
to take in order to solve these problems. Consider this problem (expressed in C):
x=a+b
How do we get our functional units to load up the data from variables a and b, and
how to we get the adder to add these pieces of data together? Way back in the 1940s
the way to do this would be to use little toggle switches (like the ones you used in
your CS1103 labs) to enter the numbers into registers, then manually connect (using
wires, just like CS1103 lab) the registers to an adder, then manually connect the
outputs of the adder to another register.
Thankfully today we don’t need to do this anymore. Instead computer-organization
designers arrange the functional units in a standard way, together with the pathways
between these units for data to flow from one unit to another (e.g. from the registers to
the adder and back). Little switches that channel data from one functional unit to
Page-1
another, as you will learn, control the pathways. We, the programmers, are given an
instruction set to control these pathways that route data from registers, through the
functional unit we want to use (e.g. the multiplier) and back to other registers.
Instruction Set Architecture refers to the design of this instruction set. There are 4
main aspects of ISA design:
i)
Instruction Types
Given a processor that has functional units for adding, subtracting,
comparing and manipulating data, we need to be able to tell the
processor how to route data to the units that we want, and how to route
the results to the desired destination. The ISA design would specify
what instructions the programmer is to be given in order to do this.
Some of the more common instruction types include:
(a) Load/Store Instructions
This class of instructions deals with the movement of data from
memory to registers, and vice-versa. E.g.:
lw R1, 0x01032 ; Load data in memory location 0x01032 to register
R1
sw 0x0132, R2; Store the data in register R2 to memory location
0x0132
(b) Arithmetic Instructions
This class of instructions deal with the normal arithmetic operations
like adding, subtracting etc. E.g.:
add R1, R2, R3; Add the numbers stored in registers R2 and R3
and store the results in register R1
(c) Bit-wise Operators
This is a curious set of instructions that deal with the bits that make up
the data. E.g.:
rol R1, 16; Rotate the bits in the number in register R1 to the right
by 16 positions. (See CS1103 Lecture 13 if you cannot
remember what a rotate operation is)
(d) Branching Instructions
This class of instructions allow us to jump to a different part of a
program in response to certain conditions. E.g.:
beq R1, label2; Jump to the portion of program labelled “label2” if
register R1 contains a ‘0’
Page-2
ii)
Instruction Formats
So far we have seen that instructions allow us to specify where data should
come from, which functional units should deal with the data, and where the
results should be sent too. In practice, how exactly do we specify all of this?
The Instruction Format aspect of ISA design deals with this: given an
instruction, which portion of the instruction tells the processor what to do (e.g.
to add? To subtract?), and which portion tells the processor where to fetch the
data from and where to send the results to? The diagram below shows an
example of an instruction format:
opcode
source 1
source 2
destination
The opcode field tells the processor which functional unit to use (e.g. the
adder), where to get the data from (source 1 and source 2) and where to store
the data (destination).
iii)
Data Formats
In CS1103 we learnt about integer number types (2’s complement, 1’s
complement, sign+magnitude) and floating point number formats. When we
design an ISA we must specify what types of data an instruction can operate
on. This comes under the data formats aspect of ISA.
iv)
Addressing Modes
There are several ways in which we can access data to be operated on. Some
examples are given here, and you will see more of this in future tutorials and
lectures:
i)
Direct Addressing
In Direct Addressing, the address in memory (for now just assume that
memory is a huge pool where data is stored, and each piece of data is
given an identifier, called its address) where the data is to be fetched
from is specified directly in the instruction. E.g.:
lw R12, 0x0134;
Here the processor is told to load up the data at memory address
0x0134 (the “0x” portion of the address 0x0134 means that “0134” is
specified in hexadecimal).
ii)
Immediate Addressing
In immediate addressing, the data to be operated on (i.e. the operand)
is given directly in the instruction. E.g.:
add R1, #0001, #0002; R1 = 1 + 2
Page-3
iii)
Register Indirect Addressing
This is a slightly complicated form of addressing, where the address of
a piece of data is put inside another register. E.g.:
lw R12, [R1]
Here register R1 contains the address of the data to be fetched.
Addressing Modes specify which of these ways of accessing data is to be
supported by the ISA.
Assembly Language vs. Machine Language
Remember from CS1103 that computers can only understand numbers. This is why
we have things like the “ASCII” codes that allow us to map text to numbers and back.
Likewise all instructions to the processor must be in the form of numbers. Take the
following instruction sequence for example:
lw R1, 0x0134
lw R2, 0x0154
add R1, R1, R2
sw 0x155, R2
Processor instructions given in human-readable form like this is called “assembly”.
This is not usable directly by the computer. An assembler must be used to convert this
sequence of assembly codes to machine codes. For example, suppose the lw, sw and
add instructions are represented by the numbers 03, 05 and 06 respectively (this
mapping to numbers is defined by the assembler and ISA), while registers R1 and R2
are represented by the numbers 01 and 02 respectively. Then the above assembly
language program will be translated to this machine language program:
03 01 0134
03 02 0154
05 01 0102
06 02 0155
This sequence of numbers can be understood directly by the machine. That’s why its
called “machine code”.
Exercise 1 Question 1
By definition, computer architecture is the study of computer (machine) organization
and instruction set architecture. Hence the only correct option is (d). The study of
programming languages (option b) is termed “comparative languages” and does not
come under computer architecture.
Page-4
Exercise 1 Question 2
As was mentioned earlier, the instruction set architecture (ISA) provides programmers
with an interface to the hardware functional units. Hence the ISA is an important
interface between Application software and hardware organization.
Let’s look at the other options and see why they are wrong:
The interface between (a) Digital Circuit and Data Path Control does not, as far as I
know exist. The interface between (b) Application software and operating systems is
the Application Programmer’s Interface or API. In (d), there is no interface between a
compiler and a programming language. The programming language defines the
compilers. Saying that there is an interface between a compiler and a programming
language makes as much sense as saying that there is an interface between a circle
and the concept of a round-shape. For (e), the interface between a high level
programming language and assembly language is the compiler.
Exercise II question 3
Based on the definition of ISA, the only correct option is (b) Compilation of a high
level C language program into the machine language program. It never makes sense to
include the definition of a high-level language in the ISA design, as this will tie the
processor to a particular high-level language.
Exercise II question 4
Option (a) is incorrect as ISA deals only with the instructions themselves, and hence
hardware design and implementation does not come under ISA. Hardware
implementation deals with how to build adders, how to build subtractors, how to
control data flow etc. This is clearly irrelevant to instruction set design.
Option (b) is incorrect. As mentioned earlier, the interface between assembly and
machine language is the assembler, not the ISA.
Option (c) is correct. In general, the same generation of processors from the same
manufacturer will have the same ISA. However, it is important to remember that
different generations of processors will have different ISAs even if they are from the
same manufacturer (e.g. Pentium II vs. Pentium III).
(d) is wrong, by the definition of computer organization and computer architecture.
Page-5
Exercise III question 5
Option (a) is clearly wrong. To see why:
Consider a sign-and-magnitude number. S+Mag numbers have this format:
Sign
Magnitude
Supposing that this number is sitting in memory. Any processor that agrees that an
S+Mag number looks like the diagram above will be able to make sense of this
number in memory. The ISA does not necessarily have to be the same.
Option (b) is also wrong. The Pentium II 450 MHz and Pentium II 500 MHz in
question (4) is a counter-example. Both have the same ISA but different performance.
Option (c) is correct. Let’s take the machine language codes we saw earlier:
03 01 0134
03 02 0154
05 01 0102
06 02 0155
In our particular example, the first 2 lines are load instructions, the 3rd line is an add
instruction, while the 4th line is a store instruction (see the “Assembly Language vs.
Machine Language” section above). If we were to bring this to another processor with
a different ISA, the numbers would mean completely different things, and the
program will not execute correctly.
Option (d) is again incorrect. The ISA has an influence over the hardware
implementation, but with each ISA we can have a huge range of choices of how to
implement the hardware. It would be incorrect to say that given a particular ISA, this
ISA defines the hardware implementation that we should use.
Exercise IV question 6
This statement is false. To see why:
Suppose we have designed an ISA. We might have several different computer
organizations to support this ISA. The diagram below shows how this is possible:
Instruction Set Architecture
Pipeline Organization
Superscalar
Organization
Page-6
Scoreboard
Superscalar
Here we see a single ISA design being supported by 3 possible machine
organizations. The hardware implementation for each organization will obviously be
different. Hence if we switch between, say, the pipeline organization and the
scoreboard superscalar organization, we are changing the hardware machine
organization. However since both organizations support the same ISA, the ISA itself
is not changed. Hence this statement is false.
Exercise IV question 7
Option (a) is incorrect. The specification of high-level languages like C and JAVA is
called the “language specification” (duuuuh). Again language specification is kept
separate from ISA specification, so that a machine’s architecture will not be tied to a
particular language. There is an exception to this: Java machines are built specifically
to run JAVA byte-codes, and are hence built around the JAVA language specification.
Option (b) is incorrect. This comes under the “compiler specifications”. Again it will
not make sense to include compiler specifications in ISA specs.
Option (c) comes under the “instruction type” perspective of ISA design, and is hence
a part of ISA design.
Option (d) comes under the “data formats” and “addressing modes” aspects of ISA
design, and is hence a part of ISA design.
Option (e) comes under machine organization, since we are dealing with hardware
units. Hence option (e) is not a part of ISA design.
Page-7
Tutorial 1 Part 2
Question 1
In this question all 4 factors will affect program execution time. To see why, we will
look at each factor in turn:
i)
Compiler Technology:
Compiler technology plays an important role in program execution time.
Depending on how “clever” the compiler is, it can:
a) Expand and re-arrange loop constructs for efficient execution.
b) Inline short functions into main procedures. E.g.
function f(x)
{
x = x + 2;
}
main()
{
int y;
f(y);
}
The function call f(y) is replaced directly, and main becomes:
main()
{
int y;
y = y + 2;
}
This saves execution cycles, as function calls incur overheads such as
state saving, saving the Program Counter (see later lectures) onto the
stack etc. By “inlining” the function call, we have eliminated it
completely.
(c) Re-schedule machine instructions to prevent pipeline stalls (to
be covered in Dr. Kato’s section), thus lowering the overall
CPI.
(d) Choose the best mix of instructions to minimize overall CPI.
All of these strategies will help to shorten execution time.
ii)
Instruction Set Architecture:
ISA design can affect execution time for 2 reasons:
Page-8
a) The ISA restricts the scope of optimization that can be done by the
compiler. For example, if the ISA does not support floating-point
operations (this was common in the old Intel architectures like the
8086, 8088, 80186, 80286 and 80386, as well as the 80486SX
processors), then the compiler has to introduce its own floatingpoint operations, which are often slow and inefficient.
b) The ISA affects the complexity of the hardware organization. E.g.
with a Reduced Instruction Set Computer (RISC) architecture
where the ISA is highly simplified, control units that coordinate
execution can be made simple and fast. Complex Instruction Set
Computer (CISC) architecture with complex ISA designs require
highly inefficient “micro-coded” control units that are many times
slower than the simple control units.
iii)
Integration circuit (IC) chip technology:
Depending on how sophisticated the IC technology is, it may be possible to
build functional units very close together. This reduces signal propagation
time (i.e. the time taken for a digital signal to move from one functional unit to
another), allowing for faster clock rates. Certain substrates (e.g. gallium
arsenide or GaAs) allow signals to propagate more rapidly, and this again
allows for faster clock rates.
Note that IC technology will not affect the overall CPI of instructions executed
on the processor. This is because CPI is affected mainly by hardware
organization and not by the IC technology.
iv)
Application Software:
In poorly written software work is often duplicated (e.g. two loops to achieve
something that a single loop can achieve), resulting in slow execution. Other
problems include excessive reading and writing to disk, resulting in poor
performance. Hence software design is extremely important in maximizing
execution speeds.
The correct answer is therefore (d) All of the above.
Question 2
In this question, we examine how we can improve execution time.
Option i) is wrong, as combining simple instructions into complex ones will not
always result in good performance. To understand this, let’s look at this example.
Suppose we wanted to add two numbers stored in memory locations with
addresses 0x0001 and 0x0003, and wish to store the results in address 0x0005
(for now, just take “memory” as a huge pool where data is stored, and the data
may be retrieved by providing “addresses”). Suppose we have a very simple
Page-9
instruction set that consists only of instructions to load data from memory to
register and vice versa, and arithmetic operations that can only operate on
registers. So our program would look like this:
lw r1, 0x0001;
lw r2, 0x0003;
add r1, r1, r2;
sw 0x0005, r1;
Load up data from memory location 0x0001
Load up data from memory location 0x0003
Add r1+r2 and store the result back in r1
Store result back in location 0x0005.
If each instruction takes 5 cycles, then this entire program will take 20 cycles
to execute.
Suppose we introduce a new instruction addi that allows arithmetic operations
on data still in memory. Then we can simplify the entire program to just:
addi 0x0005, 0x0001, 0x0003
This may look simpler and faster. However we must remember that additional
hardware must be put in place to support this new instructions. Extra clock
cycles may be needed to compute addresses and fetch data. Extra pipeline
penalties (to be covered in Dr. Kato’s part) may be incurred, and all in all the
CPI of this instruction may add up to be more than the 20 cycles needed for
the previous program. Hence it would be cheaper to use the simpler
instructions than the complex one.
Option (ii) is correct if and only if we assume a ceteris paribus condition. That is,
we assume that when clock speed is improved, CPI and all other factors remain
unchanged. This is, in general, not possible. Improving clock rate often but not always
causes CPI to go up.
Option (iii) is incorrect. As we had seen in option (i) a shorter sequence of code is not
always a faster sequence. In fact modern compilers often give users a choice between
generating the smallest code versus generating the fastest code.
The correct answer is therefore (a).
Question 3
This question examines your understanding of the CPI formula. To start off, let’s look
at how the overall CPI is defined.
Overall CPI is, by definition, the average number of cycles required by each class of
instruction in a program. Therefore:
CPI 
Cp
Np
Where Cp is the total number of clock cycles used by the program, and Np is the total
number of instructions in the program. Let us assume then that this particular
Page-10
processor has 4 instruction classes A, B, C, and D. Each class has average CPIs of
CPIA, CPIB, CPIC and CPID. This means that every instruction in class x will take an
average of CPIx clock cycles to complete. Let us further suppose that in a particular
program, there are fA class A instructions, fB class B instructions, fC class C
instructions and fD class D instructions. Cp is then trivial to compute:
Cp = fACPIA + fBCPIB + fCCPIC + fDCPID
CPI is therefore:
CPI 
f
fA
f
f
CPI A  B CPI B  C CPI C  D CPI D
NP
NP
NP
NP
EQ1
The ratios fA/Np, fB/Np etc. Are often expressed as percentages or fractions instead, as
we have seen in tutorial 2.
With this EQ1 in mind, lets look at each of the options in turn.
Option (a) is incorrect. As EQ1 shows, the important thing that affects the overall CPI
(assuming that the class CPIs are constant) is the ratio fx/Np – i.e. the ratio of the
instruction counts for class x against the total number of instructions. To decrease the
overall CPI the ratio of fast instructions must be greater than the ratio of slow
instructions. Reducing the number of instructions executed will not in general
guarantee this.
Option (b) is incorrect. It is basically saying that:
CPI = CPIA + CPIB + CPIC + CPID
This is obviously different from EQ1 and is therefore incorrect.
Option (c) is similar to option (b) and is also incorrect:
CPI = (CPIA + CPIB + CPIC + CPID)/4
Option (d) is incorrect. The CPI of a given program is determined solely by the ISA,
compiler technology and in some ways by the hardware organization. Increasing the
clock rate may require hardware organization changes that may increase CPI
(remember that hardware organization does affect CPI). Therefore in general
increasing clock rate will not decrease CPI.
The correct answer is therefore (e) none of the above.
Question 4
Assuming all other hardware and software parameters remain the same, the total
number of instructions (i.e. the instruction count) executed depends entire on the ISA
design and on the compiler.
Option (a) is therefore correct, that changing the ISA will affect instruction count.
Page-11
Option (b) is incorrect. Only the compiler and the ISA affect instruction count, and
CPI does not play a part in determining it.
Option (c) is identical to (b) and is therefore incorrect.
Option (d) is again wrong, as CPI does not affect instruction count.
Option (e) is incorrect since option (a) is correct.
Question 5
The word “always” plays a very big part in this question. This means that each option
must always, without exception, improve the performance of a program in terms of
execution time. We now look at each option in turn:
(i)
Changing the instruction set architecture from a complex one to a simple one:
In question 2 above we saw that changing from a simple ISA to a complex one
may in fact increase execution time. Unfortunately the converse is also true;
changing from a complex ISA to a simple one may also increase execution
time.
Simple instructions are generally faster than complex ones. However we
would need many more simple instructions to serve the same function as a
complex one, and the total number of cycles required by these many simple
instructions may exceed that required by the complex instruction, resulting in
poor performance.
Hence this option is incorrect.
(ii)
Changing from a simple ISA to a complex one. This option is again incorrect.
See question 2 for details.
(iii)
Changing the compiler so that a lower overall CPI value (and different
program code sequence) is obtained.
(Note: I had given the wrong answer during the tutorial, as I had
misinterpreted the idea behind “lower overall CPI”)
Having a lower overall CPI value may not necessarily improve program
execution time if a different program code sequence is obtained. To
understand why, let’s look at the relationship between CPI and execution time:
Texec= CPI x IC x Tcycle
EQ2
Where CPI is the overall CPI, IC is the instruction count, and Tcycle is the time
in seconds taken by each clock cycle. Tcycle is a constant and is the reciprocal
of the processor clock rate. By lowering the CPI, Texec should go down.
Unfortunately a different program code sequence may also result in IC
increasing. If IC increases faster than CPI decreases, Texec will go up, and the
program takes longer to execute.
Page-12
Thus this option is incorrect.
Since all 3 options are incorrect, the only correct answer is (e) None of the above.
Question 6
Which of the following affect the overall CPI of a given ISA?
(i)
(ii)
(iii)
Examining EQ1 in question 3 above, it is obvious that if the CPI of individual
classes change, the overall CPI will also be changed. Hence this option is
correct.
Examining EQ1 again, it can be seen that if the number of instructions in a
program is changed, it is possible that the ratios fx/Np may change (here fx is
the number of instructions in a program belonging to class x, and Np is the
total instruction count). This will result in a change in the CPI. This option is
therefore correct.
The overall CPI of a program depends on the ISA, the compiler and the
individual class ISAs. The clock speed does not play a part (unless there is a
change in clock speed resulting in a change in organization). Hence this option
is incorrect.
From here, we see that the correct answer is (d).
Question 7
A program takes 5 seconds to execute on MM1 and 10 seconds on MM2. Which of
the following MUST be true?
Option (a) is untrue. If MM1 can execute 10,000,000 instructions per second, while
MM2 can execute 2,000,000 instructions per second, then it is possible that while the
program on MM1 has twice as many instructions as MM2, it will still execute faster
on MM1.
Option (b) is not always true. If the clock rate on MM1 is 4 times faster than the clock
rate on MM2, then it is possible that the overall CPI on MM1 is twice as much as
MM2, but yet the program on MM1 executes twice as fast.
Option (c) is not always true. If the clock speed of MM1 is half that of MM2, the
execution time on MM1 can still be half of the execution time on MM2 if the overall
CPI on MM1 is four times smaller than the overall CPI on MM2.
Since (a), (b) and (c) are all false, (d) must be false, leaving only (e) as the correct
answer.
Question 8
For this question, we assume that if we changed one factor (e.g. # of required cycles),
all other factors remain constant. Without this assumption, the question cannot be
done. E.g., will decreasing the # of instructions executed increase the overall CPI, and
subsequently the # of required cycles? We don’t know this for sure.
Page-13
To improve the performance of a program, you would:
(a) Decrease the required # of cycles. As it can be from EQ2, if the # of cycles
(given by CPI x IC) decreases, Texec will also decrease. This assumes that
IC remains constant.
(b) Decrease the # of instructions executed. From EQ2, we see that if we
decrease IC, Texec decreases, assuming CPI remains constant.
(c) Decrease the cycle time. This can be seen from EQ2.
(d) Since the cycle time is the reciprocal of the clock rate, we can decrease
execution time by increasing the clock rate, since this will result in a drop
in the cycle time. Note however that here we must assume that CPI
remains constant in order to answer this question.
Question 9
In this question, we explore what happens when we change certain factors in the
program execution time.
For [1], if we change the instruction distribution from A10%, B:20%, C:30% and
D:40% to A:40%, B:30%, C:20% and D:10%, the overall CPI will go down. This can
be seen from the fact that the instructions get successively slower (i.e. require more
clock cycles) going from class A to class D. The new distribution of instructions
result in a larger number of faster instructions, and thus the overall CPI goes down.
Assuming that the instruction count IC remains constant, and that Tcycle remains
unchanged (it should, unless there was a change in processor clock frequency), then
the execution time must go down.
For [2], it is trivial to prove from EQ1 that the CPI will be double the original overall
CPI. Since the overall CPI is increased, if we assume that IC remains constant, the
program execution time will also increase.
Question 10
State whether the following statements are true or false:
Option (1) is false: ISA is not concerned with the circuit design of microprocessors. It
is concerned with the design of the processor instruction set, not the circuitry.
Option (2) is false. A higher clock rate does not translate to a faster processor. It is
possible the PC may have a clock rate of 400MHz but an overall CPI of 3.0, while the
Sun450 system may have a clock rate of 350 MHz but an overall CPI of 1.0. It is
easily proven using EQ2 that the PC is slower, assuming fixed IC.
Option (3) is false. When a program is submitted for execution, it may not execute
immediately. The operating system may set aside the program and do more important
work first. Only when the more important work is done will the submitted program be
executed. Execution time is between the time the program starts to execute, and the
time it finishes executing.
Page-14
Question 11
This question is similar to Question 9, and can be tackled the same way.
[1] If we had equal distribution (i.e. 25%) across all the classes, then there will be
more instructions from the faster classes A and B, and fewer instructions from the
slower classes C and D. This will result in a lower overall CPI. Assuming that the
instruction count IC remains the same, the program execution time will also decrease.
[2] In general doubling the clock rate often affects the overall CPI. However since this
has not yet been covered in the lectures, we will assume that overall CPI is affected
only by compiler technology and ISA design. So for this option we will assume that
overall CPI remains constant. Since clock rate does not affect instruction count, IC
remains constant. The Tcycle timing will now be halved (Tcycle being the reciprocal of
the clock rate), and the execution time will decrease.
Question 12
In this question we explore the effects on overall CPI, instruction count and clock
speed if a different compiler or a new hardware implementation is used.
[1] Different compiler: The compiler affects the relative distribution of instruction
classes (i.e. it affects the Frequency), causing the overall CPI to change. It also affects
the number of instructions produced, thus changing the instruction count. Finally,
clock rate is determined completely by the hardware organization, and the compiler
will not affect the CPU clock speed.
[2] A new hardware implementation is likely to change the CPI of each individual
class of instructions, thus affecting the overall CPI (see EQ1). The instruction count is
completely determined by the ISA and the compiler, and since neither are changed the
instruction count remains constant. Finally the clock speed is affected by the hardware
implementation, and will also change.
Page-15
Download