Data Representation and Architecture Modelling

advertisement
Data Representation and
Architecture Modelling
Revision
Binary system
1.
Conversion
Convert decimal to binary
2. Convert binary to decimal and hexadecimal
1.
2. Integer representation
1. Unsigned notation
2. Signed notation
3. Excess notation
4. Tow’s complement
3. Advantages of using Two’s complement
Floating point representation
1.
What decimal floating point number is represented
by the following 32 bits (single precision format)?
Show your workings.
1 100 0011 1 000 1010 0000 0000 0000 0000
2. What is the range of negative numbers in this
representation
3. Define negative overflow and underflow in this
representation.
Solution
Method
1.






the sign-bit is one, negative number
biased exponent = 10000111 = 128 + 4 +2 + 1 = 135
The real exponent = 135-127= 8
the normalized mantissa = 000 1010 0000 0000 0000 0000.
the real mantissa = 1.000101
the final value represented = -(1.0001012) x 28 = 1000101002 = (256+16+ 4)= -276
Negative range: -(2-2-23)x 2127 to- 2-127
Negative overflow and underlflow
2.
3.


Negative over: value less than -(2-2-23)x 2127.
Negative underflow: 2-127 < value < 0.
CPU
 CPU registers:
 PC, IR, AC,MAR, MBR
 System bus
 Data bus, Address bus, and control bus
 Pipelining
 Role of pipelining
 Pipelining hazards (control hazards, data hazards, and
structural hazards)
 What is the disadvantage of using a very long stage
pipeline?
Exercise
Suppose you have designed a processor implementation whose five
pipeline stages take the following amounts of time:
IF(instruction fetch)=20ns,
ID (instruction decode)=10ns,
EX (execution)=20ns,
MEM (memory operation)=35ns and
WB (write back)=10ns.
(a) What is the minimum clock period for which your processor
functions properly?
(b) What should be redesigned first to improve this processors
performance?
(c) Assume this processor is redesigned with 50 pipeline stages. Is
it true to say that the new processor is 10 times faster than the
previous design with 5 pipeline stages?
solution
(a) The minimum clock period is the time of the longest
stage: stage MEM takes 35ns.
(b) The MEM should be redesigned to reduce the clock cycle.
(c) Probably not.
 Longer pipelines can be faster due to higher clock rates,
 unlikely that the clock rate is 10x faster due to uneven
pipeline stages and register overheads
 Furthermore, longer pipelines tend to make data and control
hazards require longer stalls.
 higher clock-rate processor is likely to be more power-
hungry proportional to the increase in clock-speed
Question 2
An instruction requires four stages to execute:
stage 1 (instruction fetch) requires 30 ns,
stage 2 (instruction decode) = 9 ns,
stage 3 (instruction execute) = 20 ns and
stage 4 (store results) = 10 ns.
An instruction must proceed through the stages in sequence.
1) What is the minimum asynchronous time for any single
instruction to complete?
2) We want to set this up as a pipelined operation. How
many stages should we have and at what rate should we
clock the pipeline?
Hints
1) The minimum time it takes to execute all the 4 stages
of an instruction.
 We have 4 natural stages given and no information on
how we might be able to further subdivide them, so we
use 4 stages in our pipeline.
 Clock rate?
 use the longest stage
 Or use a time that closely matches the shortest stage,
but integrally divisible into the other stages. DISCUSS
EACH CASE.
Question 3
 The pipeline for these instructions runs with a 100
MHz clock with the following stages:
 instruction fetch = 2 clocks,
 instruction decode = 1 clock,
 fetch operands = 1 clock,
 execute = 2 clocks, and
 store result = 1 clock.
HINTS FOR QUESTION 3
1) THE longest stage takes two cycle. Hence we need to
execute one instruction per 2 cycles. What is the rate
then?
2) The Operand Fetch unit must wait until the prior
instruction stores its result.
 before it can retrieve one of its operands (e.g. Op Fetch
for 2 must wait until Op Store for 1 completes). As
 a result, things begin backing up in the pipeline, and
we produce one instruction output only every 4 cycles.
No dependencies
Execute instruction every 2 cycles. Cock rate?
dependency
From the table we still begin fetching instructions every two cycles.
However the operand fetch for 2 instruction must wait until Op Store for
instruction 1 completes. (wait for another 2 cycles). Hence, the rate????
Memories
 CPU registers
 Cache memory
 Main memory (electronic memory)
 Magnetic memory (hard drive)
 Optical memory
 Magnetic tape
Cache memory
 Cache memory enhances computer performance
using:
 Temporal locality principle
 Spatial locality principle
 Cache mapping
 Associative Mapped Cache
 Direct-Mapped Cache
 Set-Associative Mapped Cache
Why is cache memory needed?
 CPU slowed down by the main memory
 When a program references a memory location, it is
likely to reference that same memory location again
soon.
 A memory location that is near a recently referenced
location is more likely to be referenced than a memory
location that is far away.
Cache memory
 Resides between the CPU and the main memory
 Operates at a speed near to that of the CPU
 Data is exchanged between CPU and main memory
through the cache memory
 Cache memory use locality principles to enhances
computer performance.
 Temporal locality principle
 Spatial locality principle
Temporal locality principle
 When a program references a memory location, it is
likely to reference that same memory location again
soon.
 Cache memory keeps records of data recently being
used.
Spatial locality principle
 A memory location that is near a recently referenced
location is more likely to be referenced than a memory
location that is far away.
 Cache memory copies not only the recently referenced
memory locations but also its nearby.
Cache mapping
Commonly used methods:
 Associative Mapped Cache
 Direct-Mapped Cache
 Set-Associative Mapped Cache
Associative Mapped Cache
 Any main memory blocks can be mapped into each
cache slot.
 To keep track of which of the 227 possible blocks is in
each slot, a 27-bit tag field is added to each slot.
Associative Mapped Cache
 Valid bit is needed to indicate whether or not the slot
holds a line that belongs to the program being
executed.
 Dirty bit keeps track of whether or not a line has been
modified while it is in the cache.
Associative Mapped Cache
 The mapping from main memory blocks to cache slots
is performed by partitioning an address into fields.
 For each slot, if the valid bit is 1, then the tag field of
the referenced address is compared with the tag field
of the slot.
Associative Mapped Cache
 How an access to the memory location (A035F014)16 is
mapped to the cache.
 If the addressed word is in the cache, it will be found
in word (14)16 of a slot that has a tag of (501AF80)16 ,
which is made up of the 27 most significant bits of the
address.
Associative Mapped Cache
Advantages
 Any main memory block can be placed into any cache
slot.
 Regardless of how irregular the data and program
references are, if a slot is available for the block, it can
be stored in the cache.
Associative Mapped Cache
Disadvantages
 Considerable hardware overhead needed for cache
bookkeeping.
 There must be a mechanism for searching the tag
memory in parallel.
Direct-Mapped Cache
 Each cache slot corresponds to an explicit set of main
memory.
 In our example we have 227 memory blocks and 214
cache slots.
 A total of 227 / 214 = 213 main memory blocks can be
mapped onto each cache slot.
Direct-Mapped Cache
 The 32-bit main memory address is partitioned into a
13-bit tag field, followed by a 14-bit slot field, followed
by a five-bit word field.
Direct-Mapped Cache
 When a reference is made to the main memory
address, the slot field identifies in which of the 214
slots the block will be found.
 If the valid bit is 1, then the tag field of the referenced
address is compared with the tag field of the slot.
Direct-Mapped Cache
 How an access to memory location (A035F014)16 is
mapped to the cache.
 If the addressed word is in the cache, it will be found
in word (14)16 of slot (2F80)16 which will have a tag of
(1406)16.
Direct-Mapped Cache
Advantages
 Simple and inexpensive
 The tag memory is much smaller than in associative
mapped cache.
 No need for an associative search, since the slot field is
used to direct the comparison to a single field.
Direct-Mapped Cache
Disadvantages
 Fixed location for a given memory block.
 If a program accesses 2 blocks that map to the same
line repeatedly, caches misses are very high.
Set-Associative Mapped Cache
 Combines the simplicity of direct mapping with the
flexibility of associative mapping
 For this example, two slots make up a set. Since there
are 214 slots in the cache, there are 214/2 =213 sets.
Set-Associative Mapped Cache
 When an address is mapped to a set, the direct
mapping scheme is used, and then associative
mapping is used within a set.
Set-Associative Mapped Cache
 The format for an address has 13 bits in the set field,
which identifies the set in which the addressed word
will be found. Five bits are used for the word field and
14-bit tag field.
Typical exam question
 Explain the difference between direct mapped cache
and associative mapped cache.
 Explain how cache memory uses temporal and spatial
locality principles to enhance computers
performance.
Web languages (html,xml, xhtml)
 Difference between these languages
 Disadvantages of using html
 How does XHTML solve these problems
 Advantages of CSS
 Difference between HTML selector, CLASS selectors
and ID selectors
 htlm selector:
h{
bgcolor:green;
color: red;
font-weight: bold;
}
 Class selector:
.section {
color: red;
font-weight: bold;
}
 ID selector:
#section{
color: red;
font-weight: bold;
}
 An ID selector applies styles to an element in the same way as a class.
The main difference between an ID selector and a class is that an ID can be
used only once on each page, whereas a class can be used many times.
Computer networks
 Network classes and default mask
 TCP/IP model (internet model)
 The role of each layer
 Example of protocols at each layer and there role.
 TCP vs UDP
 How is error and flow control achieved? Layer responsible for
this?
 Subnetting
 Role of subnetting
 Subnet address
 Host address
 Broadcast address
 Range of addresses in a subnet
Exercise
 Given a host configuration with an IP address
192.158.15.33 and a subnet mask 255.255.255.248:
 What is the subnet address?
 What is the host address?
 What is the broadcast address?
 What is the number of possible hosts and range of host
addresses in this subnet?
Solution
 192.168.10.32
 0.0.0.1
 192.168.10.39
 The number if bits for the host is 3 and therefore the
number if hosts allowed in in this subnet is 23-2=6
 The range of address is 192.168.10.33 - 192.168.10.38.
Exam




Duration 1:30 hours
3 questions: 30 minutes each
Time : May
Preparation:
 Past exam papers
 Revise all the questions given in two assignments
 Consult revision slides
 Concentrate on the preparation list
 Attempt the Mock exam on my website
 Next week mock exam
Fin
Good Luck
Download