Tutorial 8 Longer Solution

advertisement
1.
The Test-and-Set instruction puts 1 into a memory location after reading out
its content first, while the very similar Exchange instruction puts the the value
in a register into a memory location after reading out its content, and puts that
into the register. Both swap something in CPU with something in memory.
a.
List the bus operations needed to complete a TAS/EXG instruction.
TAS
data = read(address)
wait
test data
data = 1
write(address, data)
wait
; Memory latency.
; Test operation.
EXG
data = read(address)
wait
write(address, reg)
wait
reg = data
b.
A split transaction bus may allow another TAS/EXG to use the bus
between the read and write cycles of an earlier TAS/EXG and thus
lead to a hazard. How can this be prevented. (Remember that memory
modules may be pipelined.)
Hazards occur because second process may read data before first
process gets to write. E.g.
EXG
CC
1
2
3
4
5
6
Process I
data = read(address)
wait
write(address, reg)
wait
reg = data
Process II
data = read(address)
wait
write(address)
wait
reg = data
At CC2 Process 2 gets the wrong value from memory. Hazard!
Solution:
In a pipelined memory module stages used by the EXG/TAS are
locked. Subsequent processes are blocked until EXG/TAS writes
back. Correct operation now!
CC
1
2
Process I
data = read(address)
wait
3
4
5
write(address, reg)
wait
reg = data
6
…
…
…
…
Process II
data = read(address)
(blocked)
(blocked)
(blocked)
data = read(address)
(successful)
wait
…
…
At CC5 Process 2 will now get correct value from memory.
c.
A cache coherence system ensures that any change to the lock value by
another processor would be delivered to any processor trying to read it.
Describe system activities when a processor does TAS/EXC on
i)
currently open lock no one has locked before;
Never attempted read on this lock – Cache miss
Unlocked value comes to processor’s cache.
Processor closes the lock.
Locked value broadcasted (or updated in directory).
ii)
currently closed lock recently locked by someone else;
Cache miss (due to lock value being updated by another
processor)
Locked value comes to processor’s cache.
iii)
currently closed lock, after trying unsuccessfully
earlier;
Earlier attempt on this lock. Cache hit.
Locked value is read from processor’s cache.
iv)
currently open lock recently released by someone else.
Cache miss (due to lock value being updated to open by
other processor)
Unlocked value comes to processor’s cache.
2.
We studied in Interconnect structures like Crossbar, Omega network,
hypercube and token ring. Evaluate each structure for the construction of (a)
uniform shared memory system (b) non-uniform shared memory system (c)
distributed memory system. For each describe the envisaged procedure for
accessing shared memory. Remember that processors all have caches so
memory accesses are always in blocks of bytes ranging from a hundred or so
bytes to a fraction of a page.
Uniform Shared
Cross-bar
(Connect
processors
and
memory to
network)
Omega
(Connect
processors
on one side
of switches
and
memory on
other side).
Hypercube
(Connect
processors
and
memory to
network).
Token
Ring
Non-uniformed
Shared
Each processor can
If each processor has
access memory by
memory of its own,
just flipping switches. then time to access
Uniform access times own memory faster
if each processor has than time to access
no memory of its
shared memory. Nonown.
uniform access times.
Can be as few as N/2 Each processor with
parallel paths
its own memory.
depending on
Access to own
switches. Not all
memory faster than
processors can access access to shared
shared memory.
memory across
Hence not practical.
network.
Distributed
Flip switches from
processor to memory.
Varying connection
lengths. Complex.
Not always possible.
Each processor with
own memory. Access
to own memory faster
than shared memory.
Not possible.
Contention for token
result in variable
length latencies.
Each processor with
own memory.
Contention for tokens
already results in
non-uniformity.
Each processor with
own memory and no
shared memory.
Access by flipping
switches till the
processor with
memory to be
accessed.
Each processor with
own memory. Access
by grabbing and
sending token.
Each processor with
its own memory (and
no shared memory).
Processors can access
other processor’s
memories through
cross-bar network.
Each processor with
own memory (and
now shared memory).
Processors access each
other’s memory
through network.
3.
Outline a hypercube node program to sum 2**n values with each node
contributing one. It is structurally similar to the broadcasting program of last
tutorial: each node i receives from nodes i+2**m for m=n-1 downward
provided 2**m > i, and sends to i-2**m for the next m the received value plus
its own value.
Analysis:
i)
Set m = (n-1)
ii)
While(m > 0)
i. If 2**m > i
ii. Receive x from i+2**m
iii. sum = sum + x
iv. m = m –1
Send sum to i-2**m
iii)
This algorithm has the effect of flipping bits to read from nodes around a
particular node i:
E.g. for i = 3 (0011b), n = 4:
Before: k = 2**4 = 16
Iteration 1:
Iteration 2:
Iteration 3:
k = k/2 = 8, 8>3
Read from 3 + 8 = 11 (1011b)
k = k/2 = 4, 4>3
Read from 3 + 4 = 7 (0111b)
k = k/2 = 2, 2<3 Exit
Send to 3 –2/2 = 2 (0010b)
All address (1011b, 0111b, 0010b) differ from the node address (0011b) by
exactly one bit. I.e. reads and writes from neighboring nodes.
Algorithm:
k = 2**n;
for(j=1; j<n; j++)
{
k = k / 2; // Reduce n by 1
if(k <= i)
exit;
// Stop summing
sum = sum + receive(i+k);
}
send(i-k, sum)
Download