Multiprocessors

advertisement
Symmetric and CC-NUMA
Scope
Design experiences of SMPs and Coherent
Cache Nonuniform Memory Access (CCNUMA)
NUMA
Natural extension of SMP systems
Architectures
Processor & . . . Processor &
Cache
Cache
Processor & . . . Processor &
Cache
Cache
Interconnect
Bus/Crossbar
Memory
I/O
Shared Memory logic structure
Memory
I/O
SMP architecture
Processor & . . . Processor &
Cache
Cache
...
Processor & . . . Processor &
Cache
Cache
Bus/Crossbar
Bus/Crossbar
I/O Memory Remote Cache
Node 1
I/O Memory Remote Cache
Node N
Advantages of shard memory
systems (SMP or CC-NUMA)
Symmetry
Any processor can access any memory location and
I/O device
Single address space
Single system image
One copy of OS, database app, etc
Reside in the shared memory
User no control over data distribution, redistribution
Single OS schedules processes
Easy workload management, dynamic load balancing
Advantages of shard memory
systems (SMP or CC-NUMA)
Caching
Data locality supported in the hierarchy
Coherency
Enforced by the hardware?
MESI-like snoopy protocol
Memory Communication
Low latency
Simple load/store instructions
Hardware generates coherency information
Basic Issues that SMPs must
address
 Availability
Biggest problem
Failure of the bus, memory, OS !!
 Bottleneck
Compete for the memory bus and shard memory
Packet switched-bus (split transactions)
 Latency
Low latency but still large compared to CPU
 Memory bandwidth vs. Processor speed vs. Memory
capacity
 Scalability
A bus is not scalable
CC-NUMA
Extends SMPs by connecting several SMP
nodes into a larger system
Employ directory based cache coherent
protocol
While maintaining the advantages, attacks
the scalability problem
Distributed shared memory
enhances:
Scalability
Memory capacity, I/O capabilities increase by adding
more nodes
Bandwidth
An app can access multiple local memories
concurrently
Availability
Multiple copies of a portion of OS can run on multiple
nodes
Failure of one will not disrupt the entire system
Programming
We said that
“data structures get distributed”
“Cache coherency then tracks the changes”
Any issues? (remote cache vs local memory)
P, Q: processes
A, B: arrays
P:
Phase 1:
use(A)
Phase 2:
use(B)
Q:
use(B)
use(A)
Download