Multiprocessors

Symmetric and CC-NUMA Scope Design experiences of SMPs and Coherent Cache Nonuniform Memory Access (CCNUMA) NUMA Natural extension of SMP systems Architectures Processor & . . . Processor & Cache Cache Processor & . . . Processor & Cache Cache Interconnect Bus/Crossbar Memory I/O Shared Memory logic structure Memory I/O SMP architecture Processor & . . . Processor & Cache Cache ... Processor & . . . Processor & Cache Cache Bus/Crossbar Bus/Crossbar I/O Memory Remote Cache Node 1 I/O Memory Remote Cache Node N Advantages of shard memory systems (SMP or CC-NUMA) Symmetry Any processor can access any memory location and I/O device Single address space Single system image One copy of OS, database app, etc Reside in the shared memory User no control over data distribution, redistribution Single OS schedules processes Easy workload management, dynamic load balancing Advantages of shard memory systems (SMP or CC-NUMA) Caching Data locality supported in the hierarchy Coherency Enforced by the hardware? MESI-like snoopy protocol Memory Communication Low latency Simple load/store instructions Hardware generates coherency information Basic Issues that SMPs must address  Availability Biggest problem Failure of the bus, memory, OS !!  Bottleneck Compete for the memory bus and shard memory Packet switched-bus (split transactions)  Latency Low latency but still large compared to CPU  Memory bandwidth vs. Processor speed vs. Memory capacity  Scalability A bus is not scalable CC-NUMA Extends SMPs by connecting several SMP nodes into a larger system Employ directory based cache coherent protocol While maintaining the advantages, attacks the scalability problem Distributed shared memory enhances: Scalability Memory capacity, I/O capabilities increase by adding more nodes Bandwidth An app can access multiple local memories concurrently Availability Multiple copies of a portion of OS can run on multiple nodes Failure of one will not disrupt the entire system Programming We said that “data structures get distributed” “Cache coherency then tracks the changes” Any issues? (remote cache vs local memory) P, Q: processes A, B: arrays P: Phase 1: use(A) Phase 2: use(B) Q: use(B) use(A)

Multiprocessors

Related documents

Products

Support

Multiprocessors

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib