Symmetric and CC-NUMA Scope Design experiences of SMPs and Coherent Cache Nonuniform Memory Access (CCNUMA) NUMA Natural extension of SMP systems Architectures Processor & . . . Processor & Cache Cache Processor & . . . Processor & Cache Cache Interconnect Bus/Crossbar Memory I/O Shared Memory logic structure Memory I/O SMP architecture Processor & . . . Processor & Cache Cache ... Processor & . . . Processor & Cache Cache Bus/Crossbar Bus/Crossbar I/O Memory Remote Cache Node 1 I/O Memory Remote Cache Node N Advantages of shard memory systems (SMP or CC-NUMA) Symmetry Any processor can access any memory location and I/O device Single address space Single system image One copy of OS, database app, etc Reside in the shared memory User no control over data distribution, redistribution Single OS schedules processes Easy workload management, dynamic load balancing Advantages of shard memory systems (SMP or CC-NUMA) Caching Data locality supported in the hierarchy Coherency Enforced by the hardware? MESI-like snoopy protocol Memory Communication Low latency Simple load/store instructions Hardware generates coherency information Basic Issues that SMPs must address Availability Biggest problem Failure of the bus, memory, OS !! Bottleneck Compete for the memory bus and shard memory Packet switched-bus (split transactions) Latency Low latency but still large compared to CPU Memory bandwidth vs. Processor speed vs. Memory capacity Scalability A bus is not scalable CC-NUMA Extends SMPs by connecting several SMP nodes into a larger system Employ directory based cache coherent protocol While maintaining the advantages, attacks the scalability problem Distributed shared memory enhances: Scalability Memory capacity, I/O capabilities increase by adding more nodes Bandwidth An app can access multiple local memories concurrently Availability Multiple copies of a portion of OS can run on multiple nodes Failure of one will not disrupt the entire system Programming We said that “data structures get distributed” “Cache coherency then tracks the changes” Any issues? (remote cache vs local memory) P, Q: processes A, B: arrays P: Phase 1: use(A) Phase 2: use(B) Q: use(B) use(A)