Memory Network: Enabling Technology for Scalable Near-Data Computing Gwangsun Kim, John Kim Jung Ho Ahn Yongkee Kwon Korea Advanced Institute of Science and Technology Seoul National University SK Hynix Memory Network Hybrid Memory Cube (HMC) Vault DRAM layers Vault controller … Vault controller Intra-HMC Network Logic layer High-speed link I/O port … I/O port Data A “Near”-data processing with multiple memories? Data B “compute A+B” “Far”-data? Memory network enables scalable near-data computing. Data B 2/10 DIVA Processing-in-Memory (PIM) Chip For multimedia and irregular applications. Proposed memory network for PIM modules. Simple low-dimensional network (e.g., ring ) High packet hop count performance & energy inefficiency Advanced technology is available – high off-chip bandwidth Draper et al., “The architecture of the DIVA processing-in-memory chip”, ICS’02 3/10 Memory Networks from Micron Network-attached memories 2D Mesh topology Local memories D. R. Resnick, “Memory Network Methods, Apparatus, and Systems,” US Patent Application Publication, US20100211721 A1, 2010. 4/10 Memory Network Design Issues Difficult to leverage high-radix topology – Low-radix vs. high-radix topology – High-radix topology smaller network diameter – Limited # of ports in memory modules. Low-radix networks High-radix networks Adaptive routing requirement – Can increase network cost – Depends on traffic pattern, memory mapping, etc. 5/10 Memory-centric Network Host-memory bandwidth still matters. – To support conventional applications while adopting NDP. – NDP involves communication with host processors. MCN Leverage the same network for NDP. Separate network required for NDP … … CPU Memory BW CPU Processor-to processor BW … Network CPU Flexible BW utilization … … CPU … Network The same network can be used for NDP Processor-centric Network (PCN) Memory-centric Network (MCN) (e.g., Intel QPI, AMD HyperTransport) [PACT’13] 6/10 Memory Network for Heterogeneous NDP NDP for not only CPU, but also for GPU. Unified memory network for multi-GPU systems [MICRO’14]. Extending the memory network for heterogeneous NDP. CPU GPU GPU … … … … FPGA … Unified Memory Network 7/10 Hierarchical Network With intra-HMC network, the memory network is a hierarchical network. NDP requires additional processing elements at the logic layer. Need to support various types of traffic – Local (on-chip) traffic vs. global traffic – Conventional memory access traffic vs. NDP-induced traffic Hybrid Memory Cube DRAM (stacked) Vault controller On-chip channel I/O port Concentrated Mesh-based intra-HMC network [PACT’13] 8/10 Issues with Memory Network-based NDP Power management – Large number of channels possible in memory network – Power-gating, DVFS, and other circuit-level techniques. Data placement & migration – Optimal placement of shared data – Migration within memory network Consistency & coherence – Direct memory access by multiple processors – Heterogeneous processors 9/10 Summary Memory network can enable scalable near-data processing. Leveraging recent memory network researches – Memory-centric network [PACT’13] – Unified memory network [MICRO’14] Intra-HMC design considerations Further issues 10/10