• Storage area network and System area network (SAN) – What are they? – Network requirements – Hardware/software issues – References: • Ulf Troppens, Rainer Erkens, a nd Wolfgang Muller, “Storage Networks Explained - basic and application of Fibre Channel SAN, NAS, iSCSI and Infiniband”, John Wiley & Sons, 2004. • W. J. Dally and B. Towles, “Principles and Practices of Interconnection Networks”, Morgan Kaufmann, 2004. • Ajay V. Bhatt, “Creating a Third Generation I/O Interconnect,” available at http://www.express-lane.org • Storage area network (SAN): – Server-centric IT architecture: storage devices exist only with servers • Storage-centric IT architecture: SCSI cables are replaced by a network (storage is now independent of servers). • Storage area network (SAN) requirement: – Serial transmission for high speed and long distance – Low transmission errors – Low delay of transmitted data • Needs to make it feel like using a local disk • Low delay is a relative term: – The disk subsystem has around 1ms – 10ms latency itself. – The communication protocol should not use CPU. • Current Storage area network (SAN) technology (IBM): – Fibre Channel – TCP/IP + Gigabit Ethernet (iSCSI) – InfiniBand • System area network: a network with a high bandwidth and a low lantency that serves as a connection between computers in a distributed computer system. • Why system area network: – Historically, the system area network comes with a particular parallel machine (supercomputer, e.g. Cray T3D, Cray T3E, SGI origin 2000, IBM SP, Thinking machine CM5, Intel Polygon) • The network is very expensive due to low volume • CPU is two generations behind – A more cost effective way to build these system is to decouple the processor technology from the networking technology. – To form cheaper clusters of workstations with the offthe-shelf system area network technology (compared to traditional supercomputers). – current system area networks: • Myrinet, Quadrics, Infiniband • System area network requirement: – Low latency and high bandwidth at the application level. • Not just at the hardware level • Not just at the system level – Implicitation: • Hardware, network interface, software messaging layer should work together to achieve the goal. – Infiniband is designed as both storage area network and system area network. • Hardware issues: – High speed links: • Infiniband: 2.5Gbps = 250MBps, 10Gbps=1GBps, 30 Gbps = 1GBps • Fibre channel: 100MBps, 200MBps, 400MBps, 1GBps. • Myrinet: up to 9.6Gbps • As a reference PCI bus: 100MBps – NIC may need to attach to the memory bridge • A typical PC: • A workstation connected to a system area network: • When the number of end points is large, multiple switches will be needed. • Topology • Switching • Routing • Topology – Static arrangement of channels and nodes in an interconnection network – Trade-off between cost and performance • Cost: the number and complexity of chips, density and length of the interconnections, etc. • Performance: – Bandwidth and latency: also depend on other factors other than topology – Topology performance metrics: Bisection bandwidth, diameter, nodal degree, channel load • A cut of a network is the set of channels that partitions the set of all nodes into two disjoint sets. • A bisection of a network is a cut that partitions the network nodes in roughly half. • The bisection bandwidth of a network is the minimum bandwidth over all bisections of the network. • The diameter of a network is the largest minimal hop count over all pairs of nodes. • Under a particular traffic pattern, the channel that carries the largest fraction of traffic determines the maximum channel load of the topology. • Example topologies: – Regular or irregular – Regular topologies are mostly derived from two main families: butterflies (k-ary n-flies) or tori (k-ary n-cubes) • Switching: how a packet pass a switch – Message/packet/flit • Traditional scheme: store-and-forward – Time = H (S + P/B) • Cut-through switch: – Forward to the next link after the header flit is received. Stop only when the next hop buffer is not available. – Time = H S + P/B, when S << P/B, the time does not depend on the number of hops!!! • Wormhole routing: – Cut-through switches still allocate buffer to packets. May require a large amount of buffers – Wormhole routing only allocates buffer for one flit for each packet. – Latency is the same as cut-through switching. – When the packet is block, the whole flit “train” is block, occupying links. • Solution: add more virtual channels. • The deadlock problem in wormhole routing: – Need deadlock free routing scheme to select the right path • Cut-through switch and wormhole switch are widely used in system are networks – Routing in such systems is an issue!! – Shortest path routing may result in deadlock. – Deadlock free routing: • Cut-through switch and wormhole switch are widely used in system are networks – Routing in such systems is an issue!! – Shortest path routing may result in deadlock. – Deadlock free routing: • Basic idea: fix the priority of channels and using the channels with increasing priority. • Example: up/down routing • Up/down routing: – Select a node as the root – Build a spanning tree from the root – Nodes are partitioned into layers based on the position in the spanning tree – The channel from a lower layer node to a higher layer node is the up link, the channel from a higher layer node to a lower layer node is a down link, channels between nodes in the same layer are marked as up or down link based on the node number – In the valid route: an up channel cannot follow an down channel. – These exists at least one valid path between each pair of nodes. • Problems with deadlock free routing: – Load balancing is a problem, traffic are not evenly distributed – Non-adaptive version of the deadlock free routing scheme is also a problem • How to map the routes in order to get good performance (metrics: maximum channel load?) • More on the problem to be discussed later. • Hardware/software codesign and software API issues: – What functionality should be implemented in the hardware. • E.g. adaptive routing may imply out of order packets – Chien’04 paper gives good answers to some of these questions.