The University of Sunderland Cluster Computer IET Lecture by John Tindle Northumbria Network, ICT Group Monday 11 February 2008 Overview of talk SRIF3 and Potential Vendors General Requirements Areas of Application Development Team Cluster Design Cluster System Hardware + Software Demonstrations United Kingdom – Science Research Investment Fund (SRIF) The Science Research Investment Fund (SRIF) is a joint initiative by the Office of Science and Technology (OST) and the Department for Education and Skills (DfES). The purpose of SRIF is to contribute to higher education institutions' (HEIs) long-term sustainable research strategies and address past under-investment in research infrastructure. SRIF3 SRIF3 - 90% and UoS - 10% Project duration about two years Made operational by late December 2007 Heriot Watt University - coordinator Potential Grid Computer Vendors Dell – selected vendor CompuSys – SE England Streamline - midlands Fujitsu - Manchester ClusterVision - Dutch OCF - Sheffield General requirements General requirements High performance general purpose computer Built using standard components Commodity off the shelf (COTS) Low cost PC technology Reuse existing skills - Ethernet Easy to maintain - hopefully Designed for Networking Experiments Require flexible networking infrastructure Modifiable under program control Managed switch required Unmanaged switch often employed in standard cluster systems Fully connected programmable intranet System Supports Rate limiting Quality of service (QoS) Multiprotocol Label Switching (MPLS) VLANs and VPNs IPv4 and IPv6 supported in hardware Programmable queue structures Special requirements 1 Operation at normal room temperature Typical existing systems require a low air inlet temperature < 5 Degrees C a dedicated server room with airconditioning Low acoustic noise output Dual boot capability Windows or Linux in any proportion Special requirements 2 Concurrent processing, for example continued Boxes 75% cores for Windows Boxes 25% cores for Linux CPU power control – 4 levels High resolution displays for media and data visualisation Advantages of design Heat generated is not vented to the outside atmosphere Airconditioning running cost are not incurred Heat is used to heat the building Compute nodes (height 2U) use relatively large diameter low noise fans Areas of application Areas of application 1. Media systems – 3D rendering 2. Networking experiments MSc Network Systems – large cohort 3. 4. 5. 6. Engineering computing Numerical optimisation Video streaming IP Television Application cont 1 7. Parallel distributed computing 8. Distributed databases 9. Remote teaching experiments 10. Semantic web 11. Search large image databases 12. Search engine development 13. Web based data analysis Application cont 2 14. Computational fluid dynamics 15. Large scale data visualisation using high resolution colour computer graphics UoS Cluster Development Team From left to right Kevin Ginty Simon Stobart John Tindle Phil Irving Matt Hinds Note - all wearing Dell tee shirts UoS Team UoS Cluster Work Area At last all up and running! UoS Estates Department Very good project work was completed by the UoS Estates Department Electrical network design Building air flow analysis Computing Terraces Heat dissipation Finite element (FE) study and analysis Work area refurbishment Cluster Hardware Cluster Hardware The system has been built using Dell compute nodes Cisco networking components Grid design contributions from both Dell and Cisco Basic Building Block Compute nodes Dell PE2950 server Height 2U Two dual core processors Four cores per box Ram 8G , 2G per core http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/ Computer Nodes Network interface cards 3 off Local disk drives 250G SATA II The large amount of RAM facilitates virtual computing experiments VMWare server and MS VirtualPC Cisco 6509 switch Cisco 6509 URL (1off) Cisco 720 supervisor engines (2off) Central network switch for the cluster RSM router switch module Provides 6509 Provides 720Mbps full duplex, (4off port cards) Virtual LANs - VLAN Virtual private networks - VPN Link bandwidth throttling Traffic prioritisation, QoS Network experimentation Cluster Intranet 1. 2. 3. The network has three buses Data IPC IPMI 1. Data bus User data bus A normal data bus required for interprocessor communication between user applications 2. IPC Bus Inter process communication (IPC) “The Microsoft Windows operating system provides mechanisms for facilitating communications and data sharing between applications. Collectively, the activities enabled by these mechanisms are called interprocess communications (IPC). Some forms of IPC facilitate the division of labor among several specialized processes”. IPC Bus “Other forms of IPC facilitate the division of labor among computers on a network”. Ref Microsoft Website IPC is controlled by the OS For example IPC is continued Used to transfer and install new disk images on compute nodes Disk imaging is a complex operation 3. IPMI Bus IPMI Intelligent Platform Management Interface (IPMI) specification defines a set of common interfaces to computer hardware and firmware which system administrators can use to monitor system health and manage the system. Master Rack A Linux and Microsoft 2 – PE2950 control nodes 5 – PE1950 web servers Cisco Catalyst 6509 720 supervisor engines 2 * 720 supervisors 4 * 48 port cards (192 ports) Master Rack A cont Compute nodes require 40*3 = 120 connections Disk storage 1 – MD1000 http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/ Master rack resilient to mains failure Power supply 6 kVA APC (hard wired 24 Amp PSU) Master Rack A KVM Switch Ethernet KVM switch Keyboard, Video display, Mouse - KVM Provides user access to the head nodes Windows head node, named – “Paddy” Linux head node, named - “Max” Movie USCC MVI_6991.AVI Rack B Infiniband InfiniBand is a switched fabric communications link primarily used in highperformance computing. Its features include quality of service and failover and it is designed to be scalable. The InfiniBand architecture specification defines a connection between processor nodes and high performance I/O nodes. Infiniband Rack B 6 – PE2950 each with two HCAs 1 – Cisco 7000P router Host channel adapter (HCA) link http://157.228.27.155/website/CLUSTER-GRID/Ciscodocs1/HCA/ Infiniband http://en.wikipedia.org/wiki/InfiniBand Cisco Infiniband Cisco 7000p High speed bus 10Gbits/sec Low latency < 1microsec Infiniband 6 compute nodes 24 cpu cores High speed serial communication Infiniband Many parallel channels PCI Express bus (serial DMA) Direct memory access (DMA) General compute Rack C 11 – PE2950 computer nodes Product details Racks A*1 - 2 control (+5 servers) GigE B*1 - 6 Infiniband (overlay) C*3 - 11 (33) GigE N*1 - 1 (Cisco Netlab + VoIP) Total compute nodes 2+6+33+1 = 42 Rack Layout -CCBACNF C C B A C N F Future expansion – F KVM video - MVI_6994.AVI Summary - Dell Server 2950 Number of nodes 40 + 1(lin) + 1(win) Number of compute nodes 40 Intel Xeon Woodcrest 2.66GHz Two dual core processors GigE NICs – 3 off per server RAM 8G, 2G per core Disks 250G SATA II Summary - cluster speedup Compare time taken to complete a task Time on cluster = 1 hour Time using a single CPU = 160 hours or 160/24 = 6.6 days approx 1 week Facility available for use by companies “Software City” startup companies Data storage Master nodes via PERC5e to MD1000 using 15 x 500G SATA drives Disk storage 7.5T Linux 7 disks MS 2003 Server HPC 8 disks MD1000 URL http://157.228.27.155/website/CLUSTER-GRID/Dell-docs2/ Power Total maximum load generated by Dell cluster cabinets Total load = 20,742kW Values determined by using Dells integrated system design tool Power and Noise Web servers PE1950 Height 1U Five server Web services Domain controller, DNS, DHCP etc http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/ Access Workstations Dell workstations (10 off) Operating Systems WinXP Pro HD displays LCD (4 off) Size 32 inch wall mounted Graphics NVS285 – 8*2 GPUs Graphics NVS440 – 2*4 GPU Graphics processor units Support for HDTV Block Diagram University of Sunderland Cluster Computer USCC Servers 1950 Server 2950 Switch 6509 CPUs Campus network 2 * 720 supervisor engines 720 Gb duplex Cluster gateway Web server 5 * 1950 3 * Win2003 server 2 * Linux online support for users Local access workstations 10 * Dell PCs Visualsation of data, display area 4 * 37 inch LCD flatscreens Intranet data, control monitor, spare 40Gb per slot 720 aggregate bandwidth 4 * line cards 4*48 port line cards 4 * 48 = 192 ports 1Gb Ethernet links copper Support for VPNs QoS MPLS rate limiting private VLANs IPv4 and IPv6 routing in hardware GigE intranets 3 * LANS 2 CPUs/node, 2 cores/CPU 4 cores/node 42 * nodes, 2 * head nodes Lin/Win, 40 compute nodes NAS 7.5Tb Infininband switch - 7000P 250Gb SATA Distributed stirage 40 * 250Gb = 10Tb Ethernet 3 LANs 8 * 500Gb Lin 7 * 500G Win Ram 8Gb Infiniband overlay 6 nodes 2 * HCA/node 10Gbps links Movie USCC MVI_6992.AVI Cluster Software Cluster Software Compute Node Operating Systems Scientific Linux (based on Redhat) MS Windows Server 2003 High performance computing - HPC Scali Scali Management Scali is used to control the cluster software to mange high performance cluster computers start and stop processes, upload data/code and schedule tasks Scali datasheet http://www.scali.com/ http://157.228.27.155/website/CLUSTER-GRID/Scali/ Other software Apache web services Tomcat, Java server side programming Compilers C++, Java Servers FTPD 3D modelling and animation Blender Autodesk 3DS Max software Virtual Computing Virtual Network Security Experiment - example Virtual Network VMWare Appliances Components (1) (2) (3) (4) (5) NAT router WinXP-sp2 attacks FC5 across network Network hub - interconnection Firewall - protection Fedora Core FC5 target system Network Security Experiment VMware host XPProSP2 Eth0 NAT/ (VMnet8) SW2 Red eth1 Ethernet 2 FC5 Green eth0 Ethernet NAT Firewall Forward port 80 from Red to FC5’s IP HUB (VMnet4) Eth0 Load Apache (httpd) web server Security Experiment A total of 5 virtual networking devices using just one compute box Port scanning attack (Nessus) Intrusion detection (Snort) Tunnelling using SSH and Putty RAM required 500K+ for each network component Cisco Netlab Cisco Netlab provides Remote access to network facilities for experimental purposes Netlab is installed the Network cabinet Plus VoIP demonstration system for teaching purposes Network Research Current Research Network Planning Network Planning Research Network model using OOD Hybrid parallel search algorithm based upon features of Parallel genetic algorithm (GA) Particle swarm optimisation (PSO) Ring of communicating processes Network Planning Research Web services Server side programs - JSP FTPDaemon, URL objects, XML Pan Reif solver Steve Turner PhD student based on Newton’s Method Submit May 2008 – first to use USCC UoS Cluster Computer USCC Hybrid GA Telecom Network Planning DSL for ISP DSL Network Plan Schematic Diagram Numerical output from GA optimiser – PON Equipment Data visualisation - multidimensional data structure: location, time and service types Demonstrations 1. 2. IPTV Java test program Demonstration 1 - IPTV IP television demonstration IP internet protocol Video LAN client – VLC Number of servers and clients – 10 Video streams standard definition 4 to 5Mbps Multicasting Class D addressing IPTV IGMP Internet group management protocol Video streams HD 16Mbps HD only uses 1.6% of 1Gbps Rudolph Nureyev dancing Six Five Special 1957 Don Lang and the Frantic Five New dance demonstration - Bunny Hop Demonstration 2 Java demonstration test program Compute node processes 40 Workstation server 1 Communication via UDP Graphical display on local server of data sent from compute nodes Network configuration – star Star network Cluster Demonstration Program Node 1 Node 2 Server Node Star Ring Node 39 Node 40 Cluster configuration file Description of File ipadd.txt 1 192.168.1.50 192.168.1.5 192.168.1.7 192.168.1.51 Equation double val = 100 * ( 0.5 + Math.exp(-t/tau) * 0.5 * Math.sin(theta)) ; Node id Hub server address Previous Compute Node Next Compute Node Hub2 spare Screenshot of hub server bar graph display USCC configuration Single demo in a compute node All compute node 40*5 = 200 Workstations 10 Dirs 1+4 = 5 (top level + one per core) 20*200 = 2000 Ten demos 10*2000 = 20,000 directories to set up Java program to configure cluster UoS Cluster Computer Inaugural Event UoS Cluster Computer Inaugural Event Date: Thursday 24 April 2008 Time: 5.30pm Venue: St Peter’s Campus Three speakers (each 20minutes) John MacIntyre - UoS Robert Starmer - Cisco San Jose TBA - Dell Computers USCC Inaugural Event Attendance is free Anyone wishing to attend is asked to register beforehand to facilitate catering Contact via email john.tindle@sunderland.ac.uk The End Thank you for your attention Any questions Slides and further information available at URL http://157.228.27.155/website/CLUSTER-GRID/