1. The Cluster Computing Project at John Jay College. 2. Computer Clusters for Curricular Improvements in Computer Networking and Parallel/Distributed Computing 1.Institutional Background – College and Dept. Four-year college Specialized Liberal Arts College, 12,500 students including 1300 graduate students. Undergraduate/Graduate Degrees: law and police science, criminal justice, forensic science, public management, security management, computer information systems, forensic computing (2004), Ph.D. in Criminal Justice. Mission: Advance the theory and practice of criminal justice and public through research and by providing the required trained work force. Computer networking and distributed computing: law enforcecement and criminal justice agencies need effective distributed s systems Investigage abuse and misue of computers and computer netwo Security of networked and distributed information systems Data mining techniques in criminal investigations and criminal justice research Highly available, high through put database systems for emergency response Expertise in cryptography and cryptographic/security protocols to thwart and prosecute computer and cyber crime protect national information infrastructure and privacy of individuals. High Performance Computing (modeling and simulation): standards and codes to protect the public from unsafe buildings, toxic substances t efficient and safe transportation systems. D 2. Cluster Computing for curricular improvements in computer networking and parallel/distributed computing Immediate Goal: Ensure Computer Information Systems major offers up-to-date curriculum and state-of-the-art facilities for computer networking and parallel/distributed computing. Prerequiste: Develop departmental expertise and capabilities for Research in High Performance Computing, Networking, Network Security, Solving Large Scale Compute and Data Intensive Problems. Long Term Goal: Develop a center of expertise in the Department to serve the College and Criminal Justice Communities. y especially in the areas of Databases, Distributed Computing, . Database and Network Technologies are critical. Mechanism: NASA developed research and technologies in high performance computing and computer networking (GISS, NAS at NASA AMES, JPL)., especially in areas of cluster computing.. 2. Why computer networking and distributed computing are important to us? The Computer is the Network 80’s networked file systems, e-mail, remote login, FTP. 90’s mature client/server applications: web browser/server, database applications. 00’s web information systems, distributed database applications, cluster computing (Google Search Engine), web services (XML, SOAP) Law Enforcement and Public Agencies Critical Tool – Distributed Database system, datamining in law enforcement systems, trained personnel Networks and information systems are subject of increasing abuse and misue. Network forensic techniques, high availability and fault tolerenace. CIS Major at John Jay Core courses: Basic computer science (programming fundamentals, algorithms and complexity, operating systems). Capstone courses: Systems areas and applications of interest to Criminal Justice and Law Enforcement Agencies ( computer networking, database systems, distributed systems, security, and forensic analysis). Mathematics foundation: Calculus, Discrete Mathematics and Operations Research. Students: 500 majors; 44% female; 30% African American, 32% Hispanic, 12% White, 12% Asian, 14% other. High performance computing for compute- and data-intensive problems (cluster computing for highly available, high throughput database systems; computational clusters for forensic analysis, modeling and solving large-scale problems, data mining in hetergenious database.) 3. Professional Guidelines for CS Majors: 1991 ACM/IEEE - elective courses in computer networking Computing Curricular 2001 - NetCentric Computing: computer networking, webbased systems, network security. Rapid Development of Distributed Computing - Grid technologies, web services (XML/SOAP), distributed information and database systems, information assurance and infrastructure security, Cluste Computing for high performance. k Computer Networking – communication protocols at or below the Transport Level (TCP, IP, Data/Link layer protocols, IPSec) Distributed Computing – Protocols above the transport level (HTTP, SOAP, CORBA, Grid Technologies, SSL) Parallel Computing – A type of distributed computing, solution of computationally intensive problem (Myrinet networks, MPI, PVM, ). 4. NASA Technologies and Initiatives in HPC and Computer Networking Beowulf Clusters – High capability, high capacity computing at minimal cost For earth and space science applications, NASA Goddard Space Flight Center, NASA High Performance Computing and Communications program. Thomas Sterling and Don Becker. 1995, Hothgar, 240 Mflops, sixteen Pentiums (100MHz), 64 Mbytes per processor, Fast Ethernet. Computer Cluster now used throughout NASA (AMES, JPL, GLEN). Spawned an entire industry with vendors Scyld and others. 512 processor SGI Origin 2000 systems at NASA AMES, single system image. Article by Sterling in Linux Magazine July 2003. Grid Technologies – NASA Information Power Grid Largest application of Globus toolkit, links supercomputers, mass storage devices, and large clusters of computers at three NASA centers. Information Power Grid tutorial in February helped us understand what grid computing is and the current state. Launch Pad to hide complexities of the Grid. Global Climate Modeling – Uses high performance computer and software. Open MP on clusters. Legacy codes that require loop parallelization. Simulate shared memory on a distributed memory cluster. Performance. Multithreaded Loop Level Parallelization. Research in Commodity Clusters – Portable Batch System for scheduling jobs in a cluster. New high speed cluster interconnects (Infiniband). Open Source Initiative within NASA. Relied heavily on Open Source Software (GNU Public License). Palmisano (IBM) comments at Linux World 2001 in NYC. NAS Parallel Benchmarks. 5. What do Clusters offer us? Supercomputer performance for large-scale computation (e.g., CFD, molecular modeling (Gauss), modeling and simulation, computer security – computational number theory) Clusters offer high throughput and high availability – criminal justice databases. Laboratory for Computer Science: Use latest developments in OS (scheduling, files system, I/O, administration), High performance computer networking (Infiniband) and parallel/distributed computing (MPI, OPEN MP). Rely on developments in discrete mathematics (e.g., graph partitioning for load balancing.) 6. JJ Computer Cluster (Hardware) Nodes: Twelve (12) dual processor computation node One magement node (world-node) Six dual processor 933 MHz Pentium III computers. Six dual processor 1.8 GHz XEON computers. One dual processor 933 MHZ Pentium III Vendor supplied blade solutions vs. home built PCs Interconnect: Commodity: 3 Com Gigabit Ethernet Switch (12 ports + 4 port module) (24 Port Fast Ethernet). High Performance: Interconnect Myrinet (245Mbytes/s) SCI (no switch, 300 Mbytes/s) QSNET (340Mbytes/s, direct RMA access) Multiple Interconnects: Message Passing (e.g., Myrinet) File systems (Fiber Channel) Management (Ethernet, TCP/IP) High Performance: 50% or more per cost of a node Network Infrastructure: world node + cluster nodes, world node access and launching jobs. Router directs to world node only. 7. JJ Computer Clusters (system and middle ware) Turnkey Solution vs. Build it yourself: NPACI Rocks, OSCAR from Open Cluster Group, or Scyld Computing (SSI other benefits), Portland Group Cluster Development Kit. File System: multiple computers share single view of storage. NFS, GFS (Sistina Systems, required fiber channel, Gigabit Ethernet available now), PVS (parallel file system). Operating system: Linux, Configuration issues: Trusted Login (rsh, rexec, rlogin – xinetd, rhosts), host naming information (etc/hosts), shared file system (NFS, exporting on NFS server and mounting on clients), node generation (Red Hat Kickstart) and maintenance (version skew), x-window (security for MPE), Portable Batch System. Cluster Middleware: Message Passing Interface (MPICH – configuration, machines file), MPE for visualization, Cluster Tools, Testing with MPICH supplied utilities MPPTEST. Monitoring and Testing utilities: NetPerf (TCP/IP performance), LMBench (File System Performance), NetPipe (MPI, Compilers: GNU Fortran and C, Portland Group Parallel Fortran Compilers. Applications: Intel BLAS, BLACS, PBLAS, ScalPACK (Linear Algebra Library for Distributed Memory Machines large dense systems of Equations, Singular Value Decompositions). Test it with the BLACS. BLACS can find errors, High Performance Linpack Benchmark. Present results here: Fast Ethernet: *****, Gigabit Ethernet 12 nodes + worldnode (no computation), 1 CPU, 7.95 Gflops, Block size 32 or 64, 3 by 4 Processor array, 6000 by 6000, 18 seconds, and Top 500 Supercomputers 134 GFlops. ||Ax – b||inf / (eps x ||A||inf x ||x||inf) = .0025836, 933 MHz 700 MFLOPS per CPU. According to Linpack Performance, Pentium III, 256 KB Cache, 32 bit bus, n=1000 550 Mflops, 933 Mflops theoretical peak www.netlib.org., eps = 1.1x 10-16. 8. Conclusion: Clusters – Much needs to be done. Nodes: Channel architecture (Infiniband), for now PCI-X bus (64 bit, 133MHz), power consumption excessive (5 Watt Transmetta). Interconnect: Commodity Open Standard (Infiniband); Direct memory access; Supports (messaging, file transfers and management), Require software development, Automated Diagnostic Tools! OS: SSI (single process space, single point of control and management) Single entry point, single job lauch point, etc. General Characteristics: Management - Monitoring and Diagnosis; Installation, Configuration and Update; Middleware – Effect Open MP for Clusters. Application Software and Libraries – Easier to use and Deploy. Cluster Computing White Paper, Ed., Mark Barker, 2002, University of Portsmouth, UK, http://www.dsg.port.ac.uk/mab/tfcc/WhitePaper/final-paper.pdf 8. JJ Database Cluster 4 Nodes: Microsoft SQL Server, Oracle 9i Server (Linux), MS Remote Access server, and a Web server. Web based information systems (ASP, SQL). FBI – NIBRS - Flat file (1.5 gbytes of data per year), converted to relational database (records ***), web interface, Student projects and Criminal Justice Research Continue to add each year’s data to the system, (1999 and 2000 available), Make it available nationwide to Law Enforcement and Criminal Justice agencies, Use Cluster technologies for high availability and fault tolerance. Four students work on this. 9. Distributed Computing Lab (Linux Lab), Joint Venture with Science Dept. Open Source Software Tools for networking (Ethereal – packet analyzer, NTOP – network resource utilization, POSIX Sockets), Lab for distributed applications (JAVA SKD Development, Perl Implementation of Bellman Ford Algorithm) Lab for Operating Systems Courses (View kernel software, Task Management Utilities to understand how systems work, POSIX Threads, systems software) Lab for Computer/Network Security (Intrusion Detection – SNORT, NetSaint) Mathematical Software (Maple 8, Matlab) Hard (Linux expertise) but Easy (Much is available for free). 10. NASA CIPA Network Infrastructure PUBSSH – remote access to environment. Load balancing, SSH. Management Utilities: Automated system backup, automated node generation, problem reporting system. Web sites for administration (PHP, MySQL): Literature Reference Database, Expense Database (expenses and inventory), Information on NASA/CIPA project and CIS program (web.math.jjay.cuny.edu) 11. Research, Partnerships and Interdisciplinary Work - Provide a Computational Resource for the Researchers at the College and in the Law Enforcement and Criminal Justice Communties. Facilitiate quantitative research at John Jay and growing demand for Computer Expertise at the College. Science Dept. - Molecular Modelling with the Science Dept. – area is toxicology. Solving Schrodinger Equation (Gaussian, etc.) Fire Science Instititue – Modelling smoke propogation in buildings – Reaction Diffusion Convection Modells (Stiff ODEs and Large Linear Systems) Queens College - Statistical Analysis of Data and Information Theoretic Models (Singular Value Decomposition of Large Matrices) SVD and Principal Component Analysis of high dimensional data, genome studies, image identification, google page index. Data mining. Mathematics Dept. – Large Scale Database Systems (NIBRS Database, Web Interfaces, Database Cluster), Clusters for high availability and fault tolerance Mathematics Dept. – Distributed Systems Security (NASA CIPA Network Infrastructure, secure web services environment, Large Number Library for cryptographic Studies – Miller Rabin Algorithm for Primality Testing.) Mathematics Dept. – Parallel Programming and Parallel Algorithms, Real Schur Form of a Matrix used in solving ODEs in flight control (Boeing Contract) 12. Educational Benefits (CIS major and new program in M.S. Forensic Computing) Computing Facilites, Faculty Expertise and Curriculum Development Curriculum: Added or revised courses and curricular in operating systems, computer networking, database systems, and parallel computing. Also added systems analysis, computer graphics, computer network security. Revise mathematics requirements: discrete mathematics (graph theory, graph partitioning, shortest path algorithms, cryptography), operations research (queuing theory). Developed M.S. major secure operating systems, computer networking and security of distributed systems New Laboratories: Exposure to Linux Environment Tools for network and distributed computing, access to Computational and Database Clusters, Remote access from off campus using SSH. Faculty Expertise: New faculty hired in computer networking and distributed computing. Faculty training and Cluster Computing Colloquium Series. Student Opportunites: 15 students work on the program. All are attending graduate programs in CS and working in the field. Over 400 CIS majors have benefited from NASA CIPA facilities (esp. Enterprise Level Database system). 13: Summary Faculty and Staff Development: Facilities Development: Curriculum and Students: Overall Effect on the Institution: