1 - CUNY.edu

advertisement
1. The Cluster Computing Project at John Jay College.
2. Computer Clusters for Curricular Improvements in Computer Networking and
Parallel/Distributed Computing
1.Institutional Background – College and Dept.
Four-year college Specialized Liberal Arts College, 12,500 students including 1300
graduate students.
Undergraduate/Graduate Degrees: law and police science, criminal justice, forensic
science, public management, security management, computer information systems,
forensic computing (2004), Ph.D. in Criminal Justice.
Mission: Advance the theory and practice of criminal justice and public through research
and by providing the required trained work force.
Computer networking and distributed computing:
law enforcecement and criminal justice agencies need effective distributed s systems
Investigage abuse and misue of computers and computer netwo
Security of networked and distributed information systems
Data mining techniques in criminal investigations and criminal justice research
Highly available, high through put database systems for emergency response
Expertise in cryptography and cryptographic/security protocols to thwart and prosecute
computer and cyber crime
protect national information infrastructure and privacy of individuals.
High Performance Computing (modeling and simulation):
standards and codes to protect the public from unsafe buildings,
toxic substances
t
efficient and safe transportation systems.
D
2. Cluster Computing for curricular improvements in computer networking and
parallel/distributed computing
Immediate Goal: Ensure Computer Information Systems major offers up-to-date
curriculum and state-of-the-art facilities for computer networking and parallel/distributed
computing.
Prerequiste: Develop departmental expertise and capabilities for Research in High
Performance Computing, Networking, Network Security, Solving Large Scale Compute
and Data Intensive Problems.
Long Term Goal: Develop a center of expertise in the Department to serve the College
and Criminal Justice Communities. y especially in the areas of Databases, Distributed
Computing, . Database and Network Technologies are critical.
Mechanism: NASA developed research and technologies in high performance computing
and computer networking (GISS, NAS at NASA AMES, JPL)., especially in areas of
cluster computing..
2. Why computer networking and distributed computing are important to us?
The Computer is the Network
80’s networked file systems, e-mail, remote login, FTP.
90’s mature client/server applications: web browser/server, database applications.
00’s web information systems, distributed database applications, cluster
computing (Google Search Engine), web services (XML, SOAP)
Law Enforcement and Public Agencies
Critical Tool – Distributed Database system, datamining in law enforcement
systems, trained personnel
Networks and information systems are subject of increasing abuse and misue.
Network forensic techniques, high availability and fault tolerenace.
CIS Major at John Jay
Core courses: Basic computer science (programming fundamentals, algorithms
and complexity, operating systems).
Capstone courses: Systems areas and applications of interest to Criminal Justice
and Law Enforcement Agencies ( computer networking, database systems,
distributed systems, security, and forensic analysis).
Mathematics foundation: Calculus, Discrete Mathematics and Operations
Research.
Students: 500 majors; 44% female; 30% African American, 32% Hispanic, 12%
White, 12% Asian, 14% other.
High performance computing for compute- and data-intensive problems (cluster
computing for highly available, high throughput database systems; computational
clusters for forensic analysis, modeling and solving large-scale problems, data
mining in hetergenious database.)
3. Professional Guidelines for CS Majors:
1991 ACM/IEEE - elective courses in computer networking
Computing Curricular 2001 - NetCentric Computing: computer networking, webbased systems, network security.
Rapid Development of Distributed Computing - Grid technologies, web services
(XML/SOAP), distributed information and database systems, information
assurance and infrastructure security, Cluste Computing for high performance.
k
Computer Networking – communication protocols at or below the Transport
Level (TCP, IP, Data/Link layer protocols, IPSec)
Distributed Computing – Protocols above the transport level (HTTP, SOAP,
CORBA, Grid Technologies, SSL)
Parallel Computing – A type of distributed computing, solution of
computationally intensive problem (Myrinet networks, MPI, PVM, ).
4. NASA Technologies and Initiatives in HPC and Computer Networking
Beowulf Clusters – High capability, high capacity computing at minimal cost
For earth and space science applications, NASA Goddard Space Flight Center,
NASA High Performance Computing and Communications program. Thomas
Sterling and Don Becker. 1995, Hothgar, 240 Mflops, sixteen Pentiums
(100MHz), 64 Mbytes per processor, Fast Ethernet. Computer Cluster now used
throughout NASA (AMES, JPL, GLEN). Spawned an entire industry with
vendors Scyld and others. 512 processor SGI Origin 2000 systems at NASA
AMES, single system image. Article by Sterling in Linux Magazine July 2003.
Grid Technologies – NASA Information Power Grid Largest application of
Globus toolkit, links supercomputers, mass storage devices, and large clusters of
computers at three NASA centers. Information Power Grid tutorial in February
helped us understand what grid computing is and the current state. Launch Pad to
hide complexities of the Grid.
Global Climate Modeling – Uses high performance computer and software.
Open MP on clusters. Legacy codes that require loop parallelization. Simulate
shared memory on a distributed memory cluster. Performance. Multithreaded
Loop Level Parallelization.
Research in Commodity Clusters – Portable Batch System for scheduling jobs in a
cluster. New high speed cluster interconnects (Infiniband).
Open Source Initiative within NASA. Relied heavily on Open Source Software
(GNU Public License). Palmisano (IBM) comments at Linux World 2001 in
NYC.
NAS Parallel Benchmarks.
5. What do Clusters offer us?
Supercomputer performance for large-scale computation (e.g., CFD, molecular
modeling (Gauss), modeling and simulation, computer security – computational
number theory)
Clusters offer high throughput and high availability – criminal justice databases.
Laboratory for Computer Science: Use latest developments in OS (scheduling,
files system, I/O, administration), High performance computer networking
(Infiniband) and parallel/distributed computing (MPI, OPEN MP).
Rely on developments in discrete mathematics (e.g., graph partitioning for load
balancing.)
6. JJ Computer Cluster (Hardware)
Nodes: Twelve (12) dual processor computation node
One magement node (world-node)
Six dual processor 933 MHz Pentium III computers.
Six dual processor 1.8 GHz XEON computers.
One dual processor 933 MHZ Pentium III
Vendor supplied blade solutions vs. home built PCs
Interconnect: Commodity:
3 Com Gigabit Ethernet Switch
(12 ports + 4 port module)
(24 Port Fast Ethernet).
High Performance:
Interconnect
Myrinet (245Mbytes/s)
SCI (no switch, 300 Mbytes/s)
QSNET (340Mbytes/s, direct RMA access)
Multiple Interconnects: Message Passing (e.g., Myrinet)
File systems (Fiber Channel)
Management (Ethernet, TCP/IP)
High Performance: 50% or more per cost of a node
Network Infrastructure: world node + cluster nodes, world node access and launching
jobs. Router directs to world node only.
7. JJ Computer Clusters (system and middle ware)
Turnkey Solution vs. Build it yourself: NPACI Rocks, OSCAR from Open Cluster
Group, or Scyld Computing (SSI other benefits), Portland Group Cluster Development
Kit.
File System: multiple computers share single view of storage. NFS, GFS (Sistina
Systems, required fiber channel, Gigabit Ethernet available now), PVS (parallel file
system).
Operating system: Linux, Configuration issues: Trusted Login (rsh, rexec, rlogin –
xinetd, rhosts), host naming information (etc/hosts), shared file system (NFS, exporting
on NFS server and mounting on clients), node generation (Red Hat Kickstart) and
maintenance (version skew), x-window (security for MPE), Portable Batch System.
Cluster Middleware: Message Passing Interface (MPICH – configuration, machines file),
MPE for visualization, Cluster Tools, Testing with MPICH supplied utilities MPPTEST.
Monitoring and Testing utilities: NetPerf (TCP/IP performance), LMBench (File System
Performance), NetPipe (MPI,
Compilers: GNU Fortran and C, Portland Group Parallel Fortran Compilers.
Applications: Intel BLAS, BLACS, PBLAS, ScalPACK (Linear Algebra Library for
Distributed Memory Machines large dense systems of Equations, Singular Value
Decompositions).
Test it with the BLACS. BLACS can find errors, High Performance Linpack
Benchmark. Present results here: Fast Ethernet: *****, Gigabit Ethernet 12 nodes +
worldnode (no computation), 1 CPU, 7.95 Gflops, Block size 32 or 64, 3 by 4 Processor
array, 6000 by 6000, 18 seconds, and Top 500 Supercomputers 134 GFlops.
||Ax – b||inf / (eps x ||A||inf x ||x||inf) = .0025836, 933 MHz 700 MFLOPS per CPU.
According to Linpack Performance, Pentium III, 256 KB Cache, 32 bit bus, n=1000 550
Mflops, 933 Mflops theoretical peak www.netlib.org., eps = 1.1x 10-16.
8. Conclusion: Clusters – Much needs to be done.
Nodes: Channel architecture (Infiniband), for now PCI-X bus (64 bit, 133MHz), power
consumption excessive (5 Watt Transmetta).
Interconnect: Commodity Open Standard (Infiniband); Direct memory access; Supports
(messaging, file transfers and management), Require software development, Automated
Diagnostic Tools!
OS: SSI (single process space, single point of control and management)
Single entry point, single job lauch point, etc.
General Characteristics:
Management - Monitoring and Diagnosis; Installation, Configuration and Update;
Middleware – Effect Open MP for Clusters.
Application Software and Libraries – Easier to use and Deploy.
Cluster Computing White Paper, Ed., Mark Barker, 2002, University of Portsmouth, UK,
http://www.dsg.port.ac.uk/mab/tfcc/WhitePaper/final-paper.pdf
8. JJ Database Cluster
4 Nodes: Microsoft SQL Server, Oracle 9i Server (Linux), MS Remote Access
server, and a Web server.
Web based information systems (ASP, SQL).
FBI – NIBRS - Flat file (1.5 gbytes of data per year), converted to relational
database (records ***), web interface, Student projects and Criminal Justice
Research
Continue to add each year’s data to the system, (1999 and 2000 available), Make
it available nationwide to Law Enforcement and Criminal Justice agencies, Use
Cluster technologies for high availability and fault tolerance. Four students work
on this.
9. Distributed Computing Lab (Linux Lab), Joint Venture with Science Dept.
Open Source Software Tools for networking (Ethereal – packet analyzer, NTOP –
network resource utilization, POSIX Sockets),
Lab for distributed applications (JAVA SKD Development, Perl Implementation
of Bellman Ford Algorithm)
Lab for Operating Systems Courses (View kernel software, Task Management
Utilities to understand how systems work, POSIX Threads, systems software)
Lab for Computer/Network Security (Intrusion Detection – SNORT, NetSaint)
Mathematical Software (Maple 8, Matlab)
Hard (Linux expertise) but Easy (Much is available for free).
10. NASA CIPA Network Infrastructure
PUBSSH – remote access to environment. Load balancing, SSH.
Management Utilities: Automated system backup, automated node generation,
problem reporting system.
Web sites for administration (PHP, MySQL): Literature Reference Database,
Expense Database (expenses and inventory), Information on NASA/CIPA project
and CIS program (web.math.jjay.cuny.edu)
11. Research, Partnerships and Interdisciplinary Work - Provide a Computational
Resource for the Researchers at the College and in the Law Enforcement and Criminal
Justice Communties. Facilitiate quantitative research at John Jay and growing demand
for Computer Expertise at the College.
Science Dept. - Molecular Modelling with the Science Dept. – area is toxicology.
Solving Schrodinger Equation (Gaussian, etc.)
Fire Science Instititue – Modelling smoke propogation in buildings – Reaction
Diffusion Convection Modells (Stiff ODEs and Large Linear Systems)
Queens College - Statistical Analysis of Data and Information Theoretic Models
(Singular Value Decomposition of Large Matrices) SVD and Principal
Component Analysis of high dimensional data, genome studies, image
identification, google page index. Data mining.
Mathematics Dept. – Large Scale Database Systems (NIBRS Database, Web
Interfaces, Database Cluster), Clusters for high availability and fault tolerance
Mathematics Dept. – Distributed Systems Security (NASA CIPA Network
Infrastructure, secure web services environment, Large Number Library for
cryptographic Studies – Miller Rabin Algorithm for Primality Testing.)
Mathematics Dept. – Parallel Programming and Parallel Algorithms, Real Schur
Form of a Matrix used in solving ODEs in flight control (Boeing Contract)
12. Educational Benefits (CIS major and new program in M.S. Forensic Computing)
Computing Facilites, Faculty Expertise and Curriculum Development
Curriculum: Added or revised courses and curricular in operating systems,
computer networking, database systems, and parallel computing. Also added
systems analysis, computer graphics, computer network security. Revise
mathematics requirements: discrete mathematics (graph theory, graph
partitioning, shortest path algorithms, cryptography), operations research (queuing
theory). Developed M.S. major secure operating systems, computer networking
and security of distributed systems
New Laboratories: Exposure to Linux Environment Tools for network and
distributed computing, access to Computational and Database Clusters, Remote
access from off campus using SSH.
Faculty Expertise: New faculty hired in computer networking and distributed
computing. Faculty training and Cluster Computing Colloquium Series.
Student Opportunites: 15 students work on the program. All are attending
graduate programs in CS and working in the field. Over 400 CIS majors have
benefited from NASA CIPA facilities (esp. Enterprise Level Database system).
13: Summary
Faculty and Staff Development:
Facilities Development:
Curriculum and Students:
Overall Effect on the Institution:
Download