What is a Supercomputer? Parallel Processing: Past, Present and Future Let us run a contest. Who gives the most updated explanation? Dr. G. Young CS 370 Dr. Young 1 Supercomputer CS 370 (AllWords.com) Dr. Young 2 Supercomputer A very fast, powerful mainframe computer, used in advanced military and scientific applications. CS 370 Dr. Young 3 (M-W.com, Merriam-Webster's Collegiate Dictionary) A large very fast mainframe used especially for scientific computations CS 370 Dr. Young 4 1 Supercomputer Supercomputer (Dictionary.com) A broad term for one of the fastest computers currently available. Such computers are typically used for number crunching including scientific simulations, (animated) graphics, analysis of geological data (e.g. in petrochemical prospecting), structural analysis, computational fluid dynamics, physics, chemistry, electronic design, nuclear energy research and meteorology. Perhaps the best known supercomputer manufacturer is Cray Research. A less serious definition, reported from about 1990 at The University Of New South Wales states that a supercomputer is any computer that can outperform IBM's current fastest, thus making it impossible for IBM to ever produce a supercomputer. A mainframe computer that is among the largest, fastest, or most powerful of those available at a given time. CS 370 Dr. Young 5 CS 370 (ComputerUser.com) Dr. Young 6 (PCWebopaedia.com) The fastest type of computer. Supercomputers are very expensive and are employed for specialized applications that require immense amounts of mathematical calculations. For example, weather forecasting requires a supercomputer. Other uses of supercomputers include animated graphics, fluid dynamic calculations, nuclear energy research, and petroleum exploration. The chief difference between a supercomputer and a mainframe is that a supercomputer channels all its power into executing a few programs as fast as possible, whereas a mainframe uses its power to execute many programs concurrently. A very fast and powerful computer, outperforming most mainframes, and used for intensive calculation, scientific simulations, animated graphics, and other work that requires sophisticated and highpowered computing. Cray Research and Intel are well-known producers of supercomputers. CS 370 Dr. Young Supercomputer Supercomputer (FOLDOC.doc.ic.ac.uk) 7 CS 370 Dr. Young 8 2 Supercomputer Supercomputer (PrenHall.com) The category that includes the largest and most powerful computers. CS 370 Dr. Young 9 Who is the winner? CS 370 AllWords.com M-W.com, Merriam-Webster's Collegiate Dictionary Dictionary.com FOLDOC.doc.ic.ac.uk ComputerUser.com PCWebopaedia.com PrenHall.com Geek.com Dr. Young Dr. Young 10 Contest Winner Supercomputer Contest CS 370 (Geek.com) This refers to a computer that is able to operate at a speed that places it at or near the top speed of currently produced computers. Most supercomputers cost millions of dollars, and the traditional model of using one large computer with proprietary hardware is being challenged by using a cluster of cheaper computers with more standard hardware. geek.com @ 2001 (Led by Chief Geek - Joel Evans ) Used to tell people all about Geek. For example, to check out if you’re Beginner Geek, Intermediate Geek, Advanced Geek or Super Geek 11 CS 370 Dr. Young 12 3 Winner Highlight (Geek.com@2001) This refers to a computer that is able to operate at a speed that places it at or near the top speed of currently produced computers. Most supercomputers cost millions of dollars, and the traditional model of using one large computer with proprietary hardware is being challenged by using a cluster of cheaper computers with more standard hardware. CS 370 Dr. Young 13 Topics of Discussion CS 370 Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A Dr. Young 14 Introduction CS 370 Dr. Young 15 CS 370 Why we need Supercomputers? Supercomputer Vendors Supercomputer Products Top Supercomputers How to evaluate the power of a supercomputer? Top 10 Supercomputers Theoretical Implication of Parallel machines Areas of Research in Supercomputing Supercomputing Journals Dr. Young 16 4 Supercomputer Vendors Why we need Supercomputers? Even though processor speed has been increased dramatically, but still not fast enough to our needs. Use multiple processors is the way to go. Areas need supercomputers: CS 370 Generally involves intensive computation Aerospace, Weather, Finance, Defense, Energy, Internet, Government, Chemistry, Geophysics, Telecom, Academic, Database, Mechanics, Automotive,Transportation, Electronics, Manufacturing, Fluid Dynamic, Petroleum Dr. Young 17 Supercomputer Products The The The The The The The The The The The The The The CS 370 Avalon A12 Cambridge Parallel Processing Gamma II Plus. Compaq AlphaServer SC Series. Fujitsu AP3000 Fujitsu VPP5000 series Hitachi SR8000 system HP Exemplar V2600 IBM RS/6000 SP NEC Cenju-4 NEC SX-5 SGI Origin 2000 series Sun E1000 Starfire Tera/Cray SV1 Tera/Cray T3E Dr. Young 18 How to evaluate the power of a supercomputer? Peak-performance 19 CS 370 Theoretical Run-time Benchmarks They use different technologies: Processor, OS, Connection structure, Proprietary hardware and Software CS 370 Dr. Young Linpack benchmark (Top500) Finding Largest Mersenne Prime Number Dr. Young 20 5 How to evaluate the power of a supercomputer? How to evaluate the power of a supercomputer? Benchmarks LINPACK Benchmark (introduced by Jack Dongarra) is to solve a dense system of linear equations. Rank Top500 supercomputers This performance does not reflect the overall performance of a given system, as no single number ever can. Since the problem is very regular, the performance achieved is quite high, and the performance numbers give a good correction of peak performance. CS 370 Dr. Young 21 How to evaluate the power of a supercomputer? Largest known Mersenne Prime Numbers* before 2000 Prime 2^21701-1 2^23209-1 2^44497-1 2^86243-1 2^132049-1 2^216091-1 2^756839-1 2^859433-1 2^1257787-1 2^1398269-1 2^2976221-1 2^3021377-1 2^6972593-1 Digits 6533 6987 13395 25962 39751 65050 227832 258716 378632 420921 895932 909526 2098960 # Year 1978 1979 1979 1982 1983 1985 1992 1994 1996 1997 1997 1998 1999 Dr. Young do not occur in a regular sequence no formula for generating them. Discovery of new primes requires randomly generating and testing millions of numbers. CS 370 Dr. Young 22 How to evaluate the power of a supercomputer? The current largest known Mersenne Prime Numbers (in the form of 2n – 1) can be found at Name Landon Curt Noll (with Laura Nickel, Ariel Glenn) Landon Curt Noll David Slowinski (with Harry Nelson) David Slowinski David Slowinski David Slowinski David Slowinski Paul Gage David Slowinski Paul Gage David Slowinski Paul Gage David Slowinski Paul Gage David Slowinski Paul Gage David Slowinski Paul Gage David Slowinski Paul Gage http://www.mersenne.org/ $$$ The Electronic Frontier Foundation is offering a $100,000 award for discovering the next largest (ten million digits) prime number * Mersenne Prime Numbers are Prime Numbers in the form of 2^<Integer> -1 # 67 pages long if printed on Newspaper CS 370 Prime Number Greek mathematician Euclid proved that there are an infinite number of prime numbers. 23 CS 370 Dr. Young 24 6 Top 10 Supercomputers How to evaluate the power of a supercomputer? Finding the Largest Mersenne Prime Number Slowinski: (SGI, Cray) "The prime finder program rigorously tests all elements of a system -- from the logic of the processors, to the memory, the compiler and the operating and multitasking systems. For high performance systems with multiple processors, this is an excellent test of the system's ability." CS 370 Dr. Young 25 Country USA Japan Spain India Germany France USA China Japan Germany Italy Switzerland CS 370 2012 (Nov) 5 1 1 2 1 2013 (June) 5 2 2 1 2007 8 2008 6 1 1 1 CS 370 Top 10 Supercomputers Country 2006 6 2 1 1 1 2 Dr. Young 26 Top Supercomputers 2013 (Nov) 5 1 1 2 Timeline Top #1 System http://www.top500.org/timeline/ http://www.top500.org/featured/to p-systems/ 1 Dr. Young 27 CS 370 Dr. Young 28 7 Areas of Research in P&D Computing Theoretical Implication of Parallel machines Parallel machine with infinite number of processors means we have a Non-deterministic Machine Statement like Guess({S1,S2}) can be added to our familiar deterministic program. Suddenly, those NP-hard problems (e.g. Traveling Salesman Problem) can be solved in Linear time CS 370 Dr. Young 29 Supercomputing Journals ACM J. of Experimental Algorithmics BIT Cluster Computing Computing and Visualization in Science IEEE Trans. on Computers IEEE Trans. on Parallel and Distributed Systems International J. of Computer Research International J. of Computers and Their Applications International J. of High Performance Computing and Networking International J. of High Speed Computing CS 370 Dr. Young 30 Topics of Discussion International J. of Parallel Programming J. of Interconnection Networks J. of Parallel and Distributed Computing J. of Performance Evaluation and Modeling of Computer Systems J. of Supercomputing J. of Visual Languages & Computing Parallel Algorithms and Applications Parallel Computing Parallel and Distributed Computing Practices Parallel Processing Letters SIAM J. of Computing SIAM J. of Scientific Computing Dr. Young CS 370 Parallel and Distributed Architectures Parallel and Distributed Algorithms Parallel Programming Languages Scientific Computing Signal & Image Processing Systems Special Purpose Processors VLSI and Configurable Logic Systems Performance Modeling/Evaluation Memory Hierarchy Issues in Parallel and Distributed Processing Programming Environments and Tools for Parallel and Distributed Platforms Compilers and Optimizations for Parallel and Distributed Processing Operating System and Runtime Support for Parallel and Distributed Computing Parallel and Distributed Network Protocols and Implementations Applications of Parallel and Distributed Computing Nontraditional Processor Technologies (Optical, Quantum, DNA, etc.) 31 CS 370 Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A Dr. Young 32 8 Computer Networks Network/Parallel Computer Architecture Homogeneity Computer Networks Same kind of computers Examples: a network of PCs, a network of Sun workstations, … Chain Heterogeneity Tree Dr. Young 33 Computer Networks CS 370 HP Exemplar V2600 Ring Mesh Cambridge Parallel Processing Gamma II Plus Torus Fujitsu AP3000 Tera/Cray Research Inc. T3E CS 370 Torus Hypercube SGI Origin series Dr. Young Star Cube Dr. Young Hypercube 34 Topics of Discussion Proprietary Parallel Computers Mesh A mixture of different computers Example: Internet CS 370 Ring 35 CS 370 Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A Dr. Young 36 9 Parallel and Distributed Processing Parallel and Distributed Processing Hardware Structure of Parallel Computers Hardware structure of Parallel Computers Architectural Classes Memory Systems Distributed Processing PVM & MPI Parallel Applications Task Assignment CS 370 Dr. Young 37 Classification is based on the way of manipulating of instruction and data streams 4 main architectural classes [Flynn, 1972] Multiple/Single Instruction (MI/SI) Multiple/Single Data (MD/SD) M.J. Flynn, Some computer organizations and their) effectiveness, IEEE Transactions on Computing, C-21, pp. 948-960, 1972. CS 370 Parallel and Distributed Processing Architectural Classes SISD machines: 38 Parallel and Distributed Processing Architectural Classes Dr. Young Accommodate one instruction stream that is executed serially. These are the conventional systems that contain one CPU MISD machines: Multiple instructions should act on a single stream of data No practical machine SIMD machines: Such systems often have thousands of processing units execute the same instruction on different data Hitachi S3600 MIMD machines: CS 370 Dr. Young 39 Execute instruction streams in parallel on different data. Run many sub-tasks in parallel Large variety of MIMD systems CS 370 Dr. Young 40 10 Parallel and Distributed Processing Parallel and Distributed Processing Memory Systems Shared memory systems: Have multiple CPUs all of which share the same address space. Distributed memory systems: Each CPU has its own associated memory. CS 370 Dr. Young 41 Parallel and Distributed Processing Distributed Processing DM-MIMD concept one step further Instead of many integrated processors in one or several boxes, workstations are connected by (Gigabit) Ethernet, FDDI, or otherwise and set to work concurrently on tasks in the same program. communication between processors is often slower in orders of magnitude. CS 370 Packages to realize Distributed Processing PVM (Parallel Virtual Machine) [Geist et al., 1994] MPI (Message Passing Interface) [Snir et al. and Gropp et al., 1998] A. Geist, A. Beguelin, J. Dongarra, R. Manchek, W. Jaing, and V. Sunderam, PVM: A Users' Guide and Tutorial for Networked Parallel Computing, MIT Press, Boston, 1994. M. Snir, S. Otto, S. Huss-Lederman, D. Walker, J. Dongarra, MPI: The Complete Reference Vol. 1, The MPI Core, MIT Press, Boston, 1998. W. Gropp, S. Huss-Ledermann, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, M. Snir, MPI: The Complete Reference, Vol. 2, The MPI Extensions, MIT Press, Boston, 1998. CS 370 Dr. Young 42 Parallel and Distributed Processing PVM & MPI Dr. Young 43 CS 370 PVM & MPI This style of programming, called the "message passing" model, has been widely accepted PVM and MPI have been adopted by virtually all major vendors of distributed-memory MIMD systems and even on shared-memory MIMD systems for compatibility reasons. Dr. Young 44 11 Parallel and Distributed Processing Parallel Applications Parallel Algorithms Fine grain/Coarse grain Parallel Programming Parallel and Distributed Processing Task Assignment Performance Measures ParBegin/ParEnd Overheads for P&D Processing PVM/MPI APIs CS 370 Dr. Young 45 CS 370 Task Assignment Throughput (Stone, 1977) E + ITI + ITC H. Stone, Multiprocessor Scheduling with the Aid of Network Flow Algorithms, IEEE Transactions on Software Engineering, Vol. 3, No. 1, pp. 83-85, 1977. Dr. Young Execution Time for tasks (E) Intra-task Interference cost (ITI) Inter-task Communication cost (ITC) Dr. Young 46 Topics of Discussion Parallel and Distributed Processing CS 370 Completion Time Throughput 47 CS 370 Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A Dr. Young 48 12 Computer Networks with Off-the-Shelf Hardware Powered by Parallel and Distributed Processing Tools Affordable supercomputer Computer networks with Off-the-Shelf hardware Powered by Parallel and Distributed Software Tools Advantages over Conventional Supercomputer System of Homogeneous Network A network of PC with SCSI Link SPVM System of Heterogeneous Network CS 370 Internet JMPI Dr. Young 49 Advantages over Conventional Supercomputer Decomposable Reusable Scale up and down easily Off-the-shelf Third World friendly Economical Reconfigurable Interconnection Topology Easy to upgrade – bus, processor, software Collaborative R&D Environment General-purpose Multi-usage CS 370 Dr. Young CS 370 50 Homogeneous Network 51 Dr. Young CS 370 A network of Pentium PCs Dr. Young 52 13 Heterogeneous Network Topics of Discussion CS 370 Dr. Young 53 Future Trend and Challenge CS 370 Dr. Young 54 Future Trend and Challenge PVM and MPI Community continues to grow Cheaper and faster processors and Interconnections More employment of Clusters of Workstations for High Performance Computing More freely available Software Tools Dr. Young CS 370 Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A 55 CS 370 Race between Proprietary supercomputer and the Cluster computers How fast can a supercomputer go? How the heterogeneous computing evolves? Will a cluster of computers over Internet be a faster computer in the world? Processing Power on Demand Service? Processor Sharing? Dr. Young 56 14 Topics of Discussion Conclusion Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A CS 370 Dr. Young 57 Conclusion CS 370 Dr. Young Practical Affordable Educational Research topics Knowledge Sharing through Major Forums (e.g. IEEE TFCC, Top500, TopClusters) One Key issue is how to compare/evaluate/rank their performances CS 370 58 Conclusion Such an Exciting Area of Research Powered by the state-of-art Parallel and Distributed Processing Tools, highspeed computer network, with powerful workstations, will become a very attractive, affordable, highly scalable and highly available solution for the High Performance Computing world. Dr. Young 59 CS 370 Build Your Own Supercomputer(Cluster) Heterogeneous System Employ new COTS (Com. Off-the-Shelf) Classification Benchmarks Performance Tracking Tools System Administration Software Dr. Young 60 15 Top 500 Supercomputers Update Trend of Cluster Computers Versus Proprietary Supercomputers. Q&A The TOP 500 Supercomputer List http://www.top500.org/ CS 370 Dr. Young 61 CS 370 Dr. Young 62 16