Top 500 Supercomputer Site. Available from:
Powell, R. and Y. Deng, Analysis of power and Linpack efficiencies of the world's Top 500 supercomputers, Journal of Parallel
Computing (submitted 7/2010) 2010.
Green 500 Supercomputer Site. Available from:
Inoguchi, Y. and S. Horiguchi, Shifted Recursive Torus Interconnection for High Performance Computing, in Proceedings of
the High-Performance Computing on the Information Superhighway, HPC-Asia '97. 1997, IEEE Computer Society.
Yang, Y., et al., Recursive Diagonal Torus: An Interconnection Network for Massively Parallel Computers. IEEE Transactions
on Parallel and Distributed Systems, 2001. 12(7): p. 701-715.
Zhang, P., R. Powell and Y. Deng, "Interlacing Bypass Rings to Torus Networks for More Efficient Networks," IEEE
Transactions on Parallel and Distributed Systems, 29 Apr. 2010. IEEE computer Society Digital Library. IEEE Computer
Society, <>
Stojmenovic, I., Honeycomb Networks: Topological Properties and Communication Algorithms. IEEE Transactions on
Parallel and Distributed Systems, 1997. 8(10): p. 1036-1042.
Decayeux, C. and D. Seme, 3D Hexagonal Network: Modeling, Topological Properties, Addressing Scheme, and Optimal
Routing Algorithm. IEEE Transactions on Parallel and Distributed Systems, 2005. 16(9): p. 875-884.
Preparata, F.P. and J. Vuillemin, The cube-connected cycles: a versatile network for parallel computation. Communications of
the ACM, 1981. 24(5): p. 300-309.
Tzeng, N.F., A Cube-Connected Cycles Architecture with High Reliability and Improved Performance. IEEE Transactions on
Computers, 1993. 42(2): p. 246-253.
Myricom, Inc. Available from
Barker, K.J., et al., Entering the petaflop era: the architecture and performance of Roadrunner, in Proceedings of the 2008
ACM/IEEE conference on Supercomputing. 2008, IEEE Press: Austin, Texas. p. 1-11.
Blue Waters in University of Illinois at Urbana-Champaign. Available from:
PRAC January 2010 webinar presentation. Available from:
Advanced Micro Devices (AMD), Inc. Available from:
Intel Inc.; Available from:
Silicon Mechanics. Available from:
International Business Machines (IBM) Corp.; Available from:
May, B. Why Cloud Computing Makes Business Owners Nervous. April 17, 2010; Available from:
Swansea University. Available from:
University Corporation of Atmospheric Research (UCAR). Available from:
Science and Technology Review Magazine, July/August 2010. Available from:
Argonne National Laboratory INCITE Program. Available from:
University of Chicago. Available from:
Science and Technology Review Magazine, January/February 2008. Available from:
Bhanot, G., et al., Optimizing task layout on the Blue Gene/L supercomputer. IBM Journal of Research and Development,
2005. 49: p. 12.
Agarwal, T., et al. Topology-aware task mapping for reducing communication contention on large parallel machines. In
Proceedings 20th IEEE International Parallel and Distributed Processing Symposium. 2006.
Chen, Y. and Y. Deng, Task mapping on supercomputers with cellular networks. Computer Physics Communications, 2008.
179(7): p. 479-485.
Roig, C., A. Ripoll, and F. Guirado, A New Task Graph Model for Mapping Message Passing Applications. IEEE Transactions
on Parallel and Distributed Systems, 2007. 18(12): p. 1740-1753.
Orduna, J.M., F. Silla, and J. Duato, On the development of a communication-aware task mapping technique. Journal of
Systems Architecture, 2004. 50(4): p. 207-220.
Ordu, J.M., F. Silla, and J. Duato. A New Task Mapping Technique for Communication-Aware Scheduling Strategies.
International Conference on Parallel Processing Workshops, 2001: p. 349 - 354
Bokhari, S.H., On the Mapping Problem. IEEE Transactions on Computers, 1981. 30(3): p. 207-214.
Yu, H., I.-H. Chung, and J. Moreira, Topology Mapping for Blue Gene/L Supercomputer. ACM/IEEE SC 2006 Conference
(SC'06), 2006: p. 52.
Smith, B.E. and B. Bode, Performance Effects of Node Mappings on the IBM BlueGene/L Machine, in Euro-Par 2005 Parallel
Processing. 2005. p. 1005-1013.
[35] Bailey, D.H., et al., The NAS Parallel Benchmarks. International Journal of High Performance Computing Applications, 1991.
5(3): p. 63-73.
[36] Pješivac-Grbović, J., et al., Performance analysis of MPI collective operations. Cluster Computing, 2007. 10(2): p. 127-143.
[37] Rabenseifner, R., Automatic Profiling of MPI Applications with Hardware Performance Counters, in Recent Advances in
Parallel Virtual Machine and Message Passing Interface. 1999. p. 22.
[38] Eleftheriou, M., et al., Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer, in Euro-Par 2005
Parallel Processing. 2005. p. 795-803.
[39] Fagg, G.E., S.S. Vadhiyar, and J. Dongarra, ACCT: Automatic Collective Communications Tuning, in Proceedings of the 7th
European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface.
2000, Springer-Verlag.
[40] Faraj, A., et al., MPI Collective Communications on the Blue Gene/P Supercomputer: Algorithms and Optimizations. 2009
17th IEEE Symposium on High Performance Interconnects, 2009: p. 63-72.
[41] Gannon, D.B. and J.V. Rosendale, On the Impact of Communication Complexity on the Design of Parallel Numerical
Algorithms. IEEE Transactions on Computers, 1984. 33(12): p. 1180-1194.
[42] Dongarra, J.J., P. Luszczek, and A. Petitet, The LINPACK Benchmark: past, present and future. Concurrency and
Computation: Practice and Experience, 2003. 15(9): p. 803-820.
[43] Mitra, P., et al. Fast Collective Communication Libraries, Please. in Proceedings of the Intel Supercomputing Users' Group
Meeting. 1995: University of Texas at Austin.
[44] Huse, L.P., Collective Communication on Dedicated Clusters of Workstations, in Proceedings of the 6th European PVM/MPI
Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface. 1999, Springer-Verlag.
[45] Vadhiyar, S.S., G.E. Fagg, and J. Dongarra, Automatically tuned collective communications, in Proceedings of the 2000
ACM/IEEE conference on Supercomputing (CDROM). 2000, IEEE Computer Society: Dallas, Texas, United States.
[46] Thakur, R. and W. Gropp, Improving the Performance of Collective Operations in MPICH, in Recent Advances in Parallel
Virtual Machine and Message Passing Interface. 2003. p. 257-267.
[47] Kielmann, T., H.E. Bal, and S. Gorlatch, Bandwidth-Efficient Collective Communication for Clustered Wide Area Systems, in
Proceedings of the 14th International Symposium on Parallel and Distributed Processing. 2000, IEEE Computer Society; p. 492.
[48] Faraj, A., X. Yuan, and D. Lowenthal, STAR-MPI: self tuned adaptive routines for MPI collective operations, in Proceedings of
the 20th annual international conference on Supercomputing. 2006, ACM: Cairns, Queensland, Australia.
[49] Chan, E., et al., Collective communication on architectures that support simultaneous communication over multiple links, in
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming. 2006, ACM:
New York, USA.
[50] Yang, Y. and J. Wang, Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori. IEEE Transactions
on Parallel and Distributed Systems, 2002. 13(2): p. 128-141.
[51] Kumar, S., et al., Optimization of All-to-All Communication on the Blue Gene/L Supercomputer. 2008 37th International
Conference on Parallel Processing, 2008: p. 320-329.