REFERENCES                                   Top 500 Supercomputer Site. Available from: http://www.top500.org. Powell, R. and Y. Deng, Analysis of power and Linpack efficiencies of the world's Top 500 supercomputers, Journal of Parallel Computing (submitted 7/2010) 2010. Green 500 Supercomputer Site. Available from: http://www.green500.org. Inoguchi, Y. and S. Horiguchi, Shifted Recursive Torus Interconnection for High Performance Computing, in Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97. 1997, IEEE Computer Society. Yang, Y., et al., Recursive Diagonal Torus: An Interconnection Network for Massively Parallel Computers. IEEE Transactions on Parallel and Distributed Systems, 2001. 12(7): p. 701-715. Zhang, P., R. Powell and Y. Deng, "Interlacing Bypass Rings to Torus Networks for More Efficient Networks," IEEE Transactions on Parallel and Distributed Systems, 29 Apr. 2010. IEEE computer Society Digital Library. IEEE Computer Society, <http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.89> Stojmenovic, I., Honeycomb Networks: Topological Properties and Communication Algorithms. IEEE Transactions on Parallel and Distributed Systems, 1997. 8(10): p. 1036-1042. Decayeux, C. and D. Seme, 3D Hexagonal Network: Modeling, Topological Properties, Addressing Scheme, and Optimal Routing Algorithm. IEEE Transactions on Parallel and Distributed Systems, 2005. 16(9): p. 875-884. Preparata, F.P. and J. Vuillemin, The cube-connected cycles: a versatile network for parallel computation. Communications of the ACM, 1981. 24(5): p. 300-309. Tzeng, N.F., A Cube-Connected Cycles Architecture with High Reliability and Improved Performance. IEEE Transactions on Computers, 1993. 42(2): p. 246-253. Myricom, Inc. Available from http://www.myri.com/. Barker, K.J., et al., Entering the petaflop era: the architecture and performance of Roadrunner, in Proceedings of the 2008 ACM/IEEE conference on Supercomputing. 2008, IEEE Press: Austin, Texas. p. 1-11. Blue Waters in University of Illinois at Urbana-Champaign. Available from: http://www.ncsa.illinois.edu/BlueWaters/. PRAC January 2010 webinar presentation. Available from: http://www.ncsa.illinois.edu/BlueWaters/pdfs/webinar_Prospective_PRAC.pdf. Advanced Micro Devices (AMD), Inc. Available from: http://www.ams.com/. Intel Inc.; Available from: http://www.intel.com/. Silicon Mechanics. Available from: http://www.siliconmechanics.com/. International Business Machines (IBM) Corp.; Available from: http://www.ibm.com/. May, B. Why Cloud Computing Makes Business Owners Nervous. April 17, 2010; Available from: http://www.briankeithmay.com/. Swansea University. Available from: http://www.swansea.ac.uk/. University Corporation of Atmospheric Research (UCAR). Available from: http://www2.ucar.edu/. Science and Technology Review Magazine, July/August 2010. Available from: https://str.llnl.gov/JulAug10/pdfs/7.10.pdf. Argonne National Laboratory INCITE Program. Available from: http://www.alcf.anl.gov/collaborations/index.php. University of Chicago. Available from: http://www.uchicago.edu/. Science and Technology Review Magazine, January/February 2008. Available from: https://www.llnl.gov/str/JanFeb08/pdfs/01.08.2.pdf. Bhanot, G., et al., Optimizing task layout on the Blue Gene/L supercomputer. IBM Journal of Research and Development, 2005. 49: p. 12. Agarwal, T., et al. Topology-aware task mapping for reducing communication contention on large parallel machines. In Proceedings 20th IEEE International Parallel and Distributed Processing Symposium. 2006. Chen, Y. and Y. Deng, Task mapping on supercomputers with cellular networks. Computer Physics Communications, 2008. 179(7): p. 479-485. Roig, C., A. Ripoll, and F. Guirado, A New Task Graph Model for Mapping Message Passing Applications. IEEE Transactions on Parallel and Distributed Systems, 2007. 18(12): p. 1740-1753. Orduna, J.M., F. Silla, and J. Duato, On the development of a communication-aware task mapping technique. Journal of Systems Architecture, 2004. 50(4): p. 207-220. Ordu, J.M., F. Silla, and J. Duato. A New Task Mapping Technique for Communication-Aware Scheduling Strategies. International Conference on Parallel Processing Workshops, 2001: p. 349 - 354 Bokhari, S.H., On the Mapping Problem. IEEE Transactions on Computers, 1981. 30(3): p. 207-214. Yu, H., I.-H. Chung, and J. Moreira, Topology Mapping for Blue Gene/L Supercomputer. ACM/IEEE SC 2006 Conference (SC'06), 2006: p. 52. Smith, B.E. and B. Bode, Performance Effects of Node Mappings on the IBM BlueGene/L Machine, in Euro-Par 2005 Parallel Processing. 2005. p. 1005-1013.  Bailey, D.H., et al., The NAS Parallel Benchmarks. International Journal of High Performance Computing Applications, 1991. 5(3): p. 63-73.  Pješivac-Grbović, J., et al., Performance analysis of MPI collective operations. Cluster Computing, 2007. 10(2): p. 127-143.  Rabenseifner, R., Automatic Profiling of MPI Applications with Hardware Performance Counters, in Recent Advances in Parallel Virtual Machine and Message Passing Interface. 1999. p. 22.  Eleftheriou, M., et al., Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer, in Euro-Par 2005 Parallel Processing. 2005. p. 795-803.  Fagg, G.E., S.S. Vadhiyar, and J. Dongarra, ACCT: Automatic Collective Communications Tuning, in Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface. 2000, Springer-Verlag.  Faraj, A., et al., MPI Collective Communications on the Blue Gene/P Supercomputer: Algorithms and Optimizations. 2009 17th IEEE Symposium on High Performance Interconnects, 2009: p. 63-72.  Gannon, D.B. and J.V. Rosendale, On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms. IEEE Transactions on Computers, 1984. 33(12): p. 1180-1194.  Dongarra, J.J., P. Luszczek, and A. Petitet, The LINPACK Benchmark: past, present and future. Concurrency and Computation: Practice and Experience, 2003. 15(9): p. 803-820.  Mitra, P., et al. Fast Collective Communication Libraries, Please. in Proceedings of the Intel Supercomputing Users' Group Meeting. 1995: University of Texas at Austin.  Huse, L.P., Collective Communication on Dedicated Clusters of Workstations, in Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface. 1999, Springer-Verlag.  Vadhiyar, S.S., G.E. Fagg, and J. Dongarra, Automatically tuned collective communications, in Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM). 2000, IEEE Computer Society: Dallas, Texas, United States.  Thakur, R. and W. Gropp, Improving the Performance of Collective Operations in MPICH, in Recent Advances in Parallel Virtual Machine and Message Passing Interface. 2003. p. 257-267.  Kielmann, T., H.E. Bal, and S. Gorlatch, Bandwidth-Efficient Collective Communication for Clustered Wide Area Systems, in Proceedings of the 14th International Symposium on Parallel and Distributed Processing. 2000, IEEE Computer Society; p. 492.  Faraj, A., X. Yuan, and D. Lowenthal, STAR-MPI: self tuned adaptive routines for MPI collective operations, in Proceedings of the 20th annual international conference on Supercomputing. 2006, ACM: Cairns, Queensland, Australia.  Chan, E., et al., Collective communication on architectures that support simultaneous communication over multiple links, in Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming. 2006, ACM: New York, USA.  Yang, Y. and J. Wang, Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori. IEEE Transactions on Parallel and Distributed Systems, 2002. 13(2): p. 128-141.  Kumar, S., et al., Optimization of All-to-All Communication on the Blue Gene/L Supercomputer. 2008 37th International Conference on Parallel Processing, 2008: p. 320-329.