[1] Accelerated Strategic Computing (ASCI) Initiative. A report by US Department of Energy, Lawrence Livermore ,Los Alamos, Sandia National Laboratory,1996 [2] Interconnection Networks, J. Duato, S. Yalamanchili, L. Ni, Morgan Kaufman, 2002 [3] Boden, NJ et al, "Myrinet: A Gigabit-per-Second Local Area Network", IEEE Micro, Feb. 1995 [4] Stenstrom, P., Joe, T., and Gupta, A. Comparative performance evaluation of cache-coherent numa and coma architectures. In Proceedings of the 19th International Symposium on Computer Architecture (1992), IEEE Computer Society, IEEE Press, pp. 80--91 [5] Adve S, Hill M, Vernon M. Comparison of Hardware and Software Cache Coherence Schemes. Proc. of the 18th Annual International Symposium on Computer Architecture, 1991, (Jun.): 298~308 [8] Hwang K. Advanced computer architecture: parallelism, scalability, and programmability. McGraw-Hill, 1993 [9] Silicon Graphics, Origin 200 and Origin 2000, Technical Report, 1996 [10] Stephen R. Wheat Timothy G. Mattson,David Scott. A TeraFLOPS Supercomputer in 1996: The ASCI TFLOP System. In Proceedingsof the 1996 International Parallel Processing Symposium, 1996 [11] Tom Anderson, David Culler, Dave Patterson, and the NOW Team. A Case for NOW (Networks of Workstations). IEEE Micro 15, 1, February 1995, pp. 54-64 [12] Brent R P.The parallel Evaluation of General Arithmetic Expressions.Journal of the ACM, 1972, 21(2): 201-206 [13] Amdahl G.Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities。AFIPS Conf.Proc.30,April,Thompson Books,Washington D.C,1967, 483-486 [14] Gustafson JL.Reevaluating Amdahl’s Law.Comm.of ACM, 31(5):532-533, 1988 [15] Sun X H, Ni L M.Another View of Parallel Speed.Proc.Supercomputing’90, 324-333, 1990 [16] Kumar V, Rao V N. Parallel Depth-Firsh Search, PartⅡ: Analysis. Int’l J.of Parallel Programming, 16(6): 501-519, 1987 [17] Sun X H, Rover D T.Scalability of Parallel Algorithm-Machine Combinations.IEEE Trans.on Parallel and Distributed, Systems, 5(6): 519-613, 1994 [18] Zhang X D, Yan Y, He K Q.Latency Metric:An Experimental Method for Measuring and Evaluating Parallel Program and Architecture Scalability.J.of Parallel and Distributed Computing, 22:392-410, 1994 [19] S. Fortune and J. Wyllie. Parallelism in random access machines. Proc. 10th Annual ACM Symp. on Theory of COmputing, San Diego, California, 1978, 114-118 [20] 陈国良,并行算法的可扩放性分析, 小型微型计算机系统,Vol.16,No.2,pp.10-16, 1995 [21] Ben HH Juurlink, Harry AG Wijshoff: A Quantitative Comparison of Parallel Computation Models. ACM Trans. Comput. Syst. 16(3): 271-318 (1998) [22] Mark Goudreau, Kevin Lang, Satish Rao, Torsten Suel, Thanasis Tsantilas: Towards Efficiency and Portability: Programming with the BSP Model. SPAA 1996: 1-12 [23] T. Cheatham, A. Fahmy, D. C. Stefanescu, and L. G. Valiant. Bulk synchronous parallel computing - A paradigm for transportable software. In Proc. of the 28th Hawaii International Conference on System Sciences. Vol. 2: SoftwareTechnology, pages 268--275, 1995. [24] Chlebus B, Vrto I. , Parallel Quick Sort. Journal of Parallel and Distributed Computing,1991 , 11:332-337 [25] ekel E, Nassimi D, Sahni S. Parallel Matrix and Graph Algorithms. SIAM j. on Computing,1981,10:657---673 [26] Galil Z. Optimal Parallel Algorithms for String Matching. Info. and Control, 1985,67(1---3) 144--157 [27] Hoare C A R. Quicksort. Computer Journal,1962,5:10-15 [28] JaJa J. An Introduction to Parallel Algorithms. Addison-Wesley Pub. Company, 1992 [29] Knuth D E,Morris I H, Pratt V B. Fast Pattern Matching in String. SIAM J. Computing. 1997,6(2):189-195 [30] Sedgewick R. Implementing Quicksort Programs. Communication of the ACM, 1978, 21 (10):847--857 [31] Singh V, Kumar V, Agha G et al. Efficient Algorithms for Parallel Sorting on Mesh Multi-computers. International Jounal of Parallel Programniug,1991,20(2):95---131 [32] Vishkin U. Optimal Parallel Matching in Strings. Info. and Control, 1985,67(1-3) :91-113 [33] Wagar B A. Hyperquicksort: A Fast Sorting Algorithm for Hypercubes. Pros. of the Second Conference an Hypercube Multiprocessors, 1987,292-299 [34] E. Horowitz and A. Zorat,”Divide-and-conquer for parallel processing," IEEE Trans. Comput., vol. 32, pp. 582--585, June 1983. [35] Daniel S. Hirschberg: Parallel Algorithms for the Transitive Closure and the Connected Component Problems STOC 1976: 55-57. [36] HT Kung, "Why systolic architectures ?", IEEE Computer 15, 1 (1982), 37-46. [37] Richard Cole and Uzi Vishkin. Deterministic coin tossing with applications to optimal parallel list ranking. Information and Control ,70(1):32-53, July 1986. [38] AV Goldberg, SA Plotkin, and GE Shannon. Parallel symmetry-breaking in sparse graphs. SIAM J. Desc. Math., 1:434–446, 1989. [39] JaJa J. An introduction to parallel algorithm. Addison-Wesley Pub. Company, 1992 [40] Benjamin W. Wah, Guo-Jie Li, Chee Fen Yu: Multiprocessing of Combinatorial Search Problems. IEEE Computer 18(6): 93-108 (1985) [41] Parnas and Paul C Clements A rational design process: how and why to fake it IEEE Transactions on Software Engineering, SE-12(2), pp251-257, Feb 1986. [42] G. Fox, et al Solving Problems on Concurrent Processors, Prentice Hall 1988. [43] GC Fox, RD Williams, and PC Messina. Parallel Computing Works! Morgan Kauffman Publishers, Inc., 1994. [44] Nichol, Salz "An Analysis of Scatter Decomposition", IEEE Transactions on Computers, November 1990, pages 1153-1161. [45] Foster I. Designing and building parallel programs: concepts and tools for parallel software engineering, Addison-Wesley, 1995 [46] Feng T Y. A Survey of Interconnection Networks. IEEE Computer, 1981,14 (12):12- 27 [47] Hwang K. Advanced Computer Architecture. Parallelism, Scalability, Programmability. Mc-Graw-Hill. Inc. .1993 [48] Kumar V, Gupta A, Gupta A et al. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin/Cummings Publishing Company, Inc. , 1994 [49] Berntsen J. Communication Effcient Matrix Multiplication on Hypercubes. Parallel Computing,1989,12:335---342 [50] Bertsekas D P and Tsitsilklis J N. Parallel and Distributed Computation, Numerical Methods. Prentice-Hall, 1989 [51] Cannon L E. A Cellular Computer to Implement the Kalman Filter Algorithm: Ph. D. thesis.Montana State Univ. ,1969 [52] Fox G C,Otto S W, Hey A J G. Matrix Algorithms on Hypercube I: Matrix Multiplication. Parallel Computing, 1987,4:17--31 [53] Golub G H, Loan C V. Matrix Computations. (2nd Ed). The Johns Hopkins Univ. Press.1989 [54] Gupta A and Kumar V. The Scalability of Matrix Multiplication Algorithms on Parallel Computers. Proc. lnt' l 93 Conference on Parallel Processing, 1993 , Ⅲ~115, Ⅲ ~119 [55] Ho C T, Johnssson S L, Edelman A. Matrix Multiplication on Hypercubes using Full Bandwidth and Constant Storage. Proc. Int'l 91 Conference on Parallel Processing, 1997,447---451 [56] Kumar V, Gupta A, Rao V. Scalable Load Balancing Techniques far Parallel Computers. J. Parallel & Distributed Ccanputing,1994,22(1) :60---79 [57] Kumar V, Gupta A, Gupta A et al. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin/Cummings Publishing Company, Inc. , 1994 [58] Don Heller A survey of parallel algorithms in numerical linear algebra, SIAM Rev.20 (1978), pp. 740—777 [59] JM Ortega, Introduction to Parallel and Vector Solution of Linear Systems, Plenum Press, New York, 1989. [60] KA Gallivan, RJ Plemmons, and AH Sameh, Parallel algorithms for dense linear algebra computations, SIAM Rev. 32 (1990), no. 1, 54–135. [61] MT.Heath, E.Ng and BW.Peyton, Parallel algorithms for sparse linear systems, SIAM Review, Vol. 33, 1991, pp. 420-460 [62] JM Ortega and RG Voigt. Solution of partial differential equations on vector and parallel computers. SIAM Review,27:149-240, 1985. [63] 并行计算方法:《数值并行计算原理与方法》张宝琳等,国防工业出版社,1999 [64] JW Cooley and JW Tukey, “An algorithm for the machine caculation of complex fourier series,” Math. Comp., vol. 19, pp. 297–301, April 1965. [65] Nussbaumer, H. J. Fast Fourier Transform and Convolution Algorithms, 2nd ed. New York: Springer-Verlag, 1982. [66] Paul N. Swarztrauber. Multiprocessor FFTs. Parallel Computing, 5:197-210, 1987. [67] Averbuch, E. Gabber, B. Gordissky and Y. Medan, "A Parallel FFT on a MIMD Machine," Parallel Computing, vol. 15, 1990, pp. 61-74 [68] Blumrich M A, Dubnicki C, Felten E W et al. Protected User-Level DMA for the SHRIMP Network Interface, Proc.2th Int' l Symp. on High-Performance Computer Architecture, 1996 [69] Comer D E. Internetworking with TCP/IP. 3nd Ed. Prentice-Hall,1995 [70] Lauria M, Chien A. MPI-FM: High Performance MPI on Workstation Clusters. J. of Parallel and Distributed Computing, 1997,40(l):4- 18 [71] Mellor-Crummey J M, Scott M L. Algorithms for Scalable Synchronization on Shared Memory Multiprocessors. ACM Trans. Computer Systems,1991, 9{ 1} :21-b5 [72] Messina P, Sterling T (Eels) . System Software and Tools for High Performance Computing Environment. SIAM, 1993 [73] Pancake C. M. Software Support for Parallel Computing: Where are We Headed? Comm. of the AGM, 1991.34(11) :53 --G4 [74] Pfister G F. In Search of Clusters. Prentice-Hall PTR, 1995 [75] IEEE, POSIX P1003. 4a: Threads Extension for Portable Operating Systems, IEEE, 1994 [76] Snir M et al. The Communication Software and Parallel Environment of the IBM SP2. IBM Systems Journal, 1995 , 34 (2).205 – 221 [77] Stallings W. Operating Systems (2nd Ed). Prentice-Ha11,1995 [78] Agha G, Concurrent Object-Oriented Programming. Comm. of the ACM, 1990, 33 (9). 125 141 [79] Allan S J, Oldehoeft R, HEP SISAL: Parallel Functional Programming. Kowalik (Ed). Parallel MIMD Computation: HEP Supercomputers and Applications. MIT Press, 1985 [80] ANSI Technical Committee X3H5. Parallel Processing Model for High-level Programming Languages, 1993 [81] Bal H E, Steiner J G, Tanenbaum A S. Programming Languages for Distributed Computing Systems. ACM Computing Surveys, 1989,21(3).261~322 [82] OpenMP Standards Board. OpenMP: A Proposed Industry Standard AN far Shared Memory Programming, Oct. 1997 [83] OpenMP Standards Board. OpenMP Fortran Application Program Interface Version I. 0, Oct. 1997, [84] IEEE, POSIX P1003. 4a: Threads Extension for Portable Operating Systems, IEEE, 1994 [85] Silicon Graphics, IRIS Power C User's Guide. Silicon Graphics Computer Systems, 1989 [86] Wilson G V, Lu P (Eds). Parallel Programming Using C+ + . MIT Press, 1996 [87] Xu Z, Hwang K. Coherent Parallel Programming in C//. Proc. of Int' l Conf. on Advances in Parallel and Distributed Computing, IEEE Computer Society Press, Mar. 1997 ,116---122 [88] Adams J et al. The Fortran 90 Handbook. McGraw-Hill,1992 [89] Adams J et al. The Fortran 95 Handbook. MIT Press, 1997 [90] Chapman B et al. . Extending HPF for Advanced Data-Parallel Applications. IEEE Parallel & Distributed Technology, 1994,2(3):15-27 [91] Fox G et al. FORTRAN D Language Specification. Rice Univ. , 1992. [92] Geist A et al. PVM: Parallel Virtual Machine-A User's Guide and Tutorial for Networked Parallel Computing. MIT Press, 1994 [93] Hillis W D, Steele G L. Data Parallel Algorithms. Comm. ACM, 1986,29(12).1170-1183 [94] Hwang K, Xu Z Scalable Parallel Computing. Technology, Architecture Programming. WCB/McGraw-Hill Companies,1998 [95] Koelbel C et al. The High Performance Fortran Handbook. MIT Press, 1994 [96] Mehrotra P et al. High Performance Fortran: History, Status and Future. Parallel Computing, 1998,24:325---354 [97] MPI Forum, MPI: A Message Passing interface, Proceedings of Supercomputing' 93. IEEE Computer Society,1993,878-883 [98] Zima H et al. Vienna FORTRAN-A Language Specification. ICASE,1992. Version 1.1 [99] Alliant. Alliant Product Summary. Alliant Computer Systems Corporation, 1989 [100] Babaoglu O et al. Paralex: An Environment for Parallel Programming in Distributed Systems. Proc. of ACM Int' l Conf. on Supercomputing,1992 [101] Banerjee U. Dependence Analysis for Supercomputing. Boston: Kluwer Academic Press, 1988 [102] Beguelin A et al- Visualization and Debugging in a Heterogeneous Environment. IEEE Computers, 1993,26(6) [103] Boudier G et al. An Overview of PCTE+ . SIGPLAN,1982,2(24) :248---257 [104] Brown J S. Debuggers for High Performance Computers, Proc. of the Supercomputing' 93,1993 [105] Cheng Y. A Survey of Parallel Programming Languages and Tools. Technical Report RND-93[106] Gosling J. Unix Emacs. Carnegie-Mellon Computer Science Dept,, 1982 [107] Gupta A and Kumar V. The Scalability of Matrix Multiplication Algorithms on Parallel Computers. Proc. lnt' l 93 Conference on Parallel Processing, 1993 , Ⅲ~115, Ⅲ ~119 [108] Hwang K. Advanced Computer Architecture. Parallelism, Scalability, Programmability. Mc-Graw-Hill. Inc. .1993 [109] Kacsuk P et al. A Graphical Development and Debugging Environment for Parallel Programs. Parallel Computing, 1997,22 :1747---1770 [110] Luque E et al. Overview and New Trend on PSEE. IEEE software ,1992 [111] Newton P, Browne J C. The CODE 2. 0 Graphical Parallel Programming Language. Proc. of ACM Int’ l Conf on Supercomputing,1992 [112] Reiss S P. Software Tools aril Environments. ACM Computing Surveys,1996,28(1):281---284 [113] Ries B. The Paragon Perforn3anoe Monitoring Environment. Pros. of the Supercomputing' 93,1993 [114] Ross D T. Applications and Extensions of SALT. IEEE Cornputer,1985,18(4) :25---35 [115] Rumbaugh J et al. Object-Oriented Modeling and Design. Prentice-Hall,1991 [116] Scheidler C,Schafers L. TRAPPER: A Graphical Parallel Programming Environment for Industrial High Performance Applications. Proc. of PARLE' 93: Parallel Architectures and Languages, 1993 [117] Wolfe M. High-Performance Compilers for Parallel Computing. Addison –Wesley,pub. Company,1996 [118] NASA Ames Research Center, 1993 [119] Cheng D, Hood R. A Portable Debugger for Parallel and Distributed Programs, Proc. of the Supercomputing' 94.1994 [120] Banerjee U. Dependence Analysis. Boston: Kluwer Academic Publishers, 1996 [121] Blume W, Eigenmann R. Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs . IEEE Trans. on Parallel and Distributed Systerns,1992, 3(6) :643---656 [122] Blume W et al. Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing. IEEE Parallel aral Distributed Technology, 1994,2(3):37-47 [123] Blume W et al. Parallel Programming with Polaris. IEEE ccmputer,1996,29t12):78---82