Computing in 2D Space (course for 2014/2015) March 2014 (a) Title: Computing in 2D Space (b) Course Aims: This course introduces the students to methods and techniques for computing in space as opposed to sequential computing in time. We are exploring computation on a two-dimensional surface of a chip, rather than the conventional onedimensional recipe-style computation of sequential computing systems. In exploring spatial computing we will look at implications on the design of application software, complexity of algorithms, system-level bottleneck analysis, data choreography, as well as arithmetic level optimizations. Students are not expected to be familiar with spatial programming, however experience in programming, logic design, basic complexity analysis and knowledge of algorithms and data structures are expected. The course emphasis is on vertical optimization of computation from algorithm development all the way down to computer arithmetic and data encoding. The course uses a hands on, problem-solving approach with students being encouraged to understand and use the various opportunities of computing in space while solving well-specified computing challenges both top down and bottom up. Students will explore several fundamental algorithms and data structures in computer science such as polynomial approximation and stencil computation, and will map them to the spatial context. In addition, the students will get introduced to energy and power consumption improvements associated with spatial computing. The course will be based on the industry standard Open Spatial Computing Language, OpenSPL. (c) Learning Outcomes: After completing this course students will be able to: Learn to split applications into controlflow and dataflow and use the available instantiation of spatial computing to implement high-performance parallel algorithm in space; Use powerful analytical model to predict improvements in performance, computational density and power consumption; Program and debug real spatial computing systems; Reason about the spatial complexity of algorithms, of arithmetic operations and of data movement and make trade offs to improve system performance; Optimize the available arithmetic area and bandwidth in order to gain maximal performance out of a specific spatial computer implementation; Use the knowledge and the experience gained to develop succinct and efficient solutions to unseen, but well-specified, problems of small to large scale. (d) Syllabus: programming models, execution models, and memory models; highperformance and scientific programs; system architecture, networking; spatial arithmetic and number representation, spatial data choreography, communication, locality management, custom encoding, globally optimal scheduling and custom patterns; OpenSPL; performance estimation, debug, minimum frequency computing, total cost of ownership, and benchmarking. (e) Prerequisites: Java, Computer Architecture I, Logic Design, Algorithms and Data Structures (f) Reading List: 1. Validity of the single processor approach to achieving large scale computing capabilities, Gene M Amdahl, AFIPS spring joint computer conference, IBM Sunnyvale, California, 1967. 2. Some Computer Organizations and Their Effectiveness, Michael J Flynn, IEEE Trans. Computers, C–21 (9): 948–960, Sept 1972. 3. Computer Architecture: Pipelined And Parallel Processor Design, (Chapters 1-7) Michael J Flynn, May 1995. 4. iWarp: An integrated solution to high-speed parallel computing, S. Borkar, R. Cohn, G. Cox, S. Gleason, T. Gross, H. T. Kung, M. Lam, B. Moore, C. Peterson, J. Pieper, L. Rankin, P. S. Tseng, J. Sutton, J. Urbanski, and J. Webb. In Proceedings of IEEE/ACM SC '88, pages 330-339, Orlando, Florida, November 1988. 5. Decoupled access/execute computer architectures, J. E. Smith, Computer Systems, ACM Transactions on; Volume 2, Issue 4, pp 289-308, November 1984. 6. OpenSPL Specification, v1.0, http://www.openspl.org 7. Sparse Coefficient polynomial approximations for hardware implementations, N. Brisebarre, J. M. Muller and A. Tisserand, In Proc. of 38th Asilomar Conference on Signals, Systems and Computers, pp. 532-535, California, USA, 2004. 8. Moving from Petaflops to Petadata, Communications of the ACM, Vol. 56 No. 5, May 2013. 9. Finding the Right Level of Abstraction for Minimizing Operational Expenditure, Workshop on High Performance Computational Finance at SC11, November 2011. 10. Rapid Computation of Value and Risk for Derivatives Portfolios, Concurrency and Computation: Practice and Experience, Special Issue Paper, July 2011. 11. Beyond Traditional Microprocessors for Geoscience High-Performance Computing Applications, IEEE Micro, vol. 31, no. 2, March/April 2011. (g) Teaching methods/ Assessments: Weekly lectures, tutorials, small-group tutorials, timetabled laboratory sessions with a small individual project. There is an unassessed practice test (formative assessment only), a 'driving test' (20%) and a final 'main test' (80%), all of which are taken in the laboratory under exam conditions using the Lexis test administration system. Students can also undertake independent self-assessment through unassessed exercises, for which model answers are available.