Educational Outcomes of Par Lab

advertisement
Chapter 18
Educational Outcomes of Par Lab
James Demmel, Kurt Keutzer, and David Patterson
1
Education
Other chapters in this book describe how the switch to parallelism leads to large changes in the entire hardware/software stack, from applications to algorithms to software to hardware. But an even bigger challenge is the change
needed at a level above this stack: educating users. Users, by which we also mean programmers, need to learn to think
parallel, and to use these new tools productively in order to create efficient code. And of course users include people at
all ranges of skills, from domain experts whose expertise lies in an application domain, to computer scientists familiar
with the technical challenges of parallel computing, to the much larger number of programmers with yet other levels
of training (“Einstein, Elvis, and Mort” have also been used to describe the range of users). Given the ubiquity of
parallelism, this means educating and training an enormous number of people now and in the future, a task to which
companies, universities, and other educational organizations will all need to contribute.
When we began the Par Lab, this need to add more parallelism to the curriculum was apparent to us as faculty. We
now describe some of the ways in which we not only improved our own curriculum, but also made our courses widely
available on-line and through local short courses.
A common theme in all these courses is the use of motifs and patterns, as discussed in Chapters 1 and 8. Computational patterns (and eventually structural patterns) turned out not just to be the the basis of the Par Lab research
agenda [1, 2], but the right way to teach parallel computing to students with a broad array of backgrounds. They
provide a common language that non-computer science students can understand and use to architect their applications
and understand performance, and that computer science students can think about ways to implement, optimize, and
compose. And this common language it makes it easier to build interdisciplinary teams that can communicate and
collaborate effectively.
Indeed, our graduate course CS267, Applications of Parallel Computers [3], which has been taught every year since
1991, had long used computational patterns (the 7 dwarfs) as an organizing principle. Typically about half the students
are from the Electrical Engineering and Computer Science department (EECS) and the other half from many other
science and engineering departments, including the Haas Business School; see [10] for a master’s thesis that started
as a class project. Beside recognizing, using and algorithms for patterns, the curriculum include parallel programming
using shared memory, distributed memory, GPUs, and cloud computing; tools for debugging, performance analysis,
and autotuning; programming frameworks for building larger applications; and guest lecturers presenting exciting
applications from diverse fields including climate modeling, astrophysics, and materials science. All slides and videos
of lectures are freely available on-line [3]. In addition to several programming assignments, students do class projects
that they choose themselves, typically based on their own research goals; see [4] for examples. We continue to update
the course each time it is offered, based on recent progress in the Par Lab and elsewhere.
Given the need to teach as many undergraduates about parallel as possible, we also introduced a new undergrad631
The Berkeley Par Lab: Progress in the Parallel Computing Landscape
uate parallel computing course, CS194, Engineering Parallel Software [6], in fall 2011. Also based on patterns, this
course uses a software platform based on our variant of the Smoke 2.0 video game as a running example. The video
game is used to demonstrate computational and structural patterns, as well as implementation and optimization techniques. In addition to programming assignments and lab sessions, student teams work on a project consisting of an
enhancement to a component of the video game (artificial intelligence, physics, graphics, or special effects). Our Par
Lab collaborator Tim Mattson from Intel has been a guest lecturer.
We have also taught a condensed version of this material every August since 2009 as a 3-day short course [7].
In addition to Par Lab faculty and graduate students, we have guest lecturers from among our Microsoft and Intel
collaborators. Our most recent (4th annual) short course had a record attendance of 397 participants, 136 on-site and
261 on-line, from 39 companies and 92 universities and labs world-wide. This is on top of the 991 participants in the
previous 3 offerings.
The success of these courses led the NSF-funded XSEDE project [11] to adopt CS267, CS194, and our 3-day short
course for nation-wide broadcast. The first such offering, CS267, was launched in spring 2013. XSEDE also gives
free accounts on NSF parallel computing facilities for remote students to do our homework assignments, using our
autograders. As this scales up, we hope to reach even larger numbers of students world-wide.
If parallelism is indeed ubiquitous, it should be introduced as early as possible into the curriculum. This was
done by a major redesign of our third-semester lower division course CS61C, renamed Great Ideas in Computer
Architecture [5]. As one example, we taught MapReduce using public cloud services and the standard Hadoop API,
carrying out scalability benchmarking assignments that would not have been possible otherwise. Students were excited
by the assignment, with 90% saying they thought it should be retained in future course offerings [9]. As another
example, we used performance tuning of matrix multiplication as an assignment to teach not just OpenMP parallelism
but also many other kinds of optimizations. Surprisingly, one team of sophomores even beat the highly tuned Intel
MKL implementation of matrix multiplication on some matrix sizes.
As a result of the success of infusing parallelism into 61C, the next edition of Computer Organization and Design [8] embraces this parallel perspective. It makes matrix multiple the running example through the last four chapters
of the book ( [8]), showing how making small changes as a result of understanding parallelism leads to dramatic performance improvements:
• Data-level parallelism in Chapter 3 improves performance by a factor of almost four by executing four 64-bit
floating point operations in parallel using the 256-bit operands, demonstrating the value of SIMD.
• Instruction-level parallelism in Chapter 4 more than doubles performance again by unrolling loops to give the
out-of-order execution hardware more instructions to schedule.
• Cache optimizations in Chapter 5 improves performance of matrices that didn’t fit into the L1 data cache by
another factor of 2.0 to 2.5 by using cache blocking to reduce cache misses.
• Thread-level parallelism in Chapter 6 improves the performance of matrices that don’t fit into a single L1 data
cache by another factor of 4 to 14 buy utilizing 16 cores, demonstrating the value of MIMD.
Using the ideas in this book and tailoring the software to this computer added just 24 lines of code. Depending on the
size of the matrix, the overall performance speedup from these ideas realized in those two-dozen lines of code is more
than a factor of 200. As Computer Organization and Design is the most popular textbook for undergraduate computer
architecture courses, this edition will help make parallelism the norm in undergraduate education.
In summary, Par Lab significantly impacted the teaching of parallel computing not just at Berkeley, but nationwide.
Bibliography
[1] K. Asanović, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker,
J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from Berkeley.
Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, December 18
2006.
[2] K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen,
J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. Commun. ACM,
52(10):56–67, 2009.
632
Chapter 18: Introduction
Educational Outcomes of Par Lab
[3] CS267 - Applications of Parallel Computers. http://www.cs.berkeley.edu/~demmel/cs267_Spr12, 2012.
[4] J. Demmel. CS267 - Applications of Parallel Computers - Class Projects. http://www.cs.berkeley.edu/
~demmel/cs267_Spr09/posters.html, 2009.
[5] D. Garcia. CS61C - Great Ideas in Computer Architecture. http://www-inst.eecs.berkeley.edu/~cs61c/
sp13, 2012.
[6] K. Keutzer. CS194 - Engineering Parallel Software. http://www.cs.berkeley.edu/~demmel/cs267_Spr12,
2012.
[7] 4th Annual Short Course on Parallel Programming. http://parlab.eecs.berkeley.edu/2012bootcamp, 2012.
[8] D. A. Patterson and J. L. Hennessy. Computer organization and design: the hardware/software interface, fifth
edition. Morgan Kaufmann, 2013.
[9] A. Rabkin, C. Reiss, R. Katz, and D. Patterson. Using clouds for MapReduce measurement assignments. ACM
Trans. Computing Education, 13, Jan 2013.
[10] N. Thompson. Firm Software Parallelism: Building a measure of how firms will be impacted by the changeover
to multicore chips. Master’s thesis, EECS Department, University of California, Berkeley, Dec 2012.
[11] Extreme Science and Engineering Discovery Environment (XSEDE). http://www.xsede.org/, 2013.
James Demmel, Kurt Keutzer, and David Patterson
633
Download