2008-arch1-sigcse-04 - Robust Autonomic Systems

advertisement
Responding to Java-Centric Curriculum: Combined
Introduction of C and Assembly Language in a
Course on Computer Organization
Eric A. Freudenthal and Brian A. Carter
University of Texas at El Paso,
efreudenthal@utep.edu, bacarter@miners.utep.edu
ABSTRACT
1. INTRODUCTION
This paper describes the reform of a sophomore-level course in
computer organization for the Computer Science BS curriculum at
the University of Texas at El Paso, an urban minority-serving
institution, where Java and integrated IDEs have been adopted as
the only language and development environments used in the first
three semesters of study. This reform, which interleaves the
teaching of C and assembly language within a junior-level course
in computer organization, was motivated by faculty observations
and industry feedback indicating that upper-division students and
graduates were failing to achieve mastery of non-garbage-collected,
strictly imperative languages, such as C.
Until the mid-1990’s, the introductory sequence of many Computer
Science curricula included two courses that introduced students to,
programming, and common data structures using one of a variety of
procedural languages that directly exposed students to management
of memory and the use of pointers. These concepts are necessary
for understanding the relationship between high level languages,
run-time
environments,
and
underlying
architecture.
Understandings and skills at this level are common with EE
curriculum still in use.
The similarity of C semantics to the underlying machine model
enables simultaneous mastery of both C and assembly-language
programming and exposes implementation details that are difficult
to teach independently, such as subroutine linkage and management
of stack frame. An online lab manual has been developed for this
course that is freely available for extension or use by other
institutions. In this paper, we report on pedagogical techniques for
facilitating student understanding of the relationships between
high-level language constructs, such as algebraic expression syntax,
block-structured control-flow structures, and composite data types,
and their implementations in machine code.
Categories and Subject Descriptors
C.0.0 [Hardware/software interfaces]
C.1.0 [General]
D.3.4 [Processors]
D.3.3 [Language Constructs and Features]
General Terms
Algorithms, Design, Languages,.
Index Terms - computer architecture, CSE, assembly language,
compilation
In response to the wide adoption of object-oriented languages with
“reference” abstractions and automatic garbage collection such as
Java that reduce the difficulty of engineering and debugging
complex systems, a now-popular “objects first” curriculum design
was introduced and is now listed by the ACM as an acceptable
implementation strategy for Computer Science education. [7]
The adoption of object oriented languages and their seamlessly
integrated IDEs has enabled lower division students to construct
more complex programs at the cost of decreasing their
understanding of memory management and system-level tools such
as compilers and linkers. These understandings are critical for
developers of operating systems and embedded controllers. This
learning deficit has been observed by industry and faculty at UTEP
and other institutions that adopted objects-first curricula. [1]
This paper describes initial observations from an experimental
reform to a junior-level undergraduate course whose principal
learning outcome was an understanding of computer system
organization and assembly language. In the reformed course, the C
programming language, which is commonly used for low-level
system implementation, bridges between the familiar syntax of Java
and the semantics of the underlying system.
The pre-reform computer organization course focused on
foundational concepts such as machine instructions, registers, the
random-access memory model, and a generalized fetch-execute
cycle [2]. Projects included assembly-language programming of a
Motorola M68HC11 processor embedded within a robot. The
reformed course [3][4], which uses a different embedded target,
integrates the study of C and is thus also able to focus on the
implementation of high-level language features and the linkage
between C and assembly-language routines. Student labs use
traditional command-line tools including bash, gcc, as, ld, and
make.
The reformed course’s outcomes are a superset of the original, with
extensions including (1) understanding of C and its runtime
environment, (2) parse trees, (3) implementation of simple dynamic
memory management, and (4) usage of traditional command-line
development tools. While no formal evaluation has yet been
conducted, examinations and projects indicate that most students
develop competency in the both the previous and extended
outcomes, and students have expressed enthusiasm for learning C
due to its high perceived commercial relevance.
Ironically, this work was inspired by the recent work of Yale Patt
who developed a architecture-first (a.k.a. “breadth-first”) curricula
that introduces students to underlying architecture and machine
language concepts prior to high-level language programming in C
[5], [6]. The approach described here is complementary to Patt’s
since it exploits understanding gained from the prior study of highlevel (and even object-oriented) languages to facilitate the
combined understanding of C and computer organization.
structured control-flow constructs (e.g if-then-else clauses, switch
statements and and for or while-loops) to equivalent branching
assembly-language code. Branching instructions have similar
semantics to the C (and FORTRAN) goto statement. Rather than
introducing translation templates, as fixed (and content-free)
idioms, the course exploits C’s support for both (conditional) goto
and block-structured primitives to enable the students to explore the
logical reduction of block-structured programs to “goto C.” This
approach exposes an opportunity for students to generate and
validate generalized translation templates themselves. As is
illustrated by Figure 2, the subsequent translation of goto C to
(MSP430) assembly language is straightforward.
2. ADOPTION OF THE TI MSP430
MICROCONTROLLER
In the spring of 2008, the TI MSP430 embedded controller was
adopted. Like the HC11, the MPS430 is supported by the free
GNU development tools. Advantages of this controller over the
previously used M68HC11 include a smaller orthogonal (and thus
easier to learn) instruction set and inexpensive (~$20) development
kits.
3. DECONSTRUCTING HIGH-LEVEL
PROGRAMMING CONSTRUCTS INTO
ASSEMBLY LANGUAGE
Assembly language programming is taught through examination of
systematic strategies for translating high level C-language
constructs into assembly language. This is accomplished through
the discussion and manual application of techniques that generally
are reserved for courses on compiler construction.
For example, students frequently have difficulty converting
algebraic expressions into sequences of operations. This process
requires (1) identification of sub-expressions, (2) allocation of
temporary variables to hold intermediate results, and (3)
determination of appropriate evaluation orderings. As illustrated in
Figure 1, the introduction of manually generated "parse trees"
facilitates this process because the tree’s structure directly
represents the relationship between sub-expressions and their
intermediate values, and also expose the the full range of
appropriate (bottom-up) evaluation orderings. Parse trees are
introduced as part of an active-learning exercise in which students
first identify challenges in translation of algebraic expressions.
FIGURE 1
ILLUSTRATION OF A PARSE TREE USED AS AN AID TO CONVERT A C-STYLE
ARITHMETIC EXPRESSION TO ASSEMBLY-LANGUAGE CODE.
CONTROL FLOW
Students who are only familiar with block-structured
programming can easily be confused by the translation of block-
FIGURE 2
REDUCTION OF BLOCK-STRUCTURED C TO BRANCHING CODE
4. COMPOSITE DATA TYPES
The course also includes labs in which students manipulate
composite data types declared using C’s struct primitive.
Lectures and exercises examine the memory representation of
abstract data types such as trees and linked-lists whose members
are defined as structs. A linked list example is illustrated in Figure
3. Lab projects include programs that manipulate abstract data
types (such as linked lists) both accessed by mainline code in C and
by I/O handlers written in assembly language. This figure also
illustrates a technique used by two-pass assemblers to determine
the address of instructions prior to opcode generation.
5. EVALUATION AND LESSONS
LEARNED
The reforms described herein were implemented incrementally over
three semesters. Grading artifacts indicate that most students
acquire substantial mastery of C and code generation to assembly
language.
Initially, students are highly motivated to learn C due to their
awareness of its value in the job market. However, their continued
interest appears to heavily depend upon the nature of early
programming assignments. Paradoxically, student motivation was
sustained during earlier semesters prior to the construction of a
laboratory manual that apparently over-simplified early “getting
started” labs into cookbook recipes and therefore prevented
students from practicing problem-solving skills necessary for
success in later labs.
Students who attended the pre-reform course in computer
organization had substantial difficulty with systems programming
assignments in C involving pointer manipulation. We have
recently begun evaluation of student retention of concepts taught in
the computer organization course that will examine whether they
have a greater ability to successfully complete laboratory
assignments.
The first group of students to attend the reformed course are now
attending a senior-level course in operating systems. Initial quizzes
indicate that student recall of concepts introduced in the course in
computer organization is stronger than in previous years.
Furthermore, informal reports from potential employers
interviewing our students indicate that they detect that students who
attended the reformed course possess dramatically stronger
understandings of concepts underlying system programming and C
than students who attended the pre-reform course.
REFERENCES
This reform also appears to increase student interest in further
study of compilation techniques: Our department has been unable
to offer an undergraduate course in compilation for several years
due to lack of student interest. However, the offering after the
implementation of this reform attracted a large number of students.
[3] Freudenthal, E., Carter, B., Kautz, F., Ogrey, A., Preston, R., Walton,
A., “Integration of C into an Introductory Course in Computer
Organization,” ASEE 2007 ECE Division Program.
6. SYNOPSIS
[5] Patt, Yale, “Education in Computer Science and Computer Engineering
Starts with Computer Architecture,” ACM 1996 proc. 1996 Workshop
on Computer Architecture Education.
The integrated C and Assembly Language course in Computer
Architecture has been taught for three semesters. Initial evaluation
indicates that the students both learn the original and extended
course outcomes. In addition, the course appears to increase
student interest in attending subsequent courses in compilation.
Future research will include longitudinal study of student success in
systems-oriented
[1] Dewar, Robert and Schonberg, Edmond, “Computer Science Education:
Where are the Software Engineers of Tomorrow,” STSC Crosstalk,
January 2008.
[2] Teller, Patricia; Nieto, Manuel; and Roach, Steve, “Combining learning
strategies and tools in a first course in computer architecture,” IEEE
proc. 2003 Workshop on Computer Architecture Education.
[4] Freudenthal, E, Carter, B., Kautz, F., Ogrey, A. and Preston, R, “Work
In Progress: Combined Introduction to C and Assembly,” Proc.,
Frontiers in Education 2008., ASEE and IEEE, pub.
[6] Patt, Yale and Patel, Sanjay, Introduction to Computing Systems.
McGraw Hill, 2004. ISBN 0-07-121503-4.
[7] Association for Computing Machinery, Computing Curricula 2001
Computer Science, ACM, http://www.sigcse.org/cc2001/cc2001.pdf
AUTHOR INFORMATION
Eric A. Freudenthal is a member of the Computer Science Faculty
at the University of Texas at El Paso.
Brian A. Carter is an undergraduate studying Computer Science
at
the
University
of
Texas
at
El
Paso.
FIGURE 3
COMPOSITE DATA TYPE EXAMPLE AND ILLUSTRATION OF MEMORY ALLOCATION FOR INSTRUCTION WORDS.
Download