An Evaluation of TDD Training Methods in a Programming Curriculum

advertisement
An Evaluation of TDD Training Methods in a Programming
Curriculum
Li-Ren Chien1, 2, Daniel J. Buehrer1, Chin-Yi Yang2 and Chyong-Mei Chen3
1. Department of Computer and Science Engineering, National Chung Cheng University
2. Hsin Kuo High School
3. Department of Applied Mathematics, Providence University
168 University Road, Min-Hsuing Chia-Yi, Taiwan, R.O.C.
clj@cs.ccu.edu.tw
Abstract
This paper evaluates an innovative
training method which is based on TDD
(Test Driven Development) [4] and
implemented in an automatic online judge
system named DICE [9].
After running the automatic grading
system DICE at Hsing Kuo High School in
Taiwan for years, we found that some
students were left out by the DICE system.
We needed a more sophisticated mechanism
to assist underachievers. Our solution was to
utilize TDD as an extension of the DICE
system to promote learning performance in
programming.
We implemented DICE with TDD and
have applied the innovative training method
in the programming curriculum at Hsin Kuo
High School in Taiwan for one semester.
Simultaneously we conducted an experiment
with a control and experimental group to
estimate the efficiency of DICE with TDD.
Our finding is that DICE with TDD
improves the mean scores of learners by
50.88% over the control group.
1
Introduction
Studies of programming can be
generally divided into two categories -- those
with a software engineering perspective, and
those with a psychological and educational
perspective [14].
We first utilized software engineering
technology to establish an automatic grading
system DICE to lessen the assessment work
of instructors. DICE has been used in Hsing
Kuo High School’s computer programming
courses for three years. Over 2,400 students
have used the system, and they have proven
to quickly achieve more programming skills
than students in the past who were taught by
traditional training methods.
We ran into some well-known
problems of a test-based grader, which
caused the underachievers to be eliminated
from DICE. So we needed a more
sophisticated testing mechanism for the
underachievers.
TDD is a code development strategy in
which one always writes a test case before
adding new code [1]. The benefits of TDD
are to build software better and faster and
give the programmer a great degree of
confidence in the correctness of his code
[10]. TDD seems so attractive that
computing and information technology
educators have begun to call for the
introduction of TDD into the curriculum [6].
We referred to the TDD concept from a
pedagogical perspective to implement DICE
with TDD.
After implementing DICE with TDD,
we commenced to design training material
from the ACM UVA online judge problems
[18]. We applied DICE with TDD in the
computer programming curriculum for a
semester. At the same time, we used a
post-test only control group design to
estimate the effectiveness of DICE with
TDD.
We found that DICE with TDD can
benefit learners in programming more than
the traditional method. According to a
multiple regression analysis, the mean scores
of the training group TDD with DICE were
50.88% above that of the training group
without TDD.
2
TDD in Education
TDD is a code development strategy in
which one always writes a test case before
adding new code [1]. The benefits of TDD
are to help build software better and faster
and give the programmer a greater degree of
confidence in the correctness of his code [4].
TDD is so attractive that computing and
information technology educators have
begun to call for the introduction of TDD
into the curriculum [6].
2.1
Test Driven Development
TDD (Test Driven Development) is a
code development strategy that has been
popularized by extreme programming [1].
TDD is an evolutionary approach to
development which combines test-first
development, of writing a test before writing
just enough production code to fulfill that
test, and refactoring. In TDD, one always
writes a test case before adding new code.
The following sequence of a TDD
cycle is based on Beck’s theory in Figure 2-1.
The first step is to quickly add a test,
basically just enough code to fail. Next you
run your tests, often the complete test suite
although for sake of speed you may decide
to run only a subset, to ensure that the new
test does in fact fail. You then update your
functional code to make it pass the new
tests. The fourth step is to run your tests
again. If they fail, you need to update your
functional code and retest. Once the tests
pass, the next step is to start over. You may
first need to re-factor any duplication out of
your design as needed [1].
The biggest benefit of TDD is to help
build software better and faster. It offers
more than just simple validation of
correctness; it can also drive the design of a
program. By focusing on the test cases first,
one must imagine how the functionality will
be used by clients (in this case, the test
cases). Therefore, the programmer is only
concerned with the interface, and not the
implementation.
This
benefit
is
complementary to “design by contract”, as it
approaches code through test cases rather
than through mathematical assertions or
preconceptions [1]. What is the primary goal
of TDD? One view is that the goal of TDD
is specification and not validation [13]. In
other words, it’s one way to think through
your design before writing the functional
code. Another view is that TDD is a
programming technique. As the argument
of Ron Jeffries, the goal of TDD is to write
clean code that works [5].
However, TDD has a limitation; it is
difficult to use in situations where full
functional tests are required to determine
success or failure. Examples of these are
GUIs (graphical user interfaces), programs
working with relational databases, and some
that
depend
on
specific
network
configurations. Management support is
essential. Without the entire organization
believing that TDD is going to improve the
product, management will feel that time
spent writing tests is wasted [17].
using TDD in the classroom is not
revolutionary. Computing and information
technology educators have begun to call for
the introduction of TDD into the curriculum
[6]. Over the past five years, the idea of
including software testing practices in
programming assignments within the
undergraduate computer science curriculum
has grown from a fringe practice to a
recurring theme [4]. Some researchers may
argue that starting too early with a test-first
approach can lead to the “paralysis of
analysis” [3]. TDD has gone to school since
2001; Table 2-1 is a review of TDD studies
applied in learning.
Table2-1 Previous Studies of TDD in Learning [6]
Study
Edward, 2003
Dependent
Variables
Software
Quality/reliability
Programmer
confidence
Figure2-1 Test Driven Development
Kaufma, 2003
2.2
Cases of TDD in Learning
TDD provides benefits that learners
experience for themselves. It is applicable on
small projects with minimal training. It gives
the programmer a great degree of confidence
in the correctness of his code [11]. It is
easier for learners to understand and relate to,
more than traditional testing approaches. It
promotes
incremental
development,
promotes the concept of always having a
“running version” of the program at hand,
and promotes early detection of errors
introduced by coding changes [4]. Finally, it
encourages students to test features and code
as they are implemented.
As TDD seems attractive, the idea of
Preference of
TDD
Software
quality/reliability
Programmer
productivity
Programmer
confidence
Muller, 2002
3
Software
quality/reliability
Programmer
productivity
Program
understanding
Results
TDD
Significantly
higher
TDD
Significantly
higher
Significantly
higher after
TDD
Significantly
higher
TDD
Significantly
higher
TDD
Significantly
higher
No significant
difference
No significant
difference
TDD
Significantly
higher
DICE TDD Model
Since 2005, we have established the
DICE system as a test-based assignment,
tutoring, and problem solving environment
[9]. All training work, including assigned
practice, turn-in and assessment, can be run
on the DICE system. DICE has been
working at Hsing Kuo High School in
Taiwan R.O.C. for over 2 years. A running
DICE System is shown in Figure 3-1.
In the third stage, we conducted Kolb’s
[7] [8] learning style instrument as a test of
individual differences. We found the best fit
between learning styles and training methods
which would result in satisfactory learning
outcomes. We proved that different learners
needed different training methods in the
DICE system [11].
Figure 3-1: A running DICE system
After running DICE for years; we found
some well-known problems of a test-based
grader. These caused the underachievers to
be eliminated from the DICE system. One
problem was that only clearly defined
questions with a completely specified
interface could be used. It led students to
focus on output correctness first and
foremost, and it did not encourage or reward
good performance while testing [16].
Another of the perceived shortcomings was
that its inflexibility prevented assessment of
more complex questions [2]. When a
complex question arrived, we found that
some underachievers just sat before their
computer and waited for the bell to ring. So
we needed a more sophisticated mechanism
to help underachievers.
In the second stage, we referred to
training method criteria and TDD concepts
to establish a new training model for
learning programming, which was named the
DICE TDD Model [10]. It provided sixteen
kinds training methods for learners.
Figure 3-2: The TDD Model in DICE
At this time, a typed mind map model
was
developed
as
a
knowledge
representation for DICE. [12]
4
Evaluation
DICE
with
TDD
has
been
implemented since March, 2008. We
introduced DICE with TDD into the
computer
science
curriculum
with
programming in the 10th grade of Hsin Kuo
High School in Taiwan. There were 15
classes, including 800 students, taking a
course instructed by three teachers.
We held an experiment to compare the
effectiveness of the experimental and control
groups. Our experimental group applied
DICE with TDD training methods, whereas
the control group is applied only DICE,
denoted by Non-TDD.
4.1 Research Model
As mentioned, some learners needed to
be given more guidance. Consequently, it we
followed TDD concepts to guide learners to
solve problems. An instructor needs to
divide the problem into sub problems and let
students conquer each sub problem. After
they have conquered every sub problem,
then the whole problem will have been
solved. It’s more intensive than Non-TDD.
According to the literature review of Jones
in 2004, organized in Table 2-1, most results
of the studies have nice performance in TDD
learning. From the discussion of the
Teaching Council in Hsin Kuo High School,
they observed that most learners, and
especially the slower learners, needed the
intensive method in programming learning.
Hence, we have Hypotheses 1.
Hypothesis 1: Participants in the TDD
group will score significantly higher on
learning performance measures than
participants in the Non-TDD group.
data. Table 4-1 is the distribution of the
samples.
Our training material was C-language
programming. We designed the TDD sub
problems from the ACM UVA online judge
problems [18] and trained learners to
conduct TDD and Non-TDD methods for 40
days. Then we had an examination to get the
learning performance with scores from 0 to
100.
Table4-1 Distribution of Samples
Item
Category
Frequency
Percent
Training
TDD
167
50.6%
Method
Non-TDD
163
49.4%
4.3
Data Analysis
From the box-plot in Figure 4-2 and
descriptive statistics of learning performance
between the two groups, we see that the
TDD training method seems to lead learners
to better performance than the Non-TDD
group. Nevertheless, the TDD group has
much wider range in grades and a larger
variance.
Training Method
TDD
Non-TDD
Learning
H1
Performance
Figure 4-1 Research Model
4.2
Experimental Design
The experiment proceeded under one
teacher. We conducted random assignment to
have 167 samples in the TDD group, while
Non-TDD consisted of 163 samples, and the
total participants were 330, with no missing
Figure4-2 Box-plot of Learning Performance
Table4-1Data Descriptive of learning performance
21.51
6.4
100
27.14
100
Non-T
14.25
1.6
80
19.79
80
DD
We conducted the KolmogorovSmirnov Test to examine the statistical
significance. First, we considered the
homogeneity and normal distributions of the
learning performance. We conducted a
Bartlett test and Levene test to infer
homogeneity of variance of scores among
the two groups. The outputs from the two
methods
demonstrated
the
learning
performance of both performed with a
different variance with P-value <0.05. We
used
the
non-parametric
method,
Kolmogorov-Smirnov test. The result with
P-value<0.05 said that the learning
performance in each training group did not
follow a normal distribution. Therefore, we
could not analyze this data by T tests (or
one-way ANOVA). Based on the data
examined above, we decided to conduct a
Kolmogorov-Smirnov test to analyze the
relationship of learning performance
between the two groups on the training
methods.
The empirical CDF (Cumulative
Distribution Function) of scores for each
group is estimated and plotted in Figure4-3.
It is apparent that grades in the TDD group
are stochastically larger than non-TDD
group. The Kolmogorov-Smirnov test was
conducted to test the null hypothesis: the
true distribution function of grades of TDD
is not less than the distribution function of
non-TDD, versus the alternative hypothesis:
the true distribution function of grades of
TDD is less than that of non-TDD. That is, if
we denote F1 ( x) and F0 ( x) as the CDF’s of
TDD and non-TDD groups, respectively.
The Kolmogorov-Smirnov method tests the
H 0 : F1 ( x)  F0 ( x)
hypotheses
vs.
H1 : F1 ( x)  F0 ( x) for all x, where H 1
According to the means of the two
training groups, DICE with TDD can
increase the mean from the 14.25325 of the
Non-TDD group to 21.50521. The increment
is 50.87934%. Furthermore, we evaluated
the effect of TDD from simple regression
analysis in spite of the heteroscedasticity and
invalidity of the normal assumption, the
TDD training does improve scores about
50.88052%.
1.0
TDD
means grades of TDD are stochastically
larger than the non-TDD. The test statistics
0.155 with P-value 0.01897 for a one-side
test rejects the null hypothesis and exhibits
that grades of TDD are stochastically larger
than non-TDD grades.
0.8
Range
0.6
S.D.
Empirical CDF
Max
0.4
Mid.
Non-TDD
TDD
0.2
Mean
0.0
Category
0
20
40
60
80
100
Grades
Figure 4-3 Empirical CDF Plot in Learning
Performance
4.4
Results
After conducting the KolmogorovSmirnov Test, we proved that there was a
statistically significant improvement in
performance of TDD over Non-TDD.
According to the means of the two training
groups and multiple regression, TDD with
DICE can improve progress by about
50.88%.
performance.
5
References
Conclusions and Future Work
The objective of this study is to
illustrate how TDD can be used to improve
learning performance in programming
language courses. There exists statistical
significance that TDD has nice performance
and does improve progress about 50.88%.
From a practical point of view, TDD
promotes a climate of discussion between
instructors and learners. To use DICE with
TDD, the instructors need to design a set of
sub problems with TDD methodology.
This is a challenge to the instructors. Before
announcing the assignments, instructors
need to think over how to guide learners to
solve problems by using TDD. This will
enhance the professional ability of
instructors. Regarding the learners, we
observe that in a TDD group the learners
rely on discussions with their mates to solve
problems, while Non-TDD group members
feel abandoned by DICE. Most learners
don’t know how to commence, and intend to
depend on instructors. From a pedagogical
perspective, a TDD scoring mechanism does
have a positive reinforcement on learners as
they can acquire scores after every sub
function is solved. However, learners in the
Non-TDD group will not get scores until the
whole problem is solved.
Future research will be on the impact
of the training method (TDD and Non-TDD)
based on individual differences. Researchers
in
instructional
psychology
have
demonstrated that adapting instructional
methods and teaching strategies to
accommodate key individual differences has
led to improved performance [15]. Next we
will analyze the relationship between the
effects using TDD or Non-TDD approaches
for various learning styles, to unearth the
interaction of these two factors on learning
1. Beck, K. Test Driven Development: By
Example, Addison-Wesley, 2003
2. Christopher, D., David, L. and Jams, O.
Automatic Test-Based Assessment of
Programming: A Review, ACM Journal
of Educational Resources in Computing,
Vol. 5, No 3, September 2005. Article 4
3. Don Colton., Leslie Fife., and Andrew
Thompson. A Web-based Automatic
Program Grader, Proc ISECON 2006,
v23
4. Edwards, S. H. “Teaching Software
Testing: Automatic Grading Meets
Test-First Coding.” In Proceedings of
The OOPSLA’03 Conference. Poster
presentation, 2003b, 318-319
5. John Wiley and Sons, Agile Database
Techniques: Effective Strategies for the
Agile Software Developer, Wiley, 2007
6. Jones, C.G., "Test-driven Development
Goes to School," Journal of Computing
Sciences in Colleges, 2004, vol. 20, pp.
220-231
7. Kolb, D.A. and Fry, R. Toward an
applied theory of experiential learning.
In Thories of Group Process, G..L.
Cooper(ed.), John Wiley and Sons, Inc.,
New York, NY, PP.33-54
8. Kolb, D.A. The Learning Style Inventory
Technical Manual, Mcber and Company,
Boston, MA
9. Li-Ren Chien, D. Buehrer and Chin Yi
Yang. (2007) “Dice: A Parse-Tree Based
On-Line Assessment System for a
Programming
Language
Course”,
International Conference on Teaching
and Learning (iCTL 2007), Putrajaya,
Malaysia
10. Li-Ren Chien, D. Buehrer and Chin Yi
Yang. (2007) “Using Test-Driven
Development in a Parse-tree Based
On-line Assessment System” , IADIS
International Conference e-Learning,
Lisbon, Portugal
11. Li-Ren Chien, D. J. Buehrer, Chin-Yi
Yang. (2007) “An Adaptive Learning
Environment in the DICE System with a
TDD Model” in Interactivee Computer
Aided Learning,Villach (ICL 2007),
Austria
12. Li-Ren Chien, D. Buehrer, “Using a
Typed Mind Map as Knowledge
Representation in a TDD DICE System”,
30th International Conference on
Information Technology Interfaces,
Cavat/Dubrovnik, Croatia, 2008
13. Robert C. Martin and Micah Martin,
“Agile
Software
Development,
Principles, Patterns, and Practices”,
Prentice Hall PTR Upper Saddle River,
2003
14. Robin, A., Rountree J. and Rountree N.
“Learning and Teaching Programming: A
Review and Discussion, ” Computer
Science Education, (33-2), 2003, pp.
137-172
15. Snow, R.E. “Individual Difference in the
Design of Educational Programs,”
American Psychologist (41:10), October
1986, pp. 1029-1039
16. Stephen H. Edwards and Manuel A.
Pérez-Quiñon,
“Experiences
using
test-driven development with an
automated
grader“,
Journal
of
Computing Sciences in Colleges. Volume
22, Issue 3, January 2007
17. Steven
Loughran,
“Working
Specification”, HP Laboratory, 2006
18. http://acm.uva.es/problemset/
Download