An Evaluation of TDD Training Methods in a Programming Curriculum Li-Ren Chien1, 2, Daniel J. Buehrer1, Chin-Yi Yang2 and Chyong-Mei Chen3 1. Department of Computer and Science Engineering, National Chung Cheng University 2. Hsin Kuo High School 3. Department of Applied Mathematics, Providence University 168 University Road, Min-Hsuing Chia-Yi, Taiwan, R.O.C. clj@cs.ccu.edu.tw Abstract This paper evaluates an innovative training method which is based on TDD (Test Driven Development) [4] and implemented in an automatic online judge system named DICE [9]. After running the automatic grading system DICE at Hsing Kuo High School in Taiwan for years, we found that some students were left out by the DICE system. We needed a more sophisticated mechanism to assist underachievers. Our solution was to utilize TDD as an extension of the DICE system to promote learning performance in programming. We implemented DICE with TDD and have applied the innovative training method in the programming curriculum at Hsin Kuo High School in Taiwan for one semester. Simultaneously we conducted an experiment with a control and experimental group to estimate the efficiency of DICE with TDD. Our finding is that DICE with TDD improves the mean scores of learners by 50.88% over the control group. 1 Introduction Studies of programming can be generally divided into two categories -- those with a software engineering perspective, and those with a psychological and educational perspective [14]. We first utilized software engineering technology to establish an automatic grading system DICE to lessen the assessment work of instructors. DICE has been used in Hsing Kuo High School’s computer programming courses for three years. Over 2,400 students have used the system, and they have proven to quickly achieve more programming skills than students in the past who were taught by traditional training methods. We ran into some well-known problems of a test-based grader, which caused the underachievers to be eliminated from DICE. So we needed a more sophisticated testing mechanism for the underachievers. TDD is a code development strategy in which one always writes a test case before adding new code [1]. The benefits of TDD are to build software better and faster and give the programmer a great degree of confidence in the correctness of his code [10]. TDD seems so attractive that computing and information technology educators have begun to call for the introduction of TDD into the curriculum [6]. We referred to the TDD concept from a pedagogical perspective to implement DICE with TDD. After implementing DICE with TDD, we commenced to design training material from the ACM UVA online judge problems [18]. We applied DICE with TDD in the computer programming curriculum for a semester. At the same time, we used a post-test only control group design to estimate the effectiveness of DICE with TDD. We found that DICE with TDD can benefit learners in programming more than the traditional method. According to a multiple regression analysis, the mean scores of the training group TDD with DICE were 50.88% above that of the training group without TDD. 2 TDD in Education TDD is a code development strategy in which one always writes a test case before adding new code [1]. The benefits of TDD are to help build software better and faster and give the programmer a greater degree of confidence in the correctness of his code [4]. TDD is so attractive that computing and information technology educators have begun to call for the introduction of TDD into the curriculum [6]. 2.1 Test Driven Development TDD (Test Driven Development) is a code development strategy that has been popularized by extreme programming [1]. TDD is an evolutionary approach to development which combines test-first development, of writing a test before writing just enough production code to fulfill that test, and refactoring. In TDD, one always writes a test case before adding new code. The following sequence of a TDD cycle is based on Beck’s theory in Figure 2-1. The first step is to quickly add a test, basically just enough code to fail. Next you run your tests, often the complete test suite although for sake of speed you may decide to run only a subset, to ensure that the new test does in fact fail. You then update your functional code to make it pass the new tests. The fourth step is to run your tests again. If they fail, you need to update your functional code and retest. Once the tests pass, the next step is to start over. You may first need to re-factor any duplication out of your design as needed [1]. The biggest benefit of TDD is to help build software better and faster. It offers more than just simple validation of correctness; it can also drive the design of a program. By focusing on the test cases first, one must imagine how the functionality will be used by clients (in this case, the test cases). Therefore, the programmer is only concerned with the interface, and not the implementation. This benefit is complementary to “design by contract”, as it approaches code through test cases rather than through mathematical assertions or preconceptions [1]. What is the primary goal of TDD? One view is that the goal of TDD is specification and not validation [13]. In other words, it’s one way to think through your design before writing the functional code. Another view is that TDD is a programming technique. As the argument of Ron Jeffries, the goal of TDD is to write clean code that works [5]. However, TDD has a limitation; it is difficult to use in situations where full functional tests are required to determine success or failure. Examples of these are GUIs (graphical user interfaces), programs working with relational databases, and some that depend on specific network configurations. Management support is essential. Without the entire organization believing that TDD is going to improve the product, management will feel that time spent writing tests is wasted [17]. using TDD in the classroom is not revolutionary. Computing and information technology educators have begun to call for the introduction of TDD into the curriculum [6]. Over the past five years, the idea of including software testing practices in programming assignments within the undergraduate computer science curriculum has grown from a fringe practice to a recurring theme [4]. Some researchers may argue that starting too early with a test-first approach can lead to the “paralysis of analysis” [3]. TDD has gone to school since 2001; Table 2-1 is a review of TDD studies applied in learning. Table2-1 Previous Studies of TDD in Learning [6] Study Edward, 2003 Dependent Variables Software Quality/reliability Programmer confidence Figure2-1 Test Driven Development Kaufma, 2003 2.2 Cases of TDD in Learning TDD provides benefits that learners experience for themselves. It is applicable on small projects with minimal training. It gives the programmer a great degree of confidence in the correctness of his code [11]. It is easier for learners to understand and relate to, more than traditional testing approaches. It promotes incremental development, promotes the concept of always having a “running version” of the program at hand, and promotes early detection of errors introduced by coding changes [4]. Finally, it encourages students to test features and code as they are implemented. As TDD seems attractive, the idea of Preference of TDD Software quality/reliability Programmer productivity Programmer confidence Muller, 2002 3 Software quality/reliability Programmer productivity Program understanding Results TDD Significantly higher TDD Significantly higher Significantly higher after TDD Significantly higher TDD Significantly higher TDD Significantly higher No significant difference No significant difference TDD Significantly higher DICE TDD Model Since 2005, we have established the DICE system as a test-based assignment, tutoring, and problem solving environment [9]. All training work, including assigned practice, turn-in and assessment, can be run on the DICE system. DICE has been working at Hsing Kuo High School in Taiwan R.O.C. for over 2 years. A running DICE System is shown in Figure 3-1. In the third stage, we conducted Kolb’s [7] [8] learning style instrument as a test of individual differences. We found the best fit between learning styles and training methods which would result in satisfactory learning outcomes. We proved that different learners needed different training methods in the DICE system [11]. Figure 3-1: A running DICE system After running DICE for years; we found some well-known problems of a test-based grader. These caused the underachievers to be eliminated from the DICE system. One problem was that only clearly defined questions with a completely specified interface could be used. It led students to focus on output correctness first and foremost, and it did not encourage or reward good performance while testing [16]. Another of the perceived shortcomings was that its inflexibility prevented assessment of more complex questions [2]. When a complex question arrived, we found that some underachievers just sat before their computer and waited for the bell to ring. So we needed a more sophisticated mechanism to help underachievers. In the second stage, we referred to training method criteria and TDD concepts to establish a new training model for learning programming, which was named the DICE TDD Model [10]. It provided sixteen kinds training methods for learners. Figure 3-2: The TDD Model in DICE At this time, a typed mind map model was developed as a knowledge representation for DICE. [12] 4 Evaluation DICE with TDD has been implemented since March, 2008. We introduced DICE with TDD into the computer science curriculum with programming in the 10th grade of Hsin Kuo High School in Taiwan. There were 15 classes, including 800 students, taking a course instructed by three teachers. We held an experiment to compare the effectiveness of the experimental and control groups. Our experimental group applied DICE with TDD training methods, whereas the control group is applied only DICE, denoted by Non-TDD. 4.1 Research Model As mentioned, some learners needed to be given more guidance. Consequently, it we followed TDD concepts to guide learners to solve problems. An instructor needs to divide the problem into sub problems and let students conquer each sub problem. After they have conquered every sub problem, then the whole problem will have been solved. It’s more intensive than Non-TDD. According to the literature review of Jones in 2004, organized in Table 2-1, most results of the studies have nice performance in TDD learning. From the discussion of the Teaching Council in Hsin Kuo High School, they observed that most learners, and especially the slower learners, needed the intensive method in programming learning. Hence, we have Hypotheses 1. Hypothesis 1: Participants in the TDD group will score significantly higher on learning performance measures than participants in the Non-TDD group. data. Table 4-1 is the distribution of the samples. Our training material was C-language programming. We designed the TDD sub problems from the ACM UVA online judge problems [18] and trained learners to conduct TDD and Non-TDD methods for 40 days. Then we had an examination to get the learning performance with scores from 0 to 100. Table4-1 Distribution of Samples Item Category Frequency Percent Training TDD 167 50.6% Method Non-TDD 163 49.4% 4.3 Data Analysis From the box-plot in Figure 4-2 and descriptive statistics of learning performance between the two groups, we see that the TDD training method seems to lead learners to better performance than the Non-TDD group. Nevertheless, the TDD group has much wider range in grades and a larger variance. Training Method TDD Non-TDD Learning H1 Performance Figure 4-1 Research Model 4.2 Experimental Design The experiment proceeded under one teacher. We conducted random assignment to have 167 samples in the TDD group, while Non-TDD consisted of 163 samples, and the total participants were 330, with no missing Figure4-2 Box-plot of Learning Performance Table4-1Data Descriptive of learning performance 21.51 6.4 100 27.14 100 Non-T 14.25 1.6 80 19.79 80 DD We conducted the KolmogorovSmirnov Test to examine the statistical significance. First, we considered the homogeneity and normal distributions of the learning performance. We conducted a Bartlett test and Levene test to infer homogeneity of variance of scores among the two groups. The outputs from the two methods demonstrated the learning performance of both performed with a different variance with P-value <0.05. We used the non-parametric method, Kolmogorov-Smirnov test. The result with P-value<0.05 said that the learning performance in each training group did not follow a normal distribution. Therefore, we could not analyze this data by T tests (or one-way ANOVA). Based on the data examined above, we decided to conduct a Kolmogorov-Smirnov test to analyze the relationship of learning performance between the two groups on the training methods. The empirical CDF (Cumulative Distribution Function) of scores for each group is estimated and plotted in Figure4-3. It is apparent that grades in the TDD group are stochastically larger than non-TDD group. The Kolmogorov-Smirnov test was conducted to test the null hypothesis: the true distribution function of grades of TDD is not less than the distribution function of non-TDD, versus the alternative hypothesis: the true distribution function of grades of TDD is less than that of non-TDD. That is, if we denote F1 ( x) and F0 ( x) as the CDF’s of TDD and non-TDD groups, respectively. The Kolmogorov-Smirnov method tests the H 0 : F1 ( x) F0 ( x) hypotheses vs. H1 : F1 ( x) F0 ( x) for all x, where H 1 According to the means of the two training groups, DICE with TDD can increase the mean from the 14.25325 of the Non-TDD group to 21.50521. The increment is 50.87934%. Furthermore, we evaluated the effect of TDD from simple regression analysis in spite of the heteroscedasticity and invalidity of the normal assumption, the TDD training does improve scores about 50.88052%. 1.0 TDD means grades of TDD are stochastically larger than the non-TDD. The test statistics 0.155 with P-value 0.01897 for a one-side test rejects the null hypothesis and exhibits that grades of TDD are stochastically larger than non-TDD grades. 0.8 Range 0.6 S.D. Empirical CDF Max 0.4 Mid. Non-TDD TDD 0.2 Mean 0.0 Category 0 20 40 60 80 100 Grades Figure 4-3 Empirical CDF Plot in Learning Performance 4.4 Results After conducting the KolmogorovSmirnov Test, we proved that there was a statistically significant improvement in performance of TDD over Non-TDD. According to the means of the two training groups and multiple regression, TDD with DICE can improve progress by about 50.88%. performance. 5 References Conclusions and Future Work The objective of this study is to illustrate how TDD can be used to improve learning performance in programming language courses. There exists statistical significance that TDD has nice performance and does improve progress about 50.88%. From a practical point of view, TDD promotes a climate of discussion between instructors and learners. To use DICE with TDD, the instructors need to design a set of sub problems with TDD methodology. This is a challenge to the instructors. Before announcing the assignments, instructors need to think over how to guide learners to solve problems by using TDD. This will enhance the professional ability of instructors. Regarding the learners, we observe that in a TDD group the learners rely on discussions with their mates to solve problems, while Non-TDD group members feel abandoned by DICE. Most learners don’t know how to commence, and intend to depend on instructors. From a pedagogical perspective, a TDD scoring mechanism does have a positive reinforcement on learners as they can acquire scores after every sub function is solved. However, learners in the Non-TDD group will not get scores until the whole problem is solved. Future research will be on the impact of the training method (TDD and Non-TDD) based on individual differences. Researchers in instructional psychology have demonstrated that adapting instructional methods and teaching strategies to accommodate key individual differences has led to improved performance [15]. Next we will analyze the relationship between the effects using TDD or Non-TDD approaches for various learning styles, to unearth the interaction of these two factors on learning 1. Beck, K. Test Driven Development: By Example, Addison-Wesley, 2003 2. Christopher, D., David, L. and Jams, O. Automatic Test-Based Assessment of Programming: A Review, ACM Journal of Educational Resources in Computing, Vol. 5, No 3, September 2005. Article 4 3. Don Colton., Leslie Fife., and Andrew Thompson. A Web-based Automatic Program Grader, Proc ISECON 2006, v23 4. Edwards, S. H. “Teaching Software Testing: Automatic Grading Meets Test-First Coding.” In Proceedings of The OOPSLA’03 Conference. Poster presentation, 2003b, 318-319 5. John Wiley and Sons, Agile Database Techniques: Effective Strategies for the Agile Software Developer, Wiley, 2007 6. Jones, C.G., "Test-driven Development Goes to School," Journal of Computing Sciences in Colleges, 2004, vol. 20, pp. 220-231 7. Kolb, D.A. and Fry, R. Toward an applied theory of experiential learning. In Thories of Group Process, G..L. Cooper(ed.), John Wiley and Sons, Inc., New York, NY, PP.33-54 8. Kolb, D.A. The Learning Style Inventory Technical Manual, Mcber and Company, Boston, MA 9. Li-Ren Chien, D. Buehrer and Chin Yi Yang. (2007) “Dice: A Parse-Tree Based On-Line Assessment System for a Programming Language Course”, International Conference on Teaching and Learning (iCTL 2007), Putrajaya, Malaysia 10. Li-Ren Chien, D. Buehrer and Chin Yi Yang. (2007) “Using Test-Driven Development in a Parse-tree Based On-line Assessment System” , IADIS International Conference e-Learning, Lisbon, Portugal 11. Li-Ren Chien, D. J. Buehrer, Chin-Yi Yang. (2007) “An Adaptive Learning Environment in the DICE System with a TDD Model” in Interactivee Computer Aided Learning,Villach (ICL 2007), Austria 12. Li-Ren Chien, D. Buehrer, “Using a Typed Mind Map as Knowledge Representation in a TDD DICE System”, 30th International Conference on Information Technology Interfaces, Cavat/Dubrovnik, Croatia, 2008 13. Robert C. Martin and Micah Martin, “Agile Software Development, Principles, Patterns, and Practices”, Prentice Hall PTR Upper Saddle River, 2003 14. Robin, A., Rountree J. and Rountree N. “Learning and Teaching Programming: A Review and Discussion, ” Computer Science Education, (33-2), 2003, pp. 137-172 15. Snow, R.E. “Individual Difference in the Design of Educational Programs,” American Psychologist (41:10), October 1986, pp. 1029-1039 16. Stephen H. Edwards and Manuel A. Pérez-Quiñon, “Experiences using test-driven development with an automated grader“, Journal of Computing Sciences in Colleges. Volume 22, Issue 3, January 2007 17. Steven Loughran, “Working Specification”, HP Laboratory, 2006 18. http://acm.uva.es/problemset/