Working Paper Implementing the Personal Software Process (PSP) with Undergraduate Students by M. F. Murphy Abstract This paper describes an empirical case-study which used techniques and concepts from the Personal Software Process (PSP), developed by Watts Humphrey (1995), to teach software process improvement to under-graduate, computing students. The PSP claims to provide individual software developers with a structured and systematic way to improve the quality and predictability of the software they write (Humphrey, 1995). The objectives of this case study were twofold: First, to study the impact learning an adapted version of the PSP had on the estimating ability, the programming habits, and the quality of work produced by a group of PSP-trained students. Second, to compare the software development processes of this group with a control group of non-PSP trained students. A number of hypotheses were tested, dealing with four aspects of the PSP, namely: size estimation, time estimation, time management and software quality management. A post-course survey was also administered to the PSP-trained students. The results of the case study are described and discussed in this paper and recommendations are made for the future practice of the PSP. 1. Emerging Need for Software Process Education Attention has focused in recent years on the need to provide software process improvement education to industrial software developers. This need arose because of the acute problem of low quality software produced by the software industry. Methods which support a disciplined and systematic approach to software development and improvement have been introduced into industry e.g. Capability Maturity Model (CMM) developed by the Software Engineering Institute (SEI) and the SPICE standard. However, in recent years it has been suggested that the best way to introduce software process improvement into industry is to educate students in efficient, disciplined methods and quality programming practices during the course of their formal computing studies. 2. Personal Software Process There has been a recent movement to include the Personal Software Process (PSP), developed by Watts Humphrey (1995), as a topic on under-graduate and post-graduate computing courses. The PSP was developed as a means of improving the processes of individual software engineers (Humphrey, 1995). It allows developers to keep track of their personal performance and to make estimations on their future performance based on these records. It has proved to be an effective methodology for learning a disciplined process for software development and for improving the quality of software produced by an individual. Learning the PSP requires students to work on ten small programming exercises at the rate of one per week. The students keep precise measurements such as the number of lines of code (new, reused, changed), defects found, phase of defect injection and removal, time spent fixing defects, time spent on each phase of the programming exercise, estimated and actual value of program size and development time. A new concept is introduced each week such as: Measuring and Tracking the project Software Project Planning Methods for estimating Size and Time Code Reviews and Defect Prevention Techniques Structured Design Methods Cyclic Personal Process PSP data is recorded on PSP logs and forms and PSP summary reports. 3. Research Method: 3.1 Subjects The subjects consisted of 32 undergraduate students who were entering their second year of computing studies. They were assigned alphabetically into two groups, of 16 students each – an experimental group, referred to as the PSP group, and a control group, referred to as the Non-PSP group. 3.2 Content of PSP Course Students were taught an abridged version of the PSP course, which lasted 6 weeks and involved writing 5 programs. The course included the following subset of PSP skills namely: size and time estimation, how to record of program size and program development time, how to record defect data (injection and removal) and how to perform code reviews. 3.3 Evaluation Methods Employed The impact of PSP training was evaluated from three different perspectives: A longitudinal study on the effectiveness of training on the PSP group A comparative study, between the PSP group and the Non-PSP group A post-course survey was administered to the PSP group. 3.3.1 Longitudinal Study of PSP Group PSP data collected at the beginning of the course of instruction was compared with PSP data collected during the course and with data collected at the end of the course. Paired one-tailed student t-tests were used to determine the accuracy of estimates made. Data was collected on the following PSP measurements: Program Size (estimate V actual), Time to complete program (estimate V actual), Time/effort spent per program phase (estimate V actual), Defect Data (defects injected/removed per program phase). 3.3.2 Comparative Study of the Non-PSP and PSP Group The results of a pre-test programming exercise and a post-test programming exercise administered to both the PSP and the Non-PSP group were compared. Two-sample ttests were used to test the difference between a number of the variables recorded for the two groups. 3.3.3 Post-Course Survey An anonymous, voluntary, post-course survey was also administered to the PSPtrained students. The purpose of this survey was: To ascertain the students’ opinions of the usefulness of the PSP techniques To discover their attitude towards the PSP To elicit their evaluation of the quality of the data they recorded in their exercises 3.4 Hypotheses A number of hypotheses were tested, dealing with four aspects of the PSP, namely: size estimation, time estimation, time management and software quality management. The hypotheses, and their corresponding results, are detailed in Table 4.1. 3.5 Instrumentation The students logged their primary PSP data onto simplified PSP forms which were available in paper format and also implemented in Excel. The Excel forms had builtin equations and supported data transfer between various cells and forms. The following PSP forms were used in the case study: Time/Size Estimation Log,Time Log, Time Summary Log, LOC Summary Log, Time/Size/Defects Estimation, Defect Recording Log, Time/Size/Defects Summary. Students were provided with a standard code review document, or checklist, for use during code review. The checklist concentrated primarily on checking for syntax errors. 3.6 Data Gathering Approach The PSP process activities, associated with each of the programming exercises, are listed in Table 3.1 below. Program Exercise 1 2 Estimates of time Estimate required size size size size (LOC) (LOC) (LOC) (LOC) Collect time by Calculate Calculate Calculate Calculate phase productivity productivity productivity productivity from Programs from Programs from Programs from Programs 1 1, 2 1, 2 3 and 4. 1, 2 3 and 4. Activity 3 of Estimate 4 of Estimate 5 of Estimate of Estimate of size Estimate of (LOC) time based on time based on time based on time based on size size size size estimate Estimate of estimate Estimate of estimate Estimate of estimate and and and and productivity productivity productivity productivity Measure of size Collect time by Collect time by Collect time by Collect time by on phase phase phase phase completion(LOC) Table 3.1 Measure of Measure of Measure of Measure of size on size on size on size on completion completion completion completion (LOC) (LOC) (LOC) (LOC) Collect defect Carry data by phase Code Review Code Review Collect defect Collect defect data by phase data by phase out a Carry out Research Data Gathering Requirements (Adapted from Coleman and O’Connor (2000) 4 Results The main results of the empirical study are summarised in Table 4.1, according to the various hypotheses proposed in the case study. Summary of Results Hypothesis 1: The PSP group will become more accurate, over time, at estimating the size measurement requirements of their programs. Hypothesis 2: The PSP group will become more accurate, over time, at estimating the size measurement requirements of their programs than the Non-PSP group. Study Size PSP Group Non-PSP V PSP Group Underestimated size at start Both groups underestimated at start More accurate at end Estimation PSP group more accurate than NonPSP at end Hypothesis 1 accepted Hypothesis 2 accepted a Hypothesis 3: The PSP group will become more accurate, over time, at estimating the time/effort measurement requirements of their programs. Hypothesis 4: The PSP group will become more accurate, over time, at estimating the time/effort measurement requirements of their programs than the Non-PSP group. Study PSP Group Non-PSP V PSP Group Underestimated time at start Both groups underestimated at start More accurate at end PSP group more accurate than Non- Time Estimation PSP at end Hypothesis 3 accepted Hypothesis 4 accepted Hypothesis 5: The PSP group will spend longer on the earlier phases of their programs and less time compiling and testing, over the period of the study. Hypothesis 6: The PSP group will spend longer on the earlier phases of their programs and less time on compiling and testing than the Non-PSP group. Study PSP Group Non-PSP V PSP Group Time Distribution Bulk of time spent in Compile and Test at start No difference between groups at start More time spent on front-end PSP-group spent longer on front- activities at end end activities than Non-PSP group Decrease in Compile and Test time at end at end Non-PSP group spent longer on Compile and Test at end PSP group spent longer on Compile than Non-PSP group but shorter on combined Compile and Test Hypothesis 5 accepted Hypothesis 6 accepted Hypothesis 7: The number of defects, detected in the Compile and Test phases, in the PSP group’s programs will reduce considerably, over the period of the study, as a result of using code reviews. Hypothesis 8: The PSP group will detect fewer defects in the Compile and Test phases than the Non-PSP group. Study PSP Group Non-PSP V PSP Group Quality Management Defect densities improved (but similar to Non-PSP group) Partial success with code reviews Bulk of time still spent in Compile and Test phases groups at the start Similar improvement in defect densities for both groups at end Similar proportion of defects removed in Compile Many defects found in Test phase Defect densities similar for both Smaller proportion of defects found by PSP group in Test phase PSP group better at using code reviews Non-PSP group spend twice as long in the Test phase Both groups reworked code substantially in the Test phase Table 4.1 5. Hypothesis 7 rejected Hypothesis 8 accepted Summary of Results Discussion A general discussion of the findings and the gained understanding, from the empirical aspects of the case study, is provided in subsequent sections. 5.1 Size and Time Estimation The early size and time estimates made by the PSP group tended to be optimistic, with the majority of the group underestimating the size of their programs and the time to complete them. Later estimates indicated that the group used feedback from previous estimates to inform their subsequent estimates. Some students managed to gain good control over their estimates whilst others displayed considerable variability in their individual performance. In some instances, estimates for individual students oscillated, between under-estimates and over-estimates, or vice-versa, from one program to the next. These oscillations occurred because students had only a small data set on which to base their estimates on. There were more fluctuations in the time data compared to the size estimation data. These can be attributed to various factors ranging from difficulty of the PSP programming exercises being developed to the data recording activities required for the specific PSP process being implemented. The findings from the case study indicated that the size and the time estimation skills of the PSP group improved as a result of training. However, no firm conclusions, can be claimed about an improvement in size or time estimating accuracy in the group. This is due to the fact that there is evidence of large estimation errors in the study and considerable variability in individual performance. Furthermore, the number of data points in the case study was extremely limited. The Non- PSP group under-estimated the size of both Program 1 and Program 5. They also under-estimated the time it would take them to develop the programs. This data provides strong evidence that students, without direction or guidance, will produce optimistic estimates of the sizes of the programs they are required to develop. The time estimates from the Non-PSP group provided strong evidence also that students, in the absence of historical data on time estimations, will produce optimistic estimates of the length of time it will take them to develop a program. 5.2 Time Distribution The time distribution data, for the Non-PSP group and the early programs of the PSP group, revealed how beginner programmers approach programming. Minimal effort was put into planning or designing activities. Students equated coding with design, and spent the bulk of their development time in coding, compiling and testing activities. Neither group was required to submit any design material with their PSP data. In the absence of this requirement, the majority of students did not adopt a disciplined approach to programming The need to follow a disciplined approach and to put effort into planning and design activities was re-emphasised during each PSP session. As a consequence of this repeated message, the PSP group spent increasing proportions of their time on these activities and less time compiling and testing. The time distribution data for Program 5 indicated that the PSP group, spent longer (42%) in the compile phase than the Non-PSP group. This was due to the fact that the PSP group wrote longer, more-complete first-versions of their program than the NonPSP group. The Non-PSP group spent more than double the length of time the PSP group spent in the test phase due to the fact that they wrote an abridged version of the program initially. This required less time to compile than the equivalent, more complete, version of the program written by the PSP group. The Non-PSP group then developed their programs iteratively, using frequent edit-code-debug cycles, in the test phase, to address design requirements omitted in the original version of the program they wrote. 5.3 Defects The PSP group produced better quality code for Programs 4 and 5 than they did for Program 3. This improvement, however, cannot be attributed to the impact of learning the PSP or to the use of code reviews, as the Non-PSP group produced code of a similar quality for Program 5. It was due to a combination of performing regular programming exercises and increased familiarity with the syntax of the programming language. An analysis of the phases when defects were injected and removed provided an insight into the quality of code produced by the PSP group. The vast majority of defects were injected in the coding phase and were removed in the compile phase. This pattern provided evidence that the students were not fluent with the syntax of the programming language. Both groups injected and removed a high number of defects in the test phase. According to the Time Distribution data, both groups spent a large amount of time in this phase, addressing design deficiencies that had been overlooked when developing the programs. This indicates that both groups still adopted a “hacking” approach to developing software rather than a process approach. An analysis of the defects collected by the PSP group during code reviews indicates that they used code reviews to remove compile errors rather than to remove logic errors. These compile defects could probably have been found faster by the compiler than by review methods. Before the introduction of code reviews, the PSP group, relied exclusively on the compile and test phases for defect removal. However, it is apparent they continued to rely on these phases even after the introduction of code reviews, but to a lesser extent. The results from the two code review sessions in this case study (25% and 40% respectively) were moderately successfully. These results are encouraging for providing students with more code reviewing skills and for the continuation of code reviews in the future. 5.4 Student Perspective on the PSP Initially the students were quite receptive to the concepts and techniques used in the PSP. However, as the number of processes increased, students adopted a more negative attitude. This coincided with the introduction of defect gathering activities. In the voluntary, post-course survey administered to the PSP-trained students and completed by 75% of the group, 78% indicated that they felt they learnt the PSP easily and that it gave them a better understanding of the software development process (80%). The most negative response (100%) concerned the increased workload caused by using the PSP. Students frequently complained about the overhead involved in recording PSP processes manually. The most common complaint was that it distracted them from their principle task of producing a working program. They also complained about the lack of an automatic tool to simplify the task of logging time and error data. Many students objected to recording and submitting defect data. They considered that it took them longer to record some of this erroneous data than it did to actually correct it. A large proportion of students experienced difficulty understanding the purpose of statically desk-checking code and performing code reviews. Only half the respondents in the postcourse survey acknowledging the usefulness of code reviews. The majority students (80%) attributed any improvement in code quality in their programs to an improvement in their programming skills, which was primarily due to increased practice in writing code and not due to using the PSP. A large proportion of students in the survey acknowledged that they did not see the benefits or relevance of doing size estimations. More students, however, felt it was worthwhile to collect the time data. As the students had no experience of large projects, the students did not see project planning as an essential programming skill. The majority of students thought the time distribution data was very informative and even motivational. Most students considered that collecting defect data was worthwhile, but there needed to be an easier method of collecting and recording it. They acknowledged that defect collection made them aware of their personal weaknesses and that they could use this information to improve the quality of their code in future. Respondents to the post-course survey indicated that their size data (100%) was more accurate than either their time or defect data (80%). This was due to fact they became engrossed in programming activities and may have overlooked recording PSP data at the time of its occurrence. The PSP group was asked if they applied the PSP techniques to their programming activities during their non-PSP session. None of them did so. These replies indicate the students had not yet developed a PSP mind-set. They also indicate that students often do not apply or transfer skills they learn in one area to another area, unless they are explicitly required or told to do so. The students’ negative attitude to the PSP is reflected in the fact that over 80% did not intend to use the PSP in the future. Though the majority of students stated that they found that learning the PSP was easy, they found that applying it was more difficult. The strict waterfall approach of the PSP was difficult to adhere to. A number of students had difficulty analysing and interpreting the outputs from the PSP techniques and applying these results to future estimates. Other students had difficulty initially in understanding the distinction between the various phases in the software development cycle. Some students acknowledged that using the PSP summary forms helped them to gain an understanding of the phases of software development. 5.5 Instructor’s Perspective on the PSP There were particular challenges for the author associated with teaching and implementing the PSP. The students who participated in this case study were all essentially novice programmers. Thus, the students lack of fluency in Java coupled with the data collection activities imposed by the PSP meant that the students often felt overwhelmed by the both the programming and the PSP tasks. It was also difficult to teach the PSP material as an add-on to the existing syllabus, The manual cross-checking and reviewing of student data for validity and consistency was an onerous administrative task for the author, particularly with increasing levels of PSP activities. One side-benefit of doing the PSP, was that the author was provided with a wealth of data on how students approached problem-solving and the types of defects they injected into their programs. The modifications the PSP group made to their programming practices, as a result of being taught the subset of PSP skills, provided evidence to the instructor that students learn what they are taught. The results from the Non-PSP group also provided evidence that PSP techniques and a disciplined method need to be instilled in students. 6 Conclusions and Recommendations 6.1 Conclusions The data from the PSP group indicated that their performance improved in the four areas of interest over the course of PSP instruction. However, this data did not provide conclusive proof of the effectiveness of the PSP. Due to the short duration of the case study, there were too few data points on which to base definitive conclusions. Furthermore, the students exhibited wide variability within their own individual performances. To sum up, the empirical results indicate partial success with realising the objectives of the case study. They also provide encouraging indicators for future research into the most valuable subset of PSP skills beginning programmers should be taught. The most significant finding from the case study, from a teaching perspective, is that these results indicate that students learn what they are taught. If students are not taught planning and estimating activities, they are likely to grossly under-estimate the size of their software products and the time it will take them to develop them. In the absence of being encouraged to use a disciplined method to develop software, they will adopt a trialand-error approach. In the absence of being encouraged to spend more time at the frontend activities of the life-cycle they will spend their time “hacking code” in order to produce a working program. If students’ attention is not drawn to the quantity and nature of the time-consuming defects in their own data, they are not likely to find efficient methods of preventing them. A side-benefit of the quantitative data collected for the case study was that it provided the author with a valuable insight into how each individual student approached software development. This data can be used to target instruction towards areas of weakness and to provide more focused attention and assistance to individual students. In this instance of the PSP, the data clearly indicates that the students need to develop better conceptualising skills. They also need to master both the syntax and structure of the Java language. Additionally, they need to acquire more effective desk-checking skills, which will enable them to perform more thorough and careful code reviews. The anonymous survey conducted at the end of the course indicated that there were several features of the PSP that were issues for the students, for example: the requirement to collect data manually, the practice of reviewing code before the first compile, the recording of defect data and the overhead in collecting PSP data. Section 6.4 on Recommendations for the Future Practice of the PSP addresses the issue of providing automatic tool support for the collection of PSP data. It also looks at adapting aspects of PSP instruction to make it more amenable to students. In many ways, the qualitative results of this study are consistent with the findings of other studies on implementing the software process with undergraduate students and with novice programmers, in particular. Indeed many of these studies advise against teaching the PSP to novice programmers. The author would concur with some of this advice, on the following grounds: The students in this study had not mastered the syntax and constructs of the programming language nor had they a sufficiently strong background in programming to appreciate the value of some of the concepts taught. Nevertheless, the author believes that teaching some type of personal process improvement is worthwhile and that the PSP can be used as a tool to teach good software practices. The benefits of teaching the PSP should outweigh some of the difficulties inherent in the process. Students should therefore acquire a subset of PSP skills before have developed ingrained poor habits of software development. The questions then are: what subset of PSP skills should students be taught and when should they be taught them? PSP skills should also to be reinforced throughout the students’ computing studies so that students develop a PSP mindset. The education system then will have addressed the industry’s need for software developers with good skills, who have been instilled in quality practices. 6.2 Recommendations for the Future Practice of the PSP Future work by the author with the PSP would include adaptations to the initial instruction given on the PSP and adaptations to the method of data collection. It is hoped that this would improve the students’ acceptance of the PSP and actively engage them in learning the concepts of the PSP. The remainder of this section looks at specific techniques on how both the instruction and acceptance of the PSP could be improved in the future. This would include the provision of automated tool support for collecting PSP data. Students would not be introduced to the PSP until they had a sufficiently strong programming background and were fluent with the syntax and constructs of a programming language. In addition to being taught the syntax of a language, they would be taught program design skills which would enable them to conceptualise a solution. Students would also be required to hand up program design material with their PSP data. In order to convince students of the value of the PSP, students would need to implement the PSP for a longer period than the short period of this case study. In addition, each PSP process should be repeated several times before moving on to the next one. This would reduce the cognitive overload on the students and should also help students to develop a PSP mind-set. If the case study were repeated by the author, the study omit the collection of size data and focus on getting the students to acquire good programming habits. Code reviewing skills would also be introduced at an earlier stage. Some further adaptations would be made to the PSP forms to simplify data recording. References: Humphrey, Watts S., 1995. “A Discipline for Software Engineering”, Addison-Wesley, Reading, MA 1995. O’Connor, R., Duncan H. et al, 2001. “Improving the Professional Software Skills in Industry”, Dublin City University, Working Paper Series, CA-0201. -