Forming Groups for Collaborative Learning in Introductory Computer

advertisement
Session S4E
Forming Groups for Collaborative Learning in
Introductory Computer Programming Courses Based
on Students’ Programming Styles:
An Empirical Study
Eustáquio São José de Faria 1, Juan Manuel Adán-Coello2, Keiji Yamanaka3
Abstract – This paper describes and evaluates an approach
for constructing groups for collaborative learning of
computer programming. Groups are formed based on
students' programming styles. The style of a program is
characterized by simple well known metrics, including
length of identifiers, size and number of modules
(functions/procedures),
and numbers of indented,
commented and blank lines. A tool was implemented to
automatically assess the style of programs submitted by
students. For evaluating the tool and approach used to
construct groups, some experiments were conducted
involving Information Systems students enrolled in a
course on algorithms and data structures. The experiments
showed that collaborative learning was very effective for
improving the programming style of students, particularly
for students that worked in heterogeneous groups (formed
by students with different levels of knowledge of
programming style).
Index Terms – Collaborative Learning; Group Formation;
Computer Programming; Programing Style.
INTRODUCTION
Learning computer programming is one of the first and most
challenging tasks encountered by computing students. The
difficulties faced by the students can be perceived by the high
degrees of failure and the difficulties presented in the courses
directly dependent on the abilities to program, to develop a
logical reasoning and to solve problems.
In part, this is the result of the difficulties found by
instructors to effectively guide the students during their
programming lab activities, due to the large number of
students per class.
The literature indicates that collaborative work involving
groups of students can contribute to solve this problem,
provided that adequate mechanisms are available to construct
and mediate group work. When this is done, it has been
observed that students are able to reach improvements on
performance, critical thinking and cooperative behavior [1]
[2].
Collaborative work is based on the assumption that two or
more individuals working together can reach a state of
equilibrium where ideas can be exchanged and distributed
among group members, generating as a result new ideas and
knowledge [3] [4].
Some research projects have focused on the use of
collaboration for learning abilities related to Computer
Science. A good example of this type of work is given by
COLER, an environment for collaborative learning of entityrelationship modeling [5]. In COLER an entity-relationshipmodeling problem is presented to the students that construct
individual solutions in a private workspace. Then the students
use a shared workspace to collaboratively construct a new
entity-relationship diagram. The design of COLER was based
on socio-cognitive conflict and cognitive dissonance theories
[6] [7]. COLER’s main functions are to detect learning
opportunities and to coach collaboration. COLER recognizes
learning opportunities by identifying a number of relevant
syntactic dissimilarities between individual and group
diagrams.
Doise and Mugny [6] understand the socio-cognitive
conflict as a social interaction induced by a confrontation of
divergent solutions from the participant subjects. They
highlight the importance of the socio-cognitive conflict on the
process of learning, and founded on the Piagetian theory of
equilibrium [4] argue that from that interaction the individuals
can reach higher states of knowledge.
Based on the above ideas, a collaborative strategy for
learning computer programming can benefit from the
construction of groups where students can learn from
controversy when discussing their solutions for a given
problem. On way of constructing such groups is to analyze
students’ programs to find characteristics that evidence
significant differences in some respect, for instance program
quality. The differences should be relevant enough to motivate
the students to discuss them.
SOFTWARE QUALITY METRICS
Besides meeting its functional specification a program has
several other attributes that reflect its quality. These attributes
1
Eustáquio São José de Faria, Instituto de Informática, Pontifícia Universidade Católica de Minas Gerais – Arcos, Brasil, eustaquio@pucminas.br
Juan Manuel Adán Coello, Faculdade de Engenharia de Computação, Pontifícia Universidade Católica de Campinas, juan@puc-campinas.edu.br
3 Keiji Yamanaka, Faculdade de Engenharia Elétrica, Universidade Federal de Uberlândia, Brasil, keiji@ufu.br
2
1-4244-0257-3/06/$20.00 © 2006 IEEE
October 28 – 31, 2006, San Diego, CA
36th ASEE/IEEE Frontiers in Education Conference
S4E-6
Session S4E
are not directly related to what the program does. Instead, they
reflect the structure and organization of the source code and
the related documentation and are usually considered
important to discern good from bad programs and
programmers [8].
The attributes that reflect the quality of a program are
empirically known by expert programmers, but difficult to
teach to novices, because they demand much abstraction and
experience and because many of them are not well founded.
Several metrics were proposed to measure the quality of
programs, and some tools were developed to automatically
asses it [9] [10] [11] [12] [13] [14] [15]. Most of the available
tools generate a report that is presented to the user, and some
return a single numeric value that represents the overall
quality of the evaluated program [9] [10].
Most of the metrics proposed to measure the quality of
programs are related to the style used in their construction.
Program style is a subject scarcely discussed in programming
courses and books. Usually the importance of programming
style for producing good programs is not adequately
emphasized. But, against the general intuition, source code is
written primarily to be read by people not to be processed by
compilers. The source code is read, reviewed and modified by
different people in different times for different tasks, but it
will not be processes by a compiler many times [16].
In this research, the quality of a program is assessed using
some of the most common metrics of style in the literature.
They measure the student’s capacity to write well-organized
code:
1. Identifier length
2. Percentage of identifiers defined as constants (define /
const);
3. Module length (functions and procedures);
4. Number of modules
5. Percentage of indented lines
6. Percentage of comment lines
7. Percentage of blank lines
The importance of each of these attributes can be justified
as follows:
• Identifier length: the number of different identifiers used
in a single module can reach some hundreds. Considering
all the modules of a medium size program, several
thousand different elements can be in use, each with its
particular name. Memorizing the meaning of all these
names is a very hard task. Therefore, when reading the
source code, it is desirable that the precise meaning,
declaration point and computational properties of each
name could be determined without having to consult
complementary documentation, or even variable and
function declarations. Usually very short identifiers do not
express conveniently its meaning.
• Percentage of identifiers defined as constants: the
inclusion of constants as literal values directly in the
source code (for example: 3.14, "parameter missing")
complicates program maintenance. When performing any
modification that affects one of these constants, it is
necessary to find all places where they are used. Besides,
constants with different meanings can have the same
value, making it difficult to correctly modify the program.
Module length and number of modules: problem
decomposition is a key factor for complexity reduction.
Complexity is synonymous of variety, that is, the number
of different situations that the problem can present.
Though, when decomposing a problem into subproblems,
the programmer is invariably dividing also its complexity,
and as a consequence simplifying the solution.
Percentage of indented lines: indentation is the technique
that involves shifting code text to the right or to the left
whenever there is a change in scope/block. Indentation
improves code readability, helping in evidencing the
logical structure of a program.
Percentage of comment lines: comments constitute the
main component of the internal documentation of a
program. They help the reader to understand the objective
of each code unit. They also help in explaining the logic
of difficult sections. Novice programmers rarely are
instructed on how to write good comments, albeit writing
good comments is perhaps as important and difficult as
writing good programs. Good comments can not improve
a bad codification but bad comments can seriously worsen
a good code.
Percentage of blank lines: blank lines help the
programmer to perceive the border between blocks. Blank
lines influence the readability of a program, much as
natural language texts.
•
•
•
•
A TOOL FOR PROGRAM QUALITY ASSESSMENT (PQA-C)
PQA-C is a tool implemented to asses the quality of student
programs written in the C programming language, according
to the set of style metrics presented above.
PQA-C computes a score for each style metric, using a
method similar to Berry and Meekings scoring method [9],
illustrated in figure 1. In the graph shown in figure 1, L is the
point below which no score is obtained; S is the start of the
ideal expected range for the metric; F is the end of the ideal
range and H is the point above which no score is obtained.
If vi is the value found for metric i (metric value), the score
attributed to this metric, si, is computed as follows:
si = 100 ; if S ≤ vi ≤ F
si = ((vi – L) / (S – L))*100; if L ≤ vi ≤ S
Score
100
0
L
S
F
H
Metric Value
FIGURE 1
METRIC VALUE CALCULATION
1-4244-0257-3/06/$20.00 © 2006 IEEE
October 28 – 31, 2006, San Diego, CA
36th ASEE/IEEE Frontiers in Education Conference
S4E-7
Session S4E
si = ((H – vi) / (H – F))*100 ; if F ≤ vi ≤ H
si = 0 ; if vi < L or vi > H
Using the scores for each attribute, it is computed a global
score (GS) that represent the overall quality of the program:
GS = ∑i si
Besides computing the scores for the 7 quality metrics
and the GS, PQA-C generates a Report of Anomalies and
Solutions. Based on the score of each metric, this report
informs the students which are the main stylistic deficiencies
found on his program and gives suggestions on how to remove
them, without showing to the student the ideal value for each
metric.
EXPERIMENTAL EVALUATION
A number of experiments were conducted for evaluating the
strategy proposed for forming groups and the supporting
PQA-C tool. The experiments involved students pursuing a
major on Information Systems at PUC-Minas (Pontifícia
Universidade Católica de Minas Gerais, Brasil), campus
Arcos, enrolled in an Algorithms and Data Structures course.
I. Forming groups
The experiments involved 32 students, 24 distributed into 6
groups of 4 students each and 8 working individually. The
groups have 4 students because groups with many elements
tend to present difficulties to organize their work and share
information. Conversely, groups that are too small may be
unable to provide a sufficient rich environment for idea
generation and discussion.
The experiments measured and compared the learning
gain of students working in homogeneous and heterogeneous
groups and individually.
The students had to accomplish three programming tasks.
All tasks were first developed by each student individually and
then redone by the groups. All programs produced by the
individuals and by the groups were evaluated by the PQA-C
tool. Students working individually developed the same
program twice. The second time, after having the first version
of the program evaluated by the PQA-C tool.
Groups were formed using the scores computed for the
first program developed individually. The maximum global
score (GS) for a program is 700. A GS lower than 210 (30%
of the maximum GS) was considered low; a GS between 210
and 400 (60% of the maximum GS) was considered medium;
a GS greater than 400 was considered high.
From the 32 students that performed the first
programming task, 8 get a high GS, 8 get a medium GS and 16
get a low GS. Using these scores, students were grouped as
follows:
• Groups 1 and 2 (heterogeneous): 2 students with high GS
and 2 with low GS each.
• Group 3 (homogeneous-high): 4 students with high GS,
• Group 4 (homogeneous-low): 4 students with low GS;
•
Groups 5 and 6 (homogeneous-medium): 4 students with
medium GS each;
In addition to the above groups, 8 students with low GS
worked individually in all three tasks, in order to permit to
compare the effects of collaborative work versus individual
work in the considered learning task. No student with initial
medium or high GS worked individually for two reasons: (1)
the number of students with these scores was relatively low
and (2) it was a particular interest in focusing the study on the
effects of collaborative work on weaker students. Future
experiments should be conducted to evaluate the effects of
collaborative versus individual work also on students with
medium and high GS.
Groups worked together during lab classes. Students were
always in possession of their programs and the corresponding
reports generated by PQA-C. The members of each group
were asked to exchange their programs and reports and try to
find in their colleagues’ programs the anomalies mentioned in
the respective reports. Then, each student discussed with their
group mates the likely sources of anomalies in the examined
program and on how to improve the program.
In each group, a leader randomly chosen and previously
oriented by the instructor mediated the collaboration between
their group colleagues and conducted the construction of a
new version of the program. The new program was submitted
to the PQA-C and a new report was generated and presented to
the students. They were asked to compare and contrast the
program they produced individually with the program
produced by the group, and the respective reports. Students
were also asked to comment on the salient differences between
their programs and the program produced by the group.
In parallel, the students working individually also
developed two versions of the same program. The second
version was written after the students read the report generated
by PQA-C for the first version.
The same cycle was repeated for tasks 2 and 3. All
programs produced and respective reports were stored to be
analyzed afterward by the instructor.
II. The Programming Tasks
Programming task 1 required the development of a program to
read data for a telephone directory with a constant number of
entries. After initializing the phone list entries, a menu had to
be presented to the user with the following options: (1) search
phone number by subscriber name; (2) search subscriber by
phone number; (3) exit.
Programming task 2 required to construct and maintain a
list of N (a given constant) students enrolled in a course. The
list had to be kept ordered using the bubble sort algorithm. A
menu had to be presented to the user with the following
options: (1) insert a student; (2) sort the list by name; (3) sort
the list by student grade; (4) display all approved students; (5)
display all students; (6) exit.
Programming task 3 consisted of writing a program to
control a dynamic ordered list of guests for a party. The user
had to be guided by a menu with the following options: (1)
1-4244-0257-3/06/$20.00 © 2006 IEEE
October 28 – 31, 2006, San Diego, CA
36th ASEE/IEEE Frontiers in Education Conference
S4E-8
Session S4E
TABLE I
GLOBAL SCORES FOR INDIVIDUALS (IGS) AND GROUPS (GGS)
Task 1
IGS1
Group
GGS1
Task 2
IGS 2
GGS 2
Task 3
IGS 3
GGS 3
Student
1
530
1
4
0
Heterogeneous
9
67
570
532
225
150
625
570
350
267
Task 1 → 3
IGS 3
/ IGS 1
1,18
600
∞
3,98
17
467
375
533
1,14
5
500
625
575
1,15
2
20
475
Heterogeneous
25
33
26
50
350
433
8,66
1,12
513
550
133
567
633
250
3
467
475
525
3
10
433
400
467
Homogeneous-High
16
425
4
HomogeneousLow
5
HomogeneousMedium
6
HomogeneousMedium
642
367
533
375
625
667
7,58
- 1,12
420
433
525
1,25
100
275
367
3,67
13
100
30
117
267
425
375
325
433
2,77
133
250
325
2,44
7
300
375
433
1,44
11
350
14
402
23
375
433
500
1,33
15
325
350
333
1,02
18
300
333
425
19
363
29
325
617
insert a guest in the list; (2) remove a guest from the list; (3)
display the number of guests; (4) display the names of all
guests; (5) display the names of all guests from a given city.
III. Results
Table I shows the GSs for the programs produced by the
members of the groups and by the groups themselves. Table II
presents the results for the students that worked individually.
Comparing the scores of students 1, 17, 5 and 20, from
the heterogeneous groups, that achieved an initial high GS,
with the scores of the members of the homogeneous-high
group, it is possible to notice that the scores of the students
from the heterogeneous group had, on the average, a higher
increase than that of the members of the homogeneous-high
group. This suggests that the stronger students also benefit
from the collaboration with weaker students. As a matter of
fact, the teaching activity gives new perspectives about the
object of study, consolidating previously acquired knowledge.
325
425
375
367
533
525
267
550
400
375
550
433
1,04
3,78
32
614
1,22
1,08
6
333
1,13
1,33
31
440
GGS 3
/ GGS 1
- 1,02
- 1,24
1,37
- 1,10
1,42
1,10
-1,30
1,15
Table I also shows some interesting data concerning the
evolution of scores for students that got an initial low GS. The
students with an initial low GS that worked on heterogeneous
groups (students 4, 9, 25 and 26) increased their individual
scores more than 574% on the average; the scores of students
from the homogeneous-low group had an average increase of
216%; finally, the 4 higher increases among the students that
worked individually was of only 51% on average (students 12,
14, 27 and 28). This is consistent with the literature [17] [18]:
heterogeneous groups are usually more effective for individual
learning because heterogeneity naturally produces controversy
more frequently. Nevertheless the homogeneous-low group
(group 4) also presented a good level of individual evolution.
As expected, programs developed by groups scored
higher than programs developed by their members. This can
be credited to the collaboration inside groups, but also to the
fact that groups members had already developed a version of
the programs that had already been evaluated by PQA-C.
1-4244-0257-3/06/$20.00 © 2006 IEEE
October 28 – 31, 2006, San Diego, CA
36th ASEE/IEEE Frontiers in Education Conference
S4E-9
Session S4E
2
203
175
175
233
200
225
- 1,01
8
150
75
175
275
167
225
1,11
12
100
150
133
100
150
150
1,50
21
167
200
133
167
125
175
- 1,25
22
100
133
125
233
133
133
1,33
24
100
142
175
225
150
233
1,50
27
100
150
133
250
167
267
1,67
Although more experiments have to be conducted, the
reported experiments point out that collaboration has very
positive impacts on learning how to program, at least in
improving students programming style.
The experiments also verified that the mediator is
fundamental. When the mediator was not present,
collaboration not occurred or it was very precarious. In
heterogeneous groups, in the absence of a mediator high GS
students tried to impose their positions on low GS students
without any discussion. In the homogeneous-high group, the
absence of a coordinator led to generalized chaos, where each
member believed that his position should prevail. In the two
homogeneous-medium groups, collaboration occurred
naturally in the absence of the mediator. Finally, in the
homogeneous-low group, when the mediator was absent
students became apathetic, without even knowing how to start
the discussion.
28
200
200
225
100
275
367
1,38
FUTURE WORK
TABLE II
GLOBAL SCORES FOR STUDENTS WORKING INDIVIDUALLY (IGS)
Task 1
Task 2
Task 3
Task 1 → 3
Student IGS1 IGS 11 IGS 2 IGS 2 IGS 3 IGS 33
Average IGS 3 / IGS 1
IGS 3 / IGS 1
1,28
Comparing the GSs for tasks 1 and 3, it is evident that the
programs of heterogeneous groups accounted for an increase
well above of what was verified for the remaining groups.
Table I shows that increase for heterogeneous groups (groups
1 and 2) was 13% and 22%, respectively, while for the
homogeneous-high group (group 3) the increase was of only
4%. In the remaining groups, the GS decreased. This was an
unexpected result, because the GSs for the programs of those
groups’ members had a positive increase and, in most cases,
the GS for the programs written by the groups were noticeable
higher that the GS of its members.
CONCLUSION
The conducted experiments make it evident that collaborative
work resulted in more understandable programs. Besides
verifying that the global scores (GS) of programs increased at
each activity, a manual inspection of the produced programs
performed by the instructor at the end of each experiment
showed that the programs were indeed progressively closer to
the expected readability standards.
The higher level of increase in the GS was verified among
the students that had an initial low GS and worked in groups,
especially in heterogeneous groups. By the other hand, if
heterogeneous groups accounted for higher increases in GSs,
they were the most resistant to collaboration at the beginning
of the experiments. It was also observed in these groups that
students with low GS tended to be submissive towards high
GS students. The submission was only broken when the
instructor joined the groups and stimulated the participation of
all group members. By contrast, it was observed that students
were more collaborative in homogeneous groups. Conflicts in
those groups were usually more balanced.
The experiments also verified that students with an initial
low GS had much higher increases in GS when working in
group than when working individually.
There are three main foreseen directions to continue this work.
The first is the development of new experiments that help in
confirming and refining the results described above; the
second is the extension of the PQA tool; and the third is the
construction of a mediator agent.
The PQA tool can be extended in a number of ways. For
example, currently it supports only the C programming
language. It would be very helpful if it could also analyze
programs written in other modern languages also used to teach
programming for novices, as Java. The metrics used to assess
program quality and the method employed for scoring and
comparing programs can also be improved. For example,
PQA-C does not take into consideration the semantics of the
identifiers used in a program, it only verifies their lengths.
Although usually longer identifiers are more expressive than
short identifiers, rigorously there is no direct relation between
identifier length and semantic meaning. The use of ontologies
can be of use here to verify if a given identifier is meaningful
in the context of a given program domain.
Currently all metrics have the same weigh when
computing a program GS. However, all expert programmers
know that not all style metrics contribute equally for
producing good style, readable programs, though it is not
evident the relative importance of each one.
The experiments have also shown that the effectivity of
group work in many cases depends heavily in the mediation of
their members interactions. Because it is difficult for a single
instructor to adequately mediate the work of large numbers of
groups, a computer supported environment for collaborative
learning will be most helpful. This environment should
integrate tools for group formation, group interaction and
group mediation.
REFERENCES
[1]
Suthers, D.D., "Computer Aided Education and Training Initiative",
Technical Report, Learning Research and Development Center,
University of Pittsburgh, 1998.
1-4244-0257-3/06/$20.00 © 2006 IEEE
October 28 – 31, 2006, San Diego, CA
36th ASEE/IEEE Frontiers in Education Conference
S4E-10
Session S4E
[2]
Gokhale, A.A., "Collaborative Learning Enhances Critical Thinking",
Journal of Technology on Education, Vol. 7, No. 1, 1995.
[3]
Lévy, Pierre. “Collective Intelligence: Mankind's Emerging World in
Cyberspace”, Perseus Books; January 2000.
[4]
Piaget, J. “The Development of Thought: Equilibration of Cognitive
Structures”.New York: Viking Penguin, 1977.
[5]
Constantino-Gonzalez, M. A., D. D. Suthers, J. G. E. de los Santos.
“Coaching Web-based Collaborative Learning based on Problem
Solution Differences and Participation”. International Journal of
Artificial Intelligence in Education Vol. 13, No. 2-4, 2003, pp. 263 –
299
[6]
Doise, W., and Mugny, W. “The social development of the intellect”.
Oxford: Pergamon Press, 1984.
[7]
Festinger, L. “A theory of cognitive dissonance”. Stanford University
Press, 1957.
[8]
Sommerville, I.. “Software Engineering”, 7th Ed., Addison Wesley,
2004.
[9]
Berry, R. E. and Meekings, B. A. E. “A Style Analysis of C Programs”.
Communications of the ACM, Vol. 28, No. 1, January 1985, pp. 80 – 88.
[10] Hung, S.; Kwork, L. and Chan, R. “Automatic Program Assessment”.
Computers and Education, Vol. 20, No. 2, 1993, pp. 183-190.
[11] Schorsch, T.. “Cap: An Automated Self-Assessment Tool To Check
Pascal Programs For Syntax, Logic And Style Errors”. SIGCSE’95,
Nashville, TN, USA, March, 1995.
[12] Mengel, S. A. e Yerramilli, V. “A Case Study Of The Static Analysis Of
The Quality Of Novice Student Programs”. SIGCSE’99, New Orleans.
LA, USA, march, 1999.
[13] Xenos, M.; Stavrinoudis, D.; Zikoulli, K. e Christodoulakis, D. “Objectoriented metrics – a survey”. Proc. of the FESMA - Federation of
European Software Measurement Associations, Madrid, Spain, 2000.
[14] Jackson, D.. “A Semi-Automated Approach to Online Assessment”
iTiCSE 2000, Helsinki, Finland, July, 2000.
[15] Purao, S. e Vaishnavi, V. “Product Metrics for Object-Oriented
Systems”. ACM Computing Surveys, Vol. 35, No. 2, June, 2003, pp.
191-221.
[16] Oman, P. W. e Cook, C. R. “A programming style taxonomy”. Journal
of Systems and Software. Vol. 15, No. 3, July 1991.
[17] Ellis, A. P. J. E., J. R. Hollenbeck and D. R. Ilgen, Christopher O. L. H.
Porter, B. J. West. “Team Learning: Collectively Connecting the Dots”.
Journal of Applied Psychology. Vol. 88, No. 5, 2003, pp. 821–835.
[18] Maier, N. “Problem Solving and Creativity in Individuals and Groups”.
Brooks/Cole, Belmont, CA, 1970.
1-4244-0257-3/06/$20.00 © 2006 IEEE
October 28 – 31, 2006, San Diego, CA
36th ASEE/IEEE Frontiers in Education Conference
S4E-11
Download