Does Having Had Course in Logic Help Those Tackling Logic

advertisement
Exploring the Relationship between Logic Background and
Performance on the Analytical Section of the GRE
Elizabeth L. Bringsjord, Ph.D.
Assistant Provost, Office of Academic Affairs
State University of New York, Albany NY
Selmer Bringsjord, Ph.D.
Professor of Logic, Cognitive Science, and Computer Science
Associate Chair, Department of Cognitive Science
Director, Minds and Machines Laboratory
Rensselaer Polytechnic Institute (RPI), Troy NY
The test description for the Analytical section of the Graduate Record Examination
(GRE)—a test of general reasoning ability that includes both logical reasoning (LR) and
analytical reasoning (AR) items—states that the section does not assume any formal training in
logic. What is not discussed in that description is the relationship between logic background
and performance—that is, does the section test, at least in part, for the kind of reasoning taught
in symbolic logic courses, or only for a kind of reasoning acquired more generally? Is there
any advantage accruing to examinees formally trained in logic? If so, what sort of advantage:
speed, accuracy, or both? Would such an advantage suggest that the Analytical section of the
GRE is coachable by way of logic training, in any meaningful sense? The purpose of this
study was to explore the relationship between logic background and performance on the
analytical section of the GRE. Specifically, the following questions were investigated: Is
formal training in logic associated with shorter response time on both GRE Analytical item
types? One of them? Neither? Likewise, is formal logic training associated with greater
accuracy? Does test mode—i.e., computerized-adaptive (CAT) versus paper-and-pencil
(P&P)—moderate these effects in any way? If so, how? How do problem-solving strategies
differ, if at all, between those examinees with extensive logic background and those with less
training? Do examinees point to training in logic as important or useful for this section of the
GRE?
Using a unified theory of cognition and an information-processing framework, the aim of
this study was to dissect cognitive processes across individuals with varying levels of logic
preparation exposed to two different test environments. Accordingly, the study adopted
Anderson’s (1983, 1989, 1993; Anderson & Lebiere, 1998) theory of cognition, Adaptive
Control of Thought (ACT-R), and focused on the mental activity of examinees taking the
analytical section of the GRE. Predictions based on ACT-R were tested. For example, ACT-R
predicts that extensive background in logic should be associated with shorter response times
and increased accuracy—if that background is relevant to the tasks—because of higher
activation of declarative and procedural memory. According to ACT-R, humans almost always
isolate one goal at a time, and then invoke a production (an if-then rule) for realizing that goal.
Often more than one production can be used to reach a goal. In ACT-R, mediating between
competing goals and productions is controlled through a process called “conflict resolution;”
each time a goal becomes the current goal, a search for a production is triggered. Most of the
time more than one production will be found, and conflict resolution will begin. One testable
Logic Background and GRE Analytical Performance p. 2
consequence of this cognitive processing is reduced response time for individuals with
extensive background because they: have access to well-rehearsed, relevant productions; can
more easily identify the most efficient procedures; and are less likely to experience prolonged
cognitive conflict that would slow down processing time. The theory further predicts that
additional cognitive demands in the form of, for example, a cognitively demanding test mode
(e.g., CAT), may moderate the relationship between relevant background knowledge and test
performance.
This study is part of a larger investigation (E. Bringsjord, 2001a, 2001b, 2000) exploring
individual differences (including differences related to logic background) in examinee
experience, particularly the cognitive experience, of taking the GRE Analytical test in CAT
versus P&P environments. The methodology included both qualitative and quantitative
dimensions. Data were collected using paper-and-pencil instruments, videotaped recordings of
behaviors, observation, and interview. In addition, verbal protocol analysis (Ericsson &
Simon, 1993) was used to elucidate differences in cognitive processes across individuals—
with varying logic background—and test condition. Also included in the discussion are
findings from a relevant study by Rinella, Bringsjord, and Yang (2001) which demonstrated
pretest-posttest differences among undergraduate college students (n = 100) following
instruction in symbolic logic on reasoning tasks purported by Cheng and Holyoak (1985) to be
insensitive to such training.
Theoretical Framework
Anderson’s ACT-R
This study took John Anderson’s (1993; and Anderson & Lebiere, 1998) Adaptive
Control of Thought (ACT-R) architecture to be an accurate model of information processingbased cognition, particularly cognition involved in problem solving. But what is ACT-R?
Why is it appropriate for this study? And what does it imply in connection with the
performance of examinees tackling the Analytical section of the GRE?
ACT-R is intended by Anderson (Anderson & Lebiere, 1998) to fulfill Alan Newell’s
(1973) dream that psychology would eventually yield an information-processing model of
such complexity and comprehensiveness that it would capture all of human cognition. “ACTR consists of a theory of the nature of human knowledge, a theory of how this knowledge is
deployed, and a theory of how this knowledge is acquired” ( Anderson & Lebiere, 1998).
Another way to look at this is to say that ACT-R (or perhaps a descendant), if meticulously
implemented in a computer program, would give rise to an artificial agent as smart and
flexible as a human person. Obviously, if ACT-R is good enough to model human cognition
completely, it should include the information processing engaged in by examinees taking the
GRE and other such tests.
ACT-R is the result of more than 20 years of refinement of previous architectures. The
sequence of such architectures starts with ACTE (Anderson, 1976), then moves to ACT* (i.e.,
‘ACT Star’, Anderson, 1983), and then to ACT-R version 2.0 (Anderson, 1993), and finally to
ACT-R v. 4.0 (Anderson & Lebiere, 1998), the theory used here. This sequence was initiated
out of the Human Associative Memory (HAM) Theory of human declarative memory
(Anderson & Bower, 1973). HAM was much more limited than even the first member of the
Anderson’s ‘ACT’ sequence, ACTE. The fundamental reason is that ACTE and its
Logic Background and GRE Analytical Performance p. 3
descendants postulate much more than declarative memory; for example, they assume that
human cognition is also composed of processing over IF-THEN rules called productions (or
production rules), which are described below. Thus, ACT-R assumes both declarative and
procedural memory.
Figure 1 provides an overview of ACT-R. As that diagram indicates, ACT-R is
composed of four main components: Current Goal, Goal Stack, Declarative Memory, and
Procedural Memory. Current Goal and Goal Stack, together, pretty much amount to what is
often called “working memory” in cognitive psychology: it is the “working scratchpad” that
holds sensory information coming in from the environment, and the result of processing that
information (sometimes in conjunction with information from long-term memory). The Goal
Stack holds the hierarchy of those things the agent intends to reach or make happen. Anderson
and Lebeire (1998) liken the Goal Stack to a stack of trays at a cafeteria; the first one in is the
last one out and the last one in is the first one out.
Figure 1
Architecture of Anderson’s (Anderson & Lebiere, 1998) ACT-R cognitive model
In the case of goals, when one is “popped,” or removed, the next most recent is
retrieved. A goal stack records the hierarchy of intentions, in which the bottom goal is the
most general and the goals above it are subgoals set in its service. (Anderson & Lebiere, 1998,
p. 10)
Logic Background and GRE Analytical Performance p. 4
The ‘Current Goal’, as its label suggests, reflects that which the agent has encoded and
is currently focussed on obtaining. To take as an example one given by Anderson and Lebiere
(1998), suppose a human agent is confronted with an arithmetic problem, say 3 + 4 = ? If the
agent in question has “zeroed in” on this problem, and has for the time being allowed goals
like “Pass my upcoming math test” to be left out of current processing, and hence left that goal
in the goal stack, then Current Goal will be set to “Solve the arithmetic problem: 3 + 4 = ?”
Once the current goal is fixed, productions (i.e. production rules) that relate to Current Goal
are retrieved, and activated. This may strike the reader as a trivial example, but problems on
the analytical section of the GRE—particularly the analytical reasoning items—would seem to
give rise to very similar embodiments of ACT-R in those who try to solve these items.
Now what about ‘Declarative Memory’? This area of the architecture holds factual,
propositional information that remains fairly stable over time, and episodic information as
well. For example, you know that George W. Bush is President of the United States, that New
York City is north of Miami, that 56 multiplied by 2 is 112, that such-and-such happened to
you last holiday season, and that you ate dinner last night after doing such-and-such, which
happened after you did this-and-that. Figure 2 shows a network representation of the fact that
3 + 4 equals 7. If you study this diagram for a minute, you might wonder what the ‘Bi’ and
‘Sji’ strings are there for. ‘Bi’ here is a variable that can hold the overall activation value of the
fact. (As you read this, such facts as the GRE includes an analytical section probably have a
high activation level in your mind, whereas facts like “Buffalo is west of Boston” probably
have a low level.) ‘Sji’ is also a variable, one whose values will indicate the strength of the
association between the concept at the start of the arrows (3, 4, 7) and the fact itself (3 + 4 =
7). (The connection between the number 3 and the fact that 3 + 4 = 7 is probably a fairly
strong one for you, whereas the associative connection between ‘green’ and this fact is perhaps
as low as zero.) Presumably, level of background knowledge can impact these quantities. For
example, if an examinee recognizes the structure of a logical reasoning item as a “classic”
example of a logical fallacy, then relevant rules of inference may acquire higher activation
levels and incidental information relating to (say) the contextual features of the problem
should have lower activation levels.
Figure 2
Example of Declarative Information in ACT-R architecture
Logic Background and GRE Analytical Performance p. 5
One last point about declarative memory: it was said above that this memory is
“propositional” in nature. This is indeed true, despite the rather exotic-looking network in
Figure 2. To realize this, you have only to note that sometimes information in declarative
memory is represented by Anderson and his colleagues in textual rather than network form, as
in Figure 3 below.
Figure 3
Textual representation of an addition fact (from Anderson & Lebiere, 1998, p. 23)
Fact 3 + 4
is a
ADDITION-FACT
addend1
Three
addend2
Four
sum
Seven
Now for ‘Procedural Memory’, which is one of the distinctive aspects of ACT-R. This
component is composed of procedural knowledge, which essentially means “knowing how”
(rather than “knowing that,” which is the kind of knowledge in declarative memory).
Procedural memory stands at the very heart of ACT-R because it points toward the basic
building block for the architecture as a whole, viz., a production. A production is a
conditional, or an IF-THEN statement; it says that if some condition C is true then some action
A should be performed. It is very important to realize that the actions involved needn’t be
physical actions; they can be mental in nature. And productions can be chained together.
Cognitive Load and Interference in CAT-Based Analytical Reasoning Items
When certain types of items are rendered in the CAT mode, the examinee is forced to
work in multiple modes that are not present when these items are presented in traditional P&P
form. For example, the analytical reasoning (AR) items on the GRE Analytical subtest
“scream out” for scrap paper in order to carry out diagrammatic reasoning. (Kaplan, Princeton
Review, etc. all prepare students for these items by trying to teach them how to use diagrams
on scrap paper. See also Stenning, Cox, and Oberlander’s 1995 paper, regarding diagrams
created by subjects attempting to solve similar problems.) This means that the moment such
items are rendered in CAT form, test-takers are confronted with the prospect of having to deal
with pencil, paper, and keyboard and computer screen. The increase in cognitive load is patent
from an intuitive standpoint. Presumably the behavior of subjects on the test in question will
provide data sufficient to ascertain whether or not there has in fact been interference and
additional load. In particular, response time should be affected by such conditions.
In summary, ACT-R would suggest that extensive logic preparation is likely to be
associated with enhanced performance on the GRE Analytical section, a test of reasoning and
CAT—a cognitively demanding test mode—is likely to be associated with reduced
Logic Background and GRE Analytical Performance p. 6
performance. The former due to higher levels of activation of relevant declarative and
procedural knowledge, and the latter due to cognitive interference.
Methodology
Subjects
College undergraduates who had never taken the Graduate Record Exam were
recruited from Rensselaer Polytechnic Institute, Troy, NY. A total of 42 volunteer subjects
participated, but two subjects had to be dropped due to equipment malfunction in one case and
failure to complete the study in the other, leaving a total sample of 40 subjects (n = 20 per
group). Subjects ranged in age from 18 to 21 years with a mean of 19.67 years. All subjects
were full-time students; more than half indicated they were pursuing a dual major and roughly
one third identified a minor. Academic majors among subjects were predominantly computer
science and/or engineering (55%), with 35% subjects identifying information technology
majors and the remaining 10% either management or mathematics-related majors. The sample
was largely male (80% or n = 32) and predominantly White, non-Hispanic (77.5% or n = 31)
with 17.5% Asian (n = 7) and 5% Black (n = 2). Subjects were randomly assigned to one of
two testing conditions (n = 20 per group). All took the analytical section of the Graduate
Record Examination—one group using the POWERPREP software (Educational Testing
Service, 1997) and the other using a paper and pencil version. At the end of the testing session,
all subjects received a printed score report along with information about score interpretation.
Design
The cognitive experience of examinees—with varying levels of logic background—
exposed to two modes of testing were studied using a mixed method design (Greene, Caracelli,
& Graham, 1989). The quantitative portion of the study included pre and post measures as well
as data collected during the testing session (e.g., response time per item). The qualitative
aspects of the design primarily involved observational data collected during the test, analysis
of artifacts (e.g., scrap paper), and posttest interviews—including use of verbal protocol
analysis for two analytical reasoning items on each subject’s test. All subjects were video
recorded throughout the testing session and interviewed briefly following the testing session.
Thus, qualitative and quantitative data were collected on all subjects. It was anticipated that
the qualitative data would complement the quantitative data, by providing a ‘window’ into the
cognitions of these test-takers. Furthermore, the procedure provided opportunity for validation
of qualitative self-report data with simultaneously collected measures such as mean response
time and accuracy.
This study followed an independent two-group true experimental design in which
subjects were randomly assigned to one of two treatment conditions. The design is depicted
schematically in Table 1.
Table 1
Research Design
Assignment
R
Group
Experimental 1
Before
O1
Treatment
X1
After
O2
Logic Background and GRE Analytical Performance p. 7
R
Experimental 2
O1
X2
O2
Before-treatment observations (O1) included background in logic and SAT score.
Treatment was one of two testing modes: X1 represents the CAT mode; X2 represents the
paper and pencil mode. Post-treatment observations (O2) included test scores, response time,
scrap paper use, and self-reported cognitions.
Analytical Section of GRE
Paper-and-pencil and computerized-adaptive versions of the analytical section of the
Graduate Record Examination (GRE) were administered to subjects assigned randomly to one
of two experimental groups. The tests used were those published by the Educational Testing
Service itself in their “PowerPrep” (Educational Testing Service, 1997) software bundle—for
the CAT group, and General Test Form GRE 90-16 (ETS, 1990)—for the P&P group.
As the name implies, the analytical section(s) of the GRE are intended to measure the
ability to think analytically. Two types of items are found in the analytical subtest: analytical
reasoning (AR) items and logical reasoning (LR) items. AR items test
…the ability to understand a given structure of arbitrary relationships among fictitious
persons, places, things, or events, and to deduce new information from the
relationships given. Each analytical reasoning group consists of (1) a set of about three
to seven related statements or conditions (and sometimes other explanatory material)
describing a structure of relationships, and (2) three or more questions that test
understanding of that structure and its implications. Although each question in a group
is based on the same set of conditions, the questions are independent of one another;
answering one question in a group does not depend on answering any other question.
(Educational Testing Service, 1994, p. 34)
And LR items test
…the ability to understand, analyze, and evaluate arguments. Some of the abilities
tested by specific questions include recognizing the point of an argument, recognizing
assumptions on which an argument is based, drawing conclusions and forming
hypotheses, identifying methods of [an] argument, evaluating arguments and counterarguments, and analyzing evidence. (Educational Testing Service, 1994, p.40)
Procedure
After obtaining informed consent, subjects were asked to complete a questionnaire that
included demographic and other background information, such as background in logic and
computer experience—including familiarity with computer-based testing. Following the
collection of background and baseline information, each subject received a set of scripted
directions designed to simulate a true test-taking situation. The script also made use of
imagery. Subjects were told to imagine they were about to take the Graduate Record Exam for
real, that the test was very important to them, and that their performance would be a major
factor in determining whether they were accepted into their first choice graduate program.
Subjects were instructed to provide their best answer to each question. In order to encourage
effortful participation (a concern in any study when the test doesn’t really “count”), subjects
were told that participation would likely benefit them in two ways. First, they would be better
Logic Background and GRE Analytical Performance p. 8
able to gauge their level of preparedness; and second, they would get practice taking part of
the GRE, which could lead to improved performance on the actual test. All subjects received a
score report at the end of the testing session with score interpretation information. The
researcher discussed the score report with each subject and answered questions about the
report and the test. It was hoped that this ‘self-diagnostic’ aspect of the study would provide
sufficient intrinsic motivation for optimum performance.
After appropriate instruction, the test session began. In both test conditions, all relevant
activity was video recorded for subsequent analysis. Immediately after subjects finished the
testing session (typically 60 minutes) they completed an investigator-created questionnaire
concerning their perceptions of and attitude toward the test and testing mode. Finally, all
subjects were interviewed using a short verbal protocol of approximately15 minutes wherein
they solved two problems from the first analytical reasoning (AR) problem set encountered on
the test.
Results
Logic Background and Response Time
Did subjects with extensive relevant background knowledge (e.g., procedural
knowledge as might be acquired through a GRE prep course or content knowledge from
coursework in logic) have shorter mean response times compared to those with less
background, as predicted by Anderson’s ACT-R?
None of the subjects in this study reported that they had taken a GRE preparation
course, neither did any report preparing for the GRE on their own. In fact, most of the subjects
indicated that they did not know what the Graduate Record Exam was. On the other hand, ten
(n = 10) subjects had taken two or more college-level courses in logic—which was used as a
proxy for “extensive relevant background knowledge.” Table 2 shows that indeed subjects
with more background (i.e., two or more courses) in logic had shorter response times on the
logical reasoning (LR) items than both subjects with no background (n = 16) and subjects with
some preparation (n = 14). In addition to the total sample, this difference held for each
experimental group; it also held after adjusting the means using the SAT combined score as a
covariate. Consistent with the hypothesis, pairwise comparisons revealed a significant
difference in mean response time between subjects with extensive preparation in logic (i.e.,
two or more courses in logic) versus those with some preparation at p < .05. Although the
difference between mean response time for subjects with extensive preparation in logic versus
those with “no preparation” was in the direction predicted by ACT-R (i.e., those with
extensive background had shorter mean response times on LR items), the difference did not
reach statistical significance.
Table 2
Mean and Adjusted Mean Response Time on Logical Reasoning (LR) Problems across Level
of Logic Background for Experimental Groups and All Subjects
Mean LR
Response
Time (sec)
N
SD
No preparation
87.45
8
18.72
Adjusted
Mean LR
Response
Time (sec)a
87.58
Some coursework
89.47
6
18.53
90.18
5.97
> 2 college courses
77.21
6
21.23
81.36
6.07
Background in Logic
Computerized Adaptive
Testing
Standard
Error
5.17
Logic Background and GRE Analytical Performance p. 9
Paper and Pencil Testing
All subjects*
No preparation
75.53
8
5.57
71.06
5.30
Some coursework
83.12
8
22.02
84.40
5.18
> 2 college courses
67.22
4
6.42
66.01
7.32
No preparation
81.49
16
14.69
79.32
3.70
Some coursework
85.84
14
20.09
87.29
3.96
> 2 college courses
73.21
10
17.05
73.68
4.73
a Means adjusted using SAT composite score as the covariate
* Adjusted Means for subjects with > 2 college courses vs. those with some coursework differ significantly at p < 05.
An unexpected finding was that those subjects who reported “no preparation” in logic
had shorter response times on LR items, within and across experimental groups, than those
subjects reporting they had taken some coursework. Given that logic instruction is frequently
embedded in kindergarten through 12th grade math curricula throughout the United States, and
in particular in New York (where 35% of participants reported they attended high school),
some subjects may not have recognized that they did in fact have some formal preparation in
logic.
As discussed below, a scatter plot showing GRE Analytical scores across level of logic
background (see Figure 6) suggested a trend consistent with this explanation; that is, the
deviations of scores from the line of “best fit” are greatest at the lowest level of logic
preparation.
Figure 4 below depicts graphically the adjusted (using SAT composite as the covariate)
mean response times on LR items across levels of logic background for CAT and P&P
subjects. The trends across level of logic preparation for the two experimental groups are quite
similar, although mean response time for CAT subjects tended to be longer than that for P&P
subjects, regardless of level of logic preparation.
Figure 4
Adjusted Mean Response Time on Logical Reasoning (LR) Problems across Level of Logic
Background for CAT versus P&P Experimental Group
140
130
120
110
100
90
Experimental Group
80
CAT
70
60
no prep aration
paper and pencil
some course work
Logic Background
2 or m ore courses
Logic Background and GRE Analytical Performance p. 10
In sharp contrast to the trends noted on LR items, AR response time across level of
logic background—specifically, at the higher end—showed divergent patterns for the two
testing modes (see Table 3 and Figure 5). That is, CAT subjects with extensive background in
logic had longer mean and adjusted mean response times than their less-prepared counterparts,
while P&P subjects with extensive background had shorter response times than those with
some course work. Although the graphed means would seem to suggest an interaction, no
significant interaction was noted.
Table 3
Mean and Adjusted Mean Response Time on Analytical Reasoning (AR) Problems across
Level of Logic Background for Experimental Groups and All Subjects
Mean AR
Response
Time (sec)
N
SD
No preparation
100.14
8
25.99
Adjusted
Mean AR
Response
Time (sec) a
100.23
Some coursework
112.99
6
20.30
113.45
6.89
> 2 college courses
128.53
6
21.38
131.22
7.00
No preparation
78.61
8
3.72
75.72
6.12
Some coursework
87.99
8
12.66
88.82
5.98
> 2 college courses
78.85
4
11.63
78.07
8.45
No preparation
89.37
16
21.10
87.97
4.27
Some coursework
98.70
14
20.24
101.14
4.57
Background in
Logic
Computerized Adaptive
Testing
Paper and Pencil Testing
All subjects*
Standard
Error
5.97
> 2 college courses
108.66
10
30.94
104.65
5.46
a Means adjusted using SAT composite score as the covariate
* Means for subjects with > 2 college courses vs. those with no preparation and those with some coursework differ
significantly at p < .05.
Figure 5
Adjusted Mean Response Time on AR Problems across Level of Logic Background for CAT
versus P&P Experimental Group
Logic Background and GRE Analytical Performance p. 11
140
130
120
110
100
90
80
Expe rimental Group
70
60
no preparation
CAT
paper and pencil
some course work
2 or m ore courses
Logic Backgro und
Although the differences in mean AR response times between levels of logic
preparation were, for the most part, in the opposite direction predicted by ACT-R, the
differences were significant, with F (2, 33) = 3.56, p = .04. Specific pairwise comparisons of
means between subjects with “no preparation” versus those with some coursework and
between subjects with “no preparation” versus those with two or more college courses were
also found to be statistically significant at p < .05. Thus, whereas extensive background in
logic appeared to be associated with decreased response time on LR items—as predicted, it
was associated with longer mean response time on AR items for CAT subjects and shorter
mean response times for P&P subjects.
Logic Background and Performance (Accuracy)
To explore whether logic background was associated with improved performance as
measured by GRE Analytical score, first, level of logic preparation was treated as a
continuously distributed variable. That is, levels of logic preparation (i.e., 1 = no preparation, 2
= high school course, 3 = college course, 4 = two college courses, and 5 = more than two
college courses) were assumed to represent equal intervals along a degree-of-logic-preparation
scale. Next, a Pearson product-moment correlation was calculated between GRE Analytical
score and logic preparation. The calculation revealed that more extensive logic preparation
was associated with higher GRE Analytical scores (r = .361, p = .022).
Figure 6
Scatter plot of GRE Analytical Scores across Levels of Logic Background with a Line of “Best
Fit” Superimposed
Logic Background and GRE Analytical Performance p. 12
800
700
600
500
400
300
0
1
2
3
4
5
6
Logic Background
A scatter plot of GRE Analytical scores across levels of logic background is shown in
Figure 6. Superimposed on the plot is a line of “best fit” which suggests (supported by the
correlation coefficient) a positive relationship between the two variables. Of interest, as noted
previously, is that the there is a much greater spread of scores on the left hand side of the X
axis among subjects reporting “no preparation” in logic. It may be the case, as already alluded
to, that since logic instruction is often integrated in primary and secondary math curricula,
some subjects may have underreported their preparation. On the other hand, it may also be the
case that given only pre-college level logic instruction, embedded as part of a K-12 math
curriculum, a wide range of actual “logic background” is likely to be associated with that
preparation.
To further examine the relationship between logic background and performance, a oneway ANOVA was conducted which revealed significant group differences across the three
levels of logic background (i.e., no coursework, some coursework, and 2 or more courses),
with F(2,37) = 3.484, p = .041. Consistent with the correlational analysis and scatterplot (Figure
6), higher GRE Analytical scores were associated with stronger background in logic for
subjects across and within groups (see Table 4). When ability was controlled for, using the
SAT composite score as the covariate, those subjects with the strongest background in logic
(i.e., two or more courses) outperformed their less-prepared colleagues. Subjects who reported
“no preparation” in logic had higher adjusted mean scores than those with “some coursework”
in the paper-and-pencil group; this difference carried over into the total sample as well. Once
again, embedded K-12 logic instruction may offer some explanation.
Table 4
Mean and Adjusted Mean GRE Analytical Score across Level of Logic Background for
Experimental Groups and All Subjects
Background in Logic
Mean GRE
N
SD
Adjusted
Standard
Logic Background and GRE Analytical Performance p. 13
Analytical
Scaled Score
Computerized Adaptive
Testing
Paper and Pencil Testing
All subjects*
No preparation
638.75
8
107.63
Mean GRE
Analytical
Scaled Score a
637.72
Error
Some coursework
653.33
6
72.33
647.90
32.54
> 2 college courses
726.67
6
47.61
694.81
33.06
No preparation
598.75
8
146.33
632.95
28.88
Some coursework
606.25
8
129.83
596.42
28.23
> 2 college courses
717.50
4
41.93
726.75
39.87
No preparation
618.75
16
125.80
635.34
20.16
Some coursework
626.43
14
108.03
622.16
21.56
> 2 college courses
723.00
10
43.22
710.78
25.80
28.17
a Means
adjusted using SAT composite score as the covariate
* Means and adjusted means for subjects with > 2 college courses differ significantly from those with some course work and
those with no preparation at p < 05.
Figure 7 depicts graphically the adjusted mean GRE Analytical scores (using SAT
composite as the covariate) across level of logic background for CAT and P&P subjects.
Although the graphed means intersect, the interaction failed to reach statistical significance at
p < .05. A univariate test of the effect of logic background, based on linearly independent
pairwise comparisons of marginal adjusted means, was significant, with F(2, 33) = 3.877, p =
.031. In particular, those with two or more courses in logic had significantly higher mean
scores than their less-prepared peers even when scores were adjusted for general ability (using
the SAT covariate). Test mode effects were not significant. Thus, regardless of test mode,
subjects with two or more years of logic preparation outperformed their less-prepared
counterparts.
Figure 7
Adjusted Mean GRE Analytical Score across Levels of Logic Background for CAT versus
P&P group
Logic Background and GRE Analytical Performance p. 14
800
750
700
650
600
Experimental Group
550
CAT
500
no pr eparation
paper and pencil
some course wo rk
2 or m ore courses
Logic Background
How do problem-solving strategies (on GRE analytical items) differ, if at all, between those
examinees with extensive logic background and those with less training?
To answer this question, scrap paper use and verbal protocols were analyzed. Scrap
paper analysis was carried out for all subjects (n = 40). Although all subjects participated in a
posttest think aloud, due to time constraints, a sub-sample (n = 14) of verbal protocols was
randomly selected for in-depth analysis.
Scrap paper analysis revealed that virtually no subjects used scrap paper for LR items.
In sharp contrast, there was extensive reliance on scrap paper to solve AR problems across
logic background, with nearly the same amount of reliance among those subjects with some
course work in logic (M = 97.79% of AR problems) and those with 2 or more courses (M =
97.08%), followed by those with no preparation (M = 92.24%). Overall, CAT subjects tended
to use scrap paper more extensively than their P&P counterparts. However, ANOVA revealed
no significant interaction between level of logic background and test mode.
Differences were also noted with respect to the detail of problem representation on
scrap. Subjects with “no preparation” in logic had the lowest mean modeling of AR problems
at 3.13 (on a 5-point scale, 5 being the most sophisticated) vs. 3.57 for those with some
coursework and 3.33 for those with 2 or more logic courses.
Ten subjects employed symbolic logic to answer questions. Interestingly, those who
did so had higher mean scores than their less logic-driven peers (M = 683.00, SD = 116.05 vs.
M = 635.67, SD = 108.97). As seen in the graphed means below (Figure 8), those subjects who
Logic Background and GRE Analytical Performance p. 15
used symbolic logic had slightly higher GRE scores across all levels of logic background.
Figure 8
Adjusted Mean GRE Scores for Subjects who Used Symbolic Logic to Solve AR Problems
across Levels of Logic Preparation
800
Adjusted Mean GRE Score
750
700
650
600
Used Logic
550
yes
500
no pr eparation
no
some course wo rk
2 or m ore courses
Logic Background
Perhaps contributing to the success among subjects using logic was that they tended to
model AR problems on scrap in more detail than others (M = 3.7 vs. M = 3.2, on a 5-point
scale with 5 being the most detailed), and this difference was significant with F(1,38) = 5.008, p
= .031.
Do examinees point to training in logic as important or useful for this section of the GRE?
On the posttest questionnaire, when asked to indicate their level of agreement with the
statement “My reasoning skills are strong,” those subjects with 2 or more college courses
indicated stronger agreement (M = 4.1 on a 5-point Likert scale with 5 being the strongest
level of agreement) than those with some course work (M = 3.7) and those with no preparation
Logic Background and GRE Analytical Performance p. 16
in logic (M = 3.6).
Discussion
Consistent with predictions derived from Anderson’s ACT-R, subjects with extensive
background knowledge in logic (i.e., 2 or more college courses) had shorter mean response
times on logical reasoning (LR) items than did subjects with less background (i.e., some
coursework). The difference in mean response time reached statistical significance for the total
sample; this finding suggests that relevant background knowledge—both declarative and
procedural—may have freed up working memory so that problem solving was more efficient
in the case of LR items.
On the other hand, logic background was generally not associated with shorter
response times on analytical reasoning (AR) items. In fact, a significant difference in the
inverse direction was noted for CAT subjects: that is, subjects with two or more college
courses in logic showed significantly longer response times than their less-prepared
counterparts. Three possible explanations are offered: 1) Logic background may be less salient
with AR problems than with LR problems; 2) Those with logic background may have taken
more care in representing and working out AR problems—especially if speed was less critical;
and/or 3) Test mode effects may be more salient with AR items than with LR items.
Let’s examine the first explanation. Whereas LR items call primarily for analysis of a
line of reasoning or determining the logical coherence of arguments, AR items call for the
sorting and organization of conditions—often requiring mechanical spatial manipulation in the
form of diagrammatic representation. In the verbal protocols, several subjects indicated that
the AR items seemed relatively easy—once you plugged all the information into a diagram
and worked it through—compared to LR items which, according to one subject, “made you
think.” Recall also that subjects were much less likely to use scrap paper on LR items,
especially CAT subjects. Because of the numbers of variables that must be juggled, AR
problems cannot be easily done in the head. In fact, the representation of conditions—either
pictorially or syntactically—serves to free up working memory for determining the solution. A
certain amount of time is required for transcription of problems, no matter how elementary the
representation may be. But when the diagrams are done well, which may involve considerably
more time, the deductions arise automatically.
With respect to the second explanation, in contrast to less well-prepared subjects, those
with logic background may have taken more care in solving problems; indeed, they may have
wanted to take problem solutions to another level once arrived at mechanically, so-to-speak. In
other words, those trained in logic may not have been satisfied with just a mechanical plugand-chug problem solving approach, especially given the luxury of less speeded conditions as
was the case in the CAT mode. Verbal protocol analysis also revealed that CAT subjects often
solved problems completely before referring to the possible answers. In any event, those
trained in logic had higher scores than those with less preparation, so there appeared to be
some advantage from logic background with respect to accuracy, if not consistently with
respect to speed.
Finally, the possibility that test mode may be more salient on AR problems than LR
problems receives support from observational data and from subject’s post-test questionnaires.
In general, subjects were observed to consistently use scrap paper for AR problems, whereas
Logic Background and GRE Analytical Performance p. 17
scrap paper was used very little on LR items, but CAT subjects were observed to use scrap on
more items than P&P subjects (M = 97% vs. M = 93%, respectively). Use of scrap paper in the
CAT mode entailed more transcription and lots of back and forth eye movements from screen
to paper and back. When asked to indicate their level of agreement with the statement, “It was
frustrating to have to move back and forth between the test and scrap paper” CAT subjects
indicated significantly stronger agreement than their P&P colleagues, with F(1,38) = 7.997, p
= .007.
The results of the present study are consistent with those obtained in a study conducted
by Rinella, Bringsjord, and Yang (2001). There is insufficient space here to present the data
from this study. The main point is that these three researchers found that students who took a
first course in symbolic logic were generally able to solve reasoning problems long held—on
the strength of a famous study carried out by Cheng and Holyoak (1985)—to be impervious to
training in formal logic. Rinella et al. conducted a pretest-posttest experiment. Both these
tests were formally similar; that is, from the standpoint of symbolic logic, the questions on the
pretest and posttest had similar underlying mathematical structure—but the English that
“clothed” this structure was different in the pre and post cases. Items on these two tests
included not only items which Cheng and Holyoak (1985) had used in their own pre and
posttests (e.g., the Wason selection task), but also items similar to those found in the analytical
section of the GRE.
Implications
Background in logic seemed to have differential effects on AR problem response time
between the two test modes. More extensive logic background was associated with decreased
response time for the P&P group and increased response time for the CAT group. On LR
problems, more extensive background in logic was associated with shorter response time
overall, regardless of test mode. What are the implications for these differential effects? First,
more research is needed. This study should be replicated with a larger sample size and also
with a pre-test of logic. That would assure greater confidence in the relationship between logic
background and response time. Second, there may be implications for test preparation. That is,
if those trained in logic are more likely to spend more time working AR items out, because
they choose to represent the problem both pictorially and syntactically (let us assume), this
could jeopardize performance. On the other hand, logic background was associated with
higher scores. Both of these findings have potential implications for practice in the form of
providing appropriate guidance in the test preparation materials.
At least potentially, there are other, “deeper” implications. It is one thing to conclude
that stuyding symbolic logic can enable cognition that secures higher scores on standardized
tests, but it is quite another thing to conclude that such training can help students become more
successful than they would otherwise be in today’s economy. The present study, at best,
supports the former, more humble, conclusion. But perhaps it is safe to say that the present
study, especially combined with the aforementioned concordant one carried out by Rinella,
Bringsjord, and Yang (2001), does at least suggest that further research should be carried out
to determine if the second, far-reaching conclusion can in fact be justifiably drawn. In
connection with this issue, a recent paper by Stanovich and West (2000) should at least be
mentioned here. Stanovich and West (2000) hold that “life is becoming more like the tests”
(p. 714); and the tests they refer to include the GRE. The basic idea is that our increasingly
Logic Background and GRE Analytical Performance p. 18
“technologized” world demands cognition that is abstract and symbolic, rather than concrete
and anchored to physical circumstances. As an example, they give the challenge of making a
rational decision as to how to apportion investments made to a retirement fund—and they
provide many other examples as well. If Stanovich and West (2000) are right (and there are
others who perceive a link between mastery of symbolic logic and “real world” competency:
e.g., see Adler 1984), it may be that teaching symbolic logic from the standpoint of mental
metalogic can give students an increased ability to thrive in the high-tech economy of today.
Whether or not this is so will hinge on subsequent research, to which we hope to contribute.
References
Adler, J. E. (1984). Abstraction is uncooperative. Journal for the Theory of Social
Behavior 14: 165-181.
Anderson, J. R. (1976). Language, memory, and thought. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Anderson, J. R. (1983). The Architecture of Cognition. Cambridge, MA: Harvard
University Press.
Anderson, J. R. (1989). A theory of human knowledge. Artificial Intelligence, 40, 313-351.
Anderson, J. R. (1993). Rules of the mind. Hillsdale, NJ: Lawrence Erlbaum.
Anderson, J. R. & Bower, G. H. (1973). Human associative memory. Mahwah, NJ:
Lawrence Erlbaum Associates.
Anderson, J. R.; & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ:
Lawrence Erlbaum.
Bringsjord, E. (2001a). Computerized-adaptive versus paper-and-pencil testing
environments: An experimental analysis of examinee experience (Doctoral dissertation,
University at Albany, State University of New York, 2001). Dissertation Abstracts
International (in press).
Bringsjord, E. (2001b). Computerized-adaptive versus paper-and-pencil testing
environments: An experimental analysis of examinee experience. Paper presented in the
Distinguished Paper Series at the annual meeting of the American Educational Research
Association, April 15, 2001, Seattle, WA.
Bringsjord, E. (2000, October). Computerized-Adaptive versus Paper-and-Pencil Testing
Environments: An experimental analysis of examinee experience. Paper presented at the
annual meeting of the Northeastern Educational Research Association, Ellenville, NY.
Cheng, P. W., & Holyoak, K. J. (1985). Pragmatic versus syntactic approaches to training
deductive reasoning. Cognitive Psychology, 17, 391-416.
Educational Testing Service (1990). The Graduate Record Examinations General Test.
Princeton, NJ: Educational Testing Service.
Educational Testing Service (1994). [“The official guide”] GRE: Practicing to take the
General Tests (9th ed.). Princeton, NJ: Educational Testing Service.
Educational Testing Service (1997). POWERPREP®—Preparing for the GRE General
Test. Princeton, NJ: Educational Testing Service.
Logic Background and GRE Analytical Performance p. 19
Ericsson, K. A., Simon, H. A. (1993). Protocol analysis: Verbal reports as data.
Cambridge, MA: MIT Press.
Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a framework for mixed
method evaluation designs. Educational Evaluation and Policy Analysis, 11, 255-274.
Newell, A. (1973). Production systems: Models of control structures. In W.G. Chase
(Ed.), Visual information processing (pp. 463-526). New York, NY: Academic Press.
Rinella, K., Bringsjord, S., & Yang, Y. (2001) Efficacious logic instruction: People are
not irremediably poor deductive reasoners. In Stenning, K. & Moore, J.D., eds., Proceedings
of the 23rd Annual Conference of the Cognitive Science Society (Mahwah, NJ: Lawrence
Erlbaum), pp. 851-856.
Stanovich, K. & West, R. (2000). Individual differences in reasoning: Implications for the
rationality debate? Behavioral and Brain Sciences 23(5): 645-665, 701-726.
Stenning, K., Cox, R., Oberlander, J. (1995). Contrasting the cognitive effects of graphical
and sentential logic teaching: Reasoning, representation, and individual differences. Cognitive
Processes, 10(3/4): 333-354.
Yang, Y. & Bringsjord, S. (2001). Mental metalogic: A new paradigm for psychology of
reasoning. Proceedings of the 3rd Annual International Conference on Cognitive Science
(Hefei, China: Press of the University of Science and Technology of China), pp. 199-204.
Download