Using GAISE to Create a Better Introductory

advertisement
What Can We Learn from
Quantitative Data in Statistics
Education Research?
Sterling Hilton Brigham Young University
Andy Zieffler University of Minnesota
John Holcomb Cleveland State University
Marsha Lovett Carnegie Mellon University
University of Minnesota
Educational Psychology
Introduction
 Components of a research program

Generate ideas (pre-clinical)


Frame question (pre-clinical, Phase I)




Establish efficacy (small)
Generalize findings (Phase III)


Constructs and Measurement
Design and Methods
Pilot study
Examine question (Phase I, Phase II)


Develop a conceptual framework
Larger studies in varied settings
Extend findings (Phase IV)


Longitudinal studies
Different populations
Introduction
 Quantitative methods in research
program

Framing: measurement development





Validity and reliability
Framing: pilot study
Examine
Generalize
Extend
 Statistics education research is
primarily in the “generate” and
“frame” phases
Introduction
 Purpose: Introduce two instruments that
are in different stages of development
and discuss how they have been and
might be used in statistics education
research


Comprehensive Assessment of Outcomes
in a Fist Statistics course (CAOS)
Survey of Attitudes Toward Statistics
(SATS)
Assessment Resource Tools
for Improving Statistical
Thinking

Several online assessments



ARTIST Topic Scales
Comprehensive Assessment of
Outcomes in a First Statistics
course (CAOS)
Statistics Thinking and Reasoning
Test (START)
ARTIST Topic Scales


7-15 MC items
Many topics











Data Collection
Data Representation
Measures of Center
Measures of Spread
Normal Distribution
Probability
Bivariate Quantitative Data
Bivariate Categorical Data
Sampling Distributions
Confidence Intervals
Significance Tests
CAOS Test




40 MC items
Designed to assess students’
statistical reasoning after any first
course in statistics.
CAOS test focuses on statistical
literacy and conceptual understanding,
with a focus on reasoning about
variability.
Developed through a three-year
process of acquiring and writing items,
testing and revising items, and
gathering evidence of reliability and
validity.
CAOS Test
Reliability Analysis



Sample of 10287
Cronbach’s alpha coefficient of .77
Content Validity Evidence





18 expert raters
Unanimous agreement that CAOS measures
important basic learning outcomes
All raters agreed with the statement “CAOS
measures outcomes for which I would be
disappointed if they were not achieved by
students who succeed in my statistics courses.”
Some raters indicated topics that they felt were
missing from the scale - no agreement among
these raters about the topics that were missing.
START Test



14 MC items
Identified through a principal
components analysis performed
on CAOS data gathered in Fall
2005 and Spring 2006 (n =
1470).
Alpha Coefficient from that data
set was calculated to be 0.74.
Use of Quantitative Measures
in a Phase 1 Study

Exploratory Studies



What can we find out about
students’ understanding?
Where are students having
difficulties?
Are there inconsistencies in
students’ reasoning?
Example Item 1
Measured Learning Outcome
Understanding the interpretation of a
median in the context of boxplots.
Example Item 1
The two boxplots below display final exam
scores for all students in two different
sections of the same course
Example Item 1
Which section has a greater percentage of students
with scores at or above 80?
a)
b)
c)
Section A
Section B
Both sections are about equal.
Example Item 1
Which section has a greater percentage of students
with scores at or above 80?
a)
b)
c)
Section A
Section B
Both sections are about equal.
Example Item 1
 How did students answer this
item?
Example Item 1
Pretest
Posttest
73.7%
65.6%
Section A
6.6%
6.1%
Section B
28.2%
Both sections are
about equal.
19.6%
Response (N = 754)
Example Item 1
 Is this surprising?
 What can we learn from
students’ responses to this
item?
 Implications/Directions for
research? Teaching?
Example Item 2
Measured Learning Outcome
Understanding that correlation does
not imply causation.
Example Item 2
Researchers surveyed 1,000 randomly
selected adults in the U.S. A
statistically significant, strong
positive correlation was found
between income level and the number
of containers of recycling they
typically collect in a week. Please
select the best interpretation of this
result.
Example Item 2
a) We can not conclude whether earning more
money causes more recycling among U.S.
adults because this type of design does not
allow us to infer causation.
b) This sample is too small to draw any
conclusions about the relationship between
income level and amount of recycling for
adults in the U.S.
c) This result indicates that earning more money
influences people to recycle more than people
who earn less money.
Example Item 2
a) We can not conclude whether earning more
money causes more recycling among U.S.
adults because this type of design does not
allow us to infer causation.
b) This sample is too small to draw any
conclusions about the relationship between
income level and amount of recycling for
adults in the U.S.
c) This result indicates that earning more money
influences people to recycle more than people
who earn less money.
Example Item 2
 How did students answer this
item?
Example Item 2
Pretest
54.6%
18.3%
27.1%
Posttest
Response (N = 743)
52.6%
We can not conclude whether earning more
money causes more recycling among U.S.
adults because this type of design does not
allow us to infer causation.
11.4%
This sample is too small to draw any
conclusions about the relationship between
income level and amount of recycling for
adults in the U.S.
35.9%
This result indicates that earning more money
influences people to recycle more than people
who earn less money.
Example Item 2
 Is this surprising?
 What can we learn from
students’ responses to this
item?
 Implications/Directions for
research? Teaching?
Example Item 3
Measured Learning Outcome
Ability to match a scatterplot to a
verbal description of a bivariate
relationship.
Example Item 3
Bone density is typically measured as a standardized
score with a mean of 0 and a standard deviation of 1.
Lower scores correspond to lower bone density. Which
of the following graphs shows that as women grow
older they tend to have lower bone density?
Example Item 3
a)
b)
c)
Graph A
Graph B
Graph C
Example Item 3
 How did students answer this
item?
Example Item 3
Pretest
Posttest
Response (N = 748)
90.5%
92.5%
Graph A
6.1%
6.6%
Graph B
3.3%
0.9%
Graph C
Example Item 3
 Is this surprising?
 What can we learn from
students’ responses to this
item?
 Implications/Directions for
research? Teaching?
Example Item 4
Measured Learning Outcome
Understanding of the purpose of
randomization in an experiment.
Example Item 4
A recent research study randomly divided
participants into groups who were given
different levels of Vitamin E to take daily.
One group received only a placebo pill.
The research study followed the
participants for eight years to see how
many developed a particular type of cancer
during that time period. Which of the
following responses gives the best
explanation as to the purpose of
randomization in this study?
Example Item 4
a) To increase the accuracy of the research
results.
b) To ensure that all potential cancer
patients had an equal chance of being
selected for the study.
c) To reduce the amount of sampling
error.
d) To produce treatment groups with
similar characteristics.
e) To prevent skewness in the results.
Example Item 4
a) To increase the accuracy of the research
results.
b) To ensure that all potential cancer patients had
an equal chance of being selected for the
study.
c) To reduce the amount of sampling error.
d) To produce treatment groups with similar
characteristics.
e) To prevent skewness in the results.
Example Item 4
 How did students answer this
item?
Example Item 4
Pretest
Posttest
Response (N = 754)
41.4%
31.8%
To increase the accuracy of the research
results.
13.5%
19.8%
To ensure that all potential cancer patients
had an equal chance of being selected for the
study.
22.7%
29.4%
To reduce the amount of sampling error.
8.5%
12.3%
To produce treatment groups with similar
characteristics.
13.9%
6.6%
To prevent skewness in the results.
Example Item 4
 Is this surprising?
 What can we learn from
students’ responses to this
item?
 Implications/Directions for
research? Teaching?
How Can We Use the
Results?

Begin to look for underlying
reasons students are having
difficulties
Examine the research literature
 Interview students to gain a more indepth understanding of their
reasoning
 Compare results with data from other
classes (other teachers, schools)

How Can We Use the
Results?

They can inform our instruction





Reconsider how difficult or easy some
concepts are for students
Rethink how we currently teach these ideas
Add new activities or tools
Re-allocate classroom time
Change the way we assess students


Assessment items better aligned with
learning outcomes
Assessment items that probe students
reasoning
SATS
 Survey of Attitudes Towards Statistics
 Candace Schau and Tom Dauphinee
(http://www.unm.edu/~cschau/satshomepage.htm)
 Twenty-eight item survey
 Seven point Likert scale response
Strongly
Neither agree
Disagree
1
2
nor disagree
4
5
3
Strongly
6
Agree
7
SATS
 Original four subscales




Value (9 items; α range .80 - .90 )
“Statistics is worthless.”
Affect (6 items; α range .80 - .85)
“I like statistics.”
Cognitive Competence (6 items; α range .77 .85)
“I have no idea of what’s going on in statistics.”
Difficulty (7 items; α range .64 - .79)
“Statistics is a complicated subject.”
SATS
 Two additional subscales


Interest (4 items)
“I am interested in using
statistics.”
Effort (4 items)
“I plan to complete all of my
statistics assignments.”
SATS
 Attitude is multi-faceted
outcome
 Issues to consider



Pre-existing attitudes
Direction and magnitude of
changes over a semester
Relevance of items to study
Using the SATS: A Case Study
Assessment of a project-rich
introductory statistics course

Fall 2004, at Cleveland State
University

Class 1: 30 students Pre/Post

Class 2: 16 students Pre/Post

SATS administered first day and
final exam day
Class 1: Projects - Rich

4 team projects that used/required
Real data
 Computer Software
 Collaboration
 Writing




Individualized Mid-Term and Take-home
Data Analysis Exams
http://academic.csuohio.edu/holcombj/eku/index.html
Login: holcomb pwd: projects22
Class 2
 Ti – 83


In – Class demos
Homework and Exams
Comparison of Pre Data
 No significant difference
between Class1 and Class2
PreAFFECT vs Class
7
6
PreAFFECT
5
4
3
2
1
1
2
Class
PreCOGCOMP vs Class
7
PreCOGCOMP
6
5
4
3
2
1
1
2
Class
PreVALUE vs Class
7
6
PreVALUE
5
4
3
2
1
1
2
Class
PreDIFFICULTY vs Class
7
PreDIFFICULTY
6
5
4
3
2
1
1
2
PreINTEREST vs Class
7
PreINTEREST
6
5
4
3
2
1
1
2
Class
PreEFFORT vs Class
7
PreEFFORT
6
5
4
3
2
1
1
2
Class
Class 1 Change from Pre to Post
(2 – sided tests)
 Significant Differences for:




Cognitive Competence
Value
Difficulty*
Interest
 Insignificant Differences for:


Affect
Effort
*(Not Significant with Nonparametric Test)
6.00
Six Components for Class1: Pre - Post
29
4.00
29
24
2.00
0.00
-2.00
2
5
727
18
-4.00
2
2
-6.00
p = 0.541 p=0.018 p = 0.038 p = 0.049 p = 0.006 p = 0.881
diffAFFECT
diffVALUE
diffCOGCOMP
diffINTEREST
diffDIFFICULTY
diffEFFORT
Class 2: Change from Pre to Post
(2- sided tests)
 Significant Differences

Affect (wrong direction)
 Insignificant Differences





Cognitive Competence
Value
Difficulty
Interest
Effort
Six Components for Class2: Pre - Post
4.00
43
3.00
40
31
42
2.00
1.00
0.00
-1.00
-2.00
-3.00
32
p = 0.020 p = 0.522 p = 0.247 p = 0.303 p = 0.062 p = 0.051
diffAFFECT
diffVALUE
diffCOGCOMP
diffINTEREST
diffDIFFICULTY
diffEFFORT
Multivariate Analysis of Post Data
Class Significant vs Insignificant
 Significant Differences



Affect
Value
Interest
 Insignificant Differences



Cognitive Competence
Difficulty
Effort
Does SATS Ask the Right Questions?
 Value Component Questions









Statistics is worthless.
Statistics should be a required part of my
professional training.
Statistical skills will make me more employable.
Statistics is not useful to the typical professional.
Statistical thinking is not applicable in my life outside
my job.
I use statistics in my everyday life.
Statistics conclusions are rarely presented in
everyday life.
I will have no application for statistics in my
profession.
Statistics is irrelevant in my life.
What are the Questions You
Want to Ask?
 ADD ANSWERS HERE
Instructors: Do try this at home!
 But first, set your expectations



Results may not be as high as you
desire by the end of your course
(e.g., CAOS)
Results may not change from the
beginning to the end of your
course or in the direction you
anticipate (e.g., SATS)
Same is true for other instruments,
too
How might you use such data?
How might you use such data?
 To better understand students’ learning of
particular concepts and skills
 To identify different patterns of student
performance
 To establish a starting point for further inquiry
 To make your teaching and students’ learning
more effective
 To assess where students start and to reveal
areas of difficulty during course
Some Practical Considerations
 Motivating students to take
these instruments seriously


Grading?
Feedback
 Instrument integrity
 Time to administer
 Others?
INQUERI Project
 INQUERI = Initiative for Quantitative
Education Research Infrastructure




To build a research infrastructure by
focusing on the development,
deployment, user training, and archiving
of high quality research methods,
instruments, and data
To disseminate these methods and
results
To catalyze research collaborations
See www.inqueri.org
Back to the Big Picture
 Focus on the question/goal you
want to address and relate that to
past research
 Start small


Using existing instruments is one
way
Working within your own course to
start
 Share with colleagues, connect
with the literature, and then extend
References
delMas, R., Garfield, J., Ooms, A., & Chance, B. (2006).
Assessing students' conceptual understanding after a first
course in statistics. Paper presented at the Annual Meeting
of the American Educational Research Association, San
Francisco, CA.
Garfield, J., delMas, R., & Chance, B. (n.d.).
Assessment Resource Tools for Improving Statistical
Thinking Retrieved May 8, 2007, from
https://app.gen.umn.edu/artist/index.html.
References
 http://www.unm.edu/~cschau/satshomepage.htm
 Dauphinee, T. L., Schau, C., & Stevens, J. J. (1997).
Survey of Attitudes Toward Statistics: Factor structure and
factorial invariance for females and males. Structural
Equation Modeling, 4, 129-141.
 Schau, C., Stevens, J., Dauphinee, T. L., & Del Vecchio,
A. (1995). The development and validation of the Survey
of Attitudes Toward Statistics. Educational and
Psychological Measurement, 55, 868-875.
 Hilton, S. C., Schau, C., & Olsen, J. A. (2003). Survey of
Attitudes Toward Statistics: Factor structure invariance by
gender and by administration time. Structural Equation
Modeling, 11, 92 – 109.
Contact Information
 Sterling Hilton

hiltons@byu.edu
 Andy Zieffler

zief0002@umn.edu
 John Holcomb

j.p.holcomb@csuohio.edu
 Marsha Lovett

lovett@csuohio.edu
Download