Using Assessment to Improve Statistics Instruction (the Web ARTIST

advertisement
Using Assessment to
Improve and Evaluate
Student Learning in
Introductory Statistics
Bob delMas (Univ. of MN)
GAISE Guidelines





Assessment needs to be aligned with learning
goals
Focus on learning key ideas (not only skills,
procedures, and computed answers)
Include formative assessments as well as
summative
Use timely feedback to promote learning
It is possible to implement good assessments
even in large classroom settings
Slide 2 of 41
Suggestions for Teachers




Integrate assessment as an essential component of the
course (timing of assessment and activities).
Use a variety of assessment methods to provide a more
complete evaluation of student learning.
Assess statistical literacy using assessments such as
interpreting or critiquing articles in the news and graphs
in media.
Assess statistical thinking using assessments such as
student projects and open-ended investigative tasks.
Slide 3 of 41
Suggestions for Large
Classes




Use small group projects instead of individual
projects.
Use peer review of projects to provide feedback
and improve projects before grading.
Use items that focus on choosing good
interpretations of graphs or selecting appropriate
statistical procedures.
Use discussion sections for student
presentations.
Slide 4 of 41
AAHE 9 Principles of Assessment









Assessment begins with educational values.
Treats understanding of learning as multidimensional,
integrated, and revealed over time.
Requires clear, explicitly stated purposes.
Requires attention to experiences that lead to outcomes.
Works best when ongoing, not episodic.
Fosters wider improvement when representatives from across
the educational community are involved (all stakeholders).
Represents issues and questions that people really care
about.
Most likely to lead to improvement if part of a larger set of
conditions that promote change.
Meets responsibilities to students and to the public.
Slide 5 of 41
Assessment Triangle
National Research Council (2001), Knowing what Students Know



Cognition: the aspects of achievement or competencies that
are to be assessed
Observation: the tasks used to collect evidence about
students’ achievement (i.e., the assessments)
Interpretation: the methods used to analyze the evidence
resulting from the tasks
The three elements are
interdependent
A successful assessment
synchronizes all three elements
Slide 6 of 41
COGNITION

A theory or set of beliefs about:
How students represent knowledge
 How students develop competence


Used to identify important knowledge
and skills

Based on a learning model that
provides a level of detail sufficient to
accomplish the assessment
Slide 7 of 41
OBSERVATION

What we typically consider to be the “assessment”

Careful design of tasks that will provide evidence that
can be linked to the learning model

Context and Purpose are Important

A national assessment can indicate relative
standing, but not sensitive to nuances of instruction

Instructor constructed assessment can be tied to
classroom instruction, but may not generalize to a
larger population
Slide 8 of 41
INTERPRETATION

Set of assumptions and models that are used to
interpret evidence from observation

Links observations to competencies (cognition)

National assessments may use formal, statistical
models that identify patterns indicative of competency
levels

Classroom assessment is typically more qualitative
and identifies categories of competency based on
observations

Important to identify what has not developed as well
as what has developed

Important to identify misunderstandings and
misconceptions as well as correct understanding
Slide 9 of 41
Example: Understanding p-Values

A p-value is the probability of obtaining results as
or more extreme than the observed results, given
that the null hypothesis is true.

A p-value IS NOT the probability that the null
hypothesis is true.

A p-value IS NOT the probability that the
alternative hypothesis is true.
Slide 10 of 41
Example: Understanding p-Values
A research article reports the results of a new drug test. The drug is to be used
to decrease vision loss in people with Macular Degeneration. The article
gives a p-value of .04 in the analysis section. Indicate if the following
interpretation of this p-value is valid or invalid.
Statement (N = 1617)
Valid
Invalid
The probability of getting results as extreme or more
extreme than the ones in this study if the drug is actually
not effective.
Slide 11 of 41
Example: Understanding p-Values
A research article reports the results of a new drug test. The drug is to be used
to decrease vision loss in people with Macular Degeneration. The article
gives a p-value of .04 in the analysis section. Indicate if the following
interpretation of this p-value is valid or invalid.
Statement (N = 1617)
Valid
Invalid
The probability of getting results as extreme or more
extreme than the ones in this study if the drug is actually
not effective.
57.7
42.3
Slide 12 of 41
Example: Understanding p-Values
A research article reports the results of a new drug test. The drug is to be used to
decrease vision loss in people with Macular Degeneration. The article gives a p-value
of .04 in the analysis section. Items 25, 26, and 27 present three different
interpretations of this p-value. Indicate if each interpretation is valid or invalid.
Statement (N = 1617)
Valid
Invalid
25. The probability of getting results as extreme or more
extreme than the ones in this study if the drug is actually
not effective.
57.7
42.3
26. The probability that the drug is not effective.
27. The probability that the drug is effective.
Slide 13 of 41
Example: Understanding p-Values
A research article reports the results of a new drug test. The drug is to be used to
decrease vision loss in people with Macular Degeneration. The article gives a p-value
of .04 in the analysis section. Items 25, 26, and 27 present three different
interpretations of this p-value. Indicate if each interpretation is valid or invalid.
Statement (N = 1617)
Valid
Invalid
25. The probability of getting results as extreme or more
extreme than the ones in this study if the drug is actually
not effective.
57.7
42.3
26. The probability that the drug is not effective.
39.9
60.1
27. The probability that the drug is effective.
45.2
54.8
 9% of all students made an incorrect choice for all 3 items
 Of those who chose “valid” for Item 25 (N = 933)
 55% chose valid for at least one of the other items
 7% chose “valid” for all 3 items.
Slide 14 of 41
Assessment Cycle
(from Beth Chance ; see G. Wiggins 1992, 1998)

Set goals




Select methods






Don’t use results just to assign a grade
Consider what responses indicate about student understanding
Take action



Identify an assessment that matches the type of learning outcome
Consider minute papers, article reviews, newspaper assignments, projects,
short answer items, multiple choice
Can the assessment be built into the activity?
Gather evidence (i.e., administer the assessment)
Draw inference


What should students know, be able to do?
At what point in the course?
Identify assessable learning outcomes that match goals
Provide feedback
What can be done to remedy a misunderstanding (an activity; extra reading;
more experience with a procedure or a concept)
Re-examine goals and methods
Slide 15 of 41
Embedding Assessment
into Classroom Activities
 Sorting Distributions
 Goal: Learn to associate labels with
shapes of distributions
 Normal Distribution
 Goal: Learn to find areas for the
standard normal distribution
 Sampling Distributions
 Learn the characteristics of sampling
distributions
 Understand effect of sample size
Slide 16 of 41
ARTIST Website
https://app.gen.umn.edu/artist







Item Database (Assessment Builder): A collection of about 1100
items, in a variety of item formats, organized according to statistical topic
and type of learning outcome assessed.
Resources: Information, guidelines, and examples of alternative
assessments. Copies of articles or direct links to articles on assessment in
statistics. References and links for other related assessment resources.
Research Instruments: Instruments that can be used for research and
evaluation projects that involve assessments of outcomes related to
teaching and learning statistics.
Implementation issues: Questions and answers on practical issues
related to designing, administering, and evaluating assessments.
Presentations: Copies of conference papers and presentations on the
ARTIST project, and handouts from ARTIST workshops.
Events: Information on ARTIST events.
Participation: Ways to participate as a class tester for ARTIST materials.
Slide 17 of 41
ARTIST Topic Tests


There are 11 scales, consisting of 8-12 multiple-choice items, that can
be administered online. Our goal is to develop high quality, valid and
reliable scales that can be used for a variety of purposes (e.g.,
research, evaluation, review, or self-assessment).
TOPICS











Data Collection (data types, types of study, study design)
Data Representation (choose appropriate graphs, interpret graphs)
Measures of Center (estimate, when to use, interpret, properties)
Measures of Spread (estimate, when to use, interpret, properties)
Normal Distribution (characteristics, empirical rule, areas under the curve)
Probability (interpret, independence, relative frequency, simulation)
Bivariate Quantitative Data (scatterplots, correlation, descriptive and inferential
methods, outliers, diagnostics, influential observations)
Bivariate Categorical Data (two-way tables and chi-square test, association)
Sampling Distributions (types of samples, sample variability, sampling
distributions, Central Limit Theorem)
Confidence Intervals (interpret, confidence level, standard error, margin of error)
Tests of Significance (hypothesis statements, p-values, Type I and II error,
statistical and practical significance)
Slide 18 of 41
Comprehensive Assessment of
Outcomes in Statistics (CAOS)





Forty item test that can be administered as an online test to
evaluate the attainment of desired student outcomes.
CAOS items are designed to represent the big ideas and the
types of reasoning, thinking and literacy skills deemed
important for all students across first courses in statistics.
Unifying focus is on reasoning about variability: in univariate
and bivariate distributions, in comparing groups, in samples,
and when making estimates and inferences.
Not intended to be used exclusively as a final exam or as the
sole assessment to assign student grades.
CAOS can provide very informative feedback to instructors
about what students have learned and not learned in an
introductory statistics course (e.g., administered as pretest and
posttest).
Slide 19 of 41
Data Representation Item (page 19)
A baseball fan likes to keep track of statistics for the local high school baseball team. One of the statistics she
recorded is the proportion of hits obtained by each player based on the number of times at bat as shown in the
table below. Which of the following graphs gives the best display of the distribution of proportion of hits in
that it allows the baseball fan to describe the shape, center and spread of the variable, proportion of hits?
Slide 20 of 41
Data Representation Item (page 19)
A baseball fan likes to keep track of statistics for the local high school baseball team. One of the statistics she
recorded is the proportion of hits obtained by each player based on the number of times at bat as shown in the
table below. Which of the following graphs gives the best display of the distribution of proportion of hits in
that it allows the baseball fan to describe the shape, center and spread of the variable, proportion of hits?
What percents would
you predict for your
students?
RESPONSE
PERCENT
(N = 1643)
Graph A
Graph B
Graph C
Graph D
Slide 21 of 41
Data Representation Item (page 19)
A baseball fan likes to keep track of statistics for the local high school baseball team. One of the statistics she
recorded is the proportion of hits obtained by each player based on the number of times at bat as shown in the
table below. Which of the following graphs gives the best display of the distribution of proportion of hits in
that it allows the baseball fan to describe the shape, center and spread of the variable, proportion of hits?
RESULTS
US undergraduates
2005-2006
RESPONSE
PERCENT
(N = 1643)
Graph A
11.1
Graph B
46.4
Graph C
29.1
Graph D
13.4
Slide 22 of 41
Data Representation Item (page 20)
A local running club has its own track and keeps accurate records of each member's individual
best lap time around the track, so members can make comparisons with their peers. Here are
graphs of these data. Which of the graphs allows you to most easily see the shape of the
distribution of running times?
RESPONSE
PERCENT
(N = 1345)
Graph A
Graph B
Graph C
All of the
above
Slide 23 of 41
Data Representation Item (page 20)
A local running club has its own track and keeps accurate records of each member's individual
best lap time around the track, so members can make comparisons with their peers. Here are
graphs of these data. Which of the above graphs allows you to most easily see the shape of the
distribution of running times?
RESPONSE
PERCENT
(N = 1345)
Graph A
43.8
Graph B
48.9
Graph C
3.6
All of the
above
3.7
Slide 24 of 41
First Small Group Exercise
Designate one person to be the recorder.
Discuss the following questions (pages 21-22 of handout)
with respect to the Data Representation items:
Why do you think students are selecting the incorrect
responses for these items? (3-5 minutes)
Outline an instructional activity to help students develop
the correct understanding. (10 minutes)
Slide 25 of 41
Second Small Group Exercise
Choose a Topic:
Sampling Variability
Confidence Intervals
Tests of Significance
Bivariate Quantitative Data
Discuss the following questions:
Why do you think students are selecting the incorrect
responses for each item? (5-10 minutes)
Outline an instructional activity to help students develop
the correct understanding. (10-15 minutes)
Slide 26 of 41
Sampling Variability Item 1
A certain manufacturer claims that they produce 50% brown
candies. Sam plans to buy a large family size bag of these
candies and Kerry plans to buy a small fun size bag. Which bag
is more likely to have more than 70% brown candies?
RESPONSE
Sam, because there are more candies, so his bag can have more
brown candies.
PERCENT
(N = 1608)
5.3
Sam, because there is more variability in the proportion of
browns among larger samples.
11.6
Kerry, because there is more variability in the proportion of
browns among smaller samples.
32.4
Kerry, because most small bags will have more than 50% brown
candies.
1.7
Both have the same chance because they are both random
samples.
48.9
Slide 27 of 41
Sampling Variability Item 2
Consider the distribution of average number of hours that college students spend
sleeping each weeknight. This distribution is very skewed to the right, with a mean of
5 and a standard deviation of 1. A researcher plans to take a simple random sample of
18 college students. If we were to imagine that we could take all possible random
samples of size 18 from the population of college students, the sampling distribution
of average number of hours spent sleeping will have a shape that is
RESPONSE
PERCENT
(N = 872)
Exactly normal.
18.8
Less skewed than the population.
34.4
Just like the population (i.e., very skewed to the right).
34.7
It's impossible to predict the shape of the sampling distribution.
12.0
Slide 28 of 41
Confidence Interval Item 1
Suppose two researchers want to estimate the proportion of American college
students who favor abolishing the penny. They both want to have about the same
margin of error to estimate this proportion. However, Researcher 1 wants to
estimate with 99% confidence and Researcher 2 wants to estimate with 95%
confidence. Which researcher would need more students for her study in order to
obtain the desired margin of error?
RESPONSE
PERCENT
(N = 1296)
Researcher 1.
51.9
Researcher 2.
25.9
Both researchers would need the same number of subjects.
9.1
It is impossible to obtain the same margin of error with the two
different confidence levels.
13.1
Slide 29 of 41
Confidence Interval Item 2
A high school statistics class wants to estimate the average number of chocolate chips
per cookie in a generic brand of chocolate chip cookies. They collect a random
sample of cookies, count the chips in each cookie, and calculate a confidence interval
for the average number of chips per cookie (18.6 to 21.3). Indicate if the following
interpretations are valid or invalid.
Statement (N = 1609)
Valid
Invalid
We are 95% certain that each cookie for this brand has
approximately 18.6 to 21.3 chocolate chips.
51.2
48.8
We expect 95% of the cookies to have between 18.6 and
21.3 chocolate chips.
34.1
65.9
We would expect about 95% of all possible sample means
from this population to be between 18.6 and 21.3
chocolate chips.
53.1
46.9
We are 95% certain that the confidence interval of 18.6 to
21.3 includes the true average number of chocolate chips
per cookie.
75.7
24.3
Slide 30 of 41
Test of Significance Item 1
A newspaper article claims that the average age for people who receive food stamps is
40 years. You believe that the average age is less than that. You take a random sample
of 100 people who receive food stamps, and find their average age to be 39.2 years.
You find that this is significantly lower than the age of 40 stated in the article (p <
.05). What would be an appropriate interpretation of this result?
RESPONSE
PERCENT
(N = 1101)
The statistically significant result indicates that the majority of
people who receive food stamps is younger than 40.
33.8
Although the result is statistically significant, the difference in
age is not of practical importance.
50.5
An error must have been made. This difference is too small to be
statistically significant.
15.7
Slide 31 of 41
Test of Significance Item 2
A researcher compares men and women on 100 different variables using a twosample t-test. He sets the level of significance to .05 and then carries out 100
independent t-tests (one for each variable) on data from the same sample. If, in each
case, the null hypothesis actually is true for every test, about how many "statistically
significant" findings will this researcher report?
RESPONSE
PERCENT
(N = 1160)
0
30.2
5
45.7
10
7.1
None of the above
17.1
Slide 32 of 41
Bivariate Quantitative Data Item 1
The number of people living on American farms has declined steadily during the
last century. Data gathered on the U.S. farm population (millions of people) from
1910 to 2000 were used to generate the following regression equation: Predicted
Farm Population = 1167 - .59 (YEAR). What method would you use to predict the
number of people living on farms in 2050.
RESPONSE
PERCENT
(N = 1591)
Substitute the value of 2050 for YEAR in the regression equation,
and compute the predicted farm population.
19.8
Plot the regression line on a scatterplot, locate 2050 on the
horizontal axis, and read off the corresponding value of population
on the vertical axis.
15.6
Neither method is appropriate for making a prediction for the
year 2050 based on these data.
28.4
Both methods are appropriate for making a prediction for the year
2050 based on these data.
36.2
Slide 33 of 41
Bivariate Quantitative Data Item 2
A statistics instructor wants to use the number of hours studied to predict exam scores in his
class. He wants to use a linear regression model. Data from previous years shows that the
average number of hours studying for a final exam in statistics is 8.5, with a standard deviation
of 1.5, and the average exam score is 75, with a standard deviation of 15. The correlation is .76.
Should the instructor use linear regression to predict exam scores from hours studied?
RESPONSE
PERCENT
(N = 850)
Yes, there is a high correlation, so it is alright to use linear
regression.
21.2
Yes, because linear regression is the statistical method used to
make predictions when you have bivariate quantitative data.
27.1
Linear regression could be appropriate if the scatterplot shows
a clear linear relationship.
46.2
No, because there is no way to prove that more hours of study
causes higher exam scores.
5.5
Slide 34 of 41
Assessment Builder: Search
Slide 35 of 41
Assessment Builder: Results
Slide 36 of 41
Assessment Builder: Results
Slide 37 of 41
Assessment Builder: Item Set
Slide 38 of 41
Assessment Builder: Download
Slide 39 of 41
Assessment Builder: Download
Slide 40 of 41
ARTIST Website
https://app.gen.umn.edu/artist
We invite you to contact the ARTIST team with any
comments and suggestions you have regarding this
presentation, or any of the materials at the ARTIST
website.
Thank you for your participation in
today’s session.
Slide 41 of 41
Download