PowerPoint file

advertisement
Please check, just in case…
Announcements
1. Terminology Treasure Hunt due in two
weeks (Oct 29). Please check the resources
provided in the folder on e-reserves and
bring them with you to class NEXT WEEK.
2. I need two volunteers to meet me at my
office next week before class to help with
materials. Volunteers needed after class
too.
3. Make an appointment to meet with me
with any questions you have about
upcoming assignments or class
topics/concepts.
Quick
questions,
quandaries,
comments, or
concerns?
APA Tip of the Day: Ampersand
When there are two authors for a
reference you cite, you need to cite
both of them every time. When you
cite them in a sentence, but not within
parentheses, use “and.” When you put
the citation within a parenthesis, use
an ampersand (“&”).
Examples of ampersand and
and:
• According to Gomez and Garcia (2012),
“this is very interesting” (p. 107).
• “This is very interesting” (Gomez &
Garcia, 2012, p. 107).
Topic: Psychometric
terminology
October 8, 2013
Tonight’s Terminology
•
•
•
•
•
•
Reliability (review)
Validity (review)
Chronological age
Raw score
Norm-referenced (quick review)
Normal curve
Definitions
Reliability:
1) “Reliability refers to the results
obtained with an assessment
instrument and not to the
instrument itself.”
2) “An estimate of reliability always
refers to a particular type of
consistency” (i.e. over time, interrater reliability, with different
tasks).
Definitions, cont.
Reliability:
3) “Reliability is a necessary but not
sufficient condition for validity.”
4) “Reliability is primarily statistical.”
(Linn & Gronlund, 2000, pp. 108-109)
Definitions, cont.
Validity:
1.) “Validity refers to the
appropriateness of the
interpretation of the results of an
assessment procedure for a given
group of individuals, not to the
procedure itself.”
2.) “Validity is a matter of degree; it
does not exist on an all-or-none
basis.”
Definitions, cont.
Validity:
3.) “Validity is always specific to
some particular use or
interpretation. No assessment is
valid for all purposes.”
4.) “Validity is viewed as a unitary
concept based on various kinds
of evidence.”
Definitions, cont.
Validity:
5.) “Validity involves an overall
evaluative judgment. It requires
an evaluation of the degree to
which interpretations and use of
assessment results are justified by
supporting evidence and in terms
of the consequences of those
interpretations and uses.”
(Linn & Gronlund, 2000, pp.75-76
)
Definitions, cont.
“Validity is an evaluation of
the adequacy and
appropriateness of the
interpretations and use of
assessment results.”
(Linn & Gronlund, 2000, p. 73)
Chronological Age?????
Chronos = sequential time
Raw score
The number of items correct on a test.
Without other information on the test,
the raw score is meaningless – it must
be interpreted for each child using the
information from the test manual.
Norm-referenced Tests
Describe “performance in terms of
the relative position held in some
known group (e.g., typed better than
90 percent of the class members).”
(Linn & Gronlund, 2000, p. 42)
NR assessments compare individual
performance against others’ performance.
What is a Normal Distribution?
It is the idea that for a number of human
characteristics, such as height or weight, most
of the “cases” will cluster around the middle,
with fewer at the high and low ends. So, if you
plot the number of cases at each possible data
point, you end
up with the “bell curve.”
Ways to talk about the “middle”
(measures of central tendency)
• Mean: the statistical average
• Median: the score where half fall
above and half fall below.
• Mode: the most frequent score
In a normal distribution, the mean,
median, and mode are exactly the same.
Why is a “normal curve”
important?
• Norm-referenced assessments are based on
the assumption that human abilities, like
height and shoe size, are “normally
distributed.”
• A normal distribution allows us to calculate
a number of important statistics, which
allow us to compare an individual’s score
with that of their norm group.
Testing terminology related to the
“normal curve”:
 Measures of Variability:
• Range
• Standard deviation
 Standard score
• Z-score
• Deviation IQ
• Stanine
• Percentile rank
• Age equivalency
• Grade equivalency
Measures of Variability
These tell us how spread out (or
dispersed) the scores are.
#1 - Range
• A very rough measure of variability.
• It tells us the distance between the
highest and the lowest score.
• If there are unusually high and/or low
scores, this can be misleading.
• What is the range of height in our class?
#2 - Standard Deviation
“A measure of the
variability, or dispersion,
of a distribution of scores”
(Harcourt, 2000, p. 8).
Standard Deviation:
A standard deviation corresponds to a
particular percentage of the scores, both
above and below the mean.
Two thirds (68%) of scores will fall within 1
standard deviation of the mean. More than
95% of all scores fall between 2 standard
deviations above and below average. Less
than 2.5% of scores would be below 2
standard deviations below the mean.
Standard Deviation
This tells us how closely grouped
together or how far apart the raw scores
on a particular test are. A standard
deviation corresponds to a particular
percentage of the scores, both above
and below the average score. For
example, over 2/3 (68%) of scores will
fall within 1 standard deviation of the
average. More than 95% of all scores fall
between 2 standard deviations above
and below average.
Standard Deviation, cont.
So, if the scores are very closely
grouped, there will not be a lot of
distance between the high and low
raw scores for the majority of the
students. If there is a large amount
of difference between most
students’ scores (variance), one
standard deviation will include a
wider range of raw scores.
Standard Deviation, cont.
Standard Deviation, cont.
“Standard deviation is such an accurate
measure of variability that if a distribution is
reasonably normal, then by knowing only two
numbers, the mean and the standard
deviation, it is possible to reconstruct and
redraw the distribution. Whenever the shape
of a distribution of scores approaches
normal, the standard deviation can be used
as a measuring rod to lay off distances from
the mean.”
http://library.athabascau.ca/caap6/613M1lesson1.pdf
“Normal” has a statistical definition:
scores within average range – not really,
really high and not really, really low.
Really high and really low are usually
set in terms of standard deviations.
More than two standard deviation above
or below the mean are scores which
typically qualify someone for special
education (either as gifted or with a
disability) because more than 95% of
scores would fall within 2 standard
deviations of the mean.
So…
Standard deviations are a way
of considering a particular score
with reference to distance from
the mean and percentage of
scores which fall within that
distance. It helps us figure out
how high or low a particular
score is, in comparison to other
Also…
Standard deviation is an
important statistic,
because it lets us
calculate “standard
scores.”
Standard Score:
This is a way of representing
raw scores that tells us how far
above or below average that
score is, in a way that is
comparable across tests and
across time. Standard scores
“translate” raw scores into a
common way of representing
scores.
Different Types of Standard Scores:
•
•
•
•
•
•
•
Z-Score
Deviation IQ
Stanine
T-score
Percentile
Age Equivalency
Grade Equivalency
Z-Scores
This
tells you
how
manyis
If a simply
student’s
raw
score
standard
deviations
a student’s
1 ½ standard
deviations
score
above
or
below
the
mean
below the mean, what is
his/her
score
fell.
his/her z-score?
Standard Deviation
& z-scores
Deviation IQ
People popularly refer to
this as someone’s “IQ.”
This is where 100 equals the
average score, and each
standard deviation equals
15 points.
Deviation IQ
This is where 100 equals the average
score, and each standard deviation
equals 15 points. So, hypothetically 68%
(2/3) of the population would receive a
deviation IQ score of between 85 and 115.
People popularly refer to this number as
someone’s “IQ”. It is really just one of a
number of ways of changing a raw score
into a different kind of score that can be
compared across tests or people.
Stanine
This is another kind of standard score,
where the entire range of possible raw
scores is divided up into nine groups. The
lowest group of scores fall in Stanine 1.
This highest group of scores fall in Stanine
9. Stanine 5 includes scores right in the
middle (average) -- scores that fall in
between the 40th and 60th percentile
would be in Stanine 5.
Stanines
• Good for BROAD
measurement of
performance.
• Pros
• Can compare scores
across tests
• Intervals are equal
• Cons
• Rough estimate of
performance (not as
sensitive as other
standard scores)
Percentile Rank
This tells us what percentage of
children scored at or below your child
on a particular measure (i.e. height,
weight, test score).
Percentile Rank
This tells us what percentage of children
scored below your child on a particular
measure (i.e. height, weight, test score).
For example, if your infant daughter is in
the 35th percentile for weight, that means
that 35% of girls at the same age as your
daughter weigh less than her. 65% of girls
at the same age weigh the a same or more.
Standard Deviation
& Percentile Rank
Percentile Ranks
Pros
• Generally easy to understand.
Cons
• Need to be careful, because the
intervals between ranks are not equal.
For example, the difference between
50th & 60th percentile is much smaller
than the difference between 70th & 80th
percentiles.
Age Equivalency
This tells us that a student’s score is
similar to the average score of
children at X age.
Ex. If your son’s language development (as
measured by a particular test) is at the
age equivalent of 4-3, it means that he
scored on that particular test at the same
level as the average child who was 4
years and 3 months old.
Age Equivalents
Pros
• Easy for parents to understand.
Cons
• Not equal units of measurement.
• Basically just a “ballpark” figure.
• Gives the erroneous idea that individuals
with developmental delays are just like
child, but in an older body.
Grade Level Equivalency
• Very misunderstood concept.
• This compares a child’s score to the
average performance of students at
different grade levels, if they were
to take the same test as your
student.
Grade Level Equivalency Example
If your second grader has a grade level
equivalency of 3.1 in language arts, it
means that your child scores the same as
the average third grader, in the first
month of the year, would have scored on
the second grade test. This does not mean
that your child is doing third grade work
or that he should be moved up into the
third grade. It does mean that he is
performing, on second grade work, more
like a third grader would.
Please take a
minute for the
minute paper.
Download