1
Norms and Basic Statistics for Testing 1
Definitions:
descriptive statistics: methods which provide a concise description of a collection of quantitative information. inferential statistics: methods used to make inferences to a population from observations from a sample
constant: a specific unchanging number, a characteristic that is the same for everyone in the population
variable: a symbol that can take on a variety of numerical values, a characteristic that differs for individuals in the population qualitative variable: a variable that differs by a characteristic value, rather than a numeric value quantitative variable: a variable that differs by a numeric value, rather than a characteristic value parameter: a value computed from an entire population to describe that population, usually denoted by Greek letters
Scales of Measurement
Measurement can be thought of as the application of specific procedures for assigning numbers to objects in order to transform qualities of attributes into numbers.
There are three properties of different scales of measurement:
1.
Magnitude
2.
Equal Intervals
3.
Absolute Zero
Four different scales have been defined based on these three different scales of measurement:
1.
Nominal
2.
Ordinal
3.
Interval
4.
Ratio
The level of measurement is important, because it determines which mathematical operations can be applied to your data.
1 Note that there is a typo in your book. On page 47 on the second line after the heading Percentiles and Z scores the value of 21.6 should be -1.6.
2
Frequency Distributions
Knowing the how well you did on a test is a function of how well you did in comparison to others. A score of 75 means something different if the average score is 90, as opposed to if the average score was 65.
A frequency distribution is a tabular (i.e. frequency table) or graphical (i.e. histogram) summary of scores for a group of individuals. It depicts the number of people who obtained each score.
Typically scores are arranged in order from lowest to highest. When a frequency distribution is graphical, the x-axis represents the scores and the y-axis represents the frequency of each score occurring. One can also represent the data in what is known as a frequency polygon.
The normal distribution , with its symmetric relationships, occurs very frequently. Keep in mind that this distribution is simply a frequency distribution with special properties.
Other special distributions that occur frequently include a positively skewed distribution and a negatively skewed distribution.
Frequency distributions can be grouped for a large number of observations, but some information will be lost. Frequency distributions must be grouped with continuous data.
If data is grouped then the width of the class interval must be determined. It should be small enough so that the information presented is meaningful and large enough so that you have a manageable number of intervals.
Percentile Ranks
Percentile ranks adjust for the number of scores in the group and convey the number of scores below a particular score. These are percentages and may be depicted as decimal values or as whole numbers.
Percentiles indicate the particular score at which a defined percentage of scores fall.
Quartiles and deciles are sometimes used to describe test scores. These values refer to groupings of the percentile scale.
Quartiles are points that divide a frequency distribution into four equal intervals such that the first quartile (i.e. Q
1
) is the 25 th
percentile, the second quartile (i.e. Q
2
) is the 50 th percentile, and the third quartile (i.e. Q
3
) is the 75 th
percentile. The interquartile range is the interval of scores between Q
1
and Q
3
or between the 25 th
and 75 th
percentile.
Deciles are points that divide a frequency distribution into ten equal intervals. They are similar to quartiles.
Measures of Central Tendency and Variability
Measures of central tendency indicate where the center of a distribution lies - the three most commonly used indices of central tendency are mean, median, and mode.
The mean is the average of all observations, denoted by
for a population and X for a sample
i
N
1
X i
N where N = the number of observations in the population
It can be thought of as the balancing point of the distribution
It is greatly affected by skewness because each and every score affects it
It is unbiased, meaning when calculating from a sample it does not systematically over or under estimate the population mean
It is the preferred measure for test score distributions because for many of the distributions encountered in testing it is more stable then either the median or the mode
The median is the value that half of the observations fall at or below.
It is the preferred measure of central tendency for highly skewed distribution
It is not sensitive to the magnitude of outliers or extreme values
It is the preferred measure of central tendency with skewed distributions
The mode is the most frequently occurring observation or score in a set of data.
It is not a very reliable or stable measure with quantitative data
It is the preferred statistic for qualitative data
For data that is normally distributed the mean = median = mode. For data that is positively skewed the mean > median > mode. For data that is negatively skewed the mean < median < mode.
Measures of variability indicate the degree to which observations vary – the three most commonly used indices of variability are range, variance, and standard deviation
The range indicates the difference between the largest and smallest observation occurring in the distribution.
The standard deviation (denoted by
for a population and by s for a sample) represents the average difference between observations and the mean. It can only be obtained by first finding the variance ( denoted by
2 for a population and s 2 for a sample) which represents the average squared difference between observations and the mean.
A deviation score is the difference, or distance, between an individual score and the group’s mean, denoted by x (i.e. x = X -
The standard deviation represents the average of deviation scores.
The variance can be obtained by the following formula:
2 i
N
1
X i
N
2 where N = number of observations
3
The standard deviation is simply the square root of the variance. It is easier to interpret because it is expressed in linear units, rather than squared units and can be obtained by the following formula:
2 i
N
1
X i
2
N where N = number of observations
It should be noted that if we are dealing with a sample, as opposed to a population then we need to divide by N - 1 and
is typically replaced by X . However, in testing situations we oftentimes have the population of interest.
Z-scores
A person’s score on any psychological measure is not very informative in itself. A score of 90/100 or 100% may mean different things depending on the makeup of the test.
The strengths and weaknesses of a person, school, or district are more clearly illustrated when performance is compared to a larger representative group.
Z-scores transform scores into standardized units that are easier to compare by making use of the mean and standard deviation. A z-score represents the difference between a score and the mean, divided by the standard deviation. In other words:
Z
X
or Z
X
X
S
Z-scores always have a mean of zero and a standard deviation of one.
It is important to be able to understand how percentile ranks correspond to z-scores.
Percentile ranks corresponding to z-scores represent the percentage of scores that fall below an observed z-score.
There are many other ways to transform raw scores so that they are more meaningful. Zscores are not widely used when reporting test results. Oftentimes, T-scores (McCall’s T in your book) are reported to avoid reporting negative standard scores. T-scores are simply standard scores with a mean of 50 and a standard deviation of 10. To transform a raw score into a t-score the raw score must first be converted into a z-score. Then
T –score = 50 + 10 ( z-score )
In reality, z-scores can be converted into any standard score scale, each defined by a mean and a standard deviation using the following formula:
Standard score = New mean + New standard deviation (z-score)
Standard scores (either expressed as z-scores or t-scores, or any other standard score) from different test administrations can be directly compared since the mean and standard deviation for all tests will be the same.
Stanine scores were developed and used by the air force in WWII. These standard scores have a mean of 5 and a standard deviation of approximately 2.0. This scoring
4
5 system forces all raw scores into a discrete number from 1 – 9 by . These scores lack precision due to the convenience of having only ten score categories.
NCE scores are standards scores with a mean of 50 and a standard deviation of approximately 21. These scores are easily confused with percentiles, and therefore their use is not recommended.
Grade equivalency scores are not typically standard scores! These type of scores are computed by simply finding the mean or median scale score for students at different grade levels (e.x. 4, 5, and 6) and interpolating between grade levels by dividing the difference in scores from two consecutive grade levels into 10 different categories.
Norms
Norms refer to information regarding the performance of a particular reference group on a particular measure for which a person’s score can be compared to.
The size of the reference group used for norming is not as important as to how representative that group is of the relevant population. Norming groups are selected to correspond to the total population in terms of geographic region, sex, age, urban vs. rural, community size, parent’s occupational and educational level, and ethnicity.
Norms are NOT standards. A norm is not necessarily representative of what is desired, but rather a measure of what is. However, sometimes norms are used to define standards.
The purpose of establishing norms is to determine how a test taker compares with others.
A norm referenced test compares each person with a norm. A criterion referenced test describes the specific types of skills, tasks, or knowledge that the test taker has demonstrated. Currently, many large scales test developers try to report both types of information, especially in the academic achievement domain.