Uploaded by Jessica W

451 cheat sheet 2

advertisement
Squirrel. Level of measurement- the relationship among the numbers assigned to info- critical to describing and interpreting psych
tests and measurement results. Use at item, scale, and test result level. Nominal (name) scale- assign #s to rep groups or categories of
information. #s serve as labels only, no calcs. Ex: are you currently happy? 1-yes 2-no. Identifies. Categorical data-grouped according
to common property. Few ways to describe/ manipulate data Frequencies Ord(er)inal scale-#s assigned to order or rank on attribute
being measured. Shortest to tallest, shortest #1, tallest #20. Indicates indiv or object’s value based on rel. to others in group. Ranking
hs(ers) by GPA, class standing. #/rank only has meaning w/i group being tested, provides no info about group as whole. Gives no info
about how closely two indiv or objects are related. Interval scales- each # reps a point that is an = distance from the points adjacent to
it (gap/distance) can perform stat calcs, mean and SD. Comparison of performance of 1 group to another, test norms, standard scores.
Ex: temp. Ratio scales- have a point that reps an absolute absence of the property being measured. Ex: scale in bathroom. Allow ratio
comparisons (2x). raw scores- basic scores calc from psych test. Tell v little about how indiv performed on test, performed in
comparison, or on one test to another test. Not useful w/o add. Interpretive info. Frequency dist. – orderely arrangement of a group of
#s (test scores). Show actual #(or%) of obsv. That fall into range or cat. Provide summary and pic of group data- bar graph and
histograms- reps frequency data in stats. Continuous variable can fall anywhere. Freq on y axis, values on x. Bar graph- spaces, used
for categorical reasons. Normal distribution- theoretical distributions that are perfect & symmetrical, have bell curve. Very rare.
Skewed distribution- majority of scores fall to the right or left of the median. (+) distributions- tail points to the right. More low
scores than high. Don’t want for tests. (-ly) skewed distributions- tail points to the left, most people did well and got high scores. Sd
determines height & width Transformed scores- % of scores in a distribution that fall at or below a given raw score. Score 60%
percentile, 60% of indivs in comparison group scored at/ below score. Standardized scores- universally understood units in testing
that allow test users to evaluate (inferences) about a person’s performance. Composite scores- each indiv pt, then overall score
combined. Mean- average score in a distribution sample. Best measure of central tendency when not skewed. Most effected by
outliers. Median- middle score in a group of scores. Mode- most common score in a distribution. Median or mode better w/ skewed
data. Outliers- few values sig. higher or lower than most values. Range- highest score in distribution – lowest score. Variance- tells
whether indiv scores tend to be similar or differ greatly from the mean. Dep on range of scores. Standard deviation- square root of
variance. Tells distributions. Chi-square- categorical data. Line of best fit-straight line that best represents the data on a scatter plot.
Correlations- extent to 2 or more variables fluctuate together. (+) corr, extent variables increase or decrease together; (-)correlation
extent one variable increases as the other decreases. Norms-test scores achieved by identified group of indiv. Age/ grade normscommon- determine what age/grade level indiv is performing. Percentile rank- way to rank indiv on scale from 1 to 100%. Reliability/
precision- consistency of test scores. Concept X=T+E, t- true score, e= error, x=actual score. Systemic- happens each time, effects
score same way every time. More interest. Random error- random influences that can affect results. Hard to control. Variance(error)
SD^2, fluctuation common. True variance- thing to measure based on underlying construction, scores varies. Want to see. Error
variance- need to disentangle. Natural vs error. 4 Sources of variance: 1. Person: true level of trait/construct being measured.
Variability in the person not connected w/ trait. Can cont to in or dec. scores. 2. Test – content relevant to the trait being measured.
Errors in content sampling, tricky, unclear. Errors in item construction- easily misunderstood. 3.test adm- inconsistent test adm.
4.Scoring-1 scorer is inconsistent. Or two scorers don’t agree. Types of reliability estimates: 1. Test- retest- give same test, same
group 2x. issues- floor/ceiling effects. Measuring something we shouldn’t expect to stay stable. Only works measuring stable
characteristic. Carry over effects, time, assumptions about trait measured. Parallel/ alt forms- two different forms, same test, same
day. Correlation. Hard to do. Problems- practice effects, fatigue effects. Forms may not be exactly equal. Internal consistency- how
related item(s) on test are to e.o. errors- split test in half, fewer ? = lower reliability. Content. Can’t measure x constructs.
Homogeneity- all items measuring same thing. Inter-rater/ obser- amount of consistency among scorers’ judgements- problems- base
rates, training and experience, cultural background of rater/coder. .8 and above, v reliable. .3 and below, not very reliable. Between-ok
Enhance reliability- increase test length, remove inconsistent items, standardize adm and scoring, increase test taker cooperation.
Validity- accuracy, meaningful, appropriate. Want more. Reliability doesn’t matter if it isn’t valid. Types of validity: face validity
how well it appears to measure what it’s supposed to. Superficial, misleading. Content validity- are behaviors sampled a rep sample
of attribute assessed? Irrelevant/missing content. Always start by looking at the literature. Criterion validity- test predicts
performance on a measure of interest. Criterion- another relevant test, behavior, outcome. Predictor- assessment tool. Validity
coefficient- correlation between the criterion and predictor. Ex: criterion: job performance measured by # of sales made in 3 months
Predictor- score on the assessment tool. Validity coefficient: correlation between # of sales and assessment. 2 options- concurrentGet sample of existing employees, Adm test, Compute scores (predictor),Record # of sales in 3 months (criterion)Correlate.
Predictive- Adm test to all applicants, Compute scores (predictor),Hire everyone, Wait, Record # of sales in 3 months (criterion).
Construct validity- Degree to which test measures the hypothetical construct or trait it says it is measuring. want construct to
correlate w/ related behaviors, not correlate with behaviors unrelated. Evidence for construct validity-1. Predicted age change if
construct theoretically changes with age, test scores should change with age. 2. Pre-test/Post-test changes- if intervention changes
construct, test scores should theoretically change after the intervention. 3. Variation in distinct groups- if group membership
theoretically changes scores, then different groups should show different scores. 4. Convergent validity- Test scores should correlate
positively with scores on other tests* measuring same/similar construct previously validated tests. 5. Discriminant validity- Test scores
should not correlate with scores on tests* measuring other constructs previously validated tests
Download