Squirrel. Level of measurement- the relationship among the numbers assigned to info- critical to describing and interpreting psych tests and measurement results. Use at item, scale, and test result level. Nominal (name) scale- assign #s to rep groups or categories of information. #s serve as labels only, no calcs. Ex: are you currently happy? 1-yes 2-no. Identifies. Categorical data-grouped according to common property. Few ways to describe/ manipulate data Frequencies Ord(er)inal scale-#s assigned to order or rank on attribute being measured. Shortest to tallest, shortest #1, tallest #20. Indicates indiv or object’s value based on rel. to others in group. Ranking hs(ers) by GPA, class standing. #/rank only has meaning w/i group being tested, provides no info about group as whole. Gives no info about how closely two indiv or objects are related. Interval scales- each # reps a point that is an = distance from the points adjacent to it (gap/distance) can perform stat calcs, mean and SD. Comparison of performance of 1 group to another, test norms, standard scores. Ex: temp. Ratio scales- have a point that reps an absolute absence of the property being measured. Ex: scale in bathroom. Allow ratio comparisons (2x). raw scores- basic scores calc from psych test. Tell v little about how indiv performed on test, performed in comparison, or on one test to another test. Not useful w/o add. Interpretive info. Frequency dist. – orderely arrangement of a group of #s (test scores). Show actual #(or%) of obsv. That fall into range or cat. Provide summary and pic of group data- bar graph and histograms- reps frequency data in stats. Continuous variable can fall anywhere. Freq on y axis, values on x. Bar graph- spaces, used for categorical reasons. Normal distribution- theoretical distributions that are perfect & symmetrical, have bell curve. Very rare. Skewed distribution- majority of scores fall to the right or left of the median. (+) distributions- tail points to the right. More low scores than high. Don’t want for tests. (-ly) skewed distributions- tail points to the left, most people did well and got high scores. Sd determines height & width Transformed scores- % of scores in a distribution that fall at or below a given raw score. Score 60% percentile, 60% of indivs in comparison group scored at/ below score. Standardized scores- universally understood units in testing that allow test users to evaluate (inferences) about a person’s performance. Composite scores- each indiv pt, then overall score combined. Mean- average score in a distribution sample. Best measure of central tendency when not skewed. Most effected by outliers. Median- middle score in a group of scores. Mode- most common score in a distribution. Median or mode better w/ skewed data. Outliers- few values sig. higher or lower than most values. Range- highest score in distribution – lowest score. Variance- tells whether indiv scores tend to be similar or differ greatly from the mean. Dep on range of scores. Standard deviation- square root of variance. Tells distributions. Chi-square- categorical data. Line of best fit-straight line that best represents the data on a scatter plot. Correlations- extent to 2 or more variables fluctuate together. (+) corr, extent variables increase or decrease together; (-)correlation extent one variable increases as the other decreases. Norms-test scores achieved by identified group of indiv. Age/ grade normscommon- determine what age/grade level indiv is performing. Percentile rank- way to rank indiv on scale from 1 to 100%. Reliability/ precision- consistency of test scores. Concept X=T+E, t- true score, e= error, x=actual score. Systemic- happens each time, effects score same way every time. More interest. Random error- random influences that can affect results. Hard to control. Variance(error) SD^2, fluctuation common. True variance- thing to measure based on underlying construction, scores varies. Want to see. Error variance- need to disentangle. Natural vs error. 4 Sources of variance: 1. Person: true level of trait/construct being measured. Variability in the person not connected w/ trait. Can cont to in or dec. scores. 2. Test – content relevant to the trait being measured. Errors in content sampling, tricky, unclear. Errors in item construction- easily misunderstood. 3.test adm- inconsistent test adm. 4.Scoring-1 scorer is inconsistent. Or two scorers don’t agree. Types of reliability estimates: 1. Test- retest- give same test, same group 2x. issues- floor/ceiling effects. Measuring something we shouldn’t expect to stay stable. Only works measuring stable characteristic. Carry over effects, time, assumptions about trait measured. Parallel/ alt forms- two different forms, same test, same day. Correlation. Hard to do. Problems- practice effects, fatigue effects. Forms may not be exactly equal. Internal consistency- how related item(s) on test are to e.o. errors- split test in half, fewer ? = lower reliability. Content. Can’t measure x constructs. Homogeneity- all items measuring same thing. Inter-rater/ obser- amount of consistency among scorers’ judgements- problems- base rates, training and experience, cultural background of rater/coder. .8 and above, v reliable. .3 and below, not very reliable. Between-ok Enhance reliability- increase test length, remove inconsistent items, standardize adm and scoring, increase test taker cooperation. Validity- accuracy, meaningful, appropriate. Want more. Reliability doesn’t matter if it isn’t valid. Types of validity: face validity how well it appears to measure what it’s supposed to. Superficial, misleading. Content validity- are behaviors sampled a rep sample of attribute assessed? Irrelevant/missing content. Always start by looking at the literature. Criterion validity- test predicts performance on a measure of interest. Criterion- another relevant test, behavior, outcome. Predictor- assessment tool. Validity coefficient- correlation between the criterion and predictor. Ex: criterion: job performance measured by # of sales made in 3 months Predictor- score on the assessment tool. Validity coefficient: correlation between # of sales and assessment. 2 options- concurrentGet sample of existing employees, Adm test, Compute scores (predictor),Record # of sales in 3 months (criterion)Correlate. Predictive- Adm test to all applicants, Compute scores (predictor),Hire everyone, Wait, Record # of sales in 3 months (criterion). Construct validity- Degree to which test measures the hypothetical construct or trait it says it is measuring. want construct to correlate w/ related behaviors, not correlate with behaviors unrelated. Evidence for construct validity-1. Predicted age change if construct theoretically changes with age, test scores should change with age. 2. Pre-test/Post-test changes- if intervention changes construct, test scores should theoretically change after the intervention. 3. Variation in distinct groups- if group membership theoretically changes scores, then different groups should show different scores. 4. Convergent validity- Test scores should correlate positively with scores on other tests* measuring same/similar construct previously validated tests. 5. Discriminant validity- Test scores should not correlate with scores on tests* measuring other constructs previously validated tests