1. 2. 3. Observational Physiological and Neuroscientific Self-report --majority of social & behavioral science research Self-report measures People’s replies to written questionnaires or interviews Can measure: ▪ thoughts (cognitive self-reports) ▪ feelings (affective self-reports) ▪ actions (behavioral self-reports) Self-reported momentary emotions: Positive and Negative Affect Schedule (PANAS) Indicate the extent you feel this way right now: enthusiastic (Watson, Clark & Tellegen,1988) Very enthusiastic Not at all enthusiastic 1 2 3 4 5 Indicate the extent you feel this way right now: upset Not at all upset 1 2 3 4 5 Very upset Thing being measured Interval Nominal Ordinal 1st Place Sample Hot = 1 2nd Place Sample Warm = 3 3rd Place Sample 4th Place Sample Cold = 2 5th Place Sample Interval Ratio Distinction between scales is due to the meaning of numbers 1. Nominal Scale—numbers assigned are only labels. 2. Ordinal Scale—a rank ordering. 3. Interval Scale—each number equidistant from the next, but no zero point (majority of measures). 4. Ratio Scale—each number is equidistant and there is a true zero point. Type of Scale Determines Statistics and Power Nominal Ordinal Interval Ratio Statistics Power Chi-square Rank-order tests Parametric tests (F-tests, t-tests) Parametric tests and math operations Low Moderate High High Valid: measure assesses the construct it is intended to and is not influenced by other factors Reliable: the consistency of a measure, does it provide the same result repeatedly. Reliable but not Valid Dependable measure, but doesn’t measure what it should Example: Arm length to measure self-esteem. Valid but not Reliable Measures what it should, but not dependably Example: Stone as a measure of weight in Great Britain. Central dot = construct we are seeking to measure Test-Retest Reliability Measure administered at two points in time to assess consistency. Works best for things that do not change over time (e.g., intelligence). Internal Consistency Reliability Judgments of consistency of results across items in the same test administration session. 1. Intercorrelation: Chronbach’s α (> .65 is preferred) 2. Split halves reliability Content Validity Does the measure represent the range of possible items the it should cover based on the meaning of the measure. Predictive Validity measure predicts criterion measures that are assessed at a later time. Ex: Does aptitude assessment predict later success? Construct Validity Does the measure actually tap into intended construct? Guided spontaneous response from individuals in sample population (thought listings, essay questions…) Face valid items: develop items that appear to measure your construct. Pilot test a larger set of items and choose those that are more reliable & valid. Reversed coded items indicate whether participants are paying attention. Likert Scale: To what extent do you agree with the following statement… (0 to 9, strongly disagree-strongly agree) Semantic Differential: What is your response to (insert person, object, place, issue)? (-5 to +5, good-bad, like-dislike, warm-cold) The measure exists already in the literature Restriction of range: responses either at high or low end of scale (skew). Can you trust responses? Social desirability, demand characteristics & satisficing. 1. Develop subjective and objective versions of a new scale Example: Contact with Blacks scale: Objective: % of your neighborhood growing up Subjective: No Blacks—a lot of Blacks 2. Using 5+ items worded similarly provides greatly increased reliability and likelihood of success. 3. Human targets are rarely evaluated below the midpoint of the scale, so use more scale points (9 instead of 5 points). **Most Important** If you have a larger study ready and a great idea for a new scale comes up, build something and give it a shot! Response time measures Physiological measures Neuroscience: fMRI and other brain imaging Indirect measures: projective tests, etc. Facial and other behavior coding schemes (verbal/nonverbal) Cognitive measures: (memory, perception…) Task performance: academic, physical… Game theory: prisoner’s dilemma… Chronbach’s α: AnalyzeScaleReliability Analysis Pull over all scale items Click Statistics, select inter-item correlations OK Try Van Camp, Barden & Sloan (2010) data file. Centrality1Centrality8. Compare to manuscript. Many other reliability analyses involve correlations (test-retest, split halves) or probabilities (inter-rater reliability). Case Processing Summary N Cases Valid Excludeda Total Reliability Statistics Cronbach's % 109 86.5 17 13.5 126 100.0 Alpha Based on Cronbach's Standardized Alpha Items .706 N of Items .743 8 a. Listwise deletion based on all variables in the procedure. Inter-Item Correlation Matrix centrality1rev centrality2 centrality3 centrality4rev centrality5 centrality6 centrality7 centrality8rev centrality1rev 1.000 .244 .069 .297 .082 .170 .148 .208 centrality2 .244 1.000 .298 .323 .509 .411 .588 .031 centrality3 .069 .298 1.000 .206 .398 .337 .398 .042 centrality4rev .297 .323 .206 1.000 .213 .160 .350 .284 centrality5 .082 .509 .398 .213 1.000 .589 .637 -.063 centrality6 .170 .411 .337 .160 .589 1.000 .475 .075 centrality7 .148 .588 .398 .350 .637 .475 1.000 -.041 centrality8rev .208 .031 .042 .284 -.063 .075 -.041 1.000 Factor Analysis: determines factor structure of measures (does your measure assess one construct or multiple constructs? Is your proposed construct coherent?) Multi-trait Multi-method Matrix: using combination of existing measures and manipulations to establish convergent/ divergent validity with measure. Inter-rater Reliability Independent judges score participant responses and the % of agreement is assessed to indicate reliability. Used particularly for measures requiring coding (video coding, spontaneous responses…).