7/20/2015 Admin Choices Levels Statistics Measures Choosing SPSS Last Lab Assigned SOC497/L: SOCIOLOGY RESEARCH METHODS Threats to Internal Validity Some you can plan for, some you can’t (history) Elementary Statistics: Choices & Implications Either way, choices in research design have implications for what happens during data collection This lecture & lab: Ellis Godard Choices in measurement have implications for what happens (or can happen) during analysis SOC497 @ CSUN w/ Ellis Godard Admin Choices Levels Statistics Measures Choosing SPSS Admin Clickers… 1. 2. 3. 4. 5. Choices Levels Statistics Measures 4 Choosing SPSS Outline for Today Rock my world Add some value Whatever… Have problems Really, really suck. Operationalization Choices 57% Evolution, Components Levels of Measurement Review of Introductory Statistics Central Tendency & Dispersion Choosing Statistics Distributions & Shapes 21% 14% 7% Example & Lab 0% 1. 2. SOC497 @ CSUN w/ Ellis Godard Admin Choices Levels Statistics Measures 3. 4. Choosing 5. 2 SPSS Admin 1. How many full chapters of reading in the text were assigned for this lecture? 1. 1 38% 38% 2. 2 3. 3 4. 4 5. None of the above 13% SOC497 @ CSUN w/ Ellis Godard 13% Choices Levels Statistics Measures 5 Choosing SPSS Evolution of Operationalizations Kinds Recoding & computing Indices & Scales Select cases Crosstabs and other bivariate analyses Timing Ideally, would have spelled out in advance Often, arise during data analysis Consequence: 0% 1. 2. SOC497 @ CSUN w/ Ellis Godard SOC497 @ CSUN w/ Ellis Godard 3. 4. 5. 3 Changing the measurement changes the meaning! SOC497 @ CSUN w/ Ellis Godard 6 1 7/20/2015 Admin Choices Levels Statistics Measures Choosing SPSS Admin Components of Variation (sets) Meaningful standard intervals Meaningful spectrum of values Single measurement’s value, (as if) for 1 case Ratio measures originate/end at zero Values can be compared No other differences for our purposes Examples: e.g. age, Kelvin Jill has twice as much as Joe Two primary characteristics/requirements: Exhaustiveness: Able to classify every observation just use NOI Mutually exclusivity: each case fits 1&only1 value SOC497 @ CSUN w/ Ellis Godard Choices Levels Statistics Measures 7 Choosing SPSS Only exhaustive & mutually exclusive – just names Values cannot be ordered/ranked – apples/oranges Examples: gender, race, religion, department Ordinal variables: Choices Levels Statistics 10 Measures Choosing SPSS 3. Which value is from an interval variable? Nominal variables: SOC497 @ CSUN w/ Ellis Godard Admin Levels of Measurement SPSS No “true zero” age, gender, ethnicity, sexual orientation, SES, occupation Admin Choosing Distance between any two values can be calculated Difference between any two cases can be calculated Variable: a logical set of related attributes Measures Distance between successive values is clear One of those e.g.’s - Could apply to many cases Statistics Distance btwn attributes is measured & uniform Examples: young, female, Armenian, queer, wealthy, plumber Levels Interval Variables Attribute: Characteristic or quality of something Choices Also rank-ordered – more/less, higher/lower RanksRelative, not absolute 1. 2. 3. 4. 5. <8 years of education 12-14 years old >5 sexual partners $18K-$20K in income None of the above 20% 20% 20% 2. 3. 20% 20% Difference between 2 values or cases is unclear Range covered by each value may be unclear too Examples: short/medium/tall; <HS/HS/BA/+ 1. SOC497 @ CSUN w/ Ellis Godard Admin Choices Levels Statistics Measures 8 Choosing SPSS Admin Age Gender Race Religion All of the above Choices Statistics Measures 11 Choosing SPSS 2. which LOM appropriate for… 67% 22% 11% 1. Levels 5. Question 2 from 1st Day Quiz 2. Which variable has rankable values? 1. 2. 3. 4. 5. SOC497 @ CSUN w/ Ellis Godard 4. 0% 0% 2. 3. SOC497 @ CSUN w/ Ellis Godard SOC497 @ CSUN w/ Ellis Godard 4. 5. 9 College major – nominal Socioeconomic status (low medium high) – ordinal Average GPA – interval Occupation (plumber, accountant, teacher, etc.) – Nominal Able to compose web pages in HTML (yes or no) – nominal (b/c 2) Verbal complexity (on a 100-point continuum) – interval SOC497 @ CSUN w/ Ellis Godard 12 2 7/20/2015 Admin Choices Levels Statistics Measures Choosing SPSS Implications of Level Chosen Admin Choices Levels Statistics Measures Choosing Course Progress Analysis techniques require min. level.s Typically learn statistics in this order: Anticipate appropriate conclusions Review statistics first, then pair w/ levels Descriptive statistics for univariate data Sometimes need >1 level, >1 indicator Inferential statistics for univariate data Computing variables > new variable Indices & Scales – coming lecture Descriptions of bivariate relationships Note same as recoding (values > new values) SOC497 @ CSUN w/ Ellis Godard Admin Choices Levels Statistics Measures Inferences from bivariate relationships 13 Choosing SPSS Why Do Levels Matter? SOC497 @ CSUN w/ Ellis Godard Admin Each requires different univariate procedures Can compute most frequent (modal) religion Combinations require different bivariate procedures Stat. techniques requires (min.) level Statistics Measures Choosing SPSS Gender is a good example. How many of the respondents were women, as compared to men. Including discussion of Distributions Can’t compute “average religion” Levels Describing a sample in terms of a single variable Each level is described differently Choices 16 Univariate Analysis Associated w/ Different Statistics, 2 ways… SPSS Each has a set of assumptions about data Inc. mathematical manipulation of the values How answers R distributed across possible responses What is the shape of the distribution? (…) Where is this distribution centered? (typical value) How spread out is the distribution? (dispersion) Addition, subtraction, multiplication, division Require at least interval level of measurement SOC364 w/ Dr. Ellis Godard -Slide Admin Choices Levels Statistics Measures Choosing SPSS Introduction to Statistics Levels Statistics Measures Choosing SPSS Univariate analysis – single variables, not relationships, causes, etc. Bivariate analysis – two variables; test relationships, causes, etc. Multivariate Analysis – more than two variables SOC497 @ CSUN w/ Ellis Godard 15 U-shaped (polarized), log, etc. Flat/even/uniform Logarhythmic Warnings SOC497 @ CSUN w/ Ellis Godard Normal – not same as “bell-shaped” Skewed – left/right? heavy/slight? Oddities Descriptive statistics – numerically summarize observations Inferential statistics – generalize beyond a sample Complexity Choices Basic Targets Numbers vs. Procedures Parameters (about populations) vs. Statistics (about samples) Purposes Admin 17 Shapes of Distributions Meaning SOC497 @ CSUN w/ Ellis Godard 14 May be nothing distinctive Don’t exaggerate – almost certainly not normal! SOC497 @ CSUN w/ Ellis Godard 18 3 7/20/2015 Admin Choices Levels Statistics Measures Choosing SPSS Univariate Descriptive Statistics Central Tendency Mode: most frequently occurring value Median: middle value, when sample ordered Mean: arithmetic average, “center of gravity” Choices Variance Variation ratio: percent that isn’t the mode Range (Max – Min) & IQR (middle 50%) Variance & Standard Deviation Choices Levels Statistics Mean Standard Deviation SPSS ∑ Y ∑ (Y − Y ) i n σ2 ) Square root of variance Interval ∑ (Y − Y ) 2 i n σ ) SOC497 @ CSUN w/ Ellis Godard Admin Choices Levels Statistics Measures 22 Choosing SPSS Standard Deviations & Standard Errors Appropriate if… = Interval s = s2 = Deviations across sample distributions: Simple average (the sum of all interval (unless heavily the values, divided by the number skewed) of values) Y Appropriate if… (for population: Measures of Central Tendancy Formula SPSS Formula (for population: Choosing Choosing The average squared difference between each value and the mean 19 Measures Measures 2 SOC497 @ CSUN w/ Ellis Godard Admin Statistics s2 = Dispersion (how far are they spread out?) Levels More Measures of Dispersion (what is typical?) Admin i n About 68% of the values in a normal distribution will fall w/I 1 standard deviation of the man, 95% within 2.96, and 99.9 within 3. Deviations across sampling distributions: Median Mode If even number of cases, the median case (not value) is the (n/2)th case. Otherwise, it is the [(n+1)/2]th case Interval (if skewed), ordinal (mode too?) Highest (relative) frequency any, but best for nominal (only choice) SOC497 @ CSUN w/ Ellis Godard Admin Choices Levels Statistics Measures 20 Choosing SPSS Measures of Dispersion Formula Same idea, same distribution, same %’s But the standard deviation of a sampling distribution is called the standard error Don’t confuse that with sampling error SOC497 @ CSUN w/ Ellis Godard Admin Choices Appropriate if… The percent of cases not in the Nominal (because nothing modal category else works) Range Simple subtraction of the lowest Ordinal (works best), and value (the “minimum”) from the interval (esp. if sample highest (the “maximum”) range is less than population range) Interval (also perhaps okay for ordinal, if the sample is not small and the range is not short) SOC497 @ CSUN w/ Ellis Godard Choosing SPSS nominal data; don’t trust means for ordinal data; 2. 21 Shape of distribution – e.g. for skewed interval data, use both – the mean will differ from the median in the direction of skew i.e. higher if skewed right, lower if skewed left) Robustness – Here, a statistic is “robust” if it resists sampling deviations. The mean is fairly robust, but the median is less misleading if there are scraggly tales 4. Efficiency – use the highest level of precision available (modes are least precise) 5. SOC497 @ CSUN w/ Ellis Godard Measures What else makes a measure of central tendancy or dispersion “appropriate”? (in order of importance) 1. Scale of measurement – no medians or means for 3. Same as the range, but only of the middle half of the cases when ordered – i.e. from the 25th percentile to the 75th Statistics Criteria for Selection Variation Ratio Innerquartile range (IQR) Levels 23 When in doubt, use more than one SOC497 @ CSUN w/ Ellis Godard 24 4 7/20/2015 Admin Choices Levels Statistics Measures Choosing SPSS Admin Mode Median Variance Variation Ratio Standard Deviation 17% 11% 2. Statistics Measures 3. 4. Choosing SPSS Admin Choices SPSS ANALYZE – DESCRIPTIVES – FREQUENCIES Mean, stdev, a few others – but not all you need! Stats & Choose which ones you want More options, more output, tables, etc. Use this one!! Levels Statistics Measures 26 Choosing SPSS SOC497 @ CSUN w/ Ellis Godard Admin Choices Levels Statistics Measures 29 Choosing SPSS SPSS Tips for HW5 (Happy etc) Chi-square Central tendency: mean (median if skewed) Dispersion: standard deviation (mode “ “) ANALYZE – DESCRIPTIVES – CROSSTABS DV = row; IV = column Stats – Chi-square Correlations Central tendency: median (& mode?) Dispersion: range (IQR?) Nominal Choosing ANALYZE – DESCRIPTIVES – DESCRIPTIVES Ordinal Measures Mean and Standard deviation (taking advantage of equal increments between values) Interval Statistics Median and Range (because you can at least put the values in order) Question 1 from 1st Day Quiz Levels 28 Two options for basic stats (used the 2nd) Mode and variation ratio (because there are no alternatives) Choices Math 140! SPSS Tips for Today’s Lab SOC497 @ CSUN w/ Ellis Godard Any (though most of you did interval, if that) SOC497 @ CSUN w/ Ellis Godard Interval Admin Interval (t: 2 levels of Ord or Nom; F: >2 of O) 5. Ordinal Ord, Nom 25 Nominal SPSS 6. Formula for z (or t) Typical Choices Choosing 5. Which level(s) for regression? 22% 1. Levels Measures 4. which level(s) for t test or F test (ANOVA)? SOC497 @ CSUN w/ Ellis Godard Choices Statistics 3. Which level(s) appropriate for crosstabs? 44% 6% Admin Levels Questions 3-6 from 1st Day Quiz 4. To measure nominal dispersion use… 1. 2. 3. 4. 5. Choices Central tendency: mode (only!) Dispersion: variation ratio (Index of Qual. Variation) SOC497 @ CSUN w/ Ellis Godard SOC497 @ CSUN w/ Ellis Godard 27 ANALYZE – CORRELATE – BIVARIATE Just get those stats! Don’t worry about crosstab or corr. matrix SOC497 @ CSUN w/ Ellis Godard 30 5 7/20/2015 Admin Choices Levels Statistics Measures Choosing SPSS Lab Exercise Pick a dataset (from 497 or 364 sites) Pick 3 variables One for each level of measurement (I, O, N) Do NOT use the “measure” column in SPSS to pick!! Look at values column and/or codebook and/or freq tables Submit Printout of frequency tables & histograms Description of the shape of each distribution Report central tendency & dispersion of each Report the pieces and tell a story Use the statistics to describe the sample SOC497 @ CSUN w/ Ellis Godard Admin Choices Levels Statistics 31 Measures Choosing SPSS 5. To measure nominal dispersion use… 1. 2. 3. 4. 5. Mode Median Variance Variation Ratio Standard Deviation 76% 18% 6% 0% 0% 1. 2. SOC497 @ CSUN w/ Ellis Godard Admin Choices Levels Statistics Measures 3. Choosing 4. 5. 32 SPSS Quiz Scores by Clicker Attitude Points 2.39 2 2 1.33 Team Whatever… Really, really suck. Rock my world Add some value Points SOC497 @ CSUN w/ Ellis Godard SOC497 @ CSUN w/ Ellis Godard Team 33 6