EDF 802 Dr. Jeffrey Oescher Topic 1 - Statistical Inferences Revised - 23 January 2014 I. SPSS data A. Variables 1. Researchers loosely call the constructs being studied variables 2. Examples of variables are gender, educational level, achievement, anxiety, attitudes toward school, etc. 3. Variables are described as categorical or continuous (see below), independent or dependent, predictor or criterion a) Independent variables are those being manipulated by the researcher that influence other variables; IVs are usually categorical b) Dependent variables (DV) are those on which the effect of the manipulation of the IV is observed; DVs are usually continuous c) Predictor variable are those from which a prediction is made, while a criterion variable is that which is being predicted 4. Variables are the focus of most statistical analyses 5. Variables are represented on the horizontal axis in an SPSS data set B. Observations 1. 2. 3. C. The members of the samples being studied Also known as subjects or cases Represented on the vertical axis in an SPSS data set SPSS 1. 2. Differences between the DATA VIEW and the VARIABLE VIEW in a saved SPSS data file Creating data sets a) b) Identifying variables and coding them – horizontal axis Identifying subjects – vertical axis Table 1 Example Data Subject 1 2 3 4 5 6 7 8 9 10 Pretest 81 75 79 85 83 78 86 74 72 70 Posttest 92 94 89 95 93 89 91 88 86 84 1 Sex 1 2 1 2 1 2 1 2 1 2 Ethnicity 1 2 3 1 2 3 1 2 3 3. Cleaning data a) b) c) FREQUENCIES for categorical data (e.g., sex, ethnicity) DESCRIPTIVES for continuous data (e.g., pretest, posttest) Correcting mistakes (1) (2) (3) (4) 4. Obvious problems Not-so-obvious problems Outliers Substituting computed values (e.g., the mean of the non-missing items) Managing data sets, variables, and analyses of data 2 a) Data sets (1) (2) (3) Operating on the entire data set The DATA pull down menu Examples (a) (b) Select specific cases from within the entire data set Sort the data on a specific variable or combination of variables (c) Using the D1.SAV data set. How might you limit this data set to include only those subjects in Group 1? 3 4 b) Variables (1) (2) (3) Operating horizontally on each row The TRANSFORM pull down menu Examples (a) (b) Computing a new variable like a total score across items Counting missing values within a attitudinal scale (c) Again looking at D1.SAV, how might you compute the average on the five items? 5 6 c) Analyzing data (1) (2) (3) Operating vertically on each variable across all subjects The ANALYZE pull down menu Examples (a) (b) (c) Compute the mean and other descriptive statistics for a given variable Run an inferential analysis comparing groups on a dependent variable Again looking at D1.SAV, how might you compute the mean of ITEM1 across all subjects? 7 II. Descriptive statistics A. Succinct summaries of numerical data 1. 2. 3. B. Types of variables 1. 2. C. Categorical Continuous Types of descriptive statistics 1. 2. 3. 4. D. Central tendency - What is the “middle”? Variation – How varied are the scores? Relationships – How do variables relate to one another? Central tendency: mean, median, mode Variation: range, variance, standard deviation Relationship: correlations, regression Frequency: frequency of occurrence, proportions Interpreting descriptive statistics 1. Categorical variables a) Frequencies and percentages 8 b) c) Typically reported and interpreted in the narrative Using the Example Data found on the first page of this handout (EXAMPLE.SAV), how would you compute the frequency data on SEX and ETHNICITY and how would you summarize it in narrative form? An exanimation of the data indicates the total sample consisted of 30 students. Each group had 15 (50%) students participating. The sample was almost evenly split on gender with 16 (53%) males and 14 (47%) females. Students’ ages ranged from 7 to 10. Slightly less than one-half of the students were nine-year-olds. Eight-year and ten-year old students each accounted for less than one fourth sample, while seven-year-olds represented a very small proportion of the total sample. 2. Continuous variables a) b) Means, standard deviations, and correlations Typically reported in tables and interpreted in narrative form (1) Norm referenced interpretations compare a statistic to the scores of others (a) (b) (2) Criterion referenced interpretations compare a statistic to the underlying continuum of the variable being measured (a) (b) c) John’s score was in the 90th percentile (i.e., John performed better than 90% of the other students.) The subjects in Group 1 had an average score that was statistically significantly higher than the average score of the students in Group 2. John’s score indicates he has mastered 90% of the objectives for the unit. Generally speaking the first grade students in the study can add and subtract single digit numbers but cannot multiply or divide them. Using the Example Data found on the first page of this handout (EXAMPLE.SAV), how would you compute the mean of the PRETEST and POSTTEST and how would you interpret them if they are both tests of 100 points? Table 2 Descriptive Statistics for Age, Attitudinal and Cognitive Measures for the Total Sample Variable Age N 30 Mean 8.77 SD 0.90 Attitude 30 3.87 0.51 Exam 1 30 51.67 10.17 Exam 2 30 57.87 10.22 An examination of Table 2 indicates the average age for students in the sample was just under nine years. Scores on the Attitude Subscale indicated students had relatively positive attitudes 9 based on an underlying five point scale. Students on average answered correctly about two-thirds of the items on Exam 1 regardless of the group. On Exam 2, students answered correctly approximately three-fourths of the items. Variation in the scores across all four variables appears small, indicating a relatively homogenous sample. D. APA format for tables 1. 2. 3. 4. III. Only three horizontal lines No vertical lines Data in cells is centered horizontally Creating tables in Word Inferential statistics A. Populations, samples, and statistical inferences 1. 2. 3. Populations and parameters Samples and statistics Parameters and statistics - See Table 5.1, p 95 in Huck Table 1 Notation for Common Statistics and Parameters Statistical Focus Statistic Parameter Mean µ π 2 Variance s σ2 Standard deviation s σ Proportion p P Sample size n N B. Sampling subjects 1. Probability samples a) b) c) d) 2. Non-probability samples a) b) c) d) e) f) g) 3. Simple random Stratified random Systematic Cluster Purposive Maximum variation Reputation Typical case Extreme case Convenience (pre-existing groups) Snowball Generating samples a) SPSS - DATA SELECT CASES 10 b) 4. The need to generalize from sample statistics to population parameters a) b) IV. Table of random numbers Sampling error Probability models Hypothesis testing A. Sampling distributions - the foundation of hypothesis testing 1. 2. A distribution (i.e., frequency distribution) of sample statistics Different sampling distributions a) Sampling distribution of the mean (1) (2) b) Sampling distribution of the difference between two means (1) (2) c) πππ /πππ€ F-distribution Sampling distribution of the difference between two proportions (1) (2) 3. (π1 − π2 ) t-distribution Sampling distribution of the ratio of two variances (1) (2) d) π t-distribution π1 − π2 Chi-square distribution Characteristics of sampling distributions a) Central tendency (1) (2) b) What is the “middle” or “typical” statistic? The parameter being examined Variation (1) (2) How do the statistics vary within the sampling distribution? Sampling error (a) (b) (c) (d) B. The difference between the sample statistic and the population parameter The value of the parameter is not known The “standard deviation” of a sampling distribution is known as a “standard error” Sample data is used to estimate a sampling error Statistical inferential tests 11 1. Statistical hypotheses about parameters a) Null – the assumption of no difference or no relationship (1) (2) (3) b) Alternative – the existence of differences or relationships (1) (2) (3) 2. Notation – H0 No difference between two means – H0: µ1 − µ2 = 0 No relationship – H0: ρ = 0 Notation – H1 A difference exists between two means – H1: µ1 − µ2 ≠ 0 A relationship exists between variables – H1: ρ ≠ 0 The comparison of the observed statistic to the hypothesized parameter in standardized terms a) For most test statistics the general formula is as follows πππ π‘ ππ‘ππ‘ππ π‘ππ = b) One sample comparison of the mean π‘= c) (π1 − π2 ) − (µ1 − µ2 ) π(π1−π2) A specific relationship between two variables π§= e) (πΜ − µ) ππΜ Comparison of two means π‘= d) (π§π − π§π ) πππ The comparison of two variances πΉ= f) C. πππ πππ€ One sample goodness of fit (i.e., observed and expected proportions) π2 = g) (π − πΈ)2 πΈ See the attached handout for specific hypotheses and the tests associated with them Six steps for testing the null hypothesis 1. ππ‘ππ‘ππ π‘ππ − πππππππ‘ππ ππ‘ππππππ πΈππππ ππ π‘βπ ππ‘ππ‘ππ π‘ππ State H0 and H1 12 2. Set alpha (α) level a) b) 3. 4. 5. 6. D. Assume H0 is true and generate a sampling distribution of the appropriate test statistic Calculate the observed test statistic from the sample data Map the observed test statistic into the sampling distribution of the test statistic Ascertain if the observed test statistic is typical (i.e., accept H0) or atypical (i.e., reject H0) of the values of the test statistics in the sampling distribution Specific examples 1. 2. 3. E. Type I error Type II error EX1 - Exam 2 compared for Groups 1 and 2 EX1 - Exam 1 compared to a score of 55 for the entire sample EX1 - Exam 1 correlated with Exam 2 for the entire sample Issues of importance 1. 2. 3. Knowledge of what is being tested and why SPSS programming Lack of statistical theory and formulas 13