Statistical Handout

advertisement
EDF 802
Dr. Jeffrey Oescher
Topic 1 - Statistical Inferences
Revised - 23 January 2014
I.
SPSS data
A.
Variables
1.
Researchers loosely call the constructs being studied variables
2.
Examples of variables are gender, educational level, achievement, anxiety,
attitudes toward school, etc.
3.
Variables are described as categorical or continuous (see below), independent or
dependent, predictor or criterion
a)
Independent variables are those being manipulated by the researcher
that influence other variables; IVs are usually categorical
b)
Dependent variables (DV) are those on which the effect of the
manipulation of the IV is observed; DVs are usually continuous
c)
Predictor variable are those from which a prediction is made, while a
criterion variable is that which is being predicted
4.
Variables are the focus of most statistical analyses
5.
Variables are represented on the horizontal axis in an SPSS data set
B.
Observations
1.
2.
3.
C.
The members of the samples being studied
Also known as subjects or cases
Represented on the vertical axis in an SPSS data set
SPSS
1.
2.
Differences between the DATA VIEW and the VARIABLE VIEW in a saved SPSS
data file
Creating data sets
a)
b)
Identifying variables and coding them – horizontal axis
Identifying subjects – vertical axis
Table 1
Example Data
Subject
1
2
3
4
5
6
7
8
9
10
Pretest
81
75
79
85
83
78
86
74
72
70
Posttest
92
94
89
95
93
89
91
88
86
84
1
Sex
1
2
1
2
1
2
1
2
1
2
Ethnicity
1
2
3
1
2
3
1
2
3
3.
Cleaning data
a)
b)
c)
FREQUENCIES for categorical data (e.g., sex, ethnicity)
DESCRIPTIVES for continuous data (e.g., pretest, posttest)
Correcting mistakes
(1)
(2)
(3)
(4)
4.
Obvious problems
Not-so-obvious problems
Outliers
Substituting computed values (e.g., the mean of the non-missing
items)
Managing data sets, variables, and analyses of data
2
a)
Data sets
(1)
(2)
(3)
Operating on the entire data set
The DATA pull down menu
Examples
(a)
(b)
Select specific cases from within the entire data set
Sort the data on a specific variable or combination of
variables
(c)
Using the D1.SAV data set. How might you limit this data
set to include only those subjects in Group 1?
3
4
b)
Variables
(1)
(2)
(3)
Operating horizontally on each row
The TRANSFORM pull down menu
Examples
(a)
(b)
Computing a new variable like a total score across items
Counting missing values within a attitudinal scale
(c)
Again looking at D1.SAV, how might you compute the
average on the five items?
5
6
c)
Analyzing data
(1)
(2)
(3)
Operating vertically on each variable across all subjects
The ANALYZE pull down menu
Examples
(a)
(b)
(c)
Compute the mean and other descriptive statistics for a
given variable
Run an inferential analysis comparing groups on a
dependent variable
Again looking at D1.SAV, how might you compute the
mean of ITEM1 across all subjects?
7
II.
Descriptive statistics
A.
Succinct summaries of numerical data
1.
2.
3.
B.
Types of variables
1.
2.
C.
Categorical
Continuous
Types of descriptive statistics
1.
2.
3.
4.
D.
Central tendency - What is the “middle”?
Variation – How varied are the scores?
Relationships – How do variables relate to one another?
Central tendency: mean, median, mode
Variation: range, variance, standard deviation
Relationship: correlations, regression
Frequency: frequency of occurrence, proportions
Interpreting descriptive statistics
1.
Categorical variables
a)
Frequencies and percentages
8
b)
c)
Typically reported and interpreted in the narrative
Using the Example Data found on the first page of this handout
(EXAMPLE.SAV), how would you compute the frequency data on SEX
and ETHNICITY and how would you summarize it in narrative form?
An exanimation of the data indicates the total sample consisted of 30 students. Each group had
15 (50%) students participating. The sample was almost evenly split on gender with 16 (53%)
males and 14 (47%) females. Students’ ages ranged from 7 to 10. Slightly less than one-half of
the students were nine-year-olds. Eight-year and ten-year old students each accounted for less
than one fourth sample, while seven-year-olds represented a very small proportion of the total
sample.
2.
Continuous variables
a)
b)
Means, standard deviations, and correlations
Typically reported in tables and interpreted in narrative form
(1)
Norm referenced interpretations compare a statistic to the scores
of others
(a)
(b)
(2)
Criterion referenced interpretations compare a statistic to the
underlying continuum of the variable being measured
(a)
(b)
c)
John’s score was in the 90th percentile (i.e., John
performed better than 90% of the other students.)
The subjects in Group 1 had an average score that was
statistically significantly higher than the average score of
the students in Group 2.
John’s score indicates he has mastered 90% of the
objectives for the unit.
Generally speaking the first grade students in the study
can add and subtract single digit numbers but cannot
multiply or divide them.
Using the Example Data found on the first page of this handout
(EXAMPLE.SAV), how would you compute the mean of the PRETEST
and POSTTEST and how would you interpret them if they are both tests
of 100 points?
Table 2
Descriptive Statistics for Age, Attitudinal and Cognitive Measures for the Total Sample
Variable
Age
N
30
Mean
8.77
SD
0.90
Attitude
30
3.87
0.51
Exam 1
30
51.67
10.17
Exam 2
30
57.87
10.22
An examination of Table 2 indicates the average age for students in the sample was just under
nine years. Scores on the Attitude Subscale indicated students had relatively positive attitudes
9
based on an underlying five point scale. Students on average answered correctly about two-thirds
of the items on Exam 1 regardless of the group. On Exam 2, students answered correctly
approximately three-fourths of the items. Variation in the scores across all four variables appears
small, indicating a relatively homogenous sample.
D.
APA format for tables
1.
2.
3.
4.
III.
Only three horizontal lines
No vertical lines
Data in cells is centered horizontally
Creating tables in Word
Inferential statistics
A.
Populations, samples, and statistical inferences
1.
2.
3.
Populations and parameters
Samples and statistics
Parameters and statistics - See Table 5.1, p 95 in Huck
Table 1
Notation for Common Statistics and Parameters
Statistical Focus
Statistic
Parameter
Mean
µ
𝑋
2
Variance
s
σ2
Standard deviation
s
σ
Proportion
p
P
Sample size
n
N
B.
Sampling subjects
1.
Probability samples
a)
b)
c)
d)
2.
Non-probability samples
a)
b)
c)
d)
e)
f)
g)
3.
Simple random
Stratified random
Systematic
Cluster
Purposive
Maximum variation
Reputation
Typical case
Extreme case
Convenience (pre-existing groups)
Snowball
Generating samples
a)
SPSS - DATA SELECT CASES
10
b)
4.
The need to generalize from sample statistics to population parameters
a)
b)
IV.
Table of random numbers
Sampling error
Probability models
Hypothesis testing
A.
Sampling distributions - the foundation of hypothesis testing
1.
2.
A distribution (i.e., frequency distribution) of sample statistics
Different sampling distributions
a)
Sampling distribution of the mean
(1)
(2)
b)
Sampling distribution of the difference between two means
(1)
(2)
c)
𝑀𝑆𝑏 /𝑀𝑆𝑀
F-distribution
Sampling distribution of the difference between two proportions
(1)
(2)
3.
(𝑋1 − 𝑋2 )
t-distribution
Sampling distribution of the ratio of two variances
(1)
(2)
d)
𝑋
t-distribution
𝑃1 − 𝑃2
Chi-square distribution
Characteristics of sampling distributions
a)
Central tendency
(1)
(2)
b)
What is the “middle” or “typical” statistic?
The parameter being examined
Variation
(1)
(2)
How do the statistics vary within the sampling distribution?
Sampling error
(a)
(b)
(c)
(d)
B.
The difference between the sample statistic and the
population parameter
The value of the parameter is not known
The “standard deviation” of a sampling distribution is
known as a “standard error”
Sample data is used to estimate a sampling error
Statistical inferential tests
11
1.
Statistical hypotheses about parameters
a)
Null – the assumption of no difference or no relationship
(1)
(2)
(3)
b)
Alternative – the existence of differences or relationships
(1)
(2)
(3)
2.
Notation – H0
No difference between two means – H0: µ1 − µ2 = 0
No relationship – H0: ρ = 0
Notation – H1
A difference exists between two means – H1: µ1 − µ2 ≠ 0
A relationship exists between variables – H1: ρ ≠ 0
The comparison of the observed statistic to the hypothesized parameter in
standardized terms
a)
For most test statistics the general formula is as follows
𝑇𝑒𝑠𝑑 π‘†π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘ =
b)
One sample comparison of the mean
𝑑=
c)
(𝑋1 − 𝑋2 ) − (µ1 − µ2 )
𝑆(𝑋1−𝑋2)
A specific relationship between two variables
𝑧=
e)
(𝑋̅ − µ)
𝑆𝑋̅
Comparison of two means
𝑑=
d)
(π‘§π‘Ÿ − π‘§πœŒ )
π‘†π‘π‘Ÿ
The comparison of two variances
𝐹=
f)
C.
𝑀𝑆𝑏
𝑀𝑆𝑀
One sample goodness of fit (i.e., observed and expected proportions)
πœ’2 =
g)
(𝑂 − 𝐸)2
𝐸
See the attached handout for specific hypotheses and the tests
associated with them
Six steps for testing the null hypothesis
1.
π‘†π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘ − π‘ƒπ‘Žπ‘Ÿπ‘Žπ‘šπ‘’π‘‘π‘’π‘Ÿ
π‘†π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ πΈπ‘Ÿπ‘Ÿπ‘œπ‘Ÿ π‘œπ‘“ π‘‘β„Žπ‘’ π‘†π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘
State H0 and H1
12
2.
Set alpha (α) level
a)
b)
3.
4.
5.
6.
D.
Assume H0 is true and generate a sampling distribution of the appropriate
test statistic
Calculate the observed test statistic from the sample data
Map the observed test statistic into the sampling distribution of the test
statistic
Ascertain if the observed test statistic is typical (i.e., accept H0) or atypical
(i.e., reject H0) of the values of the test statistics in the sampling
distribution
Specific examples
1.
2.
3.
E.
Type I error
Type II error
EX1 - Exam 2 compared for Groups 1 and 2
EX1 - Exam 1 compared to a score of 55 for the entire sample
EX1 - Exam 1 correlated with Exam 2 for the entire sample
Issues of importance
1.
2.
3.
Knowledge of what is being tested and why
SPSS programming
Lack of statistical theory and formulas
13
Download