Hypothesis Testing and p-value portion of the presentation

advertisement
Statistics Primer
ORC Staff:
Xin Xin (Cindy)
Ryan Glaman
Brett Kellerstedt
1
Quick Overview of Statistics
2
Descriptive vs. Inferential Statistics

Descriptive Statistics: summarize and describe data (central tendency, variability,
skewness)

Inferential Statistics: procedure for making inferences about population parameters
using sample statistics
Sample
Population
3
Measures of Central Tendency
4
Raw data
Mode
Median
Mean
Simple frequency
distribution
Pick out the value (s)
occurring more than
any other value.
Pick out the value (s)
with the highest
frequency.
Order data
2. Determine median
position = (n+1)/2
3. Locate median
based on step 2.
1.
Add up all the data
values and divide by
the number of
values.
Find the product of all
the values and their
frequencies ; then add
all the products; and
finally divide by the
total frequency.
1.
X
N
or
X
n
Order data
2. Determine median
position = (n+1)/2
3. Locate median based
on step 2 using the
freq. column
 fX
f
Group frequency
distribution
 
1
L1  
 1   2


c


 ( n  C.F .) 
c
Lm   2
f med




Find the product of all
the midpoints and
their frequencies ; then
add all the products;
and finally divide by
the total frequency.
 fX
f
Notations
= Difference between the freq. of modal
1the freq. of the next lower class.
class and
= Difference between the freq. of modal
 2 the freq. of the next higher class.
class and
L1 = Lower class boundary of the modal
class
c = class width of the modal class
Lm=lower class boundary of median class
n = sample size
C.F. = sum of all frequencies lower than the
median class
fmed = frequency of the median class
c = class width of the median class
X = the actual values (for raw data and
ungrouped freq. dist.)
= midpoints (for group freq. dist.)
f = frequency
n = sample size
N = population size
 = summation or sum of
Measures of Variability
5
Description
Applicability
Advantage
Disadvantage
Range
Difference between the largest
and the smallest value in the
data.
1.
Interval/ratio
2. No outliers exist
1.
Mean
deviation
It measures the average
absolute deviations from the
mean. Uncommonly used
1.
Interval/ratio
2. When no outliers
exist
1.
Use all the data
2. Easy to interpret
1.
Variance/
standard
deviation
Variance is the average
squared deviations from the
mean.
1.
Interval/ratio
2. When no outliers
exist
1.
Provides good
statistical
properties, by
avoiding the use of
absolute values.
2. Use all the data
1.
Interval/ratio
2. When no outliers
exist
1.
Standard deviation is square
root of the variance.
Commonly used.
Sum of
Squares
Measures variability of the
scores, the total variation of
all scores
1.
Simple to calculate
Effect size
calculation
Highly influenced
by outliers.
2. Does not use all
data
1.
Not resistant to
outliers
2. Does not yield any
further useful
statistical
properties.
Not resistant to
outliers.
2. Variance depends
on the units of
measurement,
therefore not easy
to make
comparisons.
1.
Not resistant to
outliers.
5
Variance and Sum of Squares
SS   x  x 
2
x
x  x 
x  x 2
6
5
3
5
6
1
0
-2
0
1
1
0
4
0
1
x  x 


2
S
2
n 1
 x  x 
2
Mean = 5
S
n 1
6
Empirical Rule

The empirical rule states that symmetric or normal distribution with
population mean μ and standard deviation σ have the following
properties.
7
Sampling Distribution
All possible outcomes are shown below in Table 1.
Table 1. All possible outcomes when two balls are sampled with replacement.
Outcome
Ball 1
Ball 2
Mean
1
1
1
1.0
2
1
2
1.5
3
1
3
2.0
4
2
1
1.5
5
2
2
2.0
6
2
3
2.5
7
3
1
2.0
8
3
2
2.5
9
3
3
3.0
8
Sampling Error
As has been stated before, inferential statistics involve using a representative
sample to make judgments about a population. Lets say that we wanted to
determine the nature of the relationship between county and achievement
scores among Texas students. We could select a representative sample of say
10,000 students to conduct our study. If we find that there is a statistically
significant relationship in the sample we could then generalize this to the entire
population.
However, even the most representative sample is not going to be exactly the
same as its population. Given this, there is always a chance that the things we
find in a sample are anomalies and do not occur in the population that the
sample represents. This error is referred as sampling error.
9
Sampling Error
A formal definition of sampling error is as follows:
Sampling error occurs when random chance produces a sample
statistic that is not equal to the population parameter it
represents.
Due to sampling error there is always a chance that we are making
a mistake when rejecting or failing to reject our null hypothesis.
Remember that inferential procedures are
used to determine which of the statistical
hypotheses is true. This is done by
rejecting or failing to reject the null
hypothesis at the end of a procedure.
10
Sampling Distribution and Standard Error (SE)

https://www.youtube.com/watch?v=hvIDuEmWt2k
11
Hypothesis Testing

Null Hypothesis Statistical Significance Testing (NHSST)


Testing p-values using statistical significance tests
Effect Size

Measure magnitude of the effect (e.g., Cohen’s d)
12
Null Hypothesis Statistical Significance
Testing

Statistical significance testing answers the following question:


Assuming the sample data came from a population in which the null hypothesis is
exactly true, what is the probability of obtaining the sample statistic one got for
one’s sample data with the given sample size? (Thompson, 1994)
Alternatively:

Statistical significance testing is used to examine a statement about a relationship
between two variables.
13
Hypothetical Example

Is there a difference between the reading abilities of boys and girls?

Null Hypothesis (H0): There is not a difference between the reading abilities
of boys and girls.

Alternative Hypothesis (H1): There is a difference between the reading
abilities of boys and girls.

Alternative hypotheses may be non-directional (above) or directional (e.g., boys
have a higher reading ability than girls).
14
Testing the Hypothesis

Use a sampling distribution to calculate the probability of a statistical
outcome.

pcalc = likelihood of the sample’s result

pcalc < pcritical: reject H0

pcalc ≥ pcritical: fail to reject H0
15
Level of Significance (pcrit)

Alpha level (α) determines:

The probability at which you reject the null hypothesis

The probability of making a Type I error (typically .05 or .01)
True Outcome in Population
Observed
Outcome
Reject H0 is true
H0 is false
Reject H0
Type I error (α)
Correct Decision
Fail to reject H0
Correct Decision Type II error (β)
16
Example: Independent t-test

Research Question: Is there a difference between the reading abilities of boys
and girls?

Hypotheses:

H0: There is not a difference between the reading abilities of boys and girls.

H1: There is a difference between the reading abilities of boys and girls.
17
Dataset

Reading test scores (out of 100)
Boys
Girls
88
88
82
90
70
95
92
81
80
93
71
86
73
79
80
93
85
89
86
87
18
Significance Level

α = .05, two-tailed test

df = n1 + n2 – 2
= 10 + 10 – 2 = 18

Use t-table to determine tcrit

tcrit = ±2.101
19
Decision Rules

If tcalc > tcrit, then pcalc < pcrit


Reject H0
If tcalc ≤ tcrit, then pcalc ≥ pcrit

Fail to reject H0
p = .025
p = .025
-2.101
2.101
20
Computations
Boys
Girls
Frequency (N)
10
10
Sum (Σ)
807
881
Mean (𝑋)
80.70
88.10
Variance (S2)
55.34
26.54
Standard Deviation (S)
7.44
5.15
21
Computations cont.

Pooled variance
= 40.944

Standard Error
= 2.862
22
Computations cont.

Compute tcalc
𝑋1 − 𝑋2
𝑡=
𝑆𝐸𝑋1 −𝑋2

= -2.586
Decision: Reject H0. Girls scored statistically significantly higher on the
reading test than boys did.
23
Confidence Intervals

Sample means provide a point estimate of our population means. Due to
sampling error, our sample estimates may not perfectly represent our
populations of interest. It would be useful to have an interval estimate of our
population means so we know a plausible range of values that our population
means may fall within.

95% confidence intervals do this.

Can help reinforce the results of the significance test.
CI95 = 𝑥 ± tcrit (SE)
= -7.4 ± 2.101(2.862) = [-13.412, -1.387]
24
Statistical Significance vs. Importance of
Effect


Does finding that p < .05 mean the finding is relevant to the real world?

Not necessarily…

https://www.youtube.com/watch?v=5OL1RqHrZQ8
Effect size provides a measure of the magnitude of an effect


Practical significance
Cohen’s d, η2, and R2 are all types of effect sizes
25
Cohen’s d

Equation:
= -1.16


Guidelines:

d = .2 = small

d = .5 = moderate

d = .8 = large
Not only is our effect statistically significant, but the effect size is large.
26
Download