Lesson 1

advertisement
Chapter 1
Review of Variables:
Types:
Discrete: e.g. counts such as family size or indicator variables (0 or 1; success or
failure). When plotted, say a dot plot or bar graph, you have consistent gaps.
Continuous: e.g. include measured variables such as age, height, weight. When
plotted, say histogram or boxplot, there are either no gaps or the gaps are inconsistent.
Levels of Measurements
Nominal: categories are named rather than measured, e.g. sex, eye color, race
Ordinal: categories are ordered by name, e.g. shirt size (S, M, L), agreement
(disagree, agree, strongly agree).
Interval: equal differences between numbers represent equal differences in
variable or attribute being measured, e.g. degrees Fahrenheit, Year (A.D.) The zero point
is arbitrary. For instance zero degreed Fahrenheit does not indicate absence of heat and
thus 100 degrees F doe not mean “twice as hot” as 50 degrees F.
Ratio: numbers represent equal amounts from absolute zero. Scores compared as
ratios, such as age, dollars, time, and height. With ratio measurements the comparisons
have value, e.g. a person who is 6 feet tall is twice the height of someone who is 3 feet
tall.
Review of Statistical Concepts
Measures of Central Tendency – Mode (most frequently occurring), median (half
the observations fall at or below this point) and mean (the mathematical average equal to
the sum of the observations divided by the number of observations)
n
X
x
i 1
i
n
Measures of Spread – range, quantiles, variance
n
n
S2 
 (x
i 1
i
 x) 2
n 1
or S 2 
x
2
i
 nx 2
i 1
n 1
and standard deviation equals S.
1
Random Variables
Binomial: consists of a fixed n independent trials, with each success, X, having
equal probability, p, of success and failure, q = 1 – p. Then the probability of any number
of successes x in n trials can be found by:
n
P(X = x) = P(X  x )    p x q n  x where x = 0, 1, …., n
 x
Normal: X ~ N(u, σ2) has a normal distribution with parameters mean, u, and
variance σ2. Standardizing X by Z = (X – u)/ σ produces a standard normal variable with
a mean of 0 and σ2 of 1, probabilities for which can be found in Table B.1 of your text.
Properties relating the sample mean, X , and the normal distribution.
1. For x1, x2,…,xn a random sample taken from N(u, σ2) we know that the
distribution of X follows a normal distribution with a mean u and variance of
σ2/n
2. For any random variable (not necessarily normal) then by the Central Limit
Theorem X ~N(u, σ2/n). This theorem typically holds for n > 30 when there
is little skewness or outliers.
t- distribution: Under assumption 1 above, T =
X
S/ n
has a t-distribution w/ n –
1 degrees of freedom; written T~tn-1
Chi-Square (X2) distribution: For a normal random sample,  2 
(n  1)S 2
has
2
a X2 distribution with n – 1 degrees of freedom.
F – distribution (used in ANOVA): For two independent normal samples of size
n1 and n2 and variances 12 and  22 then:
S12
has a F distribution with degrees of freedom n1 – 1 and n2 – 2 and
S 22
is written F~Fn1- 1, n2 - 1
F
Confidence Intervals and Hypothesis Testing
Example: 2-sample t-test - - - Two groups of mental patients undergoing different
treatments for the same disorder. A measure of change in the health status is based on a
questionnaire and the responses are assumed normal with the same variance for each
group.
Group 1: n1 = 15
S1 = 2.5
X 1 = 15.1
Group 2: n2 = 15
S2 = 3.0
X 2 = 12.3
2
a. Find a 99% confidence interval for the true mean difference, u1 – u2.
b. Test to see if the true means are equal or not equal at alpha = 1%
Solution (a): Since the variances are assumed equal we use the “pooled” two-sample tprocedure. In this case we have the following:
( X  X 2 )  (1   2 )
(n  1)S12  (n 2  1)S 22
T 1
where S 2p  1
and T~tn1 +n2 -2
n1  n 2  2
1
1
Sp

n1 n 2
1
1
where t* comes from the t
n1 n 2
distribution with degrees of freedom (df) = n1 + n2 – 2 and probability of 1 – α/2 = t280.995
which from Table B.2 is 2.763
The 99% confidence interval is ( X1  X 2 ) ± t*Sp
1
1

15 15
= 2.80 ± 2.78 or the 99% confidence interval is (0.02, 5.58)
= (15.1 – 12.3) ± 2.763*2.76
Solution (b): We wish to test Ho: u1 = u2 versus Ha: u1 ≠ u2. We reject Ho if the p-value
for the test statistic is “too small”, i.e. less than alpha, where test statistic is found by:
( X  X 2 )  (1   2 )
T 1
and under Ho (1  2 ) = 0.
1
1
Sp

n1 n 2
=
2.80
= 2.78
1
1
2.76

15 15
and from Table B.2 with t28 2.78 lies between 2.763 and 3.047. Since Ha is two-sided,
the p-value is twice the probability associated with these two boundaries. Thus the pvalue is: 0.005 < p < 0.01. So we reject Ho and conclude that there is a difference
between the two means at the 1% level. Alternately, since the 99% confidence interval
does not contain zero we could reach the same conclusion.
3
Download