EXAM 2 MATERIAL

advertisement
Stat303 - EXAM 2 MATERIAL
Exam Date: March 25, 26 or 27
Week 4: Probability and Inference
IPS - Ch 4
I) Inference - Using information in a sample to make a guess about a population
A) Parameters - associated with population
1) Do not vary - fixed
2) Common parameters
a)  - population mean
b)  - population standard deviation
c)  - population proportion
B) Statistics - associated with sample
1) Sampling variability - statistics can change from sample to sample
2) Common statistics
a) xbar - sample mean
b) s - sample standard deviation
c) p - sample proportion
C) Probability - how we make statements about populations using samples
1) Probability of an event=Proportion of time the event occurs in population
2) Notation: P(event)=Probability of an event
Week 5: Normal distributions
IPS - Ch 1.3
II) Normal Distribution Revisited
A) Recall - Empirical Rule : If distribution is approximately normal
1) 68% probability of being within 1 SD of mean
2) 95% probability of being within within 2 SD of mean
3) 99.7% probability of being within within 3 SD of mean
B) Notation: X~N(,2)=the variable X is normal with mean= and SD=
1) Special case
a) Standard Normal - N(0, 12)
b) Notation: Z = standard normal variable
C) Z table - gives exact probabilities for normal variables
1) Using table to find probabilities for standard normal variables
a) z = column heading plus row heading
b) P(standard normal variable is less than the value z) given in table
c) Notation: P(Z<z)=P(standard normal variable is less than the value z)
2) Finding probabilities for X~N(,2)
a) P(your variable is less than the value x)=P(Z<z) where z=(x-)/
Since Z=(X-)/ is a new variable with N(0,1) distribution
b) Notation: P(X < x)= P(your variable is less than the value x)
c) Intuitive interpretation of table z=(x-)/ is the number of SD's x is from the mean
P(Z < z)=probability a normal variable less then z SD's away from the mean
If z > 0: P(Z < z) =P(normal var. no more than z SD’s above mean)
P(Z > z) =P(normal var. more than z SD’s above mean)
If z < 0: P(Z < z) =P(normal var. more than z SD’s below mean)
P(Z > z) =P(normal var. no more than z SD’s below mean)
3) Useful properties of normal distribution
a) Symmetric -> P(Z > z) = P(Z < -z)
b) Total probability under curve equals 1 -> P(Z > z) = 1-P(Z< z)
D) Types of problems you need to solve using the Z table
1) Value, z, that corresponds to x
z = (x-)/
2) Value, x, that corresponds to z
x =  + z()
3) Probabilities for a variable X~N(,2) for the following
a) P(X < x)= P(X is less than the value x)
Find how many SD's x is from : z = (x-)/
Get P(Z < z) directly from table
b) P(X > x)= P(X is greater than the value x)
Find how many SD's x is from : z = (x-)/
Get P(Z > z) as 1 - P(Z < z)
c) P(X is less than x1 or greater than x2) = P(X < x1 or X > x2)
Find how many SD's x1 and x2 are from 
z1 = (x1-)/ and z2 = (x2-)/
Get P(Z < z or Z > z) as P(Z < z) + P(Z > z)
d) P(X is between x1 and x2) = P(x1 < X < x2)
Find how many SD's x1 and x2 are from 
z1 = (x1-)/ and z2 = (x2-)/
Get P(z1 < Z < z2) as P(Z < z2) - P(Z < z1)
4) Values of X~N(,2) or Z satisfying given probabilities
a) The value x* satisfying P(X < x*) = 
Find  in table
Add row and column heading to get z* satisfying P(Z < z*) = 
x* =  + z* 
b) The value x* satisfying P(X > x*) = 
Find 1- in table
Add row and column headings to get z* satisfying P(Z > z*) = 
x* =  + z* 
c) The value z*, satisfying P(Z > z*) =P(Z is more than z* from 0) = 
Find /2 in table
Add row and column headings to get the z*....denoted z* /2
To get the x*'s corresponding to -z and +z
x1* =  - z*  x1* =  + z* 
d) The value z*, satisfying P(-z* < Z < z*) = P(Z within z* of 0) = 1- 
Find z* satisfying P(Z more than z* from 0) = (see above- part c)
Week 6: Sampling Distributions
IPS - Ch 5
I) Sampling Distributions - distribution of a statistic calculated from a sample
A) Overview
1) Meaning of sampling distribution
a) Associated with each sample is a value of the statistic
b) Value of statistic changes from sample to sample
c) Distribution - what values of statistic occur and how often each value occurs
2) Why sampling distributions important
a) If distribution of X unknown - can't make probability statements about X
b) Under certain circumstances, many statistics (like xbar & p) are normal
Can make probability statements about Xbar and p
B) Sampling distribution of the sample proportion, p
1) Meaning - what values the sample proportion takes on, and how often each
value occurs
2) Under certain conditions p is approximately normally distributed
a) Mean = population proportion, 
b) Standard Error of p, SE(p)=sqrt[(1-)/n]
Sometimes called SD(p)
c) Z = (p-)/sqrt[(1-)/n] is approximately N(0,1)
d) P(Z < z) = P(sample proportion is less than SE(p)'s away from )
If z > 0: P(Z < z) =P(sample proportion no more than z SE(p)’s above )
P(Z > z) =P(sample proportion more than z SE(p)’s above )
If z < 0: P(Z < z) =P(sample proportion. more than z SE(p)’s below )
P(Z > z) =P(sample proportion no more than z SE(p)’s below )
3) Properties of Standard error of p
a) Bigger samples->Better estimates more often->less spread->smaller SE
4) Conditions that must hold
a) All n observations in a sample independent
b) Large sample size and  not close to 0 or 1
n*  10 AND
n*(1-)  10
C) Sampling distribution of the sample mean, Xbar
1) Interpretation - what values the sample mean takes on, and how often each
value occurs
2) Under certain conditions, Xbar is approximately normally distributed
a) Mean = population mean, 
b) Standard Error of Xbar , SE(Xbar) = /sqrt(n)
Sometimes called SD(xbar) or σxbar
c) Z = (Xbar-)/[/sqrt(n)] is approximately N(0,1)
d) P(Z < z) = P(sample mean is less than z SE's away from )
If z > 0: P(Z < z) =P(sample mean no more than z SD’s above )
P(Z > z) =P(sample mean more than z SD’s above )
If z < 0: P(Z < z) =P(sample mean more than z SD’s below )
P(Z > z) =P(sample mean. no more than z SD’s below )
e) Properties of Standard error of Xbar
Bigger samples ->Better estimates more often ->less spread ->smaller SE
3) Conditions that must hold
a) Variable you're sampling from is normally distributed(check normal
quantile plot of sample) OR sample size large (n>30)
b) All n observations in a sample independent
4) t Distribution - when s is used in place of 
a) [s/sqrt(n)] is an estimate of SE(xbar)
b) T = (Xbar-)/[s/sqrt(n)] has t distribution with n-1 df
c) P(T < t)=P(sample mean is less than t estimated SE's away from )
If t > 0: P(T < t) =P(sample mean no more than t estimated SE's above )
P(T > t) =P(sample mean more than t estimated SE's above )
If t < 0: P(T < t) =P(sample mean more than t estimated SE's below )
P(T > t) =P(sample mean no more than t estimated SE's below )
d) Properties of estimated standard error of Xbar
Changes from sample to sample even for same sample size
Bigger samples->Better estimates of  more often->t more like Z
Bigger samples->Better estimates more often->less spread->smaller SE
e) Conditions that must hold
i) Variable you're sampling from is normally distributed(check normal
quantile plot of sample) OR sample size large (n>30)
ii) All n observations in a sample independent
f) Using t table on web
Find df = n-1
Table set up the same as a Z table
g) Types of problems you need so solve using t tables
Same as those for Z table
Week 7: Confidence Intervals
IPS - Ch 6
I) Confidence Intervals
A) Overview
1) Purpose: estimate a parameter with an range of plausible values (interval)
2) Why use interval - sampling variability
3) Basic idea: Suppose X~N(, 2)
a) 95% of the X's within 1.96 SD's of  (empirical rule)
-> for 95% of the x's, the interval (x-1.96 , x+1.96 ) contains 
b) Choose just one observation, x
c) "95% confident" that x is one of the values within 1.96 SD's of 
-> "95% confident" that  is within (x - 1.96 , x + 1.96 )
d) Change 1.96 to z*/2 for other levels of confidence
See section II) Normal distribution, D, 2, c and d for finding z*/2
B) Confidence intervals for the probability/proportion in the population, 
1) Recall:under certain conditions p~N( , sqrt[(1-)/n]2 )
a) 95% of the p's within 1.96 SE's of 
-> for 95% of the p's in the population, (p-1.96 SE(p),p+1.96 SE(p)) contains 
b) Choose just one sample proportion, p
c) 95% confident that p is one of the sample proportions that is
within 1.96 SE's of 
-> 95% confident that  is in the interval (p-1.96 SE(p), p +1.96 SE(p))
d) Estimate SE(p) = sqrt[(1-)/n] with sqrt[(1-p)p/n]
e) Change 1.96 to z*/2 for other levels of confidence
2) When to use
a) Large sample size and  not close to 0 or 1
Estimate  with p
Check that n*p  10 AND n*(1-p)  10
b) All n observations are independent
C) Confidence intervals for the mean in the population, 
1) Recall: under certain conditions, Xbar~ N(, 2/n)
a) 95% of the xbar's within 1.96 SE's of 
-> for 95% of the xbar's, (xbar-1.96 SE(Xbar), xbar+SE(Xbar)) contains 
b) Choose just one sample mean, xbar
c) 95% confident that xbar is one of the sample means that is within
1.96 SE's of 
-> 95% confident that  is in (xbar-1.96SE(xbar),x bar+SE(xbar))
d) Change 1.96 to z*/2 for other levels of confidence
2) When to use
a) Original variable is normally distributed OR sample size large (n>30)
b) All n observations are independent
c) True SD of variable, , is known
3) Recall: when estimating the  with s
T = (Xbar-)/[s/sqrt(n)] has a t distribution with n-1 df
a) 95% confident  is in (xbar-t*n-1,0.025 s/sqrt(n), xbar-t*n-1,0.025 s/sqrt(n))
b) Change t*n-1,0.025 to t*/2 for other levels
c) When to use
Variable you're sampling from is normally distributed(check normal
quantile plot of sample) OR sample size large (n>30)
All n observations are independent
True SD of variable NOT known
Week 8: Hypothesis Testing
IPS Ch 7.1
I)Hypothesis Tests - answering questions about parameter(s) in population(s)
A) Overview 1)Logic behind hypothesis testing
a) Make claim about parameter in popln (usually Alternate Hypothesis, Ha)
b) Assume to the contrary that your claim is false (Null hypothesis, H o)
c) See if data are too unlikely given your assumption to the contrary
Too unlikely - reject Ho (your claim is supported)
Not too unlikely - Can't reject Ho (your claim is NOT supported)
2) Steps in hypotheses tests
a) Make a claim about the population (usually Alternate Hypothesis, H a)
Construct Null hypothesis, Ho, as opposite of Alternate Hypothesis, Ha
b) Decide on a probability that will be called "too unlikely" ()
Commonly used : 0.05, 0.01
c) Collect sample
d) Form a test statistic
e) See how likely data is if Ho really is true (p value)
f) Reject Ho if your data are too unlikely (p value<some set limit, )
or fail to reject Ho if data are not too unlikely (p value NOT < )
B) Deciding on 
1) Types of errors that can be made
a) Rejecting Ho when you shouldn't have - Type I
b) Failing to reject Ho when you should have - Type II
2) If Type I more critical - choose small 
3) If type II more critical - choose large 
II) Questions about single populations
A) Z Tests
1) Two tail
a) Ho :  = o Ha :   o
b) Decide on an  = probability that will be considered "too unlikely"
c) Test statistic:z=(xbar-o)/[/sqrt(n)]=# of SE's xbar is from o
d) p value= P[seeing data at least this extreme if Ho true (= o)]
= P(Xbar is at least this many SE's away from o)
= 2 P(Z < -z) if z is positive, or 2 P(Z < z) if z is negative
e) Reject Ho (accept your claim, Ha) if p value <  or
fail to reject Ho (fail to accept claim, Ha) if p value is NOT <
f) Type I error - Reject Ho when  really is = o
Type II error - Fail to Reject Ho when  really is  o
2) One tail
a) Ho :   o Ha :  > o
(or switch inequalities to test Ho:   o )
b) Set 
c) Test statistic : z = (xbar-o)/[/sqrt(n)]
d) p value=P[ seeing data at least this extreme if Ho is true (  o)]
= P(Xbar is more than z SE's above o)
(or below if Ho:  o)
= P(Z > z)
(or switch to < if Ho :   o)
e) Reject Ho (accept your claim, Ha) if p value <  or
fail to reject Ho (fail to accept claim, Ha) if p value is not<
f) Type I error - Reject Ho when  really is  o
Type II error - Fail to Reject Ho when  really is > o
( or switch inequalities)
3) When to use Z tests
a) True SD of variable is known
b) Variable you're sampling from is normally distributed (check normal
quantile plot of sample) OR sample size large (n > 30)
c) All n observations are independent
B) t Tests
1) Two tail
a) Ho :  = o Ha :   o
b) Decide on an  = probability representing "too unlikely"
c) Test statistic: t = (xbar-o)/[s/sqrt(n)] df = n-1
d) p value =Probability of seeing data at least this extreme if Ho is true
= P(Xbar is at least this many SE's away from o)
= 2 P(T < -t) if t is positive, or 2 P(T < t) if t is negative
e) Reject Ho (accept your claim, Ha) if p value <  or
fail to reject Ho (fail to accept claim, Ha) if p value is NOT < 
f) Type I error - Reject Ho when  really is = o
Type II error - Fail to Reject Ho when  really is  o
2) One tail
a) Ho:   o Ha: > o
(or switch inequalities to test Ho:   o )
b) Set 
c) Test statistic : t = (xbar-o)/[s/sqrt(n)] df = n-1
d) p value =Probability of seeing data at least this extreme if Ho is true
= P(Xbar is more than this many SE's above o)
(or below if Ho:  o)
= P(T > t)
(or switch to < if Ho:  o)
e) Reject Ho (accept your claim, Ha) if p value <  or
fail to reject Ho (fail to accept claim, Ha) if p value is not<
f) Type I error - Reject Ho when  really is  o
Type II error - Fail to Reject Ho when  really is > o
( or switch inequalities)
3) When to use t tests
a) True SD of variable is NOT known
b) Variable you're sampling from is normally distributed (check normal
quantile plot of sample) OR n is large (> 30)
c) All n observations are independent
C) Situations where z and t tests should NOT be used
1) Observations not independent
2) Data are not sampled from a normal distribution and small sample size (n<30)
YOU SHOULD KNOW FOR EXAMS/QUIZZES:
How to Calculate:
z tables 1. P(Z < z), P(Z > z), P(Z < - z), P(Z > -z)
2. P(Z more than z from 0) = P(Z< -z or Z > z)
3. P(Z within z of 0) = P( -z < Z < z)
4. z* given P(Z < z*) = 
5. z* given P(Z > z*) = 
6. z* given P(Z farther than z* from 0) = 
7. z* given P(Z within z* of 0) = 1-
8. The z corresponding to a given x
9. The x corresponding to a given z
10. P(a variable, X, is farther than z SD's from ) given X~N(, 2)
11. P(X < x), P(X > x), P(X<-x), P(X > -x) given x and X~N(, 2)
12. P(X is within z SD's of ) given n, z, and X~N(, 2)
13. P(Xbar<xbar),P(Xbar>xbar),P(Xbar<-xbar),P(Xbar>xbar) given xbar, n, 
and X~Normal or n>30
14. P(Xbar is farther than z SE(xbar)'s from ) given xbar, n, 
and X~Normal or n>30
15. P(Xbar is within z SE(xbar)'s of ) given xbar, n,  and X~Normal or n>30
16. P(p < some#),P(p > some#),P(p < -some#),P(p > some#) given some#, , and n
17. P(p is farther than z SE(p)'s from ) given  and n
18. P(p is within z SE(p)'s of ) given  and n
t tables 19. P(T < t), P(T > t), P(T < - t), P(T > -t) for some given t and n
20. P(T farther than t from 0) for some given t and n
21. P(T within t of 0) for some given t and n
22. t* given n and P(T < t*) = 
23. t* given n and P(T > t*) = 
24. t* given n and P(T farther than t* from 0) = 
25. t* given n and P(T within t* of 0) = 1-
26. t corresponding to xbar, given n and s and X~Normal with mean=
27. xbar corresponding to t, given n and s and X~Normal with mean=
28. P(Xbar<xbar), P(Xbar>xbar), P(Xbar<-xbar), P(Xbar>xbar) given xbar, n, s, 
and X~Normal or n>30
29. P(xbar farther than t estimated SE's from ) given t,n,s and X~Normal or n>30
30. P(xbar within t estimated SE's of ) given t, n, s and X~Normal or n>30
Standard Errors 31. Standard error of xbar, given  and n
32. Estimated standard error of xbar, given s and n
33. Standard error of p, given  and n
34. Estimated Standard Error of p, given p and n
Confidence intervals 36. 95% confidence interval for  given xbar, n, and s
37. 95% confidence interval for  given p, and n
How to perform Hypothesis tests: Given description of problem 1. Decide which test to use (if any)
2. Know how to check if conditions of test are satisfied
3. Decide on Ho and Ha
4. Choose between 2 given 
5. Compute test statistic
6. Compute and interpret p value of test statistic (one tail, two tail)
7. Reject Ho or fail to reject Ho based on  and p value
8. State conclusion in terms of the problem
Facts:
1. Conditions for sample mean to be normally distributed
2. Conditions for sample proportion to be normally distributed
3. When to construct confidence interval for  using z table
4. When to construct confidence interval for  using z table
5. When to construct confidence interval for  using t table
6. When to use each test
7. When not to use z or t tests
How to Identify:
1. Type I or Type II error in a given problem
2. Reasons why Xbar might not have normal distribution in a given problem
3. Reasons why p might not have normal distribution in a given problem
4. Reasons why z or t tests should not be used in a given problem
The Definition of:
sample
population
parameter



statistic
xbar
s
p
normal distribution
standard normal distribution
probability of an event
sampling distribution
standard error of Xbar
standard error of p
t distribution
degrees of freedom
independent observations
sampling variability
confidence interval
null hypothesis
alternate hypothesis

Type I error
Type II error
one tail
two tail
p value
Download