Discrete Probability Distributions

advertisement
ECONOMICS 4630: REVIEW FOR MIDTERM II
General:
The test will consist of problems much like those on your homework sets
(see also problems in the text). In addition, there will be a
True/False/Uncertain section. You should be able to not only solve
problems, but demonstrate an understanding of what you are doing and why.
I will provide you with a formula sheet (see attached), so there is no need to
memorize formulae. Bring a calculator. This midterm covers topics in 5 –
10 on the syllabus (chapters 7 – 11 in the seventh edition of the textbook).
V.
Continuous Probability Distributions
A.
Probability density
B.
Continuous distributions in general
C.
The normal distribution
1.
The standard normal
2.
The general normal
VI.
Sampling
A.
Why sample?
B.
Probability sampling methods
1.
simple random sampling
2.
systematic random sampling
3.
stratified random sampling
4.
stratified cluster sampling
C.
Sampling error and sampling distributions
D.
The central limit theorem
VII.
Point Estimates and Confidence Intervals
B.
When  is known or when sample size is at least 30
C.
When  is unknown and when sample size is less than 30
D.
Confidence intervals for proportions
E.
Selecting the proper sample size
VIII.
One-Sample Hypothesis Testing
A.
Hypotheses and hypothesis testing in general
B.
Testing procedures
C.
One-tailed tests vs. two-tailed tests
D.
When  is known
E.
When  is unknown
F.
Hypothesis Testing Regarding Proportions
IX.
Two-Sample Hypothesis Testing
A.
When  is known
B.
When  is unknown
C.
Hypothesis Testing Regarding Proportions
FORMULA SHEET
To find percentiles: grouped data
q th percentile  L 
where: L =
CF =
q n  CF (i)
f
the lower limit of the class containing the percentile of interest
q=
the percentile of interest, stated in decimal terms (e.g. 75th percentile would be 0.75)
n=
total number of frequencies
f=
frequency in the class containing the percentile of interest
cumulative number of frequencies in the classes preceding the class
containing the percentile of interest
i=
class interval
To find percentiles (raw data)
Position of qth percentile = (n  1)
Q
100
, where Q is the percentile of interest stated in
percent terms (i.e. 75th percentile would be 75)
Interquartile Range
IQR = Q3 - Q1, where Q1 is the 25th percentile and Q3 is the 75th percentile
Linear Combinations of Random Variables
If Y  a  bX , then  Y  a  b X and  Y  b 
X
Special rule of multiplication
If A, B, C, … , Z are events, assuming that each outcome is independent of every
other (that is, the occurrence of one outcome has no effect on the probability of
the occurrence of any other outcome), then P(A B  C …  Z) =
P(A)*P(B)*P(C)*…*P(Z).
Combination formula
# of possible combinations 
n!
, where n = # of objects, r = # of objects
(n  r )!
used at one time.
Unions and Intersections
P( X  Y )  P( X )  P(Y )  P( X  Y )
This is the “general rule of addition”
If X and Y are mutually exclusive, then
P( X  Y )  P( X )  P(Y )
This is the “special rule of addition”
Conditional Probability
P( X Y ) 
P( x y ) 
P( X  Y )
, or using probability distribution notation
P(Y )
P ( x, y )
P( y )
These are the “general rule of multiplication”
Independence
Using set notation, two events, X and Y, are independent if
P( X  Y )  P( X ) P(Y ) or if
P( X Y )  P( X )
Using probability distribution notation, X and Y are independent if
P(x,y) = P(x)P(y) for all x,y or if
P( x y)  P( x) for all x, y
Mean, Variance, and Standard Deviation (population formulae)
   xP( x)
 2   ( x   ) 2 P ( x) or
 2   x 2 P( x)   2
  2
Mean, Variance, and Standard Deviation (sample formulae)
X 
1
 Xi
n
S2 
1
X i  X 2

n 1
S  S2
Covariance
 XY   ( x   x )( y   y ) P( x, y)
X
Y
 XY   xyP( x, y)   X Y
Correlation (Discrete Case)
 XY 
 XY
COV ( X , Y )

 X  Y SD( X )SD(Y )
Uniform Probability Distribution
P(x) = 1/s,
where x = a, a+1, a+2, …, a+(s-1), and where
a and s are integers, and s>0. a is the smallest value X takes on, s is
the total number of possible values that X can take on.
Mean and variance of uniform
a = a + (s-1)/2
2 = (s2 – 1)/12
Binomial Probability Distribution
 n
P( x)    x (1   ) n  x
 x
Where:  = probability of success
n = # of trials
X = # of successes in n trials
 n
n!
n(n  1)( n  2)...(1)
  

 x  x!(n  x)! [ x( x  1)( x  2)...(1)][( n  x)( n  x  1)( n  x  2)...(1)
Mean and variance of binomial
 X  n
 2 X  n (1   )
Hypergeometric Probability Distribution
S!
( N  S )!
X !( S  X )! (n  X )![( N  S )  (n  X )]!
P( X ) 
N!
n!( N  n)!
where
S = number of successes in population
n = sample size (# of trials)
N = population size
N-S = # of failures in the population
Mean and variance of hypergeometric
x 
 x2 
nS
N
n  S  N  S 
N2

N  n 
N 1
Poisson Probability Distribution
P( X ) 
 x e 
x!
,
where e = 2.7183
X = # of successes
 = average (mean) number of successes
Mean and variance of Poisson
x = 
2x = 
Standard Normal Probability Distribution
1
P( z ) 
2
e
1
( ) Z 2
2
General Normal Probability Distribution
P( x) 
1
 2
e
 1  x   
  

 2   
2
‘Standardizing’ a Normally Distributed Random Variable
If X is distributed normally with mean  and standard deviation , then
Z
x

will be distributed standard normal.
The Central Limit Theorem
In repeated random samples of a particular size, the sampling distribution of the
sample means is distributed approximately normal, with mean =  and standard
deviation of

n
. (If  is unknown, we use our estimate of it, S)
Margin of Error
Ez
S
,
n
where E is the tolerable margin of error
z is the critical value associated with a given confidence level from the standard
normal table
S is the sample standard deviation
n is the sample size
NOTE: if sample size is less than 30, replace z with t, the critical value from the t-table.
Confidence Intervals, Large Samples
X z
S
n
where z depends on the level of confidence (for example, if α = 0.05, z = 1.96).
NOTE: If σ is known, substitute it for S.
Confidence Intervals, Small Samples
X t
S
n
where t depends on the level of confidence (for example, if α = 0.01 and degrees of freedom = 9, t
= 3.25). NOTE: use n – 1 degrees of freedom.
Confidence Intervals, Proportions (Large Samples)
pz
p (1  p )
n
where z depends on the level of confidence (for example, if α = 0.05, z = 1.96), and p is the
sample proportion.
Confidence Intervals, Proportions (Small Samples)
pt
p (1  p )
n
where t depends on the level of confidence (for example, if α = 0.01 and degrees of freedom = 9, t
= 3.25), and p is the sample proportion. NOTE: use n – 1 degrees of freedom.
Test Statistic (for large sample sizes)
z
X 

or, if  is unknown,
n
z
X 
S
n
Test Statistic, Comparing Two Means (for large sample sizes)
z
X1  X 2
S12 S 22

n1 n2
Test Statistic, Proportions (for large sample sizes)
z
p 
 (1   )
n
, where p is the sample proportion and  is the population proportion
Test statistic, t-distribution (for small samples)
t
X  0
S
n
where 0 is the value of the mean hypothesized under the null.
NOTE: use n-1 degrees of freedom
Test statistic comparing two means (for small sample sizes)
X1  X 2
t
1
1
S p2   
 n1 n2 
S p2 
, where
(n1  1)( S12 )  (n2  1)( S 22 )
(n1  n2  2)
NOTE: use (n1 + n2 – 2) degrees of freedom
Test Statistic, Proportions (for small sample sizes)
t
p 
 (1   )
, where p is the sample proportion and  is the population proportion
n
NOTE: use (n – 2) degrees of freedom.
Test statistic for correlation coefficient, ρ
t
 n2
1  2
NOTE: use (n – 2) degrees of freedom.
Binomial Coefficients
 n
n!
  
 x  x!(n  x)!
x
0
1
2
3
4
5
6
7
8
9
10
1
1
1
2
1
1
3
3
1
1
4
6
4
1
1
5
10
10
5
1
1
6
15
20
15
6
1
1
7
21
35
35
21
7
1
1
8
28
56
70
56
28
8
1
1
9
36
84
126
126
84
36
9
1
1
10
45
120
210
252
210
120
45
10
1
1
11
55
165
330
462
462
330
165
55
11
1
12
66
220
495
792
924
792
495
220
66
1
13
78
286
715
1,287
1,716
1,716
1,287
715
286
1
14
91
364
1,001
2,002
3,003
3,432
3,003
2,002
1,001
1
15
105
455
1,365
3,003
5,005
6,435
6,435
5,005
3,003
1
16
120
560
1,820
4,368
8,008
11,440
12,870
11,440
8,008
1
17
136
680
2,380
6,188
12,376
19,448
24,310
24,310
19,448
1
18
153
816
3,060
8,568
18,564
31,824
43,758
48,620
43,758
1
19
171
969
3,876
11,628
27,132
50,388
75,582
92,378
92,378
1
20
190
1,140
4,845
15,504
38,760
77,520
125,970
167,960
184,756
n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Note: 0! = 1
Student’s t Distribution
df
0.100
1
2
3
4
5
0.020
3.078
1.886
1.638
1.533
1.476
Confidence Intervals
90%
95%
98%
99%
Level of Significance for One-Tailed Test
0.050
0.025
0.010
0.005
Level of Significance for Two-Tailed Test
0.10
0.05
0.02
0.01
6.314
12.706
31.821
63.657
2.920
4.303
6.965
9.925
2.353
3.182
4.541
5.841
2.132
2.776
3.747
4.604
2.015
2.571
3.365
4.032
6
7
8
9
10
1.440
1.415
1.397
1.383
1.372
1.943
1.895
1.860
1.833
1.812
2.447
2.365
2.306
2.262
2.228
3.143
2.998
2.896
2.821
2.764
3.707
3.499
3.355
3.250
3.169
5.959
5.408
5.041
4.781
4.587
11
12
13
14
15
1.363
1.356
1.350
1.345
1.341
1.796
1.782
1.771
1.761
1.753
2.201
2.179
2.160
2.145
2.131
2.718
2.681
2.650
2.624
2.602
3.106
3.055
3.012
2.977
2.947
4.437
4.318
4.221
4.140
4.073
16
17
18
19
20
1.337
1.333
1.330
1.328
1.325
1.746
1.740
1.734
1.729
1.725
2.120
2.110
2.101
2.093
2.086
2.853
2.567
2.552
2.539
2.528
2.921
2.898
2.878
2.861
2.845
4.015
3.965
3.922
3.883
3.850
21
22
23
24
25
1.323
1.321
1.319
1.318
1.316
1.721
1.717
1.714
1.711
1.708
2.080
2.074
2.069
2.064
2.060
2.518
2.508
2.500
2.492
2.485
2.831
2.819
2.807
2.797
2.787
3.819
3.792
3.768
3.745
3.725
26
27
28
29
30
1.315
1.314
1.313
1.311
1.310
1.706
1.703
1.701
1.699
1.697
2.056
2.052
2.048
2.045
2.042
2.479
2.473
2.467
2.462
2.457
2.779
2.771
2.763
2.756
2.750
3.707
3.690
3.674
3.659
3.646
40
60
120

1.303
1.296
1.289
1.282
1.684
1.671
1.658
1.645
2.021
2.000
1.980
1.960
2.423
2.390
2.358
2.326
2.704
2.660
2.617
2.576
3.551
3.460
3.373
3.291
80%
99.9%
0.0005
0.001
636.619
31.599
12.924
8.610
6.869
Download