Probability Distributions

advertisement
Statistics and Quantitative
Analysis U4320
Segment 4: Statistics and
Quantitative Analysis
Prof. Sharyn O’Halloran
Probability Distributions

A. Distributions: How do simple
probability tables relate to distributions?

1. What is the Probability of getting a head? ( 1 coin
toss)
Prob.
1/2
0
1
Proportion of Heads
Probability
Distributions(cont.)

2. Now say we flip the coin twice. The picture now
looks like:
1/2
1/4
0
0 heads
1/2
1head
1
2 heads
Proportion of Heads
Probability
Distributions(cont.)

As number of coin tosses increases, the
distribution looks like a bell-shaped curve:
0
1/2
1
Proportion of Heads
Probability
Distributions(cont.)

3. General: Normal Distribution


Probability distributions are idealized bar graphs or
histograms. As we get more and more tosses, the
probability of any one observation falls to zero.
Thus, the final result is a bell-shaped curve
Probability
Distributions(cont.)

B. Properties of a Normal Distribution

1. Formulas: Mean and Variance
Population
Sample
N
Mean

X
2 
i
X 
i 1
N
N
Variance
n
( X
i
i 1
N
X
n
n
 )2
s2 
i
i 1
( X
i
 X )2
i 1
n 1
Probability
Distributions(cont.)

2. Note Difference with the Book

The population mean is written as:
   xp ( x ),


Variance as:
 2   ( x   ) 2 p( x ) .
Example: Two tosses of a coin
Number of Heads
x
0
1
2
Probability
p(x)
1/4
1/2
1/4
Probability
Distributions(cont.)

2. Note Difference with the Book (cont.)

So the average, or expected, number of heads in
two tosses of a coin is:
0*1/4 + 1*1/2 + 2*1/4 = 1.
Probability
Distributions(cont.)

3. Expected Value
E(x) = Average or Mean


E  x    = Expected Variance
2
Probability
Distributions(cont.)

C. Standard Normal Distribution

1.Definition: a normal distribution with mean 0 and
standard deviation 1.
Total Area of Curve = 1

Z-values - Are points on the x-axis that show how

 that point
Z-values is away from the
many standard deviations
mean m.
Probability
Distributions(cont.)

C. Standard Normal Distribution

2. Characteristics




symmetric
Unimodal.
continuous distributions
3. Example: Height of people are normally
distributed with mean 5'7"
Total Area of Curve = 1
area=1/2

What is the proportion of people taller than 5'7"?
5'7" 
Z-values
Probability
Distributions(cont.)

D. How to Calculate Z-scores


Definition: Z-value is the number of standard
deviations away from the mean
Definition: Z-tables give the probability (score) of
observing a particular z-value.
Probability
Distributions(cont.)

1. What is the area under the curve that is
greater than 1 ? Prob (Z>1)

The entry in the table is 0.159, which is the total
area to the right of 1.
Total Area of Curve = 1
0.159

z=1
Z-values
Probability
Distributions(cont.)

2. What is the area to the right of 1.64? Prob
(Z > 1.64)

The table gives 0.051, or about 5%.
Total Area of Curve = 1
0.051

1.64
Z-values
Probability
Distributions(cont.)

3. What is the area to the left of -1.64?
Prob (Z < -1.64)
Total Area of Curve = 1
0.051
-1.64

Z-values
Probability
Distributions(cont.)

4. What is the probability that an observation lies
between 0 and 1?
Prob (0 < Z < 1)
0.50
0.34
0.159

1.00
Z-values
Probability
Distributions(cont.)

5. How would you figure out the area between 1 and
1.5 on the graph?
Prob (1 < Z < 1.5)
0.159
0.092
0.067

1.00
1.50
Z-values
Probability
Distributions(cont.)

6. What is the area between -1 and 2? Prob (-1 < Z
< 2)


P (-1<Z<0) = .341
P (0<z<2) = .50-.023 =.477 
.818
0.159
0.477
0.341
0.023
-1.00

2.00
Z-values
0.477 + 0.341 =
Probability
Distributions(cont.)

7. What is the area between -2 and 2?
Prob(-2<Z<2)
1- Prob (Z< -2) - Prob (Z>2) = 1 - .023 - .023 = .954
0.023
0.023
-2.00

2.00
Z-values
Probability
Distributions(cont.)

E. Standardization
 1. Standard Normal Distribution -- is a very special
case where the mean of distribution equals 0 and the
standard deviation equals 1.



Z-values
Probability
Distributions(cont.)

2. Case 1: Standard deviation differs from 1
 For a normal distribution with mean 0 and some
standard deviation , you can convert any point x
to the standard normal distribution by changing it
to x/.
SD=1
SD=3
-2.00'
-2.00

2.00
2.00'
Z-values
Probability
Distributions(cont.)

3. Case 2: Mean differs from 0
 So starting with any normal distribution with
mean and standard deviation 1, you can convert
to a standard normal by taking x- and using this
as your Z-value.
 Now, what would be the area under the graph
between 50 and 51?

Prob (0<Z<1) = .341
SD=1

x=51
Z-values
Probability
Distributions(cont.)

4. General Case: Mean not equal to 0 and SD not
equal to 1
 Say you have a normal distribution with mean &
standard deviation . You can convert any point x
in that distribution to the same point in the
standard
by computing
x normal

Z




.
This is called standardization.
The Z-value corresponds to x.
The Z-table lets you look up the Z-Score of any
number.
Probability
Distributions(cont.)

5. Trout Example:

a. The lengths of trout caught in a lake are
normally distributed with mean 9.5" and
standard deviation 1.4". There is a law that you
can't keep any fish below 12". What percent of
the trout is this? (Can keep above 12)

Step 1: Standardize

Find the Z-score of 12: Prob (x>12)
12 - 9.5
Z = --------- = 1.79.
1.4

Step 2: Find z-score

Find Prob (Z>1.79)
 Look up 1.79 in your table; only .037, or about
4% of the fish could be kept.
Probability
Distributions(cont.)

5. Trout Example (cont.):

b. Now they're thinking of changing the standard
to 10" instead of 12". What proportion of fish
could be kept under the new limit?

Standardize
Prob (x>10)
10 - 9.5
Z = --------- = 0.36.
1.4

Step 2: Find z-score Prob (Z>.36)
 In your tables, this gives .359, or almost 36%
of the fish could be kept under the new law.
Joint Distributions

A. Probability Tables

1. Example: Toss a coin 3 times. How many heads
and how many runs do we observe?


Def: A run is a sequence of one or more of the
same event in a row
Possible outcomes
Toss
Probability
TTT
TTH
THT
THH
HTT
HTH
HHT
HHH
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8
Heads
x
0
1
1
2
1
2
2
3
Runs
y
1
2
3
2
2
3
2
1
Joint Distributions

(cont.)
2. Joint Distribution Table
Runs
Heads
x
0
1
2
3
y
1
2
3
1/8
0
0
1/8
1/4
0
1/4 (2/8)
1/4 (2/8)
0
1/2
0
1/8
1/8
0
1/4
1/8
3/8
3/8
1/8
1
Joint Distributions

(cont.)
3. Definition: The joint probability of x and y is the
probability that both x and y occur.
p(x,y) = Pr(X and Y)
p(0, 1) = 1/8,
0.
p(1, 2) = 1/4,
and p(3, 3) =
Joint Distributions

(cont.)
B. Marginal Probabilities

1. Def: Marginal probability is the sum of the rows
and columns. The overall probability of an event
occurring.
p( x )   p( x , y ) .

y
So the probability
that there are just 1 head is
the prob of 1 head and 1 runs + 1 head and 2
runs + 1 head and 3 runs
= 0 + 1/4 + 1/8 = 3/8
Joint Distributions

(cont.)
C. Independence

A and B are independent if P(A|B) = P(A).
 P ( A| B ) 
P ( A& B )
;
P( B)
 P( A)  P( A| B),
 P ( A, B )  P ( A) P ( B ) .
Joint Distributions
(cont.)
Are the # of heads and the # of runs
independent?

# Runs
y
1
2
3
# heads
x
marg
dist
0
1/8
1
3/8
2
3/8
3
1/8
marg dist
1/4
1/2
1/4
1
Correlation and
Covariance

A. Covariance

1.Definition of Covariance

Which is defined as the expected value of the
product of the differences from the means.
 x , y  E ( X   x )( Y   Y )
N

( X
i
  x )( Yi   Y )
i 1
N
  ( X i   x )( Yi   Y ) p ( x , y ).
Correlation and
Covariance

2.Graph
Correlation and
Covariance

B. Correlation

1.Definition of Correlation
 x,y
Covariance


 x y
SDx * SDy
Correlation and
Covariance
2. Characteristics of Correlation

-1    1

if  =1 then
y
x
Correlation and
Covariance


2. Characteristics of Correlation (cont.)
if  = -1
y
x
Correlation and
Covariance
2. Characteristics of Correlation (cont.)


if  = 0
y
x
Correlation and
Covariance

2. Characteristics of Correlation (cont.)
Why?
 x,y


 x y
 (x  
i
x
)( yi   y )
N
n
n
2
(
x


)
 i x
2
(
y


)
 i y
N
N
i 1
i 1
Sample Homework
GET /FILE 'gss91.sys'.
The SPSS/PC+ system file is read from
file gss91.sys
The SPSS/PC+ system file contains
1517 cases, each consisting of
203 variables (including system variables).
203 variables will be used in this session.
------------------------------COMPUTE AFFAIRS = XMARSEX.
RECODE AFFAIRS (0,5,8,9 = SYSMIS) (1,2 = 0) (3,4 = 1).
VALUE LABELS AFFAIRS 0 'BAD' 1 'OK'.
The raw data or transformation pass is proceeding
1517 cases are written to the compressed active file.
***** Memory allows a total of 10345 Values, accumulated across all
Variables.
There also may be up to 1293 Value Labels for each Variable.
Sample Homework
------------------------------------------------------------------------------AFFAIRS
Value Label
BAD
OK
Valid
Cum
Value Frequency Percent Percent Percent
.00
1.00
.
870 57.4
90.2
90.2
94
6.2
9.8
100.0
553
36.5 Missing
------- ------- ------Total
1517 100.0
100.0
-------------------------------------------------------------------------------
Sample Homework
AFFAIRS
Mean
Mode
Kurtosis
S E Skew
Maximum
Valid cases
.098
.000
5.398
.079
1.000
964
Std err
Std dev
S E Kurt
Range
Sum
.010
.297
.157
1.000
94.000
Missing cases 553
Median
Variance
Skewness
Minimum
.000
.088
2.718
.000
Sample Homework
COMPUTE MONEY = INCOME91.
RECODE MONEY (0,22,98,99 = SYSMIS) (1 THRU 15 = 0) (16 THRU 21 = 1).
VALUE LABELS MONEY 0 'LOW' 1 'HIGH'.
FREQUENCIES /VARIABLES AFFAIRS MONEY /STATISTICS ALL.
MONEY
Value Label
LOW
HIGH
Valid Cum
Value Frequency Percent Percent Percent
.00
1.00
787
51.9
57.5 57.5
581
38.3 42.5 100.0
.
149
9.8 Missing
------------- ------Total
1517
100.0 100.0
-------------------------------------------------------------------------------
Sample Homework
MONEY
Mean
.425
Mode
.000
Kurtosis -1.910
S E Skew .066
Maximum 1.000
Valid cases 1368
Std err
.013
Std dev
.494
S E Kurt
.132
Range
1.000
Sum
581.000
Median
Variance
Skewness
Minimum
Missing cases 149
.000
.245
.305
.000
Sample Homework
CROSSTABS /TABLES=MONEY BY AFFAIRS /CELLS /STATISTICS=CORR.
Memory allows for 7,021 cells with 2 dimensions for general CROSSTABS.
------------------------------------------------------------------------------MONEY by AFFAIRS
AFFAIRS
MONEY
LOW
COUNT
ROW %
COL. %
TOTAL %
HIGH
COUNT
ROW %
COL. %
TOTAL %
ROW
TOTAL
BAD
450
89.1
57.0
51.6
339
92.4
43.0
38.9
789
90.5
OK
55
10.9
66.3
6.3
28
7.6
33.7
3.2
83
9.5
COLUMN
TOTAL
505
57.9
367
42.1
872
100.0
Sample Homework
Statistic
Value
---------------------------Pearson's R
-.05487
Spearman Correlation -.05487
Approximate
ASE1
T-value
Significance
-------- -----------------.03267 -1.62089
.10540
.03267 -1.62089
.10540
Number of Missing Observations: 645
Download