Probability and Expected Value

advertisement
Probability Introduction

The primary objective of this section is to learn how probability can
be used to help understand and quantify uncertainty, in order to make
more informed decisions.

To this point we have covered the first of the four areas of the class,
namely descriptive statistics. Now we begin the second area,
probability. Recall that probability is a numerical measure of chance
or likelihood, with higher numbers indicating a higher degree of
likelihood.

With probability, we assume that we know something about the entire
population, and use that knowledge to characterize samples from the
population.

We need some definitions to begin.
A random experiment is one in which the outcome is uncertain
before the experiment is performed.


An experimental outcome is one realization of the experiment.

The sample space is the collection of all possible outcomes.

An event is a collection of experimental outcomes.

Events are mutually exclusive if only one can occur at a time.
A set of events is collectively exhaustive if the set describes
every possible outcome of the experiment.

Probability - 1

There are two fundamental requirements of probability:
1.
The probability of an experimental outcome must be between 0
and 1.
2.
The sum of the probabilities of a set of mutually exclusive,
collectively exhaustive outcomes or events is 1.
Assigning Probability

There are three methods used to assign probabilities to outcomes or
events.
1.
The classical method assumes that all experimental outcomes
are mutually exclusive and equally likely, and hence the outcomes
have the same probability.
Examples:
2.
The relative frequency method uses historical data to calculate
how frequently the event or outcome has occurred in the past.
Examples:
3.
The subjective method relies on degree of belief. How likely
do we think the outcome is?
Examples:
Probability - 2
Important Relationships

The complement of an event is every other possible event but the one
in question.

When two events occur together, they intersect, and the probability of
their intersection is called the joint probability. The key word here is
AND.

The probability of the union of two events is the probability that one
or the other or both occur. The key word here is OR.

The probability that one event occurs, given that we know another
event has already occurred, is called the conditional probability. Key
words here include GIVEN, WHEN and OF.
Probability - 3


With these definitions, we can define some useful mathematical
relationships.

Addition:

Multiplication:
We can also define what it means if two events are statistically
independent:
Probability - 4
Contingency Tables

A contingency table gives us one way to look at all of these
probabilities and relationships. Consider the following example.
Suppose that 800 individuals participated in a market research study.
They were asked about a particular product, and a television
advertisement for the product. Each participant indicated whether or
not they had purchased the product, and whether or not they could
recall the ad. The results are shown in the table below:
Purchased
Did not purchase
Total

Recalled Ad
160
240
400
Did not recall
80
320
400
Total
240
560
800
Define R to be the event that an individual could recall the add, and B
to be the event he or she purchased (bought) the product.
P(R) =
P( B ) =
P(RB) =
P(B | R) =
Are purchasing the product and recalling the ad statistically
independent?
Probability - 5
Creating Contingency Tables in Excel

Excel has a utility called a Pivot Table that allows us to create and
analyze tabular summaries (contingency tables) of qualitative data. It
can also be used with quantitative data or combinations of
quantitative and qualitative data.

To use the pivot table feature, data must be entered in columns and
each column must have a title or header. Before invoking the
procedure, be sure that the cursor is in one of the cells containing a
header or data.

To start the “wizard,” go to Data/PivotTable and PivotChart Report.
In the first step, just click on Next (the default values are what we
want). In the second step, verify that the data range shown contains
all of the data that you want to analyze, then click on Next again.

In step 3, click on the button called “Layout.” You will be presented
with the following dialog box (except the buttons on the right will
change according to the data set you are using).
Probability - 6

At this point, click on and drag the button corresponding to the
variable that you want to be on the rows of your output table to the
area labeled “Row” and the variable you want in columns to the area
that says “Column.” Then drag either of the two buttons that you just
used to the “Data” area. I recommend always dragging one of the
qualitative variables’ buttons. The button should change to say
“Count of VARIABLE” “where VARIABLE is the name of the variable
that you dragged to the middle. Then say OK.

To complete the procedure there are a few other options you can
change if you desire, but I usually just click on Finish at this point
and change options later if the output is not what I desire. If you have
used a quantitative variable, you will likely want to group it. To do
so, right click on the variable name in the table. One item in the popup menu should say Group. Choose it, and then specify how you
want the variable to be grouped.

The pivot table can display several different types of summary
measues. The default or “normal” state is to display total counts.
There may be times that you want to display the numbers in the table
as overall percentages, as row percentages, etc. To change the
display, click any where in the table and go again to the
Data/PivotTable and PivotChart Report menu item. You should be at
step 3 again. Click on Layout and then double click what is in the
middle of the table (it should say “Count of…”). Then select options.
A drop down menu that says “Show Data As” will be in the middle of
the dialog box. Use the drop down menu to say how you want to
display the data. Then exit out of all of the boxes.
Probability - 7
A Note on Lists in Excel:

By default, Excel lists the categories of qualitative variables
alphabetically. You may want them listed in some kind of logical
ascending order (for example, you may want to list class standing as
Freshman, Sophomore, Junior and Senior). To tell Excel how you
want the labels to be ordered, go to the Tools menu, select options,
and then click on the tab called “Custom Lists.” Then you can type in
the list items in the order you want them (separate them with a comma
or return) in the List Entries section. Or you can import the list in the
order that you want by identifying the cells where they are listed.

Below is a portion of an Excel worksheet with both qualitative and
quantitative variables. It shows both a portion of the original data
and the the resulting pivot table. I created a custom list in Excel as
“Good, Very Good, Excellent.”
Probability - 8
The COUNTIF Function

Sometimes rather than create a contingency table, we just need to
count the frequency of each category within one quantitative variable.
The easiest way to do so in Excel is with the COUNTIF function.

The COUNTIF function has 2 arguments: a reference to the data set,
and a condition to be met. The function then looks at the cells
indicated by the first argument, and counts the number of times that
the condition (in the second argument) is met. If you want to look for
an exact value, then the second argument is just that value (or a
reference to the value). If, on the other hand, you want to enter a
range to look for, then the condition needs to be in quotes. For
example, if I wanted to look for the number of times the cells A1:A30
have values that exceed 10, I would enter
=COUNTIF(A1:A30,“>10”).

Using the same example as above, suppose that all we are interested
in knowing is the number of restaurants that each of the three ratings.
The ratings of the restaurants are listed in cells B2:B301. The first
thing I would do is enter the different possible ratings. For example, I
might type Good, Very Good, and Excellent in cells E8:E10. Then in
cell F8 I can enter =COUNTIF($B$2:$B$301,E8). This function says
to look in cells B2:B301 for what is entered in cell E8 (“Good” in this
case). Every time it finds Good it counts it. The final result will be
the total frequency of the word Good in cells B2:B301. To count the
frequency for the other two ratings, I can copy my formula in F8 to
cells F9:F10.
Probability - 9
Random Variables

Another way to represent the results from probabilistic experiments is
with random variables. Random variables are variables which have a
set of possible values, one for each experimental outcome.

There are two main types of random variables:
Discrete random variables take on a countable number of values. For
example, the number of defective items in a production batch is a
discrete random variable.
A discrete random variable, X, has an associated probability mass
function (pmf), f(x), where f(x) = P(X=x). The pmf must meet the
following:
a. 0  f(x)  1,
b. f(x) = 1.

The random variable X also has a cumulative mass function
(cmf) (also called the distribution function of X), denoted F(x),
where F(x)=P(Xx).

Example:
Probability - 10
Continuous random variables take on continuous or interval values
(there are an infinite number of possibilities). For example, the width
of an extruded bar is a continuous random variable.
A continuous random variable, Y, has an associated probability
density function (pdf), f(y). The pdf must meet the following:
a. f(y)  0,
b.
 f ( y)dy  1 .
all y
The probability that a continuous random variable is exactly equal to
a single value is 0. Instead, we quantify the probability that the
random variable falls within a certain interval:
b
P( a  Y  b)   f ( y) dy .
a
This is equivalent to saying that the probability is equal to the area
under the pdf curve between a and b.
We will show some examples of contintuous variables shortly.
Probability - 11
Expected Value and Variance

Probability distributions are often summarized using 2 measures
referred to as the expected value and the variance. The expected
value gives us information about the center of the distribution (it is
another name for population mean), and the variance tells us about
the spread. When we are talking about probability distributions, these
quantities are population quantities.

Mathematical representation:

The expected value is denoted E(X) or more commonly as .
The variance is denoted Var(X), or more commonly as 2.
Expected Value and Variance for a discrete random variable:
 = E( X) 
 xf ( x) ,
all x
2


2

= Var ( X)   ( x   ) f ( x)   x f ( x)    2 .
 all x

all x
2
Expected Value and Variance for a continuous random variable:
 = E( Y)   yf ( y)dy ,
all y
2



2

= Var ( Y)   ( y   ) f ( y) dy   y f ( y) dy   2 .
 all y

all y


2
As before, the standard deviation can also be used and is denoted .
Probability - 12

Examples:

Some rules of expected value and variance:
E(cX) = cE(X)
E(X1+X2) = E(X1) + E(X2)
Var(cX) = c2Var(X)
Var(c1X1+c2X2) = c12 Var(X1) + c22 Var(X2) + 2 c1c2Cov(X1,X2)
= c12 Var(X1) + c22 Var(X2) if X1 and X2 are
independent.
Probability - 13

Application to Portfolio Management
When looking at financial investments, financial managers use
expected or average return to measure a security's return. The
standard deviation or variance of return is the proper measure of risk.
It is common for investors to hold more than one security in an
investment portfolio to try to reduce the amount of risk. This is called
hedging.
Suppose we have two securities, A and B, in a portfolio. Let p A be
the proportion of money invested in security A and pB be the
proportion invested in security B. Also, let  2A be the variance in
return of security A,  2B be the variance in return of security B, and
AB be the covariance between securities A and B. Then the variance
of return for the portfolio is
VAR(portfolio) = p 2A  2A + p2B  2B + 2pApBAB.

Example
Suppose that an investor holds two securities in her portfolio. She
has $500 invested in Andrews stock and $1000 in Dean stock. She
has no information on the variance or covariance of returns, so she
takes a random sample of 4 years of returns and finds the results in
the table below.
Year
1
2
3
4
Andrews
0.08
0.25
0.10
0.04
Dean
-0.05
0.55
0.19
0.30
We want to estimate the variance of return of the investor's portfolio.
Probability - 14

First, what are pA and pB?

Next we need estimates of the variances and covariances of the
individual securities.

Now we can estimate the variance of the portfolio.

How does the risk of the portfolio compare to that of the individual
securities?
Probability - 15
Portfolio Management Practice Problem
Security F has an expected return of 10% and a standard deviation of 5%
per year. Security G has an expected return of 20% and a standard
deviation of 60% per year.
a. What is the expected return on a portfolio composed of 40% of Security
F and 60% of Security G?
b. Find the variance of the portfolio described in part a if the correlation
coefficient between F and G is .5.
c. Repeat part b if the correlation coefficient is -.5. What is the effect of
the correlation coefficient’s sign on the variance of the portfolio?
Probability - 16
Common Probability Distributions

Our goal for this section is to learn the assumptions and usefulness of
three common probability distributions. We will discuss their
applications and computations using spreadsheets. The binomial
distribution is discrete and the uniform and normal are continuous.
The Binomial Distribution

The binomial distribution describes probabilistic experiments with 2
possible outcomes. For example, if we are inspecting finished goods,
they can be classified as either good or bad.

There are three important assumptions of the binomial distribution.
1. We have n independent trials;
2. There are only 2 possible outcomes for each trial;
3. The probability of a "success," p, is constant from trial to
trial.
Probability - 17

If we will let X = the number of successes in n independent trials,
then
 n
P( X  i)    p i (1  p) n  i , for i=0,1,...,n,
i 
 n
n!
where   
.
 i  i!( n  i)!

Fortunately for us, Excel will find binomial probabilities so we don’t
have to do the computation above. The formula is
=BINOMDIST(x,n,p,I) where I is 0 if we want P(X=x) and I=1 if we
want P(Xx).

For the binomial distribution, = np, and 2 = np(1-p).

Example: In a population of sales invoices, 5 percent have no
shipping document attached. If an auditor takes a random sample of
50 invoices, what is the likelihood that 3 will have missing shipping
documents? What is the probability that there will be fewer than 1
with missing documents? What is the expected number of invoices
with missing documents?
Probability - 18
Common Continuous Distributions

We now begin to discuss continuous distributions. Recall that
finding probabilities with continuous distributions is equivalent to
finding areas under the curve.

Areas can always be found by integration and sometimes can be
found geometrically. We will see both types in our examples.

The most common continuous distribution is the normal distribution,
so we will spend most of our time with it. We will also discuss the
uniform distribution, since it occurs naturally in some cases, and
mainly because it gives insight into working with continuous
distributions.
The Uniform Distribution

The uniform distribution is similar to the classical method of
assigning probability in that it assumes that every outcome is equally
likely. There is a discrete uniform distribution (e.g., tossing a die),
but we will discuss the continuous uniform distribution.

The uniform distribution has density function
 1
 b  a for a  y  b

f ( y)  
0
otherwise.


Probability - 19

Because of its density, the uniform distribution is also called the
rectangular distribution.
0.5
0.4
f(y)
0.3
0.2
0.1
0
0
1
2
3
4
5
6
y
.

Examples:
P(Y=2) =
P(Y>2) =
P(Y<2) =
P(3Y4) =

( b  a)2
ab
2
For the uniform distribution,  =
and  =
. So for our
12
2
example,
=
and
Probability - 20
2 =
The Normal Distribution

Last time we talked about the bell-shaped rule. We want to say a
little more about this now. The numbers (68%, 95%, etc.) are derived
from the normal or gaussian distribution.

The normal distribution is the most commonly used continuous
distribution, and is assumed for many problems in statistical
inference. We would use it whenever we believe or have evidence
that the data we are working with have a most likely value in the
center, and that as we move to values away from that most likely
value in either direction, the probability of obtaining such values
declines.

The normal probability density function is
f ( y) 
1
e
2
1 y
 

2  
2
,   y   .
where  is the mean and  is the standard deviation of the
distribution.
Probability - 21
The normal pdf is a symmetric bell shaped curve.

0.4
0.35
0.3
0.25
f(y)
0.2
0.15
0.1
0.00135
0.00135
0.3413
0.3413
0.05
0.1359
0.1359
0.0214
0.0214
0
-4
-3
-2
-1
0
1
2
3
4
Number of standard deviations from the mean of y

Using the Normal Distribution:
In order to use the distribution to make probability calculations, you
probably learned in earlier classes to first transform the given
distribution to the standard normal distribution. The standard normal
is a normal distribution with mean 0 and variance 1. It is usually
represented by the random variable Z. Once the Z value is calculated,
it is possible to look up areas in a table.

We, however, will let the spreadsheet do this work for us. In the
spreadsheet we can find normal probabilities of the form P(X<x) by
using =NORMDIST(x,,,1).
Probability - 22

Even with the NORMDIST function, it is very important to draw a
picture of the probability that you want. That is because the
NORMDIST function only finds left-tail areas. If we want other
types of areas, we need to manipulate the desired probability to get it
in terms of left tail areas. The best way to visualize what areas we
want is to draw a picture.

Example:
Suppose we have a process which produces rods with a mean
diameter of .625 in. and a standard deviation of .01 in. The customer
requires that no diameter exceed .65 in., or be smaller than .6 in.
What proportion of parts will not meet these specifications?
Probability - 23
Download