Applied Statistics I Liang Zhang June 17, 2008

advertisement
Applied Statistics I
Liang Zhang
Department of Mathematics, University of Utah
June 17, 2008
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
1 / 22
Random Variables
Definition
A dicrete random variable is an rv whose possible values either constitute
a finite set or else can be listed in an infinite sequence in which there is a
first element, a second element, and so on (“countably” infinite).
A random variable is continuous if both of the following apply:
1. Its set of possible values consists either of all numbers in a single
interval on the number line (possibly infinite in extent, e.g., (−∞, ∞) ) or
all numbers in a disjoint union of such intervals (e.g., [0, 10] ∪ [20, 30]).
2. No possible value of the variable has positive probability, that is,
P(X = c) = 0 for any possible value c.
Examples
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
2 / 22
Random Variables
Examples:
1. Assume we toss a coin. Then S = {H, T}. We can define a rv X by
X (H) = 1 and X (T) = 0
2. A techincian is going to check the quality of 10 prodcuts. For each
product the outcome is either successful (S) or defective (D). Then we can
define a rv Y by
(
1, successful
Y =
0, defective
Definition
Any random variable whose only possible values are 0 abd 1 is called a
Bernoulli random variable.
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
3 / 22
Random Variables
More examples:
3. (Example 3.3) We are investigating two gas stations. Each has six gas
pumps. Consider the experiment in which the number of pumps in use at
a particular time of day is determined for each of the stations.
Define rv’s X , Y and U by
X = the total number of pumps in use at the two stations
Y = the difference between the number of pumps in use at station 1
and the number in use at station 2
U = the maximum of the numbers of pumps in use at the two stations
If this experiment is performed and s = (3, 4) results, then
X ((3, 4)) = 3 + 4 = 7, so we say that the observed value of X was x = 7.
Similarly, the observed value of Y would be y = 3 − 4 = −1, and the
observed value of U would be u = max(3, 4) = 4.
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
4 / 22
Random Variables
More examples:
4. Assume we toss a coin until we get a Head. Then the sample space
would be S = {H, TH, TTH, TTTH, . . . } If we define a rv X by X
X = the number we totally tossed
Then X ({H}) = 1, X ({TH}) = 2, X ({TTH}) = 3, . . . , and so on.
In this case, the random variable X can be any positive integer, which in
all is infinite.
5. Assume we are going to measure the length of 100 desks. Define the rv
Y by
Y = the length of a particular desk
Y can also assume infinitly possible values.
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
5 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
6 / 22
Probability Distributions for Discrete RV
An example:
Assume we toss a coin 3 times and record the outcomes. Let Xi be a
random variable defined by
(
1, if the i th outcome is Head;
Xi =
0, if the i th outcome is Tail;
Let X be the random variable such that X = X1 + X2 + X3 , then X
represents the total number of Heads we could get from the experiment.
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
6 / 22
Probability Distributions for Discrete RV
An example:
Assume we toss a coin 3 times and record the outcomes. Let Xi be a
random variable defined by
(
1, if the i th outcome is Head;
Xi =
0, if the i th outcome is Tail;
Let X be the random variable such that X = X1 + X2 + X3 , then X
represents the total number of Heads we could get from the experiment.
If the probability for getting a Head for each toss is 0.7, then the
probabilities for all the outcomes are tabulated as following:
s
x
p(x)
HHH
3
0.343
HHT
2
0.147
Liang Zhang (UofU)
HTH
2
0.147
HTT
1
0.063
THH
2
0.147
Applied Statistics I
THT
1
0.063
TTH
1
0.063
TTT
0
0.027
June 17, 2008
6 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
7 / 22
Probability Distributions for Discrete RV
Example continued:
HHH HHT
s
x
3
2
p(x) 0.343 0.147
Liang Zhang (UofU)
HTH
2
0.147
HTT
1
0.063
THH
2
0.147
Applied Statistics I
THT
1
0.063
TTH
1
0.063
TTT
0
0.027
June 17, 2008
7 / 22
Probability Distributions for Discrete RV
Example continued:
HHH HHT HTH HTT THH
s
x
3
2
2
1
2
p(x) 0.343 0.147 0.147 0.063 0.147
We can re-tabulate it only for the x values:
x
0
1
2
3
p(x) 0.027 0.189 0.441 0.343
Liang Zhang (UofU)
Applied Statistics I
THT
1
0.063
TTH
1
0.063
TTT
0
0.027
June 17, 2008
7 / 22
Probability Distributions for Discrete RV
Example continued:
HHH HHT HTH HTT THH
s
x
3
2
2
1
2
p(x) 0.343 0.147 0.147 0.063 0.147
We can re-tabulate it only for the x values:
x
0
1
2
3
p(x) 0.027 0.189 0.441 0.343
Now we can answer various questions.
Liang Zhang (UofU)
Applied Statistics I
THT
1
0.063
TTH
1
0.063
TTT
0
0.027
June 17, 2008
7 / 22
Probability Distributions for Discrete RV
Example continued:
HHH HHT HTH HTT THH THT
s
x
3
2
2
1
2
1
p(x) 0.343 0.147 0.147 0.063 0.147 0.063
We can re-tabulate it only for the x values:
x
0
1
2
3
p(x) 0.027 0.189 0.441 0.343
Now we can answer various questions.
The probability that there are at most 2 Heads is
TTH
1
0.063
TTT
0
0.027
P(X ≤ 2) = P(x = 0 or 1 or 2) = p(0) + p(1) + p(2) = 0.657
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
7 / 22
Probability Distributions for Discrete RV
Example continued:
HHH HHT HTH HTT THH THT
s
x
3
2
2
1
2
1
p(x) 0.343 0.147 0.147 0.063 0.147 0.063
We can re-tabulate it only for the x values:
x
0
1
2
3
p(x) 0.027 0.189 0.441 0.343
Now we can answer various questions.
The probability that there are at most 2 Heads is
TTH
1
0.063
TTT
0
0.027
P(X ≤ 2) = P(x = 0 or 1 or 2) = p(0) + p(1) + p(2) = 0.657
The probability that the number of Heads are is strictly between 1 and
3 is
P(1 < X < 3) = P(X = 2) = p(2) = 0.441
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
7 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
8 / 22
Probability Distributions for Discrete RV
Definition
The probability distribution or probability mass function (pmf) of a
discrete rv is defined for every number x by
p(x) = P(X = x) = P(all s ∈ S : X (s) = x).
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
8 / 22
Probability Distributions for Discrete RV
Definition
The probability distribution or probability mass function (pmf) of a
discrete rv is defined for every number x by
p(x) = P(X = x) = P(all s ∈ S : X (s) = x).
In words, for every possible value x of the random variable, the pmf
specifies the probability of observing that value
P when the experiment is
performed. (The conditions p(x) ≥ 0 and all possible x p(x) = 1 are
required for any pmf.)
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
8 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
9 / 22
Probability Distributions for Discrete RV
Example 3.8
Six lots of components are ready to be shipped by a certain supplier. The
number of defective components in each lot is as follows:
Lot
Number of defectives
1
0
2
2
3
0
4
1
5
2
6
0
One of these lots is to be randomly selected for shipment to a particular
customer. Let X be the number of defectives in the selected lot.
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
9 / 22
Probability Distributions for Discrete RV
Example 3.8
Six lots of components are ready to be shipped by a certain supplier. The
number of defective components in each lot is as follows:
Lot
Number of defectives
1
0
2
2
3
0
4
1
5
2
6
0
One of these lots is to be randomly selected for shipment to a particular
customer. Let X be the number of defectives in the selected lot.
The three possible X values are 0, 1 and 2. The pmf for X is
3
p(0) = P(X = 0) = P(lot 1 or 3 or 6 is selected) = = 0.500
6
1
p(1) = P(X = 1) = P(lot 4 is selected) = = 0.167
6
2
p(2) = P(X = 2) = P(lot 2 or 5 is selected) = = 0.333
6
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
9 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
10 / 22
Probability Distributions for Discrete RV
Example 3.10:
Consider a group of five potential blood donors — a, b, c, d, and e — of
whom only a and b have type O+ blood. Five blood smaples, one from
each individual, will be typed in random order until an O+ individual is
identified. Let the rv Y = the number of typings necessary to identify an
O+ individual. Then what is the pmf of Y ?
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
10 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
11 / 22
Probability Distributions for Discrete RV
Example:
Consider whether the next customer coming to a certain gas station buys
gasoline or diesel. Let
(
1, if the customer purchases gasoline
X =
0, if the customer purchases diesel
If 30% of all customers in one month purchase diesel, then the pmf for X is
p(0) = P(X = 0) = P(nextcustomerbuysdiesel) = 0.3
p(1) = P(X = 1) = P(nextcustomerbuysgasoline) = 0.7
p(x) = P(X = x) = 0 for x 6= 0 or 1
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
11 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
12 / 22
Probability Distributions for Discrete RV
Example:
Consider whether the next customer coming to a certain gas station buys
gasoline or diesel. Let
(
1, if the customer purchases gasoline
X =
0, if the customer purchases diesel
If 100α% of all customers in one month purchase diesel, then the pmf for
X is
p(0) = P(X = 0) = P(nextcustomerbuysdiesel) = α
p(1) = P(X = 1) = P(nextcustomerbuysgasoline) = 1 − α
p(x) = P(X = x) = 0 for x 6= 0 or 1
here α is between 0 and 1.
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
12 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
13 / 22
Probability Distributions for Discrete RV
Definition
Suppose p(x) depends on a quantity that can be assigned any one of a
number of possible values, with each different value determining a
different probability distribution. Such a quantity is called a parameter of
the distribution. The collection of all probability distributions for different
values of the parameter is called a family of probability distribution.
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
13 / 22
Probability Distributions for Discrete RV
Definition
Suppose p(x) depends on a quantity that can be assigned any one of a
number of possible values, with each different value determining a
different probability distribution. Such a quantity is called a parameter of
the distribution. The collection of all probability distributions for different
values of the parameter is called a family of probability distribution.
For the previous example, the quantity α is a parameter. Each different
value of α between 0 and 1 determines a different member of a family of
distributions; two such members are




0.3 if x = 0
0.25 if x = 0
p(x) = 0.7 if x = 1
p(x) = 0.75 if x = 1




0
otherwise
0
otherwise
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
13 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
14 / 22
Probability Distributions for Discrete RV
Example:
Assume we are drawing cards from a 100 well-shuffled cards with
replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e.
there are 100 · p ♠’s. Assume the successive drawings are independent and
define X = the number of drawings. Then
p(1) = P(X = 1) = P({♠}) = p
p(2) = P(X = 2) = P({♠♠}) = (1 − p) · p
p(3) = P(X = 3) = P({♠♠♠}) = (1 − p) · (1 − p) · p
...
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
14 / 22
Probability Distributions for Discrete RV
Example:
Assume we are drawing cards from a 100 well-shuffled cards with
replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e.
there are 100 · p ♠’s. Assume the successive drawings are independent and
define X = the number of drawings. Then
p(1) = P(X = 1) = P({♠}) = p
p(2) = P(X = 2) = P({♠♠}) = (1 − p) · p
p(3) = P(X = 3) = P({♠♠♠}) = (1 − p) · (1 − p) · p
...
A general formula would be
(
(1 − p)x−1 · p
p(x) =
0
Liang Zhang (UofU)
Applied Statistics I
x = 1, 2, 3, . . .
otherwise
June 17, 2008
14 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
15 / 22
Probability Distributions for Discrete RV
Example:
Assume we are drawing cards from a 100 well-shuffled cards with
replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e.
there are 100 · p ♠’s. Assume the successive drawings are independent and
define X = the number of drawings.
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
15 / 22
Probability Distributions for Discrete RV
Example:
Assume we are drawing cards from a 100 well-shuffled cards with
replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e.
there are 100 · p ♠’s. Assume the successive drawings are independent and
define X = the number of drawings.
If we know that there are 20 ♠’s, i.e. p = 0.2, then what is the probability
for us to draw at most 3 times? More than 2 times?
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
15 / 22
Probability Distributions for Discrete RV
Example:
Assume we are drawing cards from a 100 well-shuffled cards with
replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e.
there are 100 · p ♠’s. Assume the successive drawings are independent and
define X = the number of drawings.
If we know that there are 20 ♠’s, i.e. p = 0.2, then what is the probability
for us to draw at most 3 times? More than 2 times?
P(X ≤ 3) = p(1) + p(2) + p(3) = 0.2 + 0.2 · 0.8 + 0.2 · (0.8)2 = 0.488
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
15 / 22
Probability Distributions for Discrete RV
Example:
Assume we are drawing cards from a 100 well-shuffled cards with
replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e.
there are 100 · p ♠’s. Assume the successive drawings are independent and
define X = the number of drawings.
If we know that there are 20 ♠’s, i.e. p = 0.2, then what is the probability
for us to draw at most 3 times? More than 2 times?
P(X ≤ 3) = p(1) + p(2) + p(3) = 0.2 + 0.2 · 0.8 + 0.2 · (0.8)2 = 0.488
P(X > 2) = p(3)+p(4)+p(5)+· · · = 1−p(1)−p(2) = 1−0.2−0.2·0.8 = 0.64
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
15 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
16 / 22
Probability Distributions for Discrete RV
Definition
The cumulative distribution function (cdf) F (x) of a discrete rv X with
pmf p(x) is defined for every number x by
X
F (x) = P(X ≤ x) =
p(y )
y :y ≤x
For any number x, F(x) is the probability that the observed value of X will
be at most x.
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
16 / 22
Probability Distributions for Discrete RV
Definition
The cumulative distribution function (cdf) F (x) of a discrete rv X with
pmf p(x) is defined for every number x by
X
F (x) = P(X ≤ x) =
p(y )
y :y ≤x
For any number x, F(x) is the probability that the observed value of X will
be at most x.
F (x) = P(X ≤ x) = P(X is less than or equal to x)
p(x) = P(X = x) = P(X is exactly equal to x)
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
16 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
17 / 22
Probability Distributions for Discrete RV
Example 3.10 (continued):


0





0.4
F (y ) = 0.7


0.9




1
Liang Zhang (UofU)
if
if
if
if
if
y <1
1≤y <2
2≤y <3
3≤y <4
y ≥2
Applied Statistics I
June 17, 2008
17 / 22
Probability Distributions for Discrete RV
Example 3.10 (continued):


0





0.4
F (y ) = 0.7


0.9




1
Liang Zhang (UofU)
if
if
if
if
if
y <1
1≤y <2
2≤y <3
3≤y <4
y ≥2
Applied Statistics I
June 17, 2008
17 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
18 / 22
Probability Distributions for Discrete RV
Example:
Assume we are drawing cards from a 100 well-shuffled cards with
replacement. We keep drawing until we get a ♠. Let α = P({♠}), i.e.
there are 100 · α ♠’s. Assume the successive drawings are independent and
define X = the number of drawings. The pmf would be
(
(1 − α)x−1 · α x = 1, 2, 3, . . .
p(x) =
0
otherwise
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
18 / 22
Probability Distributions for Discrete RV
Example:
Assume we are drawing cards from a 100 well-shuffled cards with
replacement. We keep drawing until we get a ♠. Let α = P({♠}), i.e.
there are 100 · α ♠’s. Assume the successive drawings are independent and
define X = the number of drawings. The pmf would be
(
(1 − α)x−1 · α x = 1, 2, 3, . . .
p(x) =
0
otherwise
Then for any positive interger x, we have
F (x) =
X
y ≤x
p(y ) =
x
x−1
X
X
(1 − α)(y −1) · α = α
(1 − α)y
y =1
(
1 − (1 − α)x
=
0
Liang Zhang (UofU)
y =0
x ≥1
x <1
Applied Statistics I
June 17, 2008
18 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
19 / 22
Probability Distributions for Discrete RV
Example:
Assume we are drawing cards from a 100 well-shuffled cards with
replacement. We keep drawing until we get a ♠. Let α = P({♠}), i.e.
there are 100 · α ♠’s. Assume the successive drawings are independent and
define X = the number of drawings. The pmf would be
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
19 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
20 / 22
Probability Distributions for Discrete RV
pmf =⇒ cdf:
F (x) = P(X ≤ x) =
X
p(y )
y :y ≤x
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
20 / 22
Probability Distributions for Discrete RV
pmf =⇒ cdf:
F (x) = P(X ≤ x) =
X
p(y )
y :y ≤x
It is also possible cdf =⇒ pmf:
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
20 / 22
Probability Distributions for Discrete RV
pmf =⇒ cdf:
F (x) = P(X ≤ x) =
X
p(y )
y :y ≤x
It is also possible cdf =⇒ pmf:
p(x) = F (x) − F (x−)
where “x−” represents the largest possible X value that is strictly less
than x.
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
20 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
21 / 22
Probability Distributions for Discrete RV
Proposition
For any two numbers a and b with a ≤ b,
P(a ≤ X ≤ b) = F (b) − F (a−)
where “a−” represents the largest possible X value that is strictly less
than a. In particular, if the only possible values are integers and if a and b
are integers, then
P(a ≤ X ≤ b) = P(X = a or a + 1 or . . . or b)
= F (b) − F (a − 1)
Taking a = b yields P(X = a) = F (a) − F (a − 1) in this case.
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
21 / 22
Probability Distributions for Discrete RV
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
22 / 22
Probability Distributions for Discrete RV
Example (Problem 23):
A consumer organization that evaluates new automobiles customarily reports the
number of major defects in each car examined. Let X denote the number of major
defects in a randomly selected car of a certain type. The cdf of X is as follows:


x <0
0



0.06 0 ≤ x < 1



0.19 1 ≤ x < 2



0.39 2 ≤ x < 3
F (x) =

0.67 3 ≤ x < 4





0.92 4 ≤ x < 5




0.97 5 ≤ x < 6



1
x ≤6
Calculate the following probabilities directly from the cdf: (a)p(2), (b)P(X > 3)
and (c)P(2 ≤ X < 5).
Liang Zhang (UofU)
Applied Statistics I
June 17, 2008
22 / 22
Download