Binomial and Chi

advertisement
Presenters: Nouruddin Boojhawoonah & Poonam Gopaul
Notes reffered from statistics tutorial: Probability distribution.
J.CRAWSHAW and J.CHAMBERS
To understand probability distributions, it is important to understand
variables. random variables, and some notation.
•A variable is a symbol (A, B, x, y, etc.) that can take on any of a specified
set of values.
•When the value of a variable is the outcome of a statistical experiment, that
variable is a random variable.
Generally, statisticians use a capital letter to represent a random variable and
a lower-case letter, to represent one of its values. For example,
•X represents the random variable X.
•P(X) represents the probability of X.
•P(X = x) refers to the probability that the random variable X is equal to a
particular value, denoted by x. As an example, P(X = 1) refers to the
probability that the random variable X is equal to 1.
Probability Distributions
An example will make clear the relationship between random variables and probability
distributions. Suppose you flip a coin two times. This simple statistical experiment can have four
possible outcomes: HH, HT, TH, and TT. Now, let the variable X represent the number of Heads
that result from this experiment. The variable X can take on the values 0, 1, or 2. In this
example, X is a random variable; because its value is determined by the outcome of a statistical
experiment.
A probability distribution is a table or an equation that links each outcome of a statistical
experiment with its probability of occurence. Consider the coin flip experiment described above.
The table below, which associates each outcome with its probability, is an example of a
probability distribution.
The below table represents the probability distribution of the random variable X .
Number of heads
Probability
0
1
2
0.25
0.50
0.25
Cumulative Probability Distributions
A cumulative probability refers to the probability that the value of a random variable
falls within a specified range.
Let us return to the coin flip experiment. If we flip a coin two times, we might ask:
What is the probability that the coin flips would result in one or fewer heads? The
answer would be a cumulative probability. It would be the probability that the coin
flip experiment results in zero heads plus the probability that the experiment results
in one head.
P(X < 1) = P(X = 0) + P(X = 1) = 0.25 + 0.50 = 0.75
Like a probability distribution, a cumulative probability distribution can be
represented by a table or an equation. In the table below, the cumulative probability
refers to the probability than the random variable X is less than or equal to x.
Number of heads: x
Probability: P(X = x)
Cumulative
Probability: P(X < x)
0
0.25
0.25
1
0.50
0.75
2
0.25
1.00
Uniform Probability Distribution
The simplest probability distribution occurs when all of the values of a random
variable occur with equal probability. This probability distribution is called the
uniform distribution.
Uniform Distribution. Suppose the random variable X can assume k different values.
Suppose also that the P(X = xk) is constant. Then,
P(X = xk) = 1/k
Example 1
Suppose a die is tossed. What is the probability that the die will land on 6 ?
Solution: When a die is tossed, there are 6 possible outcomes represented by: S = { 1,
2, 3, 4, 5, 6 }. Each possible outcome is a random variable (X), and each outcome is
equally likely to occur. Thus, we have a uniform distribution. Therefore, the P(X = 6) =
1/6.
Example 2
Suppose we repeat the dice tossing experiment described in Example 1. This time, we
ask what is the probability that the die will land on a number that is smaller than 5 ?
Solution: When a die is tossed, there are 6 possible outcomes represented by: S = { 1,
2, 3, 4, 5, 6 }. Each possible outcome is equally likely to occur. Thus, we have a
uniform distribution.
This problem involves a cumulative probability. The probability that the die will land
on a number smaller than 5 is equal to:
P( X < 5 ) = P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) = 1/6 + 1/6 + 1/6 + 1/6 = 2/3
If a variable can take on any value between two specified values, it is called a continuous
variable; otherwise, it is called a discrete variable.
Some examples will clarify the difference between discrete and continuous variables.
•Suppose the fire department mandates that all fire fighters must weigh between 150 and 250
pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire
fighter's weight could take on any value between 150 and 250 pounds.
•Suppose we flip a coin and count the number of heads. The number of heads could be any
integer value between 0 and plus infinity. However, it could not be any number between 0 and
plus infinity. We could not, for example, get 2.5 heads. Therefore, the number of heads must be
a discrete variable.
Just like variables, probability distributions can be classified as discrete or continuous.
Discrete Probability Distributions
If a random variable is a discrete variable, its probability distribution is called a discrete
probability distribution.
Binomial Distribution
To understand binomial distributions and binomial probability, it helps to understand binomial experiments and
some associated notation; so we cover those topics first.
Binomial Experiment
A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following
properties:
•The experiment consists of n repeated trials.
•Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a
failure.
•The probability of success, denoted by P, is the same on every trial.
•The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials.
Consider the following statistical experiment. You flip a coin 2 times and count the number of times the coin
lands on heads. This is a binomial experiment because:
•The experiment consists of repeated trials. We flip a coin 2 times.
•Each trial can result in just two possible outcomes - heads or tails.
•The probability of success is constant - 0.5 on every trial.
•The trials are independent; that is, getting heads on one trial does not affect whether we get heads on other
trials.
Notation
The following notation is helpful, when we talk about binomial probability.
•x: The number of successes that result from the binomial experiment.
•n: The number of trials in the binomial experiment.
•P: The probability of success on an individual trial.
•Q: The probability of failure on an individual trial. (This is equal to 1 - P.)
•b(x; n, P): Binomial probability - the probability that an n-trial binomial experiment results in exactly x
successes, when the probability of success on an individual trial is P.
•nCr: The number of combinations of n things, taken r at a time.
Binomial Distribution
A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The
probability distribution of a binomial random variable is called a binomial distribution (also known as a
Bernoulli distribution).
Suppose we flip a coin two times and count the number of heads (successes). The binomial random variable is
the number of heads, which can take on values of 0, 1, or 2. The binomial distribution is presented below.
The binomial distribution has the following properties:
•The mean of the distribution (μx) is equal to n * P .
2
•The variance (σ x) is n * P * ( 1 - P ).
•The standard deviation (σx) is sqrt[ n * P * ( 1 - P ) ].
Binomial Probability
The binomial probability refers to the probability that a binomial experiment results in exactly x successes.
For example, in the above table, we see that the binomial probability of getting exactly one head in two coin
flips is 0.50.
Given x, n, and P, we can compute the binomial probability based on the following formula:
Binomial Formula. Suppose a binomial experiment consists of n trials and results in x successes. If the
probability of success on an individual trial is P, then the binomial probability is:
P(X=r)= (nCr).qn-r.pr
Number of heads
Probability
0
0.25
1
0.50
2
0.25
Lets work out an example
30% of pupils in a school travel by bus. From a sample of ten pupils chosen at random,
find the probability that
(a) only three travel by bus,
(b) less than half travel by bus
Hints: (we need to identify n=? & p=?)
Other examples
(1) The random variable X~Bin(6, .042). Find
(a) P(X= 6)
(b) P(X= 4)
(c) P(X≤ 2)
(2) A fair coin is tossed six times. Find the probability
of throwing at least four heads.
(3) X~Bin(n, 0.3). Find the least possible value of n such
that P(X≥1)= 0.8.
(4) Assuming that a couple are equally likely to produce
a boy or a girl, find the probability that in a family of
five children there are more boys than girls.
(5) X~Bin(4, p) and P(X=4)= 0.0256. Find P(X=2).
(6) Charlie finds that when she takes a cutting from a particular plant,
the probability that it roots successfully is 1/3.
(a) She takes nine cuttings. Find the probability that
(i) more than five cuttings root successfully,
(ii) at least three cuttings root successfully,
(b) Find the number of cuttings that she should take in order to be
99% certain that at least one cutting root successfully.
Example to illustrate Diagrammatic representation of the Binomial Distribution
In a survey on washing powder, it is found that the probability that a shopper chooses
Soapsuds is 0.35. Using a sample of seven shoppers, illustrate the information in a
diagram.
Solution:
X~Bin(7, 0.35)
P(X=r) = (7Cr).qn-r.pr
P(X=0)= 0.0490
P(X=1)= 0.1847
P(X=2)= 0.2984
P(X=3)= 0.2678
P(X=4)= 0.1442
P(X=5)= 0.0466
P(X=6)= ???
P(X=7)= ???
p
X~Bin(7, 0.35)
0
X
Expectation and Variance of the Binomial Distribution
If X~Bin(n, p)
E(X)=np
VAR(X)=npq, where q= 1-p
Computation of Expectation and Variance for a probability distribution table
E(X)= ExP(X=r)
E(X^2)= Ex^2P(X=r)
VAR(X)= E(X^2)-E^2(X)
The random variable X~Bin(4, 0.8). Construct the probability distribution for X and find
the expectation and variance. Verify that E(X)= np and Var(X)= npq
X~Bin(4,0.8)
so n=4 and p=0.8
P(X=0)= 0.2^4
P(X=1) = 4*0.2^3*0.8
P(X=2)= 4C2*0.2^2*0.8^2
P(X=3)= 4C3*0.2*0.8^3
P(X=4)=0.8^4
=0.0016
=0.0256
=0.1536
=0.4096
=0.4096
Probability distribution table for X:
X
0
1
2
3
4
P(X=r)
0.0016
0.0256
0.1536
0.4096
0.4096
E(X)= ExP(X=r)
= 0*0.0016 + 1*0.0256 + 2*0.1536 + 3*0.4096 + 4*0.4096
= 3.2
E(X^2)= Ex^2P(X=r)
= (0^2*0.0016) + (1^2*0.0256) + (2^2*0.1536) + (3^2*0.4096) + (4^2*0.4096)
= 10.88
VAR(X) = E(X^2)-E^2(X)
= 10.88- (3.2^2)
= 0.64
Now,
np= 8*0.4 = 3.2
npq= 8*0.4*0.6 = 0.64
Therefore, E(X)= np
VAR(X)= npq
The X2 test is a significance test that enables us to decide
whether it is valid to use a particular distribution, such as
binomial,poisson or normal, as a model so that we can
interpret observed data. We can also use the X2 test to
decide Whether two variables are independent.
Example:
 A farmer Kept a record of the number of heifer calves born to each of his
cows during the first five years of breeding of each cow. The results are
summarized below
Number 0
of
Heifers
1
2
3
4
5
Number
of cows
19
41
52
26
8
4
 Test, at 5% Level of significance, whether or not the binomial distribution with
parameters n=5,p=0.5 is an adequate model for these distribution
procedures
1.
Consider a set of data with observed frequency, O
Number
of
Heifers
0
Observed 4
frequenc
y (O)
1
2
3
4
5
19
41
52
26
8
Make the null hypothesis(ho ) concerning the distribution followed by the
data.
Let X be the r.v.’the number of heifer calves born to a cow in the first five years of
breeding’.
Ho:X~Bin(5,0.5)
3.
Calculate the expected frequencies,E according to this hypothesis.
The expected frequencies are given by
150p(X=x) where P(X=x)=5cx(o.5)5-x(o.5)x
=5cx(0.5)5
2.
Number of 0
heifers
1
2
3
4
5
Observed
4
frequency(o
)
19
41
52
26
8
Total15
0
Expected
4.7
frequency(E
)
23.4
46.9
46.9
23.4
4.7
Total15
0
)5
150x 5c0
150x
5
(0.5) 5
c1
(0.5)5
150 x 5c2 (0.5)5
Since the expected frequencies for the first and last cells are less than 5, We
must combine them with the next cell.
Number
of heifers
0 or 1
2
3
4 or 5
Observed
frequency(
O)
23
41
52
34
Total 150
Expected
Frequency
(E)
28.1
46.9
46.9
28.1
Total 150
4.7+23.4
4.7+23.4
4.
Work out the number of degrees of freedom v Where v= Number of cellsNumber of restrictions
The Number of restriction depends on the null hypothesis
The number of cells=4
There is one restriction, that the total expected frequency is150.
Therefore, v =4-1=3
Decide on the level of the test and the rejection criterion, looking up the
critical values in the x2 tables
The x2(3) distribution is considered.
5.
From the table
Degree
of
freedo
m
99%
1
0.000
16
2
0.020
3
0.12
4
0.30
95
%
90%
70%
50%
30%
10%
5%
7.82
1.14
1%
 We test at the 5% level and reject H0 if x2> x25% (3),i.e.
if x2>7.82
O
E
(O-E)2/E
23
28.1
0.925
41
46.9
0.742
52
46.9
.554
34
28.1
1.2387
Total 150
Total 150
3.461
2
2
X =Sum(O-E) /E
= 3.461
 Since X2 <7.82, we do not reject Ho and we conclude
that the binomial distribution with n=5 and p= 0.5 is
an adequate model for the data
Questions?
Thank you all for your kind attention, if ever there
still any doubt left somewhere, do feel free to ask me
after lecture session.
Download