Biostatistics

advertisement
Biostatistics
Unit 4
Probability
1
Probability
Probability theory developed from the study
of games of chance like dice and cards. A
process like flipping a coin, rolling a die or
drawing a card from a deck is called a
probability experiment. An outcome is a
specific result of a single trial of a probability
experiment.
2
Probability distributions
• Probability theory is the foundation for
statistical inference. A probability
distribution is a device for indicating the
values that a random variable may have.
• There are two categories of random
variables. These are:
–discrete random variables, and
–continuous random variables.
3
Discrete random variable
The probability distribution of a discrete
random variable specifies all possible values
of a discrete random variable along with their
respective probabilities
(continued)
4
Discrete random variable
Examples can be
• Frequency distribution
• Probability distribution (relative frequency
distribution)
• Cumulative frequency
Examples of discrete probability distributions
are the binomial distribution and the
Poisson distribution.
5
Binomial distribution
A binomial experiment is a probability
experiment with the following properties.
1. Each trial can have only two outcomes
which can be considered success or failure.
2. There must be a fixed number of trials.
3. The outcomes of each trial must be
independent of each other.
4. The probability of success must remain the
same in each trial.
6
Binomial distribution
The outcomes of a binomial experiment
are called a binomial distribution.
7
Poisson distribution
The Poisson distribution is based on the
Poisson process.
1. The occurrences of the events are
independent in an interval.
2. An infinite number of occurrences of the event
are possible in the interval.
3. The probability of a single event in the interval
is proportional to the length of the interval.
4. In an infinitely small portion of the interval, the
probability of more than one occurrence of the
event is negligible.
8
9
Continuous variable
A continuous variable can assume any
value within a specified interval of values
assumed by the variable. In a general case,
with a large number of class intervals, the
frequency polygon begins to resemble a
smooth curve.
10
Continuous variable
• A continuous probability distribution is a
probability density function.
• The area under the smooth curve is equal to
1 and the frequency of occurrence of values
between any two points equals the total
area under the curve between the two points
and the x-axis.
11
The normal distribution
• The normal distribution is the most
important distribution in biostatistics. It is
frequently called the Gaussian distribution.
• The two parameters of the normal
distribution are the mean (m) and the
standard deviation (s).
• The graph has a familiar bell-shaped curve.
12
The normal distribution
13
Properties of a normal distribution
1. It is symmetrical about m .
2. The mean, median and mode are all equal.
3. The total area under the curve above the xaxis is 1 square unit. Therefore 50% is to the
right of m and 50% is to the left of m.
4. Perpendiculars of:
± s contain about 68%;
±2 s contain about 95%;
±3 s contain about 99.7%
of the area under the curve.
14
The normal distribution
15
Table of Normal Curve Areas
16
The Standard Normal Distribution
• A normal distribution is determined by m
and s. This creates a family of distributions
depending on whatever the values of m
and s are.
• The standard normal distribution has
m=0 and s =1.
17
Standard z score
• The standard z score is obtained by
creating a variable z whose value is
• Given the values of m and s we can convert
a value of x to a value of z and find its
probability using the table of normal curve
areas.
18
Finding normal curve areas
1. The Table of Normal Curve Areas gives areas
between
and the value of
.
2. Find the z value in tenths in the column at left
margin and locate its row. Find the hundredths
place in the appropriate column.
19
Finding normal curve areas
3. Read the value of the area (P) from the body of
the table where the row and column intersect.
Note that P is the probability that a given value of
z is as large as it is in its location.
Values of P are in the form of a decimal point and
four places. This constitutes a decimal percent.
20
Finding probabilities
(a) What is the probability that z < -1.96?
(1) Sketch a normal curve
(2) Draw a line for z = -1.96
(3) Find the area in the table
(4) The answer is the area to the left of
the line P(z < -1.96) = .0250
21
22
Finding probabilities
23
Finding probabilities
(b) What is the probability that -1.96 < z < 1.96?
(1) Sketch a normal curve
(2) Draw lines for lower z = -1.96, and
upper z = 1.96
(3) Find the area in the table corresponding to
each value
(4) The answer is the area between the values.
Subtract lower from upper:
P(-1.96 < z < 1.96) = .9750 - .0250 = .9500
24
25
Finding probabilities
26
Finding probabilities
(c) What is the probability that z > 1.96?
(1) Sketch a normal curve
(2) Draw a line for z = 1.96
(3) Find the area in the table
(4) The answer is the area to the right of the
line. It is found by subtracting the table
value from 1.0000:
P(z > 1.96) =1.0000 - .9750 = .0250
27
Finding probabilities
28
Applications of the normal
distribution
• The normal distribution is used as a model
to study many different variables.
• We can use the normal distribution to
answer probability questions about random
variables.
• Some examples of variables that are
normally distributed are human height and
intelligence.
29
Solving normal distribution
application problems
(1)
(2)
(3)
(4)
Write the given information
Sketch a normal curve
Convert x to a z score
Find the appropriate value(s) in
the table
(5) Complete the answer
30
Example: fingerprint count
Total fingerprint ridge count in humans is
approximately normally distributed with mean
of 140 and standard deviation of 50. Find the
probability that an individual picked at random
will have a ridge count less than 100. We
follow the steps to find the solution.
31
Example: fingerprint count
(1) Write the given information
m = 140
s = 50
x = 100
32
Example: fingerprint count
(2) Sketch a normal curve.
33
Example: fingerprint count
(3) Convert x to a z score.
34
35
Example: fingerprint count
(4) Find the appropriate value(s) in the table
A value of z = -0.8 gives an area of .2119
which corresponds to the probability
P (z < -0.8)
36
Example: fingerprint count
(5) Complete the answer.
The probability that x is less than 100 is .2119.
37
Distortions of Normal Curve
• Data may not be normally distributed.
• There may be data that are outliers that
distort the mean. The measure of this is
skew.
• Data may be bunched about the mean in a
non-normal fashion. The measure of this is
kurtosis.
38
Normal Distribution Graph-Box Plot
39
Skewed Data
• Data may have a positive skew (long tail to
the right, or a negative skew (long tail to
the left).
40
Positive Skew
41
Negative Skew
42
Kurtosis
• Kurtosis indicates data that are bunched
together or spread out.
• Data that are bunched together give a tall,
think distribution which is not normal. This
is called leptokurtic.
• Data that are spread out give a low, flat
distribution which is not normal. This is
called platykurtic.
43
Kurtosis
44
fin
45
Download