Where: x

advertisement
Statistics for clinicians
• Biostatistics course by Kevin E. Kip, Ph.D., FAHA
Professor and Executive Director, Research Center
University of South Florida, College of Nursing
Professor, College of Public Health
Department of Epidemiology and Biostatistics
Associate Member, Byrd Alzheimer’s Institute
Morsani College of Medicine
Tampa, FL, USA
1
SECTION 2.1
Module Overview
and Introduction
Probability theory and discrete and
continuous sampling distributions
SECTION 2.4
Bayes Theorem
Bayes Theorem
• Procedure for updating a probability based on new information.
• Rule can be used to compute a conditional probability based on
specific, available information (i.e. links degree of belief in a
proposition before and after accounting for evidence).
• Can represent a subjective degree of belief that changes over time
to account for new evidence.
• Often used in meta analyses and synthesis of evidence
P(B|A) P(A)
P(A|B) = --------------P(B)
Bayesian Methods (example)
Question: What is the probability that Daphne’s next male child
will be affected by the X-linked disorder?
Aaron
Bart
Carlos
Desmond
Earl
Barbara
Cathy
Daphne
Joseph
Naïve: 1/2
1/4
1/8
1/16
5
Question: What is the probability that Daphne’s next male child
will be affected by the X-linked disorder?
Aaron
Bart
Carlos
Desmond
Earl
Barbara
Cathy
Cathy
carrier
non-carrier
1/2
1/2
Cathy Prior:
Posterior: 1/3
2/3
Daphne
Daphne
Daphne
carrier
non-carrier
1/6
5/6
Prior:
Posterior: 1/21
20/21
Joseph
Thus, the probability that Daphne’s
next male child will be affected is:
1/2 x 1/21 = 1/42.
6
SECTION 2.5
Binomial
Distribution Model
Probability Model:
Mathematical equation or formula used to generate probabilities
based on certain assumptions about the process. Very important
for statistical inference.
Binomial Model:
• Two possible outcomes – often labeled as “success” or
“failure”, or as “disease” or “no disease”.
• Allows computation of observing a specified number of
responses (e.g. successes) when the process is repeated a
specific number of times (e.g. among a set of patients).
• For the binomial model with a set number of trials:
p = probability of success, and
q=1–p
Binomial distribution model
n = number of times process is repeated (e.g. # of patients)
x = number of successes (outcomes)
p = probability of outcome for any individual (i.e. independent)
! = factorial
n!
P(x outcomes) = ----------- px(1-p)n-x
x!(n-x)!
Example:
Assume that a medication is effective 80% (0.80) of the time (i.e. p=0.8)
Assume that the medication will be given to 10 patients
(i.e. n=10)
What is the probability the medication will be effective in exactly 7 patients? (x=7)
(i.e. if we had to guess, we would think it was most likely that
the medication would be effective in 8 patients)
10!
P(7 successes) = ----------- 0.807(1-0.80)10-7
7!(10-7)!
Binomial distribution model
n!
P(x outcomes) = ----------- px(1-p)n-x
x!(n-x)!
10!
P(7 successes) = ----------- 0.807(1-0.80)10-7
7!(10-7)!
10!
--------7!(10-7)!
=
10(9)(8)(7)(6)(5)(4)(3)(2)1
-------------------------------------[7(6)(5)(4)(3)(2)(1)][(3)(2)(1)]
=
10(9)(8)
---------3(2)
=
120
P(7 successes) = (120)(0.807))(1-0.80)10-7)
P(7 successes) = (120)(0.2097)(0.008) = 0.2013
Binomial distribution model
Many times we do not want to know a probability of an outcome in
an exact number of persons, but rather a certain number or more
persons.
For example, using our previous scenario, what is the probability
that the medication will be effective at least 7 of the 10 patients?
(i.e. P(> 7 successes))
To do this, we need to compute individual probabilities for all the
combinations.
P(7 successes)
P(8 successes)
P(9 successes)
P(10 successes)
=
=
=
=
0.2013
0.3020
0.2684
0.1074
P(>7 successes)
=
=
0.2013 + 0.3020 + 0.2684 + 0.1074
0.8791
Binomial distribution model (practice)
n!
P(x outcomes) = ----------- px(1-p)n-x
x!(n-x)!
Assume that a drug is effective 90% of the time
Assume that the medication will be given to 12 patients
What is the probability the drug will be effective in exactly 10
patients?
Complete the formula below:
P(10 successes) = ------------------
Binomial distribution model (practice)
n!
P(x outcomes) = ----------- px(1-p)n-x
x!(n-x)!
Assume that a drug is effective 90% of the time
Assume that the medication will be given to 12 patients
What is the probability the drug will be effective in exactly 10
patients?
Complete the formula below:
12!
P(10 successes) = ----------- 0.9010(1-0.90)12-10
10!(12-10)!
Binomial distribution model (practice)
Complete the formula below:
12!
P(10 successes) = ----------- 0.9010(1-0.90)12-10
10!(12-10)!
12!
--------=
10!(12-10)!
=
12(11)(10)(9)(8)(7)(6)(5)(4)(3)(2)1
-------------------------------------[10(9)(8)7(6)(5)(4)(3)(2)(1)][(2)(1)]
12(11)
---------(2)
=
66
P(10 successes) = (66)(0.9010))(1-0.90)12-10)
P(10 successes) = (66)(0.3487)(0.01) = 0.2301
Binomial distribution model
(calculating standard deviation)
Our Example:
Assume a medication is effective 80% (0.80) of the time (i.e. p=0.8)
Assume the medication will be given to 10 patients
(i.e. n=10)
What is the probability the medication will be effective in exactly 7
patients? (i.e. x=7)
The expected number of outcomes of a binomial population is: µ = np
So, in our example, µ = (10 x 0.8) = 8
The standard deviation (σ) = sqrt[(n(p))(1-p)]
So, in our example, σ = sqrt[(10 x 0.8) x (1-0.8)]
σ = sqrt[(8 x 0.2)]
σ = sqrt[1.6]
σ = 1.265
Binomial distribution model (practice)
(calculating standard deviation)
Example:
Assume that a medication is effective 90% of the time
Assume that the medication will be given to 20 patients
What is the expected number of outcomes: µ = np
So, in this example, µ = ______________
Calculate the standard deviation (σ) = sqrt[(n(p))(1-p)]
So, in this example, σ = _______________________
Binomial distribution model (practice)
(calculating standard deviation)
Example:
Assume that a medication is effective 90% of the time
Assume that the medication will be given to 20 patients
What is the expected number of outcomes: µ = np
So, in this example, µ = (20 x 0.9) = 18
Calculate the standard deviation (σ) = sqrt[(n(p))(1-p)]
So, in this example, σ = sqrt[(20 x 0.9) x (1-0.9)]
σ = sqrt[(18 x 0.1)]
σ = sqrt[1.8]
σ = 1.34
SECTION 2.6
Poisson
Distribution Model
Poisson Distribution Model
• A discrete probability distribution that expresses the probability of
a given number of events occurring in a fixed interval of time (i.e.
X = 0,1,2,3, ….)
• Usually associated with rare events
• Approximates the binomial distribution when N is large (i.e. >100)
and p is small (i.e. <0.01)
Requirements for the Poisson Distribution
a) Length of time period is fixed in advance;
b) Events occur at a constant average rate;
c) Events can be counted in whole numbers
d) Number of events occurring in disjoint intervals are statistically
independent.
Poisson Distribution Model
Illustration:
Assume deaths from typhoid fever in a given population are Poisson
distributed with a mean of 2.3 deaths per year. What is the
probability distribution of deaths in this population?
Pr(X = k)
k
Poisson Distribution Model
Poisson Formula
Suppose we conduct a Poisson study in which the
average number of health events within a given time
period is μ. Then, the Poisson probability is:
P(x; μ) = (e-μ) (μx) / x!
where x is the actual number of events that result from
the study,
and
e is approximately equal to 2.71828.
Poisson Formula
Example:
The average number of colds for toddlers in day care is
2 per year. Knowing this, what is the estimated
probability that a new toddler to day care will have
exactly 3 colds during the following year?
P(x; μ) = (e-μ) (μx) / x!
μ = 2; since 2 colds per year, on average.
x = 3; since we want to find the likelihood that 3 colds will occur in the next year.
e = 2.71828; since e is a constant equal to approximately 2.71828.
Plug these values into the Poisson formula:
P(3; 2) = (2.71828-2) (23) / 3!
P(3; 2) = (0.13534) (8) / 6
P(3; 2) = 0.180
Thus, the probability of a toddler having 3 colds in the next year is 0.180.
Poisson Formula (Practice)
Assume that the average number of cases of tuberculosis
within a nursing home is 4 per year. Knowing this, what
is the estimated probability that exactly 4 new cases of
TB will occur during the following 6-months?
P(x; μ) = (e-μ) (μx) / x!
μ = ________
x = ________
e = ________
Plug these values into the Poisson formula:
Poisson Formula (Practice)
Assume that the average number of cases of tuberculosis
within a nursing home is 4 per year. Knowing this, what
is the estimated probability that exactly 4 new cases of
TB will occur during the following 6-months?
P(x; μ) = (e-μ) (μx) / x!
μ = 2; since 4 cases of TB per year = 2 cases of TB per 6 months, on average.
x = 4; since we want to find the likelihood that 4 cases will occur in next 6 months.
e = 2.71828; since e is a constant equal to approximately 2.71828.
Plug these values into the Poisson formula:
P(4; 2) = (2.71828-2) (24) / 4!
P(4; 2) = (0.13534) (16) / 24
P(4; 2) = 0.0902
Thus, the probability of exactly 4 new cases of TB in the next 6-months is 0.09.
http://statpages.org/ctab2x2.html
SECTION 2.7
Properties of the
Normal Distribution
Normal Distribution
Appropriate for a continuous outcome if:
 Mean = median = mode
 Symmetric around the mean
 P(x > µ) = p(x < µ) where x is continuous variable and µ is mean
 ~68% of all values fall between the mean and one 1 SD
(i.e. P(µ - σ < x < P(µ + σ) = 0.68)
 ~95% of all values fall between the mean and 2 SD
(i.e. P(µ - 2σ < x < P(µ + 2σ) = 0.95)
 ~99% of all values fall between the mean and 3 SD
(i.e. P(µ - 3σ < x < P(µ + 3σ) = 0.99)
Normal Distribution
Normal Distribution
~68% of values between mean and one 1 SD
~95% - ~68% = 27% / 2 = ~13.6% of all values
between -1 to -2 SD and +1 to +2SD
~99% - ~95% = 4% / 2 = ~2.0% of all values
between -2 to -3 SD and +2 to +3SD
Normal Distribution (Practice)
Assume that BMI is normally distributed
with µ = 29.4 and σ = 4.6
Body Mass Index (BMI)
1. Put in the appropriate values on the
normal distribution curve for BMI values
plus or minus 1, 2, and 3 SD from the mean
2. What is the median BMI value? _______
3. Approximately what percentage of the
population has a BMI > 34? _________
4. Approximately what percentage of the
population has a BMI between 15.6
and 20.2? __________
5. What is the approximate minimum and
maximum BMI in the population?
Min: _________ Max: __________
29.4
?
?
?
µ
?
?
?
Normal Distribution (Practice)
Assume that BMI is normally distributed
with µ = 29.4 and σ = 4.6
Body Mass Index (BMI)
1. Put in the appropriate values on the
normal distribution curve for BMI values
plus or minus 1, 2, and 3 SD from the mean
2. What is the median BMI value? _29.4__
3. Approximately what percentage of the
population has a BMI > 34? __16.0%___
(i.e. 13.6% + 2.2% + 0.2%)
4. Approximately what percentage of the
population has a BMI between 15.6
and 20.2? ____2.2%______
5. What is the approximate minimum and
maximum BMI in the population?
Min: ____~15_____ Max: ____~44____
0.2%
15.6
2.2%
13.6%
20.2
µ-3σ µ-2σ
34%
34%
24.8
29.4
µ-1σ
µ
13.6%
34.0
2.2%
38.6
0.2%
43.2
µ+1σ µ+2σ µ+3σ
Normal Distribution
Question: What do we if we want to calculate a probability when the value of interest
is not the mean or a multiple of the standard deviation?
Answer: Compute a z-score and use a table of probabilities for a “standard” normal
distribution with mean of 0 and standard deviation of 1.
Standard Normal Distribution; µ=0, σ=1
-3
-2
-1
µ
1
2
3
Normal Distribution
Standardized Score (z-score)
x-µ
Z = ------------σ
Where:
x is the value of interest
µ is the mean
σ is the standard deviation
Example: Body Mass Index
x is 35
µ is 29.4
σ is 4.6
35 – 29.4
Z = ------------- =
4.6
1.17
Body Mass Index
µ is 29.4
σ is 4.6
Body Mass Index (BMI)
If x = 34, then z = 1
i.e. 34 – 29.4
----------- = 1
4.6
Thus, P(z > 1 =
(0.136 + 0.022 + 0.002 = 0.16)
If x = 35, then z = 1.17
0.2%
15.6
2.2%
13.6%
20.2
24.8
34%
29.4
34%
13.6%
34.0
2.2%
38.6
0.2%
43.2
Body Mass Index (BMI)
i.e. 35 – 29.4
----------- = 1.17
4.6
Thus, P(z > 1.17 =
(?????)
Area under
the curve
Refer to Appendix Table 1
P
P(z > 1.17 = 0.121)
35.0
Body Mass Index (BMI)
In Appendix Table 1:
The probability value of 0.879 (and
1 – 0.879 = 0.121) is determined by
first looking at the left column for z
to 1 decimal place, and then across
the top row for z to the second
decimal place.
Area under
the curve
P
35.0
Can also get the exact probability for z = 1.17
http://stattrek.com/online-calculator/normal.aspx
Cumulative probability: P(Z < 1.17) = 0.879
Thus, probability: P(Z > 1.17) = (1 - 0.879) = 0.121
Standard Normal Distribution (Practice)
A.
What is the probability that a person in the population
will have a BMI > 37.0?
z =
B.
_______________
P(Z > ???) =
__________
What is the probability that a person in the population
will have a BMI < 27.0?
z =
Z =
_______________
x-µ
------------σ
P(Z < ???) =
__________
Body Mass Index (BMI)
Body Mass Index
µ is 29.4
σ is 4.6
After calculating z for questions A
and B, refer to Appendix Table 1
Area under
the curve
Area under
the curve
P
P
27.0
37.0
Standard Normal Distribution (Practice)
A.
What is the probability that a person in the population
will have a BMI > 37.0?
z =
B.
(37 – 29.4) / 4.6 = 1.65
P(Z > 1.65) = 0.0495
What is the probability that a person in the population
will have a BMI < 27.0?
z =
Z =
(27-29.4) / 4.6 = -0.52
x-µ
------------σ
P(Z < -0.52) = 0.3015
Body Mass Index (BMI)
Body Mass Index
µ is 29.4
σ is 4.6
After calculating z for questions A
and B, refer to Appendix Table 1
Area under
the curve
Area under
the curve
P
P
27.0
37.0
Percentiles from Standard Normal Distribution
Standard normal distribution can also be used to compute
percentiles.
x = µ + zσ
Example: Calculate the 90th percentile for BMI:
In Appendix Table 1:
The z value for the 90th percentile is
found by looking in the body of the
table for a value of 0.90, or the
nearest value. In this case, it is 0.8997
which corresponds to a z value of
1.28 (the actual z value is 1.282)
So,
x = 29.4 + (1.282 x 4.6) = 35.3
Body Mass Index (BMI)
Body Mass Index
µ is 29.4
σ is 4.6
µ
29.4
90th percentile
Percentiles from Standard Normal Distribution (Practice)
x = µ + zσ
Calculate the 33rd percentile for BMI:
z = _________
so, x = _________________________
Body Mass Index (BMI)
Body Mass Index
µ is 29.4
σ is 4.6
µ
33rd 29.4
percentile
Percentiles from Standard Normal Distribution (Practice)
x = µ + zσ
Calculate the 33rd percentile for BMI:
z = -0.44
so, x = 29.4 + (-0.44 x 4.6) = 27.4
Body Mass Index (BMI)
Body Mass Index
µ is 29.4
σ is 4.6
µ
33rd 29.4
percentile
Standard Normal Distribution
Z values for common percentiles:
Percentile
1st
2.5th
5th
10th
25th
50th
75th
90th
95th
97.5th
99th
Z
-2.326
-1.960
-1.645
-1.282
-0.675
0.0
0.675
1.282
1.645
1.960
2.326
Z =
x-µ
------------σ
SECTION 2.8
Sampling Distributions
and Central Limit
Theorem
Sampling Distributions
 In estimating the mean of a continuous variable in a population, the
mean of a representative sample is a good estimate of the unknown
population mean, but it is only an estimate.
 When making estimates about population parameters based on
sample statistics, it is very important to quantify the precision of
the parameter estimates (e.g. standard error of the mean).
Illustration:
Assume a population of 6 measurements of self-reported pain (on a 0
to 100 scale) after total hip replacement with scores as follows:
25
50
80
85
90
100
The population mean (μ) is: ∑X / N = 71.7 and
the standard deviation is:
sqrt[∑(X- μ)2 / N = 25.9
Sampling Distributions
25
50
80
85
90
100
The population mean (μ) is: ∑X / N = 71.7 and
the standard deviation is:
sqrt[∑(X- μ)2 / N = 25.9
Suppose we did not have population data and wanted to estimate the
mean from a sample, taking a sample size of 4.
There are 15 different possible samples with n=4 when sampling
without replacement is used (i.e. each individual can only be sampled
once in a given sample).
The probability of selecting any one of the 15 possible samples is
1/15 – see next slide.
Sample
Observations in
Sample
Sample Mean (X)
1
25
50
80
85
60.0
2
25
50
80
90
61.3
3
25
50
80
100
63.8
4
25
50
85
90
62.5
5
25
50
85
100
65.0
6
25
50
90
100
66.3
7
25
80
85
90
70.0
8
25
80
85
100
72.5
9
25
80
90
100
73.8
10
25
85
90
100
75.0
11
50
80
85
90
76.3
12
50
80
85
100
78.8
13
50
80
90
100
80.0
14
50
85
90
100
81.3
15
80
85
90
100
88.8
The table represents the sampling
distribution of the sample means
So, with the original population
sample of N=6, the population mean
is: µ = ∑X / N = 71.7 with SD of 25.9.
However, the mean of the sample
means, denoted as µX is 71.7 and a
standard deviation of σX = 8.5.
Note that µ = µX (71.7), yet σ (25.9)
is much smaller than σX = 8.5
This is because the range of the
population data (25 to 100) is much
larger than the range of the sample
means (60 to 88.8).
These properties are formally stated
in the Central Limit Theorem.
Central Limit Theorem
If we take simple random samples of size n from the population with
replacement, then for large samples (n > 30), the sample distribution
of the sample means is approximately normally distributed with:
µX = µ and σX = σ / sqrt(n)
Because the distribution of the sample means is approximately
normal, the normal probability model can be used to make
inferences about a population mean.
The parameter σX = σ / sqrt(n), as noted above, is the standard error
(meaning the standard deviation of the sample means)
For a dichotomous outcome, the theorem holds for samples that:
Minimum[np, n(1-p)] > 5, where n is the sample size and p is
probability of the outcome in the population.
Central Limit Theorem (Practice)
Sample Population
Sample Means
Characteristic
N
µ
σ
Age (in years)
60
56.2
9.1
Systolic blood pressure
60
135.6
20.8
Body mass index
60
26.3
6.4
Resting heart rate
60
71.0
8.6
µX = µ
σX = σ / sqrt(n)
µX
σX
Central Limit Theorem (Practice)
Sample Population
Sample Means
Characteristic
N
µ
σ
µX
σX
Age (in years)
60
56.2
9.1
56.2
1.17
Systolic blood pressure
60
135.6
20.8
135.6
2.69
Body mass index
60
26.3
6.4
26.3
0.83
Resting heart rate
60
71.0
8.6
71.0
1.11
µX = µ
σX = σ / sqrt(n)
Note that µX = µ because the sample population mean is an
unbiased estimate of the true mean
Whereas σX = σ are different quantities:
σ is an estimate of variability in the sample
σX is an estimate of precision of the mean estimate
Thus, as n increases, σX is smaller, but σ may be > or <
Central Limit Theorem -- Application
Assume that in adults over age 50 that HDL cholesterol has:
µ = 54 and σ = 17.
Suppose a physician has 40 patients (older than 50) and wants to
determine the probability that their mean HDL is >60 i.e. P(X>60)
Intuitively, this should appear very unlikely…..
Z =
X-µ
-----------σ / sqrt(n)
60 – 54
Z = ---------------17 / sqrt(40)
=
From appendix Table 1, P(z > 2.22) =
(1 – 0.9868) = 0.0132 (i.e. very unlikely)
6
---2.7
= 2.22
Central Limit Theorem – Application (Practice)
Assume that in adults over age 50 that HDL cholesterol has:
µ = 44 and σ = 16.
Suppose a physician has 50 patients (older than 50) and wants to
determine the probability that their mean HDL is <40 i.e. P(X<40)
Z =
X-µ
-----------σ / sqrt(n)
X = ____
µ = ____
Z = ---------------σ = ____
From appendix Table 1, P(z < ????) =
n = ____
Central Limit Theorem – Application (Practice)
Assume that in adults over age 50 that HDL cholesterol has:
µ = 44 and σ = 16.
Suppose a physician has 50 patients (older than 50) and wants to
determine the probability that their mean HDL is <40 i.e. P(X<40)
Z =
X-µ
-----------σ / sqrt(n)
X = 40
40 – 44
Z = ---------------16 / sqrt(50)
µ = 44
σ = 16
=
-4
----2.26
n = 50
From appendix Table 1, P(z < -1.77) = 0.0384
= -1.77
SECTION 2.9
SPSS – Calculation
of Z-Scores
SPSS – Calculation of Z-Scores
Example: Age
Analyze
Descriptive Statistics
Descriptives
Before you click OK, be sure that the box
marked "Save Standardized Values as
Variables" is checked.
Run a Frequency distribution for the standardized variable
SPSS – Calculation of Z-Scores
Example: Age
GET
FILE='G:\NGR 7848
2012\Datasets\baseline_random.sav'.
DATASET NAME DataSet1 WINDOW=FRONT.
DESCRIPTIVES VARIABLES=SCR_AGE
/SAVE
/STATISTICS=MEAN STDDEV MIN MAX.
Descriptive Statistics
N
Age (years)
503
Valid N (listwise) 503
Minimum
45
Maximum
74
Mean
59.16
Std. Deviation
7.409
SPSS – Calculation of Z-Scores
Example: Age
FREQUENCIES VARIABLES=ZSCR_AGE
/FORMAT=NOTABLE
/STATISTICS=STDDEV MEAN SKEWNESS
SESKEW
/HISTOGRAM
/ORDER=ANALYSIS.
Statistics
Zscore: Age (years)
N
Valid
503
Missing 0
Mean
0E-7
Std. Deviation
1.00000000
Skewness
.097
Std. Error of Skewness
.109
Download