(성대의대대학원강의1_ 20120912)

advertisement
통계적 추론
(Statistical Inference)
삼성생명과학연구소
통계지원팀
김선우
1
Statistical Inference
• 통계적 방법과 자료(Sample)에 근거하여 모집단을 추측
하는 것
• Estimation (추정)
• Testing (검정)
2
Population vs Sample
• Population (모집단)
- 관심 정보를 얻고자 하는 대상의 전체 집합
- 관심 정보에 따라 모집단이 다르게 정의되기 때문에 무
엇을 알고자 하느냐를 명확히 정의하는게 중요
- 시점, 지역 등 명시
• Sample (표본)
- 모집단의 부분 집합
3
Target (Study) population
• The group that we wish to study
• The sample is selected from the study population
4
Parameter vs Statistic
• Parameter (모수)
- 모집단의 양적 특성
- Population mean, Population standard deviation,
Population proportion, …
• Statistic (통계량)
- 모수 추정을 위해 표본으로부터 산출되는 양적 특성
- Sample mean, Sample standard deviation,
Sample proportion, …
5
• Example 6.8
- Wish to characterize the distribution of birthweight of
all liveborn infants that were born in the US in 1988.
- Parameter of interest: Mean, SD of birthweight

• Sample
- Statistic: Mean, SD of birthweight from sample
6
Random sample
• A selection of some members of the population such
that each member is independently chosen and has a
known non-zero probability of being selected.
• A simple random sample
- A random sample in which each group member has
the same probability of being selected.
7
Randomized Clinical Trial
(RCT)
• A type of research design for comparing difference
treatments, in which the assignment of treatments to
patients is by some random mechanisms (randomization).
• Randomization
- Assignment of treatment of an individual is independent to
assignment of treatment of other individuals.
- The types of patients assigned to different treatment
modalities will be similar if the sample sizes are large. If
sample sizes are small, then patient characteristics of
treatment groups may not be comparable. Thus, it is
customary to present a table of characteristics of different
treatment groups in RCTs, to check that the randomization
process is working well.
8
Design features of randomized
clinical trials
• Block randomization
- For comparing two treatments (A, B), a block size of
2n is determined in advance, where for every 2n
patients entering the study, n patients are randomly
assigned to treatment A and the remaining n patients
are assigned to treatment B. A similar approach can be
used in clinical trials with more than 2 treatment
groups.
9
• Stratification
- Patients are subdivided into subgroups, or strata,
according to characteristics that are thought to be
important for patient outcome. Separate
randomization lists are maintained for each stratum to
ensure that there are comparable patient populations
within each stratum. Either random selection or block
randomization might be used for each stratum.
10
• Blinding
- Double blind if neither the physician nor the patient
know what treatment he or she is getting.
- Single blind if the patient is blinded as to treatment
assignment but the physician is not (vise versa).
- Blinding is always preferable to prevent biased
reporting of outcome by the patient and/or the
physician. However, it is not always feasible in all
research setting
- Double dummy: Target drug + Placebo of standard
drug, Standard drug + Placebo of target drug
11
Estimation
• Point estimation (점추정)
• Interval estimation (구간추정)
12
Point estimation
• A natural estimator to use for estimating the
population mean is the sample mean.
• This estimator has desirable properties (unbiased,
minimum variance).
• Sampling distribution of the sample mean
- The distribution of values of sample mean over all
possible samples of same size that could have been
selected from the study population. (Figure 6.1)
13
• Standard Error of the Mean (SEM)
- The standard deviation of the sample means
- A quantitative measure of the variability of sample
means obtained from repeated random samples of
size n drawn from the same population.
- The standard deviation of population(σ)/√n

- The larger the sample size, the more accurate an
estimator of mean will be.
• Example 6.23
- SEM is given by SD of sample(s)/√n=22.44/√10=7.09
14
• Each estimator (ex: sample variance, sample proportion,
etc) has its own standard error.
Ex) SE of sample proportion (p) = p(1-p)/√n
• The larger the sample size, the more accurate an
estimator of corresponding parameter will be.
15
• SD vs SEM
• SD
- Variability of raw data
• SEM
- Variability of the sample means
16
Interval estimation
• 점 추정값은 주어진 표본으로부터 산출된 값으로, 표본
이 달라지면 점 추정값이 또한 달라질 수 있으므로, 그
자체 variability를 갖고 있음 (예: SEM).
• Ex)
Sample mean of cholesterol in SMC = 192
Sample mean of cholesterol in SNU = 181
Sample mean of cholesterol in CMC = 185
…
• 따라서 관심 모수가 속해 있을 구간을 추정해 볼 필요가
있음.
17
• 95% Confidence interval for μ (population mean)
- Over the collection of all 95% confidence intervals that
could be constructed from repeated random samples
of size n, 95% will contain the parameter μ. (figure 6.6)
• Factors affecting the length of a CI
- As the sample size increases, the length of CI
decreases.
- As the SD, which reflects the variability of individual
observations, increases, the length of CI increases.
- As the confidence desired (ex; 95%) increases, the
length of CI increases.
18
• Example 6.30, 6.32, 6.33
• Body temperature
• 95% CI for the mean
= (sample mean-1.96*SE, sample mean+1.96*SE)
( SE = standard error of the mean = SD/√n)
• Sample mean=97.2, SD=0.2 for n=10
• 95% CI for the mean = (97.08, 97.32)
• Sample mean=97.2, SD=0.2 for n=100
• 95% CI for the mean = (97.16, 97.24)
• Sample mean=97.2, SD=0.4 for n=10
• 95% CI for the mean = (96.95, 97.45)
19
• Confidence interval can be estimated using either
asymptotic method or exact method.
• For binary outcome,
Asymptotic method for 95% CI for the proportion;
(sample proportion-1.96*SE, sample proprotion+1.96*SE)
( SE = Standard error of the proportion = √p(1-p)/n)
Example 6.45
p=0.04, n=10,000
95% CI for proportion = (0.036, 0.044)
20
In the case of np(1-p)<5, exact interval should be
estimated using binomial distribution.
• Example 6.47
n=20, p=0.1; np(1-p)=1.8<5
Exact 95% CI for p = (0.01, 0.32)
21
Testing
• Research objective
• Research question

• Research hypothesis
22
Hypothesis (가설)
• 연구목적이 추상적 기술인 반면, 가설은 실제 연구 수행
(설계~보고)이 가능하도록 구체적이고 명확히 기술됨
•
•
•
•
•
•
연구목적과 부합
분석 대상 포함
비교 군이 명확히 포함
비교 변수가 실제 측정 변수를 사용하여 기술
기대하는 바가 반영되어야 함
직접 통계적 검정이 가능하도록 작성되어야 함
23
• 연구목적: 새로운 항암제와 기존 항암제간 유효성 비교
• 가설?
• 새로운 항암제와 기존 항암제간 효과가 다르다.
 Wrong!
• 4기 유방암 환자에서 새로운 항암제 사용군과 기존 항암
제 사용군간 3개월 반응율이 차이가 있다.
24
귀무가설 vs 대립가설
• Alternative hypothesis (대립가설) (H1)
- 연구자가 입증하고자 하는 바를 기술한 것
• Null hypothesis (귀무가설) (H0)
- 대립가설과 반대되는 가설
25
• 대립가설: 4기 유방암 환자에서 새로운 항암제 사용군과
기존 항암제 사용군간 3개월 반응율이 차이가 있다.
(Non-equality test)  입증하고자 하는 바
• 귀무가설: 4기 유방암 환자에서 새로운 항암제 사용군과
기존 항암제 사용군간 3개월 반응율이 차이가 없다.
• 대립가설: 전립선 수술환자에서 open surgery와 robot
surgery 방법간 3개월째 PSA 수치 비정상 비율이 차이가
없다. (Equivalence test)  입증하고자 하는 바
• 귀무가설: 전립선 수술환자에서 open surgery와 robot
surgery 방법간 3개월째 PSA 수치 비정상 비율이 차이가
있다.
26
Statistical testing
• 통계적 방법과 자료를 가지고 귀무가설 기각 여부에 대한 판
정을 내리는 것
• 귀무가설이 ‘참’이라는 가정하에 검정을 수행하는 것으로, 귀
무가설이 기각될만큼의 충분한 증거가 있을 때에만 귀무가설
을 기각
• If the null hypothesis is rejected, there is sufficient evidence
to reject the null hypothesis, and the alternative hypothesis
can be proved.
• If the null hypothesis is not rejected, the null hypothesis
may be true or the evidence of the alternative hypothesis is
not sufficient to reject the null hypothesis even though the
alternative hypothesis is true. Thus, if the null hypothesis is
not rejected, we can not say that the null hypothesis is true.
27
• 대립가설: 4기 유방암 환자에서 새로운 항암제 사용군과
기존 항암제 사용군간 3개월 반응율이 차이가 있다.
(Non-equality test)  입증하고자 하는 바
• 귀무가설: 4기 유방암 환자에서 새로운 항암제 사용군과
기존 항암제 사용군간 3개월 반응율이 차이가 없다.

• 귀무가설이 기각될 경우; 두 군간 3개월 반응율이 차이가
있다.
• 귀무가설이 기각되지 못할 경우; 두 군간 3개월 반응율이
차이가 있다고 할 수 없다.
28
• 대립가설: 전립선 수술환자에서 open surgery와 robot
surgery 방법간 3개월째 PSA 수치 비정상 비율이 차이가
없다. (Equivalence test)  입증하고자 하는 바
• 귀무가설: 전립선 수술환자에서 open surgery와 robot
surgery 방법간 3개월째 PSA 수치 비정상 비율이 차이가
있다.

• 귀무가설이 기각될 경우, 3개월째 PSA 수치 비정상 비율
이 두 군간 차이가 없다.
• 귀무가설이 기각되지 못할 경우, 3개월째 PSA 수치 비정
상 비율이 두 군간 같다고 말할 수 없다.
29
• Four possible outcomes in hypothesis testing
Truth
Decision from testing
Null hypothesis
Alternative hypothesis
Do not reject null
Correct
Incorrect
(Type II error;
False negative error)
Reject null
Incorrect
(Type I error;
False positive error)
Correct
30
• The probability of a type I error is usually denoted by α and
is commonly referred to as the significance level of a test.
(false positive error의 감내할 수 있는 최대 크기)
• The probability of a type II error is usually denoted by β.
• The power of a test is defined as 1-β.
• The general aim in hypothesis testing is to use statistical
tests that make α and β as small as possible. This goal
requires compromise, since making α small involves
rejecting the null hypothesis less often, whereas making β
small involves accepting the null hypothesis less often. 
Contradictory; that is, as α increases, β will decrease vice
versa. General strategy is to fix α at some specific level, (ex;
0.1, 0.05, 0.01, etc) and to use the test that minimizes β (or
maximizes the power).
31
One-sided test vs Two-sided test
• A one-sided test is a test in which the values of the
parameter being studied (ex. Population mean, μ)
under the alternative hypothesis are allowed to be
either greater than or less than the values of the
parameter under the null hypothesis but not both.
• Example 7.2, 7.10 (SD=25)
- H0: μ=120, H1: μ<120
- Sample mean=115
- How sample mean is small in order to reject H0?
 Need the rejection region (the range of values of
sample mean for which H0 is rejected)
32
• Use the probability of type I error (α).
• α = P(reject H0 | H0 is true) = P(sample mean < C | μ=120)
(표준화 필요  Z = (sample mean – μ) / (σ/√n), Z~N(0,1),
즉 Z는 평균 0, 분산 1인 표준정규분포를 따름)
• With SD=25, n=100, α=0.05,
0.05 = P(Z < (C-120)/(25/√100))
 (C-120)/(25/√100) = Z0.05 = -1.645
 C=115.89
• Reject H0 if sample mean < 115.89 under α = 0.05.
• From sample data, sample mean=115,
 reject H0 under α = 0.05
 This approach depends on the size of type I error (α) to
decide whether the null is rejected.
33

• Significance tests can be effectively performed at all α
levels by obtaining the p-value for the test.
•
-
P-value
Probability; (0, 1)
자료가 귀무가설을 지지하는 정도
귀무가설이 맞다고 가정했을 때, 자료로부터 산출한 통
계값(예: 표본평균)보다 더 극단적인 결과(즉 대립가설에
유리하게 나오는 것)가 나올 확률
34
• P-value
= P(표본평균<115 | μ=120)
= P(표준화된 표본평균 < (115-120)/(25/√100))
= P(Z<-2.0) = 0.02275
• 귀무가설이 맞다고 가정함으로써 귀무가설(μ=120)을 기
준으로 하여, 관찰된 통계값(ex; 115)이 거기서 얼마나 떨
어져있나를 보는 것.
• 멀리 떨어져 있으면 p-value가 작아 귀무가설을 부정하
게되고, 가까우면 p-value가 커서 귀무가설을 부정하지
않게됨
35
• Significance level (α); a pre-chosen probability
• P-value; a probability calculated after a given study
• P-value는 표본의 크기가 크면 임상적, 실제적으로 의미
없는 차이에도 작게 산출되어 통계적으로 유의하다고 할
수 있으므로, 통계적 유의성이 곧 임상적, 실제적 유의성
을 보장하지는 않음
• 따라서, p-value와 신뢰구간을 함께 제시하는 것이 바람
직함
36
• A two-sided test is a test in which the values of the
parameter being studied under the alternative
hypothesis are allowed to be either greater than or
less than the values of the parameter under the null
hypothesis.
• Example 7.19
- H0: μ=190, H1: μ≠190
• A reasonable decision rule to test for alternative on
either side of the null mean is to reject H0 if sample
mean is either too small or too large.
37
• α = P(reject H0 | H0 is true)
= P(sample mean < C1 or > C2 | H0 is true)
= P(sample mean < C1 | H0 is true)
+ P(sample mean > C2 | H0 is true)
• For comparison of two means, half of the type I error
is arbitrarily assigned to each of the probabilities.

• P(sample mean < C1 | H0 is true)
= P(sample mean > C2 | H0 is true) = α/2
38
• Sample mean given the data = 181.52
• P-value
= 2*P(sample mean < 181.52 | μ=190)
= 2*P(Z < (181.52-190)/(40/√100))
= 2*P(Z<-2.12) = 2*0.017=0.034
39
Relationship between hypothesis
testing and confidence interval
• For two-sided cases, H0 (μ=μ0) is rejected with a twosided level α test if and only if the two-sided 100%*(1α) confidence interval for parameter does not contain
μ0.
• Example 7.40 H0 (μ=190)
- 95% CI for cholesterol mean
= (sample mean-1.96*σ/√n, sample mean+1.96*σ/√n)
= (181.52-1.96*40/ √100, 181.52+1.96*40/√100)
= (173.68, 189.36)
- P-value was 0.034.
40
Download