Uploaded by Jenifer Acido

Stats 101.2

advertisement
Descriptive and inferential
statistics
Asst. Prof. Georgi Iskrov, PhD
Department of Social Medicine
Before we start
http://www.raredis.work/edu/
Lecture slides to be updated!
Outline
•
•
•
•
•
•
•
•
•
•
•
Statistics
Sample, population and sampling
Descriptive and inferential statistics
Types of variables and level of measurement
Measures of central tendency and spread
Normal distribution
Confidence intervals
Sample size calculation
Hypothesis testing
Significance, power and errors
Normality tests
Why do we need to use
statistical methods?
• Why do we need to use statistical methods?
– To make strongest possible conclusion from limited
amounts of data;
– To generalize from a particular set of data to a more
general conclusion.
• What do we need to pay attention to?
– Bias
– Probability
Population vs Sample
Sample / Statistics
x, s
Population
Parameters
μ, σ
Population vs Sample
• Population includes all objects of interest, whereas
sample is only a portion of the population:
– Parameters are associated with populations and
statistics with samples;
– Parameters are usually denoted using Greek letters (μ,
σ) while statistics are usually denoted using Roman
letters (X, s).
• There are several reasons why we do not work with
populations:
– They are usually large and it is often impossible to get
data for every object we are studying;
– Sampling does not usually occur without cost. The
more items surveyed, the larger the cost.
Inferential statistics
Sampling
Population
Parameters
From population to sample
Sample
From sample to population
Inferential statistics
Statistics
Descriptive vs Inferential statistics
• We compute statistics and use them to estimate
parameters.
• The computation is the first part of the statistical analysis
(Descriptive Statistics) and the estimation is the second
part (Inferential Statistics).
• Descriptive statistics: The procedure used to organize
and summarize masses of data.
• Inferential statistics: The methods used to find out
something about a population, based on a sample.
Sampling
• Individuals in the population vary from one another with
respect to an outcome of interest.
Sampling
• When a sample is drawn, there is no certainty that it will
be representative for the population.
Sample A
Sample B
Sampling
Sample B
Sample A
Population
Sampling
Sample B
Population
Sample A
Sampling
• Random sample: In random sampling, each item or
element of the population has an equal chance of being
chosen at each draw. While this is the preferred way of
sampling, it is often difficult to do. It requires that a
complete list of every element in the population be
obtained. Computer generated lists are often used with
random sampling.
• Properties of a good sample:
– Random selection;
– Representativeness by structure;
– Representativeness by number of cases.
Sampling
• Systematic sampling: The list of elements is “counted
off”. That is, every k-th element is taken. This is similar to
lining everyone up and numbering off “1,2,3,4; 1,2,3,4;
etc”. When done numbering, all people numbered 4
would be used.
• Convenience sampling: In convenience sampling,
readily available data is used. That is, the first people the
surveyor runs into.
Sampling
• Cluster sampling: It is accomplished by dividing the
population into groups (clusters), usually geographically.
The clusters are randomly selected, and each element in
the selected clusters are used.
• Stratified sampling: It divides the population into
groups, called strata. However, this time it is by some
characteristic, not geographically. For instance, the
population might be separated into males and females. A
sample is taken from each of these strata using either
random, systematic, or convenience sampling.
Random and systematic
errors
• Random error can be conceptualized as sampling
variability.
• Bias (systematic error) is a difference between an
observed value and the true value due to all causes
other than sampling variability.
• Biased sample: Biased sample is one, in which the
method used to create the sample results in samples
that are systematically different from the population.
• Accuracy is a general term denoting the absence of
error of all kinds.
Sample size calculation
• Law of Large Numbers: As the number of trials of a
random process increases, the percentage difference
between the expected and actual values goes to zero.
• Application in biostatistics: Bigger sample size, smaller
margin of error.
• A properly designed study will include a justification for
the number of experimental units (people/animals) being
examined.
• Sample size calculations are necessary to design
experiments that are large enough to produce useful
information and small enough to be practical.
Sample size calculation
• Generally, the sample size for any study depends on:
– Acceptable level of confidence;
– Power of the study;
– Expected effect size and absolute error of precision;
– Underlying scatter in the population.
Large sample size
High power
Large effect
Little scatter
Small sample size
Low power
Small effect
Lots of scatter
Sample size calculation
• For quantitative variables:
Z  SD
n
2
d
2
• Z – confidence level;
• SD – standard deviation;
• d – absolute error of precision.
2
Sample size calculation
• For quantitative variables:
Z  SD
n
2
d
2
2
• A researcher is interested in knowing the average
systolic blood pressure in pediatric age group at 95%
level of confidence and precision of 5 mmHg. Standard
deviation, based on previous studies, is 25 mmHg.
1.96  25
n
 96.04
2
5
2
2
Sample size calculation
• For qualitative variables:
Z  p  (100  p)
n
2
d
2
• Z – confidence level
• p – expected proportion in population
• d – absolute error of precision
Sample size calculation
• For qualitative variables:
Z  p  (100  p)
n
2
d
2
• A researcher is interested in knowing the proportion of
diabetes patients having hypertension. According to a
previous study, the actual number is no more than 15%.
The researcher wants to calculate this size with a 5%
absolute precision error and a 95% confidence level.
1.96 15  (100  15)
n
 195.92
2
5
2
Variables
• Different types of data require different kind of analyses.
Nominal
Ordinal
Interval
Ratio
Frequency
distribution
Yes
Yes
Yes
Yes
Median,
percentiles
No
Yes
Yes
Yes
Mean,
standard
deviation
No
No
Yes
Yes
Ratio
No
No
No
Yes
Levels of measurement
• There are four levels of measurement: Nominal, Ordinal,
Interval and Ratio. These go from lowest level to highest
level.
• Data is classified according to the highest level which it
fits. Each additional level adds something the previous
level did not have.
– Nominal is the lowest level. Only names are
meaningful here;
– Ordinal adds an order to the names;
– Interval adds meaningful differences;
– Ratio adds a zero so that ratios are meaningful.
Levels of measurement
• Nominal scale – eg., genotype
You can code it with numbers, but the order is arbitrary
and any calculations would be meaningless.
• Ordinal scale – eg., pain score from 1 to 10
The order matters but not the difference between values.
• Interval scale – eg., temperature in C
The difference between two values is meaningful.
• Ratio scale – eg., height
It has a clear definition of 0. When the variable equals 0,
there is none of that variable. When working with ratio
variables, but not interval variables, you can look at the
ratio of two measurements.
Central tendency and spread
• Central tendency: Mean, mode and median
• Spread: Range, interquartile range, standard deviation
• Mistakes:
– Focusing on only the mean and ignoring the variability
– Standard deviation and standard error of the mean
– Variation and variance
• What is best to use in different scenarios?
– Symmetrical data: mean and standard deviation
– Skewed data: median and interquartile range
Normal (Gaussian) distribution
• When data are approximately normally distributed:
 approximately 68% of the data lie within one SD of
the mean;
 approximately 95% of the data lie within two SDs of
the mean;
 approximately 99.7% of the data lie within three SDs
of the mean.
Normal (Gaussian) distribution
• Central limit theorem:
– Create a population with a known distribution that is not normal;
– Randomly select many samples of equal size from that
population;
– Tabulate the means of these samples and graph the frequency
distribution.
• Central limit theorem states that if your samples are
large enough, the distribution of the means will
approximate a normal distribution even if the population
is not Gaussian.
• Mistakes:
– Normal vs common (or disease free);
– Few biological distributions are exactly normal.
Confidence interval for the
population mean
• Population mean: point estimate vs interval estimate
• Standard error of the mean – how close the sample
mean is likely to be to the population mean.
• Assumptions: a random representative sample,
independent observations, the population is normally
distributed (at least approximately).
• Confidence interval depends on: sample mean, standard
deviation, sample size, degree of confidence.
• Mistakes:
– 95% of the values lie within the 95% CI;
– A 95% CI covers the mean ± 2 SD.
Confidence interval for the
population mean
• The duration of time from first exposure to HIV infection to AIDS
diagnosis is called the incubation period. The incubation periods (in
years) of a random sample of 30 HIV infected individuals are: 12.0,
10.5, 9.5, 6.3, 13.5, 12.5, 7.2, 12.0, 10.5, 5.2, 9.5, 6.3, 13.1, 13.5,
12.5, 10.7, 7.2, 14.9, 6.5, 8.1, 7.9, 12.0, 6.3, 7.8, 6.3, 12.5, 5.2, 13.1,
10.7, 7.2. Calculate the 95% CI for the population mean incubation
period in HIV.
• X = 9.5 years; SD = 2.8 years
• SEM = 0.5 years
• 95% level of confidence => Z = 1.96
• µ = 9.5 ± (1.96 x 0.5) = 9.5 ± 1 years
• 95% CI for µ is (8.5; 10.5 years)
Confidence interval for the
population mean
• X = 9.5 years; SD = 2.8 years
• SEM = 0.5 years
• 95% level of confidence => Z = 1.96
– µ = 9.5 ± (1.96 x 0.5) = 9.5 ± 1 years
– 95% CI for µ is (8.5; 10.5 years)
• 99% level of confidence => Z = 2.58
– µ = 9.5 ± (2.58 x 0.5) = 9.5 ± 1.3 years
– 99% CI for µ is (8.2; 10.8 years)
Hypothesis testing
Diabetes type 2 study
• Experimental group:
• Control group:
Mean blood sugar level: 103 mg/dl
Mean blood sugar level: 107 mg/dl
Pancreatic cancer study
• Experimental group:
• Control group:
1-year survival rate: 23%
1-year survival rate: 20%
Is there a difference?
Hypothesis testing
• The general idea of hypothesis testing involves:
– Making an initial assumption;
– Collecting evidence (data);
– Based on the available evidence (data), deciding
whether to reject or not reject the initial assumption.
• Every hypothesis test – regardless of the population
parameter involved – requires the above three steps.
Null hypothesis – H0
• This is the hypothesis under test, denoted as H0.
– The null hypothesis is usually stated as the
absence of a difference or an effect;
– The null hypothesis says there is no effect;
– The null hypothesis is rejected if the significance test
shows the data are inconsistent with the null
hypothesis.
Alternative hypothesis – H1
• This is the alternative to the null hypothesis. It is denoted
as H', H1, or HA.
– It is usually the complement of the null
hypothesis;
– If, for example, the null hypothesis says two
population means are equal, the alternative says the
means are unequal.
Criminal trial
• Criminal justice system assumes “the defendant is
innocent until proven guilty”. That is, our initial
assumption is that the defendant is innocent.
• In the practice of statistics, we make our initial
assumption when we state our two competing
hypotheses – the null hypothesis (H0) and the alternative
hypothesis (HA). Here, our hypotheses are:
• H0: Defendant is not guilty (innocent);
• HA: Defendant is guilty;
• In statistics, we always assume the null hypothesis
is true. That is, the null hypothesis is always our
initial assumption.
Criminal trial
• The prosecution team then collects evidence with the
hopes of finding “sufficient evidence” to make the
assumption of innocence refutable.
• In statistics, the data are the evidence.
• The jury then makes a decision based on the available
evidence:
• If the jury finds sufficient evidence – beyond a
reasonable doubt – to make the assumption of
innocence refutable, the jury rejects H0 and deems the
defendant guilty. We behave as if the defendant is
guilty.
• If there is insufficient evidence, then the jury does not
reject H0. We behave as if the defendant is innocent.
Making the decision
• Recall that it is either likely or unlikely that we would
observe the evidence we did given our initial
assumption.
• If it is likely, we do not reject the null hypothesis;
• If it is unlikely, then we reject the null hypothesis in favor
of the alternative hypothesis;
• Effectively, then, making the decision reduces to
determining “likely” or “unlikely”.
Making the decision
• In statistics, there are two ways to determine whether the
evidence is likely or unlikely given the initial assumption:
– We could take the “critical value approach” (favored
in many of the older textbooks).
– Or, we could take the “p-value approach” (what is
used most often in research, journal articles, and
statistical software).
Making the decision
• Suppose we find a difference between two groups in
survival:
– patients on a new drug have a survival of 15 months;
– patients on the old drug have a survival of 18 months.
• So, the difference is 3 months.
Making the decision
• Suppose we find a difference between two groups in
survival:
– patients on a new drug have a survival of 15 months;
– patients on the old drug have a survival of 18 months.
• So, the difference is 3 months.
• Do we accept or reject the hypothesis of no true
difference between the groups (the two drugs)?
• Is a difference of 3 a lot, statistically speaking – a
huge difference that is rarely seen?
• Or is it not much – the sort of thing that happens all
the time?
Making the decision
• A statistical test tells you how often you would get a
difference of 3, simply by chance, if the null
hypothesis is correct – no real difference between the
two groups.
• Suppose the test is done and its result is that p = 0.32.
This means that you’d get a difference of 3 quite often
just by the play of chance – 32 times in 100 – even when
there is in reality no true difference between the groups.
Making the decision
• A statistical test tells you how often you’d get a
difference of 3, simply by chance, if the null
hypothesis is correct – no real difference between the
two groups.
• On the other hand if we did the statistical analysis and p
= 0.0001, then we say that you would only get a
difference as big as 3 by the play of chance 1 time in 10
000. That is so rarely that we want to reject our
hypothesis of no difference: there is something different
about the new therapy.
Hypothesis testing
• Somewhere between 0.32 and 0.0001 we may not be
sure whether to reject the null hypothesis or not.
• Mostly we reject the null hypothesis when, if the null
hypothesis were true, the result we got would have
happened less than 5 times in 100 by chance. This is
the ‘conventional’ cutoff of 5% or p <0.05.
• This cutoff is commonly used but it is arbitrary i.e. no
particular reason why we use 0.05 rather than 0.06 or
0.048 or whatever.
Hypothesis testing
Decision:
Reject null
hypothesis
Decision:
Do not reject null
hypothesis
Null hypothesis is
true
Type I error
No error
Null hypothesis is
false
No error
Type II error
Type I and II errors
• A type I error is the incorrect rejection of a true null
hypothesis (also known as a “false positive”
finding).
• The probability of a type I error is denoted by the Greek
letter  (alpha).
• A type II error is incorrectly retaining a false null
hypothesis (also known as a "false negative"
finding).
• The probability of a type II error is denoted by the Greek
letter  (beta).
Level of significance
• Level of significance (α) – the threshold for declaring if
a result is significant. If the null hypothesis is true, α is
the probability of rejecting the null hypothesis.
• α is decided as part of the research design, while pvalue is computed from data.
• α = 0.05 is most commonly used.
• Small α value reduces the chance of Type I error, but
increases the chance of Type II error.
• Trade-off based on the consequences of Type I (falsepositive) and Type II (false-negative) errors.
Power
• Power – the probability of rejecting a false null
hypothesis. Statistical power is inversely related to β or
the probability of making a Type II error (power is equal
to 1 – β).
• Power depends on the sample size, variability,
significance level and hypothetical effect size.
• You need a larger sample when you are looking for a
small effect and when the standard deviation is large.
Choosing a statistical test
• Choice of a statistical test depends on:
– Level of measurement for the dependent and
independent variables
– Number of groups or dependent measures
– Number of units of observation
– Type of distribution
– The population parameter of interest (mean, variance,
differences between means and/or variances)
Choosing a statistical test
• Multiple comparison – two or more data sets, which
should be analyzed
– repeated measurements made on the same individuals;
– entirely independent samples.
• Degrees of freedom – the number of scores, items, or
other units in the data set, which are free to vary
• One- and two tailed tests
– one-tailed test of significance used for directional hypothesis;
– two-tailed tests in all other situations.
• Sample size – number of cases, on which data have
been obtained
– Which of the basic characteristics of a distribution are more
sensitive to the sample size?
Student t-test
t
X1  X 2
S S
2
x1
2
x2
2-sample t-test
• Aim: Compare two means
• Example: Comparing pulse rate in people taking two
different drugs
• Assumption: Both data sets are sampled from Gaussian
distributions with the same population standard deviation
• Effect size: Difference between two means
• Null hypothesis: The two population means are identical
• Meaning of P value: If the two population means are
identical, what is the chance of observing such a
difference (or a bigger one) between means by chance
alone?
Paired t-test
• Aim: Compare a continuous variable before and after an
intervention
• Example: Comparing pulse rate before and after taking a
drug
• Assumption: The population of paired differences is
Gaussian
• Effect size: Mean of the paired differences
• Null hypothesis: The population mean of paired
differences is zero
• Meaning of P value: If there is no difference in the
population, what is the chance of observing such a
difference (or a bigger one) between means by chance
alone?
One-way ANOVA
• Aim: Compare three or more means
• Example: Comparing pulse rate in 3 groups of people,
each group taking a different drug
• Assumption: All data sets are sampled from Gaussian
distributions with the same population standard deviation
• Effect size: Fraction of the total variation explained by
variation among group means
• Null hypothesis: All population means are identical
• Meaning of P value: If the population means are
identical, what is the chance of observing such a
difference (or a bigger one) between means by chance
alone?
Parametric and
non-parametric tests
• Parametric test – the variable we have measured in
the sample is normally distributed in the population
to which we plan to generalize our findings
• Non-parametric test – distribution free, no assumption
about the distribution of the variable in the population
Normality test
• Normality tests are used to determine if a data set is
modeled by a normal distribution and to compute how
likely it is for a random variable underlying the data set to
be normally distributed.
• In descriptive statistics terms, a normality test measures
a goodness of fit of a normal model to the data – if the fit
is poor then the data are not well modeled in that respect
by a normal distribution, without making a judgment on
any underlying variable.
• In frequentist statistics statistical hypothesis testing, data
are tested against the null hypothesis that it is normally
distributed.
Normality test
• Graphical methods
• An informal approach to testing normality is to compare
a histogram of the sample data to a normal probability
curve. The empirical distribution of the data (the
histogram) should be bell-shaped and resemble the
normal distribution. This might be difficult to see if the
sample is small.
Normality test
• Frequentist tests
• Tests of univariate normality include the following:
– D'Agostino's K-squared test
– Jarque–Bera test
– Anderson–Darling test
– Cramér–von Mises criterion
– Lilliefors test
– Kolmogorov–Smirnov test
– Shapiro–Wilk test
– Etc.
Normality test
• Kolmogorov–Smirnov test
• K–S test is a nonparametric test of the equality of
distributions that can be used to compare a sample with
a reference distribution (1-sample K–S test), or to
compare two samples (2-sample K–S test).
• K–S statistic quantifies a distance between the empirical
distribution of the sample and the cumulative distribution
of the reference distribution, or between the empirical
distributions of two samples.
• The null hypothesis is that the sample is drawn from the
reference distribution (in the 1-sample case) or that the
samples are drawn from the same distribution (in the 2sample case).
Normality test
• Kolmogorov–Smirnov test
• In the special case of testing for normality of the
distribution, samples are standardized and compared
with a standard normal distribution. This is equivalent to
setting the mean and variance of the reference
distribution equal to the sample estimates, and it is
known that using these to define the specific reference
distribution changes the null distribution of the test
statistic.
Download