-1-
T. Baska, M.D., Ph.D.
Statistics (from Latin status , meaning "manner of standing") means presenting of obtained information, process of analysis and summarization of the data (means, standard deviations, rates, ratios, proportions etc.). This is a scope of so called descriptive statistics. Statistical procedures can be also used to reach certain conclusions and prognoses that can be used to patient care, evaluations of treatment efficiency or public health planning. In other words, statistical methods allow estimating or inferring the characteristics of larger population from a sample (process of generalization). This is a scope of so called inference (analytic) statistics.
Application of statistics in medicine and biology are biostatistics and biometrics .
Presenting of data
During observations or measurements the first objective is to express results in some simple form which will permit further calculations organizing and displaying obtained data.
According the type of data there are three basic categories of scales of measurements (the scale in which a characteristic is measured): nominal, ordinal and numerical scales.
Nominal scales are used for the simplest level of measurement when data fit into categories. These data are also called qualitative data, because they describe quality of the studied item (e.g. HIV positivity - positive or negative, gender - male or female, color of eye, etc.) and data fit into several categories (two or more). Nominal data are generally described
-2- in terms of percentages or proportions. Qualitative data are counted.
Ordinal scales are similar to nominal scales but some observations have "more" or are "greater than" other observations so categories are assigned to rank order (e.g. staging of tumors according to their degree of development,
Apgar score describing the maturity of newborn etc.). Although order exists among categories, the difference between two adjacent categories is not the same throughout the scale.
Numerical scales are used when data have some numerical properties. They are sometimes called quantitative data because they measure quantity of something. So, quantitative data in contrast to qualitative ones are measured. There are two types of numerical scales: continuous scales and discrete scales. A continuous scale has values on a continuum and can be continuously refined to measure more accurately. Measured quantity could have any value in certain range (e.g. blood cholesterol level, body weight, blood pressure etc.).
Continuous scale is commonly divided into several intervals
(classes) and data are grouped into competent intervals.
Number of cases in each class indicates its frequency.
Discrete (discontinuos) scales have values equal to integers and can not be refined (e.g. number of fractures, number of teeth etc.)
Distribution of quantitative data
Frequencies of classes displayed into a histogram (Fig. 1) show distribution of certain quantity (in our example distribution of blood cholesterol level in a sample of 15 patients). In a histogram the size of the intervals chosen can
-3- vary. The smaller the intervals, the more detailed the histogram. As the intervals become smaller and more numerous the shape of the histogram becomes increasingly like a smooth curve. If shape of distribution curve follows (approximately)
Gaussian curve, i.e. is symmetrical and bell shaped, such a distribution is called normal distribution .
It has extremely useful characteristics because a large number of statistical tests and calculations can be used if the observations follow a normal distribution.
Summarizing data
While presenting results we can not display all data but we should use some ways to summarize them in well-arranged indices. Different indices are used for nominal, ordinal or numerical data.
Measures of nominal and ordinal data . Characteristics measured on a nominal scale are counts or frequencies of occurrence and can be expressed by: I) proportions (percentages) - proportion is the number of observations with given characteristics divided by the total number of observations. Percentage is the proportion multiplied by 100. II) ratios - ratio is a number of observations with a given characteristics divided by the number of observations without the given characteristics. III) rates - rate is a proportion multiplied by 1,000, 10,000 or
100,000. Vital statistics use rates to describe the health status of a population (e.g. mortality rates, morbidity rates etc.).
Measures of numerical data.
Distribution is characterized by measures of central tendency describing the location of its center and measures of dispersion describing the spread of
-4- data. The measures of central tendency frequently used in epidemiology are mean, median, mode and geometric mean. The most frequently used measures of dispersion are range, standard deviation and percentiles.
The mean is the arithmetic average of the observations and is calculated by the formula: X
X n
( X - mean, X - sum of individual observations, n total number of observations).
The mean can be used only for numerical data, not for ordinal ones.
The median is the middle observation if all observations are ordered from smallest to largest. For an odd number of observations it is the middle value and for even number of observations it is defined as the mean of the two middle values (Fig. 1). Median can be used for ordinal data as well.
The mode is the value that occurs most frequently ("most popular" value). Mode can be also used for ordinal data (Fig.
2).
The geometric mean is defined by formula: GM
n ( X
1
)( X
2
)...( X n
) ( GM
- geometric mean, n - total number of observations, X
1
...X
n
- individual observations). The geometric mean is not so senzitive for extreme values in a set of observations as the mean and is generally used with data measured on a logarithmic scale (e.g. series of titres in serologic studies).
In normal distribution mean, median and mode are in the same point. If distribution is skewed it is rather to use median and mode is primarily used for bimodal distribution (Fig. 2)
The range is the difference between the largest observation and the smallest observation.
-5-
The standard deviation is the most commonly used measure of dispersion of measured values from the mean and is defined by a formula: s
( X
n
1
X )
2
( s - standard deviation,
( X
X )
2 - sum of squares of deviations of individual observations from the mean, n - total number of observations). In normal distribution standard deviation represents horizontal distance between mean and the point of inflection of the distribution curve and approximately 68 per cent of observations lie between X -1s and X +1s.
Similarly, about 96 per cent of observations lie between X -2s and X +2s (Fig. 3) . Standard deviation serves also for defining normal and deviated measured values (values between -1s and +1s are normal ones, between -2SD and -1SD moderately decreased, between +1s and
+2s moderately increased, between -2s and -3s subnormal, between 2s and 3s abnormal, less than -3s extreme low and more than +3s extreme high).
Percentile is a number that indicates the percentage of a distribution that is equal to or bellow that number.
Percentiles are used to compare an individual value with a set of norms. In many laboratories, values considered as normal are set by the 2 1 /
2
and 97 1 /
2
percentiles, which represent central 95 per cent of distribution. Sometimes also the so called interquartile range is used, which is the difference between the 25th and 75th percentiles and contains the central
50 per cent of observations.
Statistical estimations
Population is a large group of items that have something in common (e.g. population of mankind includes all human beings
-6- in the World, population of red blood cells includes all red blood cells in the bloodstream etc.). Only rarely is possible to study the entire population. This is a reason for sampling.
Sample is a subset of the population, selected in such a way that it is representative of the larger population. Samples can be studied more quickly and easily than the whole population. Statistical methods allow estimating characteristics of the entire population studying only a small part of it – studying a sample. If repeated samples are taken from the same population, the statistical measures of central tendency and variability vary among particular samples and these measurements themselves are approximately normally distributed if the sample sizes are sufficiently large. The standard deviation of a sample means is called the standard error of the mean. It can be calculated by formula: SE
s n
( SE - standard error, s - standard deviation of a sample, n - total number of observations). Standard error of the mean indicates the precision of the measured results e.i. how close the estimate derived from any sample is likely to be the underlying population value. It is important to keep in mind difference between standard deviation and standard error:
Standard deviation tells us how much variability can be expected among individuals. The standard error of the mean, however, is the standard deviation of the means in sampling distribution and tells us how much variability can be expected among means in future samples. Because estimates vary from sample to sample, it is important to know how close the estimate derived from one sample is likely to be to the underlying population value. If the means of the samples are normally distributed, we can define range of values
-7- surrounding the estimate which have a specified probability of including the true population values. This range is called confidence interval of the mean and endpoints of this range are called confidence limits. The most frequently used probability is as much as 95 per cent. So, we have confidence
95 per cent that true mean (mean of the population) lies somewhere within limits of the confidence interval (Fig. 4).
Statistical inference
Researcher is frequently faced to the problem how to assess measured differences among samples (e.g. difference between studied group and control group during testing the efficiency of new drug against placebo). Are these differences entirely due to a chance or not? Was the difference between control and studied group caused only by sampling variability or was it the effect of the tested drug? If so, to which extent? These questions can be solved by the so called significance tests.
How do they work? It has been already discussed that sample means are normally distributed (average value equals mean of the population). Similarly, also differences between two sample means have their own distribution (average value equals zero). Significance tests consider differences between studied and control sample as a difference between two samples drawn from one population. At the beginning of testing two hypotheses should be stated:
1.
Null hypothesis claims that there is no real difference between samples. Observed differences are entirely due to a chance. Samples were drawn from one population.
2.
Alternative hypothesis claims that difference is too large to be entirely caused by chance. Another influencing factor should by present (e.g. effect of tested drug).
-8-
First it is assumed that differences represent only sampling variability (are due to a chance) and then probability is calculated whether differences at least as large as those found in the observed data would have occurred by chance. If this probability (P value) is large we accept null hypothesis and reject alternative hypothesis. If P value is small we assume that another factor besides chance was present and accept alternative hypothesis. In other words, difference is statistically significant and there really is a difference.
The level of significance (also called alpha value - ) serves as a borderline between significant and not significant results. Traditional values used for are p=0.05, 0.01 and
0.001. If P value calculated by significance test is smaller than , null hypothesis is rejected and alternative hypothesis is accepted (result is statistically significant). In statistical analysis a hypothesis is never proven to be true or false but is only accepted or rejected on the basis of statistical test. Two types of error are associated with this decision:
1.
to reject the null hypothesis when it is true - type I error
2.
to accept the null hypothesis when it is false - type II error
So, probability of making a type I error is the level of significance.
Practical application of statistical tests
There are many various types of statistical tests and researcher should decide upon appropriate test. Several important facts should be taken into consideration: type of data (qualitative or quantitative), size of sample,
-9- distribution of measured variable (does it approximately follow normal distribution or another type of distribution?) and number of compared samples (two or more).
Comparing two means. If size of sample is more than 30 and measured characteristics follow approximately normal distribution, simple u-test can be used. Test calculates u value, which represents distance from mean if standard deviation equals 1 (e.g. 2 = 2s above mean, -1.5 = 1.5 s below mean etc.). P value can be easily calculated from u (P*100 represents appropriate percentile). If the size of a sample is
30 and less Student's t test is appropriate. Besides these also non-parametric tests can be applied. Non-parametric tests are known as distribution-free methods because they do not assume that the values are normally distributed. The most frequently used is Wilcoxon test.
Comparing three and more means. Many research projects employ more than two groups. In such situation above mentioned tests cannot be used since multiple tests between different pairs of means will alter the level, not for each comparison but for the experiment as a whole. For example, if each comparison is made by using = 0.05, there is a 5 per cent chance that each comparison will falsely be called significant. So in 6 different comparisons there is a 30 per cent chance declaring one of the comparisons incorrectly significant (6x5 per cent).
The recommended approach for analyzing data in this situation is analysis of variance (ANOVA) or some of the multiplecomparison procedures. Analysis of variance splits up variability observed in a collection of observations into components to show how much of it appears to be due to each of number of causes. Multiple-comparison procedures adjust
-10- significance levels of particular comparisons so that overall
level is correct.
Comparing nominal data. To compare nominal (qualitative) data two different approaches can be employed: comparing proportions or comparing absolute values. In comparing proportions we use not absolute numbers (numbers of cases) but percentual proportions. There are tests for comparing two proportions as well as tests comparing more than two proportions. Chi square contingency tables are used for assessing numbers of cases (nor indexes neither numerical data cannot be used). Usually two tables are compared: first one containing expected data assuming their independence and second one containing observed data. Chi square test assesses if expected data are significantly different from observed data. In other words, if occurrence of observed data follows laws of the chance or if some other reason influences their occurrence. The most frequently used tables are 2x2 Chi-Square tables comparing two samples (e.g. comparing occurrence of disease in vaccinated and non vaccinated children.). It should be kept in mind that in 2x2 tables the expected value in each cell must be greater than or equal to five. In larger tables
(three or more raws and columns) no more than 20 per cent of cells can be less than five (but minimum is 1).
Measuring associations between two or more variables
A problem with which the scientist is frequently faced is the relationship between two or more characteristics of a population. Strength of association between two or more variables is measured by correlation coefficient. This coefficient is usually calculated in case of two numerical
-11- characteristics, two ordinal characteristics or observations of more than two variables. Regression analysis is used for predicting one variable from another.
Two numerical characteristics.
When the variables are distributed normally, Pearson’s correlation coeficient is calculated. Its value ranges from –1 to +1. +1 describes perfect positive linear relationship. Zero means there is no relationship at all and -1 means perfect negative linear relationship. When the correlation is near zero the “shape” of pattern of observations in scatter diagram is less or more circular. As the value of correlation gets closer to +1 or –1, the shape becomes more elliptical, until, at +1 and –1, the observations fall directly on a straight line (Fig. 5).
Sometimes the association is not linear but these variables are associated in nonlinear way and therefore have a low correlation coefficient. Using logarithms of variables can help to convert association to linear one and therefore make it more obvious.
Two ordinal characteristics.
In this case Speraman’s correlation coefficient is calculated. Similarly to Pearson’s coefficient, its values can lie between –1 and +1.
More than two characteristics.
In many real-life situations a simple association between two variables is unlikely to prevail. The effect is more likely brought about by number of factors, some inter-related and some independent of each other. To detect more precisely mutual interrelationship among many variables, multiple correlation is measured. Partial correlation helps us while assessing association of two variables to exclude influence of others.
-12-
Regression.
In regression analysis one variable is considered to be a dependent variable, its value varying according to one or more independent variables. Regression line constructed on base of variables enables interpolation and therefore prediction of one variable from another. Other commonly used regression models take into consideration the nonlinear relationship between variables: polynomal regression, logistic regression and proportional hazard.
Using modified life tables
Life tables are constructed in a given time for a specified community (e.g. country, region etc.) and show for each age number of living people as well as number of people died in this age. This enables to calculate mortality ratio for each age group, which represents likelihood of dying in this age.
Life tables provide some practical information:
1.
The probability of dying between any two selected ages (the ratio of the total deaths between two ages to the number alive at the initial age).
2.
The number of survivors at any given age out of those starting at age 0.
3.
The probability of surviving from one age to another
4.
The life expectation of a given population (community)
There are two types of life tables:
1.
Current life tables use cross-sectional data on a given population and can show only what would happen under current conditions of mortality (not actual survival data of the population).
2.
Cohort life tables use data of the one group of subjects followed for a given period of time. These tables show the
-13- actual dying out of a defined group of persons, all born at the same time.
The life-table methods can be used to determine whether new medication, a new treatment, or a new procedure will perform better than the one in current use. Similarly, prognosis of a disease can be expressed in terms of survival data. However, the method can be more generally applied to any defined feature in the follow-up of persons or patients. This is also a useful way of comparing the survival patterns of two or more groups, and testing whether they are significantly different.
Such application of life-tables method is called logrank method.
This short description of statistical tools used in epidemiology serves only as a foreword for more detailed study, which is beyond the scope of this chapter. Statistical tools and tests are comprehensively described in various textbooks on medical statistics. Similarly, many pieces of statistical software were developed (some of them are freely available in internet network) and also commonly used table editors such as Microsoft Excel are equipped with basic statistical tools.