Stats 101.2

Descriptive and inferential statistics Asst. Prof. Georgi Iskrov, PhD Department of Social Medicine Before we start http://www.raredis.work/edu/ Lecture slides to be updated! Outline • • • • • • • • • • • Statistics Sample, population and sampling Descriptive and inferential statistics Types of variables and level of measurement Measures of central tendency and spread Normal distribution Confidence intervals Sample size calculation Hypothesis testing Significance, power and errors Normality tests Why do we need to use statistical methods? • Why do we need to use statistical methods? – To make strongest possible conclusion from limited amounts of data; – To generalize from a particular set of data to a more general conclusion. • What do we need to pay attention to? – Bias – Probability Population vs Sample Sample / Statistics x, s Population Parameters μ, σ Population vs Sample • Population includes all objects of interest, whereas sample is only a portion of the population: – Parameters are associated with populations and statistics with samples; – Parameters are usually denoted using Greek letters (μ, σ) while statistics are usually denoted using Roman letters (X, s). • There are several reasons why we do not work with populations: – They are usually large and it is often impossible to get data for every object we are studying; – Sampling does not usually occur without cost. The more items surveyed, the larger the cost. Inferential statistics Sampling Population Parameters From population to sample Sample From sample to population Inferential statistics Statistics Descriptive vs Inferential statistics • We compute statistics and use them to estimate parameters. • The computation is the first part of the statistical analysis (Descriptive Statistics) and the estimation is the second part (Inferential Statistics). • Descriptive statistics: The procedure used to organize and summarize masses of data. • Inferential statistics: The methods used to find out something about a population, based on a sample. Sampling • Individuals in the population vary from one another with respect to an outcome of interest. Sampling • When a sample is drawn, there is no certainty that it will be representative for the population. Sample A Sample B Sampling Sample B Sample A Population Sampling Sample B Population Sample A Sampling • Random sample: In random sampling, each item or element of the population has an equal chance of being chosen at each draw. While this is the preferred way of sampling, it is often difficult to do. It requires that a complete list of every element in the population be obtained. Computer generated lists are often used with random sampling. • Properties of a good sample: – Random selection; – Representativeness by structure; – Representativeness by number of cases. Sampling • Systematic sampling: The list of elements is “counted off”. That is, every k-th element is taken. This is similar to lining everyone up and numbering off “1,2,3,4; 1,2,3,4; etc”. When done numbering, all people numbered 4 would be used. • Convenience sampling: In convenience sampling, readily available data is used. That is, the first people the surveyor runs into. Sampling • Cluster sampling: It is accomplished by dividing the population into groups (clusters), usually geographically. The clusters are randomly selected, and each element in the selected clusters are used. • Stratified sampling: It divides the population into groups, called strata. However, this time it is by some characteristic, not geographically. For instance, the population might be separated into males and females. A sample is taken from each of these strata using either random, systematic, or convenience sampling. Random and systematic errors • Random error can be conceptualized as sampling variability. • Bias (systematic error) is a difference between an observed value and the true value due to all causes other than sampling variability. • Biased sample: Biased sample is one, in which the method used to create the sample results in samples that are systematically different from the population. • Accuracy is a general term denoting the absence of error of all kinds. Sample size calculation • Law of Large Numbers: As the number of trials of a random process increases, the percentage difference between the expected and actual values goes to zero. • Application in biostatistics: Bigger sample size, smaller margin of error. • A properly designed study will include a justification for the number of experimental units (people/animals) being examined. • Sample size calculations are necessary to design experiments that are large enough to produce useful information and small enough to be practical. Sample size calculation • Generally, the sample size for any study depends on: – Acceptable level of confidence; – Power of the study; – Expected effect size and absolute error of precision; – Underlying scatter in the population. Large sample size High power Large effect Little scatter Small sample size Low power Small effect Lots of scatter Sample size calculation • For quantitative variables: Z  SD n 2 d 2 • Z – confidence level; • SD – standard deviation; • d – absolute error of precision. 2 Sample size calculation • For quantitative variables: Z  SD n 2 d 2 2 • A researcher is interested in knowing the average systolic blood pressure in pediatric age group at 95% level of confidence and precision of 5 mmHg. Standard deviation, based on previous studies, is 25 mmHg. 1.96  25 n  96.04 2 5 2 2 Sample size calculation • For qualitative variables: Z  p  (100  p) n 2 d 2 • Z – confidence level • p – expected proportion in population • d – absolute error of precision Sample size calculation • For qualitative variables: Z  p  (100  p) n 2 d 2 • A researcher is interested in knowing the proportion of diabetes patients having hypertension. According to a previous study, the actual number is no more than 15%. The researcher wants to calculate this size with a 5% absolute precision error and a 95% confidence level. 1.96 15  (100  15) n  195.92 2 5 2 Variables • Different types of data require different kind of analyses. Nominal Ordinal Interval Ratio Frequency distribution Yes Yes Yes Yes Median, percentiles No Yes Yes Yes Mean, standard deviation No No Yes Yes Ratio No No No Yes Levels of measurement • There are four levels of measurement: Nominal, Ordinal, Interval and Ratio. These go from lowest level to highest level. • Data is classified according to the highest level which it fits. Each additional level adds something the previous level did not have. – Nominal is the lowest level. Only names are meaningful here; – Ordinal adds an order to the names; – Interval adds meaningful differences; – Ratio adds a zero so that ratios are meaningful. Levels of measurement • Nominal scale – eg., genotype You can code it with numbers, but the order is arbitrary and any calculations would be meaningless. • Ordinal scale – eg., pain score from 1 to 10 The order matters but not the difference between values. • Interval scale – eg., temperature in C The difference between two values is meaningful. • Ratio scale – eg., height It has a clear definition of 0. When the variable equals 0, there is none of that variable. When working with ratio variables, but not interval variables, you can look at the ratio of two measurements. Central tendency and spread • Central tendency: Mean, mode and median • Spread: Range, interquartile range, standard deviation • Mistakes: – Focusing on only the mean and ignoring the variability – Standard deviation and standard error of the mean – Variation and variance • What is best to use in different scenarios? – Symmetrical data: mean and standard deviation – Skewed data: median and interquartile range Normal (Gaussian) distribution • When data are approximately normally distributed:  approximately 68% of the data lie within one SD of the mean;  approximately 95% of the data lie within two SDs of the mean;  approximately 99.7% of the data lie within three SDs of the mean. Normal (Gaussian) distribution • Central limit theorem: – Create a population with a known distribution that is not normal; – Randomly select many samples of equal size from that population; – Tabulate the means of these samples and graph the frequency distribution. • Central limit theorem states that if your samples are large enough, the distribution of the means will approximate a normal distribution even if the population is not Gaussian. • Mistakes: – Normal vs common (or disease free); – Few biological distributions are exactly normal. Confidence interval for the population mean • Population mean: point estimate vs interval estimate • Standard error of the mean – how close the sample mean is likely to be to the population mean. • Assumptions: a random representative sample, independent observations, the population is normally distributed (at least approximately). • Confidence interval depends on: sample mean, standard deviation, sample size, degree of confidence. • Mistakes: – 95% of the values lie within the 95% CI; – A 95% CI covers the mean ± 2 SD. Confidence interval for the population mean • The duration of time from first exposure to HIV infection to AIDS diagnosis is called the incubation period. The incubation periods (in years) of a random sample of 30 HIV infected individuals are: 12.0, 10.5, 9.5, 6.3, 13.5, 12.5, 7.2, 12.0, 10.5, 5.2, 9.5, 6.3, 13.1, 13.5, 12.5, 10.7, 7.2, 14.9, 6.5, 8.1, 7.9, 12.0, 6.3, 7.8, 6.3, 12.5, 5.2, 13.1, 10.7, 7.2. Calculate the 95% CI for the population mean incubation period in HIV. • X = 9.5 years; SD = 2.8 years • SEM = 0.5 years • 95% level of confidence => Z = 1.96 • µ = 9.5 ± (1.96 x 0.5) = 9.5 ± 1 years • 95% CI for µ is (8.5; 10.5 years) Confidence interval for the population mean • X = 9.5 years; SD = 2.8 years • SEM = 0.5 years • 95% level of confidence => Z = 1.96 – µ = 9.5 ± (1.96 x 0.5) = 9.5 ± 1 years – 95% CI for µ is (8.5; 10.5 years) • 99% level of confidence => Z = 2.58 – µ = 9.5 ± (2.58 x 0.5) = 9.5 ± 1.3 years – 99% CI for µ is (8.2; 10.8 years) Hypothesis testing Diabetes type 2 study • Experimental group: • Control group: Mean blood sugar level: 103 mg/dl Mean blood sugar level: 107 mg/dl Pancreatic cancer study • Experimental group: • Control group: 1-year survival rate: 23% 1-year survival rate: 20% Is there a difference? Hypothesis testing • The general idea of hypothesis testing involves: – Making an initial assumption; – Collecting evidence (data); – Based on the available evidence (data), deciding whether to reject or not reject the initial assumption. • Every hypothesis test – regardless of the population parameter involved – requires the above three steps. Null hypothesis – H0 • This is the hypothesis under test, denoted as H0. – The null hypothesis is usually stated as the absence of a difference or an effect; – The null hypothesis says there is no effect; – The null hypothesis is rejected if the significance test shows the data are inconsistent with the null hypothesis. Alternative hypothesis – H1 • This is the alternative to the null hypothesis. It is denoted as H', H1, or HA. – It is usually the complement of the null hypothesis; – If, for example, the null hypothesis says two population means are equal, the alternative says the means are unequal. Criminal trial • Criminal justice system assumes “the defendant is innocent until proven guilty”. That is, our initial assumption is that the defendant is innocent. • In the practice of statistics, we make our initial assumption when we state our two competing hypotheses – the null hypothesis (H0) and the alternative hypothesis (HA). Here, our hypotheses are: • H0: Defendant is not guilty (innocent); • HA: Defendant is guilty; • In statistics, we always assume the null hypothesis is true. That is, the null hypothesis is always our initial assumption. Criminal trial • The prosecution team then collects evidence with the hopes of finding “sufficient evidence” to make the assumption of innocence refutable. • In statistics, the data are the evidence. • The jury then makes a decision based on the available evidence: • If the jury finds sufficient evidence – beyond a reasonable doubt – to make the assumption of innocence refutable, the jury rejects H0 and deems the defendant guilty. We behave as if the defendant is guilty. • If there is insufficient evidence, then the jury does not reject H0. We behave as if the defendant is innocent. Making the decision • Recall that it is either likely or unlikely that we would observe the evidence we did given our initial assumption. • If it is likely, we do not reject the null hypothesis; • If it is unlikely, then we reject the null hypothesis in favor of the alternative hypothesis; • Effectively, then, making the decision reduces to determining “likely” or “unlikely”. Making the decision • In statistics, there are two ways to determine whether the evidence is likely or unlikely given the initial assumption: – We could take the “critical value approach” (favored in many of the older textbooks). – Or, we could take the “p-value approach” (what is used most often in research, journal articles, and statistical software). Making the decision • Suppose we find a difference between two groups in survival: – patients on a new drug have a survival of 15 months; – patients on the old drug have a survival of 18 months. • So, the difference is 3 months. Making the decision • Suppose we find a difference between two groups in survival: – patients on a new drug have a survival of 15 months; – patients on the old drug have a survival of 18 months. • So, the difference is 3 months. • Do we accept or reject the hypothesis of no true difference between the groups (the two drugs)? • Is a difference of 3 a lot, statistically speaking – a huge difference that is rarely seen? • Or is it not much – the sort of thing that happens all the time? Making the decision • A statistical test tells you how often you would get a difference of 3, simply by chance, if the null hypothesis is correct – no real difference between the two groups. • Suppose the test is done and its result is that p = 0.32. This means that you’d get a difference of 3 quite often just by the play of chance – 32 times in 100 – even when there is in reality no true difference between the groups. Making the decision • A statistical test tells you how often you’d get a difference of 3, simply by chance, if the null hypothesis is correct – no real difference between the two groups. • On the other hand if we did the statistical analysis and p = 0.0001, then we say that you would only get a difference as big as 3 by the play of chance 1 time in 10 000. That is so rarely that we want to reject our hypothesis of no difference: there is something different about the new therapy. Hypothesis testing • Somewhere between 0.32 and 0.0001 we may not be sure whether to reject the null hypothesis or not. • Mostly we reject the null hypothesis when, if the null hypothesis were true, the result we got would have happened less than 5 times in 100 by chance. This is the ‘conventional’ cutoff of 5% or p <0.05. • This cutoff is commonly used but it is arbitrary i.e. no particular reason why we use 0.05 rather than 0.06 or 0.048 or whatever. Hypothesis testing Decision: Reject null hypothesis Decision: Do not reject null hypothesis Null hypothesis is true Type I error No error Null hypothesis is false No error Type II error Type I and II errors • A type I error is the incorrect rejection of a true null hypothesis (also known as a “false positive” finding). • The probability of a type I error is denoted by the Greek letter  (alpha). • A type II error is incorrectly retaining a false null hypothesis (also known as a "false negative" finding). • The probability of a type II error is denoted by the Greek letter  (beta). Level of significance • Level of significance (α) – the threshold for declaring if a result is significant. If the null hypothesis is true, α is the probability of rejecting the null hypothesis. • α is decided as part of the research design, while pvalue is computed from data. • α = 0.05 is most commonly used. • Small α value reduces the chance of Type I error, but increases the chance of Type II error. • Trade-off based on the consequences of Type I (falsepositive) and Type II (false-negative) errors. Power • Power – the probability of rejecting a false null hypothesis. Statistical power is inversely related to β or the probability of making a Type II error (power is equal to 1 – β). • Power depends on the sample size, variability, significance level and hypothetical effect size. • You need a larger sample when you are looking for a small effect and when the standard deviation is large. Choosing a statistical test • Choice of a statistical test depends on: – Level of measurement for the dependent and independent variables – Number of groups or dependent measures – Number of units of observation – Type of distribution – The population parameter of interest (mean, variance, differences between means and/or variances) Choosing a statistical test • Multiple comparison – two or more data sets, which should be analyzed – repeated measurements made on the same individuals; – entirely independent samples. • Degrees of freedom – the number of scores, items, or other units in the data set, which are free to vary • One- and two tailed tests – one-tailed test of significance used for directional hypothesis; – two-tailed tests in all other situations. • Sample size – number of cases, on which data have been obtained – Which of the basic characteristics of a distribution are more sensitive to the sample size? Student t-test t X1  X 2 S S 2 x1 2 x2 2-sample t-test • Aim: Compare two means • Example: Comparing pulse rate in people taking two different drugs • Assumption: Both data sets are sampled from Gaussian distributions with the same population standard deviation • Effect size: Difference between two means • Null hypothesis: The two population means are identical • Meaning of P value: If the two population means are identical, what is the chance of observing such a difference (or a bigger one) between means by chance alone? Paired t-test • Aim: Compare a continuous variable before and after an intervention • Example: Comparing pulse rate before and after taking a drug • Assumption: The population of paired differences is Gaussian • Effect size: Mean of the paired differences • Null hypothesis: The population mean of paired differences is zero • Meaning of P value: If there is no difference in the population, what is the chance of observing such a difference (or a bigger one) between means by chance alone? One-way ANOVA • Aim: Compare three or more means • Example: Comparing pulse rate in 3 groups of people, each group taking a different drug • Assumption: All data sets are sampled from Gaussian distributions with the same population standard deviation • Effect size: Fraction of the total variation explained by variation among group means • Null hypothesis: All population means are identical • Meaning of P value: If the population means are identical, what is the chance of observing such a difference (or a bigger one) between means by chance alone? Parametric and non-parametric tests • Parametric test – the variable we have measured in the sample is normally distributed in the population to which we plan to generalize our findings • Non-parametric test – distribution free, no assumption about the distribution of the variable in the population Normality test • Normality tests are used to determine if a data set is modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed. • In descriptive statistics terms, a normality test measures a goodness of fit of a normal model to the data – if the fit is poor then the data are not well modeled in that respect by a normal distribution, without making a judgment on any underlying variable. • In frequentist statistics statistical hypothesis testing, data are tested against the null hypothesis that it is normally distributed. Normality test • Graphical methods • An informal approach to testing normality is to compare a histogram of the sample data to a normal probability curve. The empirical distribution of the data (the histogram) should be bell-shaped and resemble the normal distribution. This might be difficult to see if the sample is small. Normality test • Frequentist tests • Tests of univariate normality include the following: – D'Agostino's K-squared test – Jarque–Bera test – Anderson–Darling test – Cramér–von Mises criterion – Lilliefors test – Kolmogorov–Smirnov test – Shapiro–Wilk test – Etc. Normality test • Kolmogorov–Smirnov test • K–S test is a nonparametric test of the equality of distributions that can be used to compare a sample with a reference distribution (1-sample K–S test), or to compare two samples (2-sample K–S test). • K–S statistic quantifies a distance between the empirical distribution of the sample and the cumulative distribution of the reference distribution, or between the empirical distributions of two samples. • The null hypothesis is that the sample is drawn from the reference distribution (in the 1-sample case) or that the samples are drawn from the same distribution (in the 2sample case). Normality test • Kolmogorov–Smirnov test • In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic.

Stats 101.2

Related documents

Products

Support

Stats 101.2

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib