Introduction to Biostatistics for Clinical and Translational Researchers KUMC Departments of Biostatistics & Internal Medicine University of Kansas Cancer Center FRONTIERS: The Heartland Institute of Clinical and Translational Research Course Information Jo A. Wick, PhD Office Location: 5028 Robinson Email: jwick@kumc.edu Lectures are recorded and posted at http://biostatistics.kumc.edu under ‘Events & Lectures’ Objectives Understand the role of statistics in the scientific process and how it is a core component of evidence-based medicine Understand features, strengths and limitations of descriptive, observational and experimental studies Distinguish between association and causation Understand roles of chance, bias and confounding in the evaluation of research Course Calendar July 5: Introduction to Statistics: Core Concepts July 12: Quality of Evidence: Considerations for Design of Experiments and Evaluation of Literature July 19: Hypothesis Testing & Application of Concepts to Common Clinical Research Questions July 26: (Cont.) Hypothesis Testing & Application of Concepts to Common Clinical Research Questions “No amount of experimentation can ever prove me right; a single experiment can prove me wrong.” Albert Einstein (1879-1955) Vocabulary Basic Concepts Statistics is a collection of procedures and principles for gathering data and analyzing information to help people make decisions when faced with uncertainty. In research, we observe something about the real world. Then we must infer details about the phenomenon that produced what we observed. A fundamental problem is that, very often, more than one phenomenon can give rise to the observations at hand! Example: Infertility Suppose you are concerned about the difficulties some couples have in conceiving a child. It is thought that women exposed to a particular toxin in their workplace have greater difficulty becoming pregnant compared to women who are not exposed to the toxin. You conduct a study of such women, recording the time it takes to conceive. Example: Infertility Of course, there is natural variability in time-to- pregnancy attributable to many causes aside from the toxin. Nevertheless, suppose you finally determine that those females with the greatest exposure to the toxin had the most difficulty getting pregnant. Example: Infertility But what if there is a variable you did not consider that could be the cause? No study can consider every possibility. Example: Infertility It turns out that women who smoke while they are pregnant reduce the chance their daughters will be able to conceive because the toxins involved in smoking effect the eggs in the female fetus. If you didn’t record whether or not the females had mothers who smoked when they were pregnant, you may draw the wrong conclusion about the Smoking industrial toxin. Behaviors of Mother Environmental Toxins Natural Variability Fertility Example: Infertility Lurking (Confounding) Variable → Bias Unexposed to Toxin ? Exposed to Toxin Majority unexposed to smoke in womb Time-toconceive measured ? Majority exposed to smoke in womb Prolonged time-toconceive found Type I Error! Example: Infertility Lurking (Confounding) Variable → “Noise” Unexposed to Toxin ? Exposed to Toxin Some smoking exposure Time-toconceive measured ? Some smoking exposure An insignificant change in time-toconceive found Type II Error! The Role of Statistics The conclusions (inferences) we draw always come with some amount of uncertainty due to these unobserved/unanticipated issues. We must quantify that uncertainty in order to know how “good” our conclusions are. This is the role that statistics plays in the scientific process. P-values (significance levels) Level of confidence Standard errors of estimates Confidence intervals Proper interpretation (association versus causation) The Role of Statistics Scientists use statistical inference to help model the uncertainty inherent in their investigations. sam ple x1 x2 ? population m odel (im agination) x3 population (reality) X S xn goal: statistical in feren ce (u n certa in ty m ea su red b y p ro b a b ility) histogram (observation) Evidence-based Medicine Evidence-based practice in medicine involves gathering evidence in the form of scientific data. applying the scientific method to inform clinical practice, establishment or development of new therapies, devices, programs or policies aimed at improving health. Types of Evidence Scientific evidence: “empirical evidence, gathered in accordance to the scientific method, which serves to support or counter a scientific theory or hypothesis” Type I: descriptive, epidemiological Type II: intervention-based Type III: intervention- and context-based Evidence-based Medicine Evidence-based practice results in a high likelihood of successful patient outcomes and more efficient use of health care resources. The Scientific Method Observe Revise Experiment Clinical Evaluation Evidence (Data) Revise Design & Hypothesis Run Experiment Types of Studies Purpose of research 1) To explore 2) To describe or classify 3) To establish relationships 4) To establish causality Ambiguity Control Strategies for accomplishing these purposes: 1) Naturalistic observation 2) Case study 3) Survey 4) Quasi-experiment 5) Experiment Generating Evidence Studies Descriptive Studies Populations Case Reports Analytic Studies Individuals Case Series Cross Sectional Observational Case Control Complexity and Confidence Cohort Experimental RCT Observation versus Experiment A designed experiment involves the investigator assigning (preferably randomly) some or all conditions to subjects. An observational study includes conditions that are observed, not assigned. Example: Heart Study Question: How does serum total cholesterol vary by age, gender, education, and use of blood pressure medication? Does smoking affect any of the associations? Recruit n = 3000 subjects over two years Take blood samples and have subjects answer a CVD risk factor survey Outcome: Serum total cholesterol Factors: BP meds (observed, not assigned) Confounders? Example: Diabetes Question: Will a new treatment help overweight people with diabetes lose weight? N = 40 obese adults with Type II (non-insulin dependent) diabetes (20 female/20 male) Randomized, double-blind, placebo-controlled study of treatment versus placebo Outcome: Weight loss Factor: Treatment versus placebo How to Talk to a Statistician? “It’s all Greek to me . . .” Why Do I Need a Statistician? Planning a study Proposal writing Data analysis and interpretation Presentation and manuscript development When Should I Seek a Statistician’s Help? Literature interpretation Defining the research questions Deciding on data collection instruments Determining appropriate study size What Does the Statistician Need to Know? General idea of the research Specific Aims and hypotheses would be ideal What has been done before Literature review! Outcomes under consideration Study population Drug/Intervention/Device Rationale for the study Budget constraints “No amount of experimentation can ever prove me right; a single experiment can prove me wrong.” Albert Einstein (1879-1955) Vocabulary Hypotheses: a statement of the research question that sets forth the appropriate statistical evaluation Null hypothesis “H0”: statement of no differences or association between variables Alternative hypothesis “H1”: statement of differences or association between variables Disproving the Null If someone claims that all swans are white, confirmatory evidence (in the form of lots of white swans) cannot prove the assertion to be true. Contradictory evidence (in the form of a single black swan) makes it clear the claim is invalid. The Scientific Method Observation Hypothesis Experiment Results Evidence supports H Evidence inconsistent with H Revise H Hypothesis Testing By hypothesizing that the mean response of a P x population is 26.3, I am saying that I expect the mean of a sample drawn from that population to be ‘close to’ 26.3: 24.5 25.0 25.5 26.0 26.5 x 27.0 27.5 28.0 Hypothesis Testing What if, in collecting data to test my hypothesis, I P x observe a sample mean of 26? What conclusion might I draw? 24.5 25.0 25.5 26.0 26.5 x 27.0 27.5 28.0 Hypothesis Testing What if, in collecting data to test my hypothesis, I P x observe a sample mean of 27.5? What conclusion might I draw? 24.5 25.0 25.5 26.0 26.5 x 27.0 27.5 28.0 Hypothesis Testing What if, in collecting data to test my hypothesis, I P x observe a sample mean of 30? What conclusion might I draw? 24.5 25.0 25.5 26.0 26.5 x 27.0 27.5 28.0 Hypothesis Testing If the observed sample mean seems odd or unlikely under the assumption that H0 is true, then we reject H0 in favor of H1. We typically use the p-value as a measure of the strength of evidence against H0. P x What is a P-value? A p-value is thetheprobability area under of the getting curve a The tail of the distribution it is in is for Ifvalues sample H states mean of the that assample favorable the mean mean orismore more greater determined If H1 1Null states by that H1. the mean is less than distribution extreme favorable than If H 26.3, than states tothewhat Hp-value than we theisobserved mean what as shown. iswas different in 1that 26.3, 1the p-value is the area to the left the sample observed, than assuming 26.3, we actually the Hp-value is true.is twice the 0 gathered. of the observed sample mean. area shown, accounting for the area in both tails. Observed sample mean p-value 24.5 25.0 25.5 26.0 26.5 x 27.0 27.5 28.0 Vocabulary One-tailed hypothesis: outcome is expected in a single direction (e.g., administration of experimental drug will result in a decrease in systolic BP) Two-tailed hypothesis: the direction of the effect is unknown (e.g., experimental therapy will result in a different response rate than that of current standard of care) Vocabulary Type I Error (α): a true H0 is incorrectly rejected “An innocent man is proven GUILTY in a court of law” Commonly accepted rate is α = 0.05 Type II Error (β): failing to reject a false H0 “A guilty man is proven NOT GUILTY in a court of law” Commonly accepted rate is β = 0.2 Power (1 – β): correctly rejecting a false H0 “Justice has been served” Commonly accepted rate is 1 – β = 0.8 Decisions Truth Conclusion H1 H0 H1 Correct: Power Type I Error H0 Type II Error Correct Statistical Power Primary factors that influence the power of your study: Effect size: as the magnitude of the difference you wish to find increases, the power of your study will increase Variability of the outcome measure: as the variability of your outcome decreases, the power of your study will increase Sample size: as the size of your sample increases, the power of your study will increase Statistical Power Secondary factors that influence the power of your study: Dropouts Nuisance variation Confounding variables Multiple hypotheses Post-hoc hypotheses Hypothesis Testing We will cover these concepts more fully when we discuss Hypothesis Testing and Quality of Evidence Descriptive Statistics Field of Statistics Statistics Descriptive Statistics Experimental Design Inferential Statistics Methods for processing, summarizing, presenting and describing data Techniques for planning and conducting experiments Evaluation of the information generated by an experiment or through observation Field of Statistics Statistics Descriptive Graphical Experimental Design Numerical Inferential Estimation Hypothesis Testing Field of Statistics Descriptive statistics Summarizing and describing the data Uses numerical and graphical summaries to characterize sample data Inferential statistics Uses sample data to make conclusions about a broader range of individuals—a population—than just those who are observed (a sample) population The principal way to guarantee that the sample sample Field of Statistics Experimental Design Formulation of hypotheses Determination of experimental conditions, measurements, and any extraneous conditions to be controlled Specification of the number of subjects required and the population from which they will be sampled Specification of the procedure for assigning subjects to experimental conditions Determination of the statistical analysis that will be performed Descriptive Statistics Descriptive statistics is one branch of the field of Statistics in which we use numerical and graphical summaries to describe a data set or distribution of observations. Statistics Descriptive Graphs Statistics Inferential Hypothesis Testing Interval Estimates Types of Data All data contains information. It is important to recognize that the hierarchy implied in the level of measurement of a variable has an impact on (1) how we describe the variable data and (2) what statistical methods we use to analyze it. Levels of Measurement Nominal: difference discrete qualitative Ordinal: difference, order Interval: difference, order, equivalence of intervals continuous quantitative Ratio: difference, order, equivalence of intervals, absolute zero Types of Data NOMINAL ORDINAL INTERVAL RATIO Information increases Ratio Data Ratio measurements provide the most information about an outcome. Different values imply difference in outcomes. 6 is different from 7. Order is implied. 6 is smaller than 7. Ratio Data Intervals are equivalent. The difference between 6 and 7 is the same as the difference between 101 and 102. Zero indicates a lack of what is being measured. If item A weighs 0 ounces, it weighs nothing. Ratio Data Ratio measurements provide the most information about an outcome. Can make statements like: “Person A (t = 10 minutes) took twice as long to complete a task as Person B (t = 5 minutes).” This is the only type of measurement where statements of this nature can be made. Examples: age, birth weight, follow-up time, time to complete a task, dose Interval Data Interval measurements are one step down on the “information” scale from ratio measurements. Difference and order are implied and intervals are equivalent. BUT, zero no longer implies an absence of the outcome. What is the interpretation of 0C? 0K? The Celsius and Fahrenheit scales of temperature are interval measurements, Kelvin is a ratio measurement. Interval Data Interval measurements are one step down on the “information” scale from ratio measurements. You can tell what is better, and by how much, but ratios don’t make sense due to the lack of a ‘starting point’ on the scale. 60F is greater than 30F, but not twice as hot since 0F doesn’t represent an absence of heat. Examples: temperature, dates Ordinal Data Ordinal measurements are one step down on the “information” scale from interval measurements. Difference and order are implied. BUT, intervals are no longer equivalent. For instance, the differences in performance between the 1st and 2nd ranked teams in basketball isn’t necessary equivalent to the differences between the 2nd and 3rd ranked teams. The ranking only implies that 1st is better than 2nd, 2nd is better than 3rd, and so on . . . but it doesn’t try to quantify the ‘betterness’ itself. Ordinal Data Ordinal measurements are one step down on the “information” scale from interval measurements. Examples: Highest level of education achieved, tumor grading, survey questions (e.g., likert-scale quality of life) Nominal Data Nominal measurements collect the least amount of information about the outcome. Only difference is implied. Observations are classified into mutually exclusive categories. Examples: Gender, ID numbers, pass/fail response Levels of Measurement It is important to recognize that the hierarchy implied in the level of measurement of a variable has an impact on (1) how we describe the variable data and (2) what statistical methods we use to analyze it. The levels are in increasing order of mathematical structure—meaning that more mathematical operations and relations are defined—and the higher levels are required in order to define some statistics. Levels of Measurement At the lower levels, assumptions tend to be less restrictive and the appropriate data analysis techniques tend to be less sensitive. In general, it is desirable to have a higher level of measurement. A summary of the appropriate statistical summaries and mathematical relations or operations is given in the next table. Levels of Measurement Level Statistical Summary Mathematical Relation/Operation Nominal Mode one-to-one transformations Ordinal Median monotonic transformations Interval Mean, Standard Deviation positive linear transformations Ratio Geometric Mean, Coefficient of Variation multiplication by c 0 We must know where an outcome falls on the measurement scale--this not only determines how we describe the data (descriptive statistics) but how we analyze it (inferential statistics). Using Graphs to Describe Data Nominal and ordinal measurements are discrete and qualitative, even if they are represented numerically. Rank: 1, 2, 3 Gender: male = 1, female = 0 We typically use frequencies, percentages, and proportions to describe how the data is distributed among the levels of a qualitative variable. Bar and pie charts are even more useful. Example: Myopia A survey of n = 479 children found that those who had slept with a nightlight or in a fully lit room before the age of 2 had a higher incidence of nearsightedness later in childhood. No Myopia Darkness 155 (90%) Nightlight 153 (66%) Full Light 34 (45%) Total 342 (71%) Myopia 15 (9%) 72 (31%) 26 (48%) 123 (26%) High Myopia 2 (1%) 7 (3%) 5 (7%) 14 (3%) Total 172 (100%) 232 (100%) 75 (100%) 479 (100%) Example: Myopia High Some Full Light None Nightlight Darkness 0 10 20 30 40 50 60 70 80 90 100 Example: Myopia As the amount of sleep time light increases, the incidence of myopia increases. This study does not prove that sleeping with the light causes myopia in more children. There may be some confounding factor that isn’t measured or considered-possibly genetics. Children whose parents have myopia are more likely to suffer from it themselves. It’s also possible that those parents are more likely to provide light while their children are sleeping. Example: Nausea How many subjects experienced drug-related nausea? Dose Nausea No Nausea 0 mg 0 9 10 mg 1 10 20 mg 3 10 50 mg 3 11 12 10 8 6 4 2 0 Nausea 0 mg 10 mg No Nausea 20 mg 50 mg Example: Nausea With unequal sample sizes across doses, it is more meaningful to use percent rather than 100% frequency. Dose 0 mg 10 mg 20 mg 50 mg Nausea 0 (0%) 1 (9%) 3 (23%) 3 (21%) No Nausea 9 (100%) 10 (91%) 10 (77%) 11 (79%) 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Nausea 0 mg 10 mg No Nausea 20 mg 50 mg 30 Bar & Pie Charts Percent Caucasian 30 African American 20 Hispanic 17 Asian American 13 Native American Ethnicity 13 Other 7 5 10 15 20 25 Race Other 0 Native American Caucasian African American Hispanic Asian American Native American Caucasian Other Asian American Hispanic African American Using Graphs to Describe Data Interval and Ratio variables are continuous and quantitative and can be graphically and numerically represented with more sophisticated mathematical techniques. Height Survival Time We typically use means, standard deviations, medians, and ranges to describe how the variables tend to behave. Histograms and boxplots are even more useful. Example: Time-to-death Suppose that we record the variable x = time-toTime 20 10 0 Frequency 30 40 death of n = 100 patients in a study. 0 5 10 x 15 Example: Time-to-death We can quickly observe several characteristics of the data from the histogram: For most subjects, death occurred between 0 and 5 months For a few subjects, death occurred past 15 months From this picture, we may wish to identify the distinguishing characteristics of the individuals with unusually long times. Example: Weight Suppose we record the weight in pounds of n = 100 subjects in a study. IQ R Q1 Q2 Q1 - 1.5 IQ R Q3 Q 3 1.5 IQ R * * o u tlie r o u tlie r x Example: Tooth Growth Boxplots represent the same information, but are more useful for comparing characteristics between several data sets. Right: distributions of tooth growth for two supplements and three dose levels Using Numbers to Describe Data Nominal and ordinal measurements are discrete and qualitative, even if they are represented numerically. Rank: 1, 2, 3 Gender: male = 1, female = 0 Interval and Ratio variables are continuous and quantitative and can be graphically and numerically represented with more sophisticated mathematical techniques. Height Survival Time Using Numbers to Describe Data Nominal and ordinal measurements are qualitative, even if they are represented numerically. We typically describe qualitative data using frequencies and percentages in tables. Measures of central tendency and variability don’t make as much sense with categorical data, though the mode can be reported. Describing Data Interval and ratio measurements are quantitative. When dealing with a quantitative measurements, we typically describe three aspects of its distribution. Central tendency: a single value around which data tends to fall. Variability: a value that represents how scattered the data is around that central value--large values are indicative of high scatter. We also want to describe the shape of the distribution of the sample data values. Central Tendency Mean: arithmetic average of data Median: approximate middle of data Mode: most frequently occurring value location Central Tendency Mode, Mo The most frequently occurring value in the data set. May not exist or may not be uniquely defined. It is the only measure of central tendency that can be used with nominal variables, but it is also meaningful for quantitative variables that are inherently discrete (e.g., performance of a task). Its sampling stability is very low (i.e., it varies greatly from sample to sample). Central Tendency: Mode 0.10 0.05 0.00 Density 0.15 0.20 Histogram of x 0 5 10 x Mo 15 Central Tendency: Mode Females Mo Males 0 2 4 6 8 10 12 14 16 Central Tendency Median, M The middle value (Q2, the 50th percentile) of the variable. It is appropriate for ordinal measures and for skewed interval or ratio measures because it isn’t affected by extreme values. It’s unaffected (robust to outliers) because it takes into account only the relative ordering and number of observations, not the magnitude of the observations themselves. It has low sampling stability. Example: Median Suppose we have a set of observations: 1 2 2 4 The median for this set is M = 2. Now suppose we accidentally mismeasured the last observation: 1 2 2 9 The median for this new set is still M = 2. Central Tendency: Median 0.10 0.05 0.00 Density 0.15 0.20 Histogram of x 0 5 10 x Mo M 15 Central Tendency Mean, x The arithmetic average of the variable x. It is the preferred measure for interval or ratio variables with relatively symmetric observations. It has good sampling stability (e.g., it varies the least from sample to sample), implying that it is better suited for making inferences about population parameters. It is affected by extreme values because it takes into account the magnitude of every observation. It can be thought of as the center of gravity of the variable’s distribution. Example: Mean Suppose we have a set of observations: 1 2 2 4 The median for this set is M = 2, the mean is x 2 .2 5 . Now suppose we accidentally mismeasured the last observation: 1 2 2 9 The median for this new set is still M = 2, but the new mean is x 3 .5 . Central Tendency: Median 0.10 0.05 0.00 Density 0.15 0.20 Histogram of x 0 5 Mo M 10 x x 15 Variability Range: difference between min and max values Standard deviation: measures the spread of data about the mean, measured in the same units as the data spread Variability Measures of variability depict how similar observations of a variable tend to be. Variability of a nominal or ordinal variable is rarely summarized numerically. The more familiar measures of variability are mathematical, requiring measurement to be of the interval or ratio scale. Variability Range, R The distance from the minimum to the maximum observation. Easy to calculate. Influenced by extreme values (outliers). 1 2 3 4 10 R = 10 - 1 = 9 1 2 3 4 100 R = 100 - 1 = 99 Variability Interquartile Range, IQR The distance from the 1st quartile (25th percentile) to the 3rd quartile (75th percentile), Q3 - Q1. Unlike the range, IQR is not influenced by extreme values. Variability: IQR IQ R Q1 Q2 Q1 - 1.5 IQ R Q3 Q 3 1.5 IQ R * * o u tlie r o u tlie r x Variability Standard deviation, s Represents the average spread of the data around the mean. Expressed in the same units as the data. “Average deviation” from the mean. Variability Variance, s2 The standard deviation squared. “Average squared deviation” from the mean. Shape shape Distribution Shapes Summary Basic Concepts Definition and role of statistics Vocabulary lesson • Brief introduction to Hypothesis Testing • Brief introduction to Design concepts Descriptive Statistics Levels of Measurement Graphical summaries Numerical summaries Next time: Study Design Considerations and Quality of Evidence