Overview of Research Methods in Dentistry Robert Weyant, DMD DrPH Department of Dental Public Health and Information Management University of Pittsburgh What is “Causation” • • • • Koch-Henle postulates Bradford-Hill 'criteria' inductionist, refutationist, or hypotheticodeductivist view Provides the basis for “intervention” "Causality. There is no escape from it, we are forever slaves to it. Our only hope, our only peace is to understand it, to understand the why” Larry, .; Andy Wachowski, . The Matrix: Reloaded. 2 Hills Criteria of Causation • • Austin Bradford Hill (1897-1991), a British medical statistician as a way of determining the causal link between a specific factor (e.g., cigarette smoking) and a disease (such as emphysema or lung cancer). Hill's Criteria form the basis of modern epidemiological research, which attempts to establish scientifically valid causal connections (disease – and its cause) • • • • • • • • • Temporal Relationship Strength Dose-Response Relationship Consistency Plausibility Consideration of Alternate Explanations Experiment Specificity Coherence 3 Systems • Deterministic Systems • • • Events are part of an unbroken chain of prior occurrences. Outcomes occur predictably Newtonian Physics • Stochastic Systems • • • Outcomes are computationally and practically unpredictable. Present state does not fully determine the next state Biology and medicine are stochastic 4 Statistical Causality • Observational studies (like counting cancer cases among smokers and among non-smokers and then comparing the two) can give hints, but can never establish cause and effect. • • The gold standard for causation here is the randomized • • • Hypothesis generation. experiment: One limitation of experiment is they do a good job of testing for the presence of some causal effect they do less well at estimating the size of that effect in a population of interest. Subject selection may lack generalizability. . Med Exp Outcome 5 Research Designs In clinical research 6 Essentials of Research Design • • • Basic research Clinical research (often experimental) Epidemiological research (often observational, know denominator) • Health services research limited to human research (in vivo) 7 What are our research (and clinical) concerns? • Exposure • • Outcome • • • Good or bad: Chemical, biological, psychological, educational, etc. Good or bad: disease, cure, improved attitude, longer life, etc. We generally know one and want to measure the other Concerns are that we measure both accurately and understand what population is represented. 8 Classification Schemes • • • Descriptive vs. Analytical Experimental vs. Observational Time Referenced • Prospective vs. Cross-sectional vs. Retrospective 9 Describe or Analyze? • Descriptive • simply describe what was seen (common in surveys). Prevalence of various conditions. • PREVALENCE: the proportion of the population who exhibit the condition of interest. • Analytic • attempt to determine the associations between disease and possible risk factors/determinates and to quantify risk. (common for experimental designs and search for causality) 10 Experiment or Observe? • Experimentation is defined by the degree of control or manipulation the investigator has over the study conditions. • • In a non-experimental (observational) design the investigator has less control over the study conditions. The consequences of study design are in the limitations put upon the interpretation of the results of the study. 11 Time ? $ Retrospective Prospective Case control All experimental and cohort (obs) Cross Sectional Time 12 Classification according to CONTROL / INTERVENTION • Experimental Designs (Classic Design = RCT) • • • • • Prospective Investigator alters the conditions understudy There is a true control group Randomization MUST occur Observational • • • May be prospective/retrospective/crosssectional No control No intervention 13 Issues of concern 1. 2. 3. 4. 5. 6. 7. 8. Population Control group Sample size Placebo Control of Operational Procedures Validity and Reliability of Measures Duration Statistical Analysis 14 1. Population (Relevance) • When you read a study you must ask: • is the population representative of something I care about? • Is it appropriate to answer the question. 15 How do people get into a “study”? • • • • They volunteer Often they are in the right place at the right time They have the right disease (severity) or exposure. Often “clinic” based studies are very poorly generalized to larger populations. 16 Why people don’t get into a study • • • • Too sick or not sick enough Wrong gender, race, etc…. Don’t live in the right place. Don’t know about the study. 17 Where do research “subjects” come from? Generalizability of Results Population of interest Present for study Eligible Consent/Enroll Complete study and can be found for follow up (in community) Barriers Lack of knowledge Referral Issues Fear Transportation Barriers Wrong disease severity Demographic issues Barriers Fear Transportation Not willing to be “randomized” Barriers Not adhere to protocol Lost to follow up (move, die) Is the study relevant and valid? • External validity • • • Do the study subjects represent a definable population of interest - i.e., “your patients”? Hence, is it relevant Internal validity • • Is the study well designed and analyzed? Hence, is it valid 19 2. Sample Size (did you look at enough people…) • • There must be enough people in the study to ensure that the conclusions are valid. The likelihood that a finding will be spurious or incorrect decreases as you increase the number of individuals in the study. POWER: the ability of a test to detect a significant difference when one exists. Be particularly attentive to negative studies. • Function of effect size, variance, sample size 20 3. Control Group ? (it worked!……compared to what?) • • If we are to conclude that an intervention has an effect, then we must be sure that the group with and without the intervention were similar before the study began/and remained so except for the intervention. If not, bias can result in spurious conclusions. 21 4. Placebo ? (I feel much better...what was that?) • • Placebo is a material, formulation, intervention that is similar to the test product, but without the active ingredient. There is a well documented placebo effect in many situations. • Up to 70% in some studies. 22 5. Control of operational procedures (What exactly did you do, doctor?) • When reading a study for your own use, it is important that the authors explain precisely what they did. This allows the reader to generalize to his/her own situation and helps to assess relevance 23 6. Reliability of measures (That was great…now do it again?) • One of the most important areas in any study: did the effect occur and how do we know. Someone measured it. We must be able to determine that the investigator(s) measured it accurately, repeatable. • • • INTRA-RATER reliability (same cases over time) INTER-RATER reliability (comparison of same cases among raters) Instrumentation 24 7. Duration of study (over so fast?) • • • • • Did the trial run long enough to measure the desired effect. Caries trials 2-3 years Calculus-preventing agents 90 days Orthodontic outcomes (20 years?) Implants (5 years) 25 8. Statistical Analysis (So, did I find anything “significant?”) • • • Where they appropriate to the design, quality of data, intent of investigators. Statistical analysis is based on type of data (nominal, ordinal, ratio). Type of question being asked • • • Summarize Difference between groups Effect size or risk 26 Threats to Validity of a Study (Nice result, but what about…) • • • Bias: Any systematic error in a study which results in an incorrect estimate of the association between disease and exposure. Confounding: results when there is a mixing of the effect of the exposure and disease with a “third factor” Chance: The exposure:disease relationship is spurious as the result of random variation in sampling. 27 Types of Bias • Selection • • • • • Non-representative sample Non-comparable case/control groups Loss to follow-up Differential survival Observation (Misclassification error) • • • Disease Classification Exposure Classification Instrumentation 28 Confounding • • • • Definition: the bias in the (crude) diseaseexposure estimate that can result when the exposure-disease relationship is mixed up with the effect of “extraneous variables” Confounding affects our understanding of the “true” disease-exposure relationship The determination is “data-based” Two methods • • Stratification mulitvariate analysis 29 Chance • • That’s what we have statistics for - to quantify the chance. Type 1 (alpha) error (p-value). 30 Research Designs Case-control study yes Observational studies no Do we know disease status of patients before study no Will observations be made at more than one time Cross sectional study yes Cohort study no Alter the conditions under study yes yes True experiment Is there to be a control group Experimental studies no Quasi Experiment Observational Designs Cross Sectional Case Control (retrospective) Cohort (prospective) 32 Cross Sectional Study • • • Measure, Classify, Compare Used for questionnaires, surveys, prevalence estimates, to generate hypotheses. Everything occurs “at once”. 33 Cross Sectional Design 1. Select Pop of interest 2. Select Sample 3. Assess population for both disease (outcome) status Study Sample Population of Interest Disease Positive Disease Negative and risk factor (exposure) status RF + RF RF + RF - Analyze using correlational statistics but causation not “provable” due to lack of temporal association Cross-Sectional Design Advantage: Disadvantage: 1. Quick and Low Cost 1. Subject selection may reflect selection bias (volunteers, hospital patients) 2. Evaluate a large number of variables 3. Enroll a large number of Subjects Common Uses: • Questionnaires and Surveys • Prevalence studies • Hypothesis Development 2. Is difficult to identify cause and effect relationship. Case Control • • • Select cases and controls Retrospective assessment of risk factors Quantify exposure. Since no denominator, only relative rates. 36 Case-Control Design 1. Select group of subjects WITH disease/outcome of interest = CASES RF + 3. Measure (retrospectively) risk factors of interest. RF - Cases RF + RF 4. Analyze using strength of association measures. Controls 2. Select group of subjects WITHOUT disease/outcome = CONTROLS Selection of controls crucial Common Use: Case selection also must be carefully considered Rare Disease (e.g., birth defects) Long Latency (e.g., cancer) Case-Control Design Advantages Disadvantages 1. not dependent on natural frequency of disease (thus used to study rare diseases) 1. case selection may be problematic 2. well suited to study diseases with long latency 2. controls may not be representative of same population as cases in terms of disease risk or confounders 3. requires comparatively few cases (2:1 or 3:1 matching) 3. investigators may be biased when know of disease status of subjects 4. not dependent on previously established cohort 4. subjects may bias answers (recall) due to disease status 5. allows study of multiple potential causes of disease 5. factors which are used to match are removed from analysis 6. relatively low cost and quick 6. incidence, prevalence, RR and AR can't be calculated since no "population at risk" denominator is available 7. ethical: disease has already occurred Cohort Design • Select two or more groups (cohorts) that are free of disease but differ on their exposure status. • • • May start with one heterogeneous cohort. Cohorts have a “denominator” which allows the calculation of true rates. Useful when “exposure” varies over time. 39 Cohort Study Design 1. Select Population of interest 2. Recruit sample WITHOUT disease(s) of interest and measure risk factors Disease Free Study Sample (baseline exam) Population of Interest 3. Recall cohort periodically and remeasure risk factors and disease status Visit 2 Visit 3 Visit n Prospective, Observational Design. Uses: Time • Determining/quantifying risk factors • Developing new etiological theory •Establishing causality Cohort Design Advantages Disadvantages 1. allows risk to be expressed as incidence 1. inefficient for study of rare disease 2. certain biases are reduced: 3. selection bias not controlled exposure status disease status 3. subject characteristics can be related to more than one outcome 2. assessment of relationships limited to those defined at beginning of study 4. loss to follow-up common 5. subjects may change in regards to characteristics (i.e. exposure status) 6. bias may be present if the characteristic studied influences surveillance and if surveillance influences detection of outcome (Berkson's fallacy) 7. expensive and time consuming Experimental Designs Clinical Trials (RCTs) Field Trials 42 Clinical Trials • • • • • • Prospective controlled experiment of human subjects to assess intervention for a specific disease. Asks an important research question Clinical event or outcome Done in clinical or medical setting Evaluates one or more interventions compared with “standard treatment” Informed consent and DSMB required 43 Phases of Clinical Trials • • • • Phase I: dose finding Phase II: efficacy at fixed dose Phase III: comparing treatment (RCT) Phase IV: late/uncommon effects 44 Uses of Clinical Trails (experimental studies) • • • Test new drug therapy Test new surgical interventions Test educational/programatic interventions 45 Randomized Clinical Trial Design 1. Recruit individuals WITH disease. 2. Randomize into treatment arms Standard Treatment Study Sample with disease 3. Follow up to assess outcomes Outcomes Randomization Ethical only to the degree that differences in treatment are unknown at time of study initiation (equipoise). Requires DSMB. New Treatment Outcomes Randomization is essential, and along with strict control of experimental conditions allows for minimal bias Excellent internal validity (but possibly low external validity) Experimental Design Advantages Disadvantages: 1. investigator directly controls assignment to study groups 1. not immune to problems encountered with other designs: (non-compliance, incomplete follow-up, biased observation) 2. investigator directly controls exposure to agent. 2. may have low external validity 3. random assignment measures can control extraneous 3. may not be feasible for studies of factors. disease etiology (ethical considerations, rare disease) 4. blinding of evaluators may be possible 4. may not be feasible for effective disease prevention exists. (can't withhold treatment) 5. Can be very expensive Efficacy vs. Effectiveness • • Efficacy is the potential to provide a clinical benefit. Measured in CTs • • Effectiveness is the benefit provided in routine “real world” use. Measured in surveillance systems (registries), after market incident reports, etc. 48 Hierarchy of Research Designs • • • • • • • • Experimental designs Cohort studies Case-control designs Human trial without controls Cross-sectional designs Descriptive studies Case reports Personal opinion Based on control of bias and confounding and ability to make causal arguments 49 RCT’s Strengths • Minimally biased design • • • • Randomization Control of extraneous variables Prospective (causality established) Design issues determined prior to initiation of study. 50 Problems with (Dental) RCTs • • Difficult to randomize Ethical Concerns • • • • Principle of Equipoise involves the ethical treatment of human subjects in experimental conditions. A subject should only be submitted to a randomized, controlled design if there is substantial uncertainty about which of the treatments would benefit the subject most. RCTs should not be done when patient preference can be elicited (ortho vs. surgical tx) Blinding issues (Hawthorne effect) Expensive (and often lack sponsor) 51 What are the current “Issues” in Dental clinical research? • • • • Diagnosis Treatment approach Materials Long term issues • • Health Services Research • Cost Effectiveness Harm 52 Negative Study • • • • No association Sloppy design (poor methods or analysis) Bias Chance • Statistics measures “chance” (expressed as p-value) 53 Systematic Reviews Putting it all together 54 Scientific Truth relies on • the weight of evidence over many studies that creates confidence in results. • • If its not published….it didn’t happen. Journalistic Reviews…the “old way” • • Remember the essays you used to write as a student? You would browse through the indexes of books and journals until you came across a paragraph that looked relevant, and copied it out. If anything you found did not fit in with the theory you were proposing, you left it out. Or the way its done by senior academics. Take a simmering topic, extract the juice of an argument, add the essence of one filing cabinet, sprinkle liberally with your own publications and sift out the work of noted detractors or adversaries…or 55 Systematic Reviews…the new way • • • In contrast to the old way, systematic reviews use explicit and rigorous methods to identify, critically appraise, and synthesize relevant studies. Qualitative: when the results of studies are not statistically combined. Quantitative or Meta-analysis: systematic review that uses statistical methods to combine the results of two or more studies 56 Maturation of Dentistry Age of Empiricism: Age of Evidence Dental practice based on observation and experience in ignorance of scientific findings Dental practice based on high quality evidence of effectiveness All knowledge maintained personally Textbooks and Journals Apprentice Model of Education Absence of Research Internet Scientific Literature and Knowledge Synthesisbased Education RCTs Systematic Reviews and Meta Analysis Evolution of the Dental Knowledge Base • store of specialized information - diseases - treatment methods - treatment outcomes • basis of professional decision-making • has evolved over time with respect to: - creation - synthesis - dissemination Bader JEBDP 2004 What is a Systematic Review • • A "systematic review” comprehensively locates, evaluates and synthesizes all the available literature on a given topic using a strict scientific design which must itself be reported in the review. Aim of SR is: • • • • Systematic (e.g. in its identification of literature) Explicit (e.g. in its statement of objectives, materials and methods) Reproducible (e.g. in its methodology and conclusions) Goal: To efficiently integrate valid information and provide a basis for rational decision making. Features of a Systematic Review • • Explicit criteria (reproducible) Efficient • • • • As it is impractical for even an expert to read all the literature published in his field. SR are a succinct but robust form for practitioners who need to keep up to date? Well focused (PICO) Thorough (unpublished information may be included) Provides a context for studies and creates a sense of the “weight of evidence” • Secondary data analysis 60 Why Systematic Reviews • Annually 3 million articles are published in biomedical journals and biomedicine mass doubling time is less than 20 months. • • • You would need to read a dozen or more articles per day (365 days/yr.) to stay up to date. Not all articles are valid or useful for patient care. SR provide a summary and context of the current state of knowledge (that is lacking if you only read a few articles in an area). Quality of Evidence Pyramid Meta-Analysis Systematic Review Randomized Controlled Trial Cohort studies Case Control studies Case Series/Case Reports Basic Research and Animal research } Guidelines Questions come in two varieties: • BACKGROUND QUESTIONS • • Textbooks/Basic Sci Faculty FOREGROUND QUESTIONS • • • Clinical Faculty Journal articles Guidelines Foreground Background Dental School Professional Practice Background Questions • Are general questions about conditions, illnesses, syndromes and patterns of disease, and pathophysiology. • • • • "What is the typical clinical presentation of primary oral herpes?” or “Which teeth are most commonly affected during ECC?” Novices asks this type of question in a particular knowledge area, in order to gain a general understanding of clinical issues. Best resources include textbooks and faculty. Foreground Questions • • Foreground questions are about issues of patient care and clinical decision-making. Best resources: • • guidelines, systematic reviews Remember: Generally, its not what you don’t know that causes problems - its what you “know” that just ain't so…. Steps in Developing Systematic Reveiws 66 Step 1: Identify an area of Uncertainty • Diagnosis • • Therapy • • Should asymptomatic impacted third molars be extracted? Prognosis • • How well does DIAGNODENT diagnosis interproximal caries? How long will a implant last when used to replace a single anterior tooth lost due to trauma? Is it different if the tooth loss is due to perio? Harm or Causality • Do posterior inlays result in greater risk of tooth sensitivity compared with other posterior restorations? Step 2: Frame it as an Answerable Questions (PICO Format) • • • • P patients or populations I interventions C comparison group(s) or "gold standard" O outcome(s) of interest P.I.C.O. Patient or Problem Tips for Building Questions Example Intervention (a cause, prognostic factor, treatment etc. Comparison Intervention Outcomes (if necessary) Starting with your patient, ask “How would I describe a group of patients similar to mine?” Ask “Which main intervention am I considering” Ask “What is the main alternative to compare with the intervention?” Balance precision with brevity. Be specific Be specific In young adults will asymptomatic impacted third molars, cause ortho relapse or lead to problems better dealt with prophylactically Surgical extraction Watchful waiting Ask “What can I hope to accomplish?”, or “What could this exposure really affect?” Be specific reduction of ortho relapse, prevention of oral infections, reduction in surgical complications at an older age. Step 3: Search for the Evidence • • • Philosophy: Find all literature that is relevant and valid Eliminate studies with poor design Reduce potential for bias • • • • Effect size (design effects) Publication (no negative studies) Author (COI) Poor search strategies Step 3: Search for the Evidence • Establish inclusion and exclusion criteria • • Type of study (RCTs, Cohort, Case-Control, Cross sectional) Type of exposure and outcomes • Case Definition • Exposure Definition • Are Outcomes Important (to whom?) 71 Step 3: Search for the Evidence • Develop Search Strategy • Electronic Databases • MEDLINE, EMBASE, Cochrane Library, etc. • Search Filter (are they tested and sensitive/specific) • • Hand searching Unpublished studies • Gray literature (conference proceedings, disssertations) • • Reference lists Personal communication 72 Step 4: Extract Data • • • • • • Apply Inclusion and exclusion criteria Two stage review (title/abstract; full article) Two reviewers Rules for resolving disagreements Use predetermined forms Log reason for exclusion 73 Step 5: Analyze and Present Results • Evidence Table • • • • • • Research design Subjects Methods Results Qualitative Summary Quantitative Summary • • • Heterogeneity Meta-analysis Sensitivity analysis • Methodological Quality • • • • • • • allocation concealment blinding statistical analysis funding/sponsorship population (specificity) intervention (specificity) outcomes (specificity) 74 Step 6: Interpret and Review Results • • • • • Have all the main outcomes been considered Have data been presented about absolute change as a result of the intervention Have any factors that may limit application been considered Are the results consistent Don’t confuse “no evidence of an effect” with “evidence of no effect” 75 Forest Plots A quick look at metaanalysis 76 there’s a label to tell you what the comparison is and what the outcome of interest is 77 At the bottom there’s a horizontal line. This is the scale measuring the treatment effect. Here the outcome is death and towards the left the scale is less than one, meaning the treatment has made death less likely. Take care to read what the labels say – things to the left do not always mean the treatment is better than the control. 78 The vertical line in the middle is where the treatment and control have the same effect – there is no difference between the two 79 For each study there is an id The data for each trial are here, divided into the experimental and control groups This is the % weight given to this study in the pooled analysis 80 The data shown in the graph are also given numerically The label above the graph tells you what statistic has been used •Each study is given a blob, placed where the data measure the effect. •The size of the blob is proportional to the % weight •The horizontal line is called a confidence interval and is a measure of how we think the result of this study might vary with the play of chance. •The wider the horizontal line is, the less confident we are of the observed effect. 81 The pooled analysis is given a diamond shape where the widest bit in the middle is located at the calculated best guess (point estimate), and the horizontal width is the confidence interval Definition of a 95% confidence interval: If a trial was repeated 100 times, then 95 out of those 100 times, the best guess (point estimate) would lie within this interval. 82 At the end of the day…. What do we really want to know? 83 Can we believe it ? • • • bias free search & inclusion criteria? appraisal of methodology of primary studies? consistent results from all primary studies? • • if not, are the differences sensibly explained? are the conclusions supported by the data? 84 If we believe it — does it apply to our patient? • • Is our patient (or population) so different from those in the primary studies that the results may not apply? consider differences in: • • • time — many things change. culture — both treatments and values of outcomes can be different stage of illness or prevalence can effect results. We believe it ! But….does it matter? • • • • Is the benefit worthwhile to our patient? Ask the patient about cultural values. Think about Relative Risk Reduction vs. Absolute Risk to our patient. Potential benefit is the Absolute risk avoided in our patient = Absolute Risk Reduction (ARR)! Is it a systematic review? does it: • • • • • define a four part (answerable) clinical question? combine Randomized Controlled Trials (RCT’s)? describe PRE-DEFINED search methods? PRE-DEFINED inclusion criteria? PRE-DEFINED methodological exclusion criteria? PICO Practice PICO Practice Step 1: Key Clinical Question “What is the effectiveness of semiannual fluoride varnish compared to semiannual fluoride gel in preventing dental caries in permanent teeth among caries-active adults?” Egger at al., 2001 PICO Practice Step 1: Key Clinical Question “What is the effectiveness of semiannual fluoride varnish compared to semiannual fluoride gel in preventing dental caries in permanent teeth among caries-active adults?” Population Egger at al., 2001 PICO Practice Step 1: Key Clinical Question “What is the effectiveness of semiannual fluoride varnish compared to semiannual fluoride gel in preventing dental caries in permanent teeth among caries-active adults?” Population Egger at al., 2001 PICO Practice Step 1: Key Clinical Question “What is the effectiveness of semiannual fluoride varnish compared to semiannual fluoride gel in preventing dental caries in permanent teeth among caries-active adults?” Intervention Egger at al., 2001 PICO Practice Step 1: Key Clinical Question “What is the effectiveness of semiannual fluoride varnish compared to semiannual fluoride gel in preventing dental caries in permanent teeth among caries-active adults?” Intervention Egger at al., 2001 PICO Practice Step 1: Key Clinical Question “What is the effectiveness of semiannual fluoride varnish compared to semiannual fluoride gel in preventing dental caries in permanent teeth among caries-active adults?” Comparison Egger at al., 2001 PICO Practice Step 1: Key Clinical Question “What is the effectiveness of semiannual fluoride varnish compared to semiannual fluoride gel in preventing dental caries in permanent teeth among caries-active adults?” Comparison Egger at al., 2001 PICO Practice Step 1: Key Clinical Question “What is the effectiveness of semiannual fluoride varnish compared to semiannual fluoride gel in preventing dental caries in permanent teeth among caries-active adults?” Outcome Egger at al., 2001 PICO Practice Step 1: Key Clinical Question “What is the effectiveness of semiannual fluoride varnish compared to semiannual fluoride gel in preventing dental caries in permanent teeth among caries-active adults?” Outcome Egger at al., 2001 Source of Secondary Information • Systematic Reviews • • E.g., Cochrane Collaboration Guidelines • E.g., National Guidelines Clearinghouse 98 THE COCHRANE COLLABORATION Cochrane Collaboration • An international organisation that aims to help people make well-informed decisions about healthcare by preparing, maintaining and promoting the accessibility of systematic reviews of the effects of health care interventions. Cochrane Centres Canadian San Francisco Nordic German UK Dutch French Iberoamerican San Antonio Italian Chinese New England Brazilian South African Australasian Cochran Library 106 Evaluation of Diagnostic Tests 107 Topics How do we “know” something. 1. • What are the elements and structure of scientific thinking. 2. • 4. • 6. Facts, Hypotheses, Theories, Paradigms Research Designs and Control of Bias Clinical Epidemiology 3. 5. Scientific Reasoning Sensitivity, Specificity, Predictive Value Measurement in Dentistry The Research Enterprise 108 Topics How do we “know” something. 1. • What are the elements and structure of scientific thinking. 2. • 4. • 6. Facts, Hypotheses, Theories, Paradigms Research Designs and Control of Bias Clinical Epidemiology 3. 5. Scientific Reasoning Sensitivity, Specificity, Predictive Value Measurement in Dentistry The Research Enterprise 109 Diagnostic Tests • • Purpose: to increase our certainty about the cause of a patients illness Common Types: • • • • Physical and history findings Laboratory test Radiography “Other” technological findings (pulp tester, etc.) 110 Examples of Diagnostic Tests in Dentistry • Caries • • Pulpal necrosis • • Biopsy, dye Periodontitis • • Electrical, thermal Soft tissue lesions • • visual, radiography, DIFOTI Future attachment loss, PSR Malocclusion • Index, study models, ceph 111 Reduction of Diagnostic Information • • • • Scales Indexes, Cut Points Basic Decision: Treat or No Treatment 112 Outcomes in Orthodontics • • • Malocclusion is not a disease Outcomes based on clinician assumptions of patients needs/desires Many dimensions need to be measured • Overjet, overbite, cross bite, etc… 113 Measurement Issues in Orthodontics • Index - assign numerical rating • • • • Diagnostic (Angle) Epidemiological Index (Summer’s) Treatment need (HLD, Salzman, IOTN) Treatment Outcome (PAR) 114 Valid and Reliable Reliable but NOT valid NOT reliable or valid Reliable and valid Can’t be valid unless reliable 115 What is validity in Ortho Index • Measures dimensions of occlusion that are considered clinically important. These could based upon: • • • Expert opinion Clinical consequences (disease) or change Patient values and desires 116 How to assess reliability • Intra-rater • • Inter-rater • • Have same person rate the “case” more than once. Have different people rate the “case”. Expressed as measures of rater agreement • • • Nominal (Kappa) Categorical (Percent agreement, weighted Kappa ) Continuous (Correlation, ICC) 117 Test Quality • • • Diagnosis is an imperfect process - all tests have some inherent inaccuracy The “correct” diagnosis thus becomes a probability Understanding the mathematical performance of a test improves the clinicians decision making process. 118 Measures of the Quality of a Diagnostic Test • • • • • Sensitivity Specificity Accuracy Predictive Value (positive and negative) The higher these numbers - the better the test. 119 the “Gold Standard” • • • The definitive diagnostic technique Often expensive, elaborate, or difficult to perform. We are always looking for faster, cheaper, better ways to diagnose disease (and to determine treatment). 120 Sensitivity • • • • The number of people with the disease (Gold Standard) who have a positive test result. Relates Gold Standard to New Test. A sensitive test rarely misses people with disease. Sensitive tests should be selected when there is an important penalty for missing disease (i.e., cancer diagnosis) 121 Specificity • • • The number of people without the disease who test positive. A specific test will rarely misclassify people without disease as diseased. Specific tests are used to “rule in” a diagnosis that has been suggested by other tests. 122 Accuracy of a Test • • The overall ability of a test to correctly classify a patient. Sensitivity + Specificity / 2 123 Predictive Value • • Positive predictive value is probability of disease in a patient with an abnormal test. Negative predictive value is the probability of no disease in a patient when the test result is normal. 124 A new diagnostic test for periodontal disease 125 “PERIOCHECK®” • • • • A new diagnostic assay that the company claims “predicts” future periodontal attachment loss (LOA). Requires a “blood test” of 1 ml of blood placed into the “Periocheck” machine. Values of the test range from -5 to +5 “Gold Standard” is actual attachment loss (measured prospectively). 126 A Validation Study for PERIOCHECK • 300 subjects recruited into study • • • 2 edentulous exclusions 8 medical complication exclusions 4 refused upon consent • • • 45% African Am Mean age 49 ± 15y Upon 2 year follow up • • 48 lost to follow up Final Study Sample • 238 (79%) • 125 had LOA (52.5%) • 113 no LOA (47.5%) 127 Distribution of Baseline PERIOCHECK values by future LOA TN - True Negatives TP - True Positives FN - False Negatives FP - False Positives Diagnostic Cutpoint ≥ 0 35 Frequency 30 People who do NOT develop LOA 25 TP TN 20 People who DO develop LOA 15 10 5 FN FP 0 -4 -3 -2 -1 0 1 Periocheck Values 2 3 4 128 Distribution of Baseline PERIOCHECK values by future LOA TN - 91 TP - 109 FN - 16 FP - 22 Diagnostic Cutpoint ≥ 0 35 Frequency 30 People who do NOT develop LOA 25 TP TN 20 People who DO develop LOA 15 10 5 FN FP 0 -4 -3 -2 -1 0 1 Periocheck Values 2 3 4 129 Gold Standard (eventual LOA) Disease Present Periocheck Positive Test ≥ 0 109 Negative Test < 0 16 35 125 30 Frequency Disease Absent 22 TP FP FN TN 91 113 238 25 TN 20 TP 15 10 5 FN 0 -4 -3 -2 FP -1 0 1 Periocheck Values 2 3 4 Prevalence = 125/238 = 52% Quality of Diagnostic Test • • Sensitivity - the number of people with disease who have a positive test. Specificity - the number of people without a disease who have a negative test 131 Gold Standard (eventual LOA) Disease Present Periocheck Positive Test ≥ 0 109 Negative Test < 0 16 35 125 30 Frequency Disease Absent 22 TP FP FN TN 91 113 238 25 TN 20 TP 15 Sensitivity = 109/124 = 87.9% 10 5 FN 0 -4 -3 -2 FP -1 0 1 Periocheck Values Specificity = 91/113 = 80.5% 2 3 4 Prevalence = 125/238 = 52% Performance related to “Cut Point” • • • “cut point” is arbitrary and may be changed. It is a decision point that a clinician may wish to set for him/herself. Sensitivity and Specificity are inversely associated to one another and vary with the cut point 133 Gold Standard (eventual LOA) Disease Present Periocheck Positive Test ≥ 0 109 Negative Test < 0 16 35 125 30 Frequency Disease Absent 22 TP FP FN TN 91 113 238 25 TN 20 TP 15 Sensitivity = 109/125 = 87.9% 10 5 FN 0 -4 -3 -2 FP -1 0 1 Periocheck Values Specificity = 91/113 = 80.5% 2 3 4 Prevalence = 125/238 = 52% Gold Standard (eventual LOA) Disease Present Periocheck Positive Test ≥ -2 124 Negative Test < -2 1 35 125 30 Frequency Disease Absent 35 TP FP FN TN 78 113 238 25 TN 20 TP 15 Sensitivity = 124/125 = 99.2% 10 5 FN 0 -4 -3 -2 FP -1 0 1 Periocheck Values Specificity = 78/113 = 69.0% 2 3 4 Prevalence = 125/238 = 52% What we have so far • That at the cut point studied (i.e., 0) • • for every 100 patients without disease we will correctly classify 80 of them. (Specificity) For every 100 patients with disease we will correctly classify 89 of them. (Sensitivity) 136 Relationship of Sensitivity/Specificity to Cut Point Cut Point Sensitivity Specificity -3 100 34 0 95 71 .5 90 82 1 83 91 3 55 99 137 ROC Curves • • • Relates changes in sensitivity and specificity to changes in cut point. Provides overall utility of test Suggests “optimal” cut point 138 Senst ROC CURVE 100 .5 0 -2 1 1.5 50 0 3 0 1-Spec 50 100 139 Senst ROC CURVE 100 .5 0 -1 1 1.5 50 0 2 0 1-Spec 50 100 140 Senst ROC CURVE 100 .5 0 -1 1 1.5 50 0 2 0 1-Spec 50 100 141 Senst ROC CURVE 100 Area =.91 50 0 Area =.5 0 1-Spec 50 100 142 Senst ROC CURVE 100 Optimal cut point 50 0 0 1-Spec 50 100 143 What we actually get clinically • • People with a “positive” test • And we want to know how many really DO have disease • Positive Predictive Value - the number of people with a positive test who have disease. People with a “negative” test • And we want to know how many really DO NOT have disease. • Negative Predictive Value - the number of people with a negative test who do not have disease. 144 Gold Standard (eventual LOA) Disease Present Periocheck Positive Test ≥ 0 109 Negative Test < 0 16 22 TP FP FN TN 91 131 107 238 35 30 Frequency Disease Absent 25 TN 20 TP 15 Positive Pred = 109/131 = 83.2% 10 5 FN 0 -4 -3 -2 FP -1 0 1 Periocheck Values Negative Pred = 91/107 = 85.0% 2 3 4 Prevalence = 125/238 = 52% Test performance and prevalence • • Sensitivity and Specificity are stable properties PPV and NPV are frequency (Prevalence) dependent properties 146 Gold Standard (eventual LOA) Disease Present Periocheck Positive Test ≥ 0 109 Negative Test < 0 16 35 125 30 Frequency Disease Absent 22 TP FP FN TN 131 91 107 113 238 25 TN 20 TP 15 Positive Pred = 109/131 = 83.2% 10 5 FN 0 -4 -3 -2 FP -1 0 1 Periocheck Values Negative Pred = 91/107 = 85.0% 2 3 4 Prevalence = 125/238 = 52% Gold Standard (eventual LOA) Disease Present Periocheck Positive Test ≥ 0 109 Negative Test < 0 35 30 Frequency Disease Absent 194 TP FP FN TN 303 16 806 822 125 1000 1125 25 TN 20 TP 15 Positive Pred = 109/303 = 35.9% 10 5 FN 0 -4 -3 -2 FP -1 0 1 Periocheck Values Negative Pred = 806/822 = 98.0% 2 3 4 Prevalence = 125/1125 = 11% Remember • • • Sensitivity and Specificity are stable with changing prevalence, but will vary inversely with “cut point”. PPV/NPV vary by the prevalence of the population in which the test is administered. Best to use when uncertainty is high • Prevalence close to 50% 149 HIV Example (ELISA) When used in premarital screenings • • • • • • • Sensitivity - 98 Specificity - 99 Prevalence - 250/100,000 PPV = 20% 2 million marriages / year in US HIV cases = 5,000 For every 1000 correctly diagnosed, there will be 4000 false positives. 150 Research Ethics • • • • • Risk/Benefit Ratio Subject safety the information investigator is how Written Consent. should befrom in a state protected The investigator Informed Consent IRB Approved. of "equipoise," that unauthorized must consider how Full disclosure of is, if a new Privacy and Confidentialityobservation, and will adverse events Risksintervention iswho being how participants are be handled; will Adverse events against the totested be notified any provide careoffor a currently accepted unforeseen findings Equipoise participant injured in treatment, thewho will from the research a study and investigator that mayshould or may paythey for that care are be genuinely not want to know. important uncertain which considerations. approach is superior. 151 Ethical Issues in Human Research • • • Autonomy Beneficence Justice the obligation on the part of the Tuskegee: Study of syphilis in the beneficence, which refers to investigator to respect each justice, which demands equitable Blacks, without telling them of their obligation on the part of the participant asofa participants, person capable selection i.e., of participation. investigator to attempt to maximize making an informed decision avoiding participant populations Deception and lack of informed benefits for the individual regarding participation the that may be unfairlyincoerced into consent used. participant and/or society, while research study. The investigator participating, such as prisoners Individuals followed for 40 years minimizing risk of harm to the must ensure that the participant and institutionalized children. without treatment. individual. honest and thorough has receivedAn a full disclosure of the risk/benefit calculation must be nature of the study, the risks, performed. benefits and alternatives, with an extended opportunity to ask questions. 152 Components of Ethical, Valid Consent • • • • • Disclosure Understanding Voluntariness Competence Consent Disclosure: The potential Understanding: The participant participant must be informed as must understand what has been fully as possible Voluntariness: The of participant's the nature explained and must be given the and purpose consent to participate of the research, in the opportunity to ask questions the research procedures must The be to voluntary, be used, the free Competence: participant and have them answered by one expected of any coercion benefits or to promises thegive of must competent to of thebe investigators. participant benefits unlikely tosociety, result from the consent. If and/or the participant is not Consent: The potential human potential participation. of due reasonably competent to mentalhis/her status, subject must authorize foreseeable risks, stresses, disease, or emergency, a and participation in the research discomforts, and alternatives designated surrogate may to study, preferably in writing, participating in the research. provide consent if it isoral in the although at times an participant's best interest consent or assent may betomore participate. appropriate. 153 The End Questions? 154