Desiderata for Computable Representations of Electronic Health Records-Driven Phenotype Algorithms Supplemental Document (Appendix) Part 1: Type 2 Diabetes Mellitus Phenotype Algorithm This section provides a high-level overview of the eMERGE Type 2 diabetes mellitus (T2DM) algorithm [1] for extracting T2DM cases from the EHR (please see https://phekb.org/phenotype/type-2-diabetes-mellitus for further details). The case selection algorithm requires specific patient-level data elements to be extracted from the EHR, including diagnoses, lab results, medication orders, and physician encounter dates. In particular, the following data elements (defined with standardized concept lists, as Desideratum 7) are required: Counts of type 1 diabetes mellitus (T1DM) ICD-9 code assignment dates by diagnostic source Counts of T2DM ICD-9 code assignment dates by diagnostic source T1DM medications (i.e., insulin & Symlin) order or prescription dates – at least the earliest date of Rx T2DM medications (metformin & other sulfonylureas, biguanides, etc.) order or prescription dates – at least the earliest date of Rx Fasting blood glucose lab values – at least the maximum value Random blood glucose lab values – at least the maximum value HBA1c lab values – at least the maximum value For this algorithm, the following definitions and abbreviations apply: Abnormal lab – An abnormal lab value is defined as one of the following: – Random glucose > 200 mg/dl – Fasting glucose ≥ 125 mg/dl – Hemoglobin A1c ≥ 6.5% Physician entered diagnosis – A physician entered diagnosis code is one that is derived from encounter or problem list sources only (excludes diagnoses entered into billing systems). A flowchart expressing the logic of the T2DM case selection algorithm is shown in Figure 1. Figure 1 (same as main text): Phenotype algorithm for identifying type 2 diabetes mellitus (T2DM) from electronic medical records (EMR, or EHR). T1DM: type 1 diabetes mellitus; Dx: diagnoses, defined as recorded using International Classification of Diseases, 9th Revision (ICD-9) codes; med: medication; physcn: physicians; Rx: prescriptions. (From PheKB.org) The T2DM algorithm has been implemented as a workflow inside of the Konstanz Information Miner (KNIME) data analytics platform. The workflow takes as input a comma-separated value (CSV) file, with each row corresponding to a set of patient-level variables (as Desideratum 2). Set operations (as Desideratum 4) are mostly performed at the relational level in generating patient-level values, some of which are counts and dates of first appearance (as aggregative operations in Desideratum 5). Figure S2 shows the top-level workflow that implements the algorithm. Figure S3 show the input data in tabular format – each row contains a set of data for one patient, each column is a data item with a specific data type. Figure S4 shows the configuration window for the final Rule Engine node in the workflow. This node implements the logic depicted by the flowchart diagram in Figure 1 (as Desideratum 5). One of the rule paths for cases requires that the (first) oral hypoglycemic medications (as preferable T2DM medications) should start before start of the (first) insulin treatments (as preferable T1DM medications), as Desiderata 6. Figure S2: T2DM implementation as a KNIME workflow. Figure S3: Table of deidentified* input data for KNIME workflow. Each row contains a set of patient-level variables (*patient IDs are removed and the dates are randomly shifted by a different random number for each patient for protection of patient privacy). Figure S4: Configuration window for the final Rule Engine node in Figure S2. This node encodes test values of variables and returns either “case” or “unknown” for each patient. This logic implements the paths through the flowchart in Figure 1. Part 2: Examples of complex counting rules and scoring rules (for Desideratum 5, aggregative operations. Modified Duke Criteria (a counting rule) [2]: Table A: Definitions of criteria Major criteria 1. Positive blood cultures for IE Typical microorganism for infective endocarditis from two separate blood cultures: Viridans streptococci Streptococcus gallolyticus (formerly S. bovis), including nutritional variant strains (Granulicatella spp and Abiotrophia defectiva) HACEK group: Haemophilus spp, Aggregatibacter (formerly Actinobacillus actinomycete comitants), Cardiobacterium hominis, Eikenella spp, and Kingella kingae Staphylococcus aureus Community-acquired enterococci, in the absence of a primary focus; OR Persistently positive blood culture, defined as recovery of a microorganism consistent with IE from: Blood cultures drawn more than 12 hours apart OR All of three or a majority of four or more separate blood cultures, with first and last drawn at least one hour apart Single positive blood culture for Coxiella burnetii or antiphase I IgG antibody titer >1:800* 2. Evidence of endocardial involvement Positive echocardiogram for IE TEE recommended in patients with prosthetic valves, rated at least "possible IE" by clinical criteria, or complicated IE [paravalvular abscess]; TTE as first test in other patients Definition of positive echocardiogram Oscillating intracardiac mass, on valve or supporting structures, or in the path of regurgitant jets, or on implanted material, in the absence of an alternative anatomic explanation OR Abscess OR New partial dehiscence of prosthetic valve New valvular regurgitation Increase in or change in preexisting murmur not sufficient Minor criteria 1. Predisposition: predisposing heart condition or intravenous drug use 2. Fever: 38.0°C (100.4°F) 3. Vascular phenomena: major arterial emboli, septic pulmonary infarcts, mycotic aneurysm, intracranial hemorrhage, conjunctival hemorrhages, Janeway lesions 4. Immunologic phenomena: glomerulonephritis, Osler's nodes, Roth spots, rheumatoid factor 5. Microbiologic evidence: positive blood culture but not meeting major criterion as noted previously (excluding single positive cultures for coagulase-negative straphylococci and organisms that do not cause endocarditis) OR serologic evidence of active infection with organism consistent with IE 6. Echocardiographic minor criteria eliminated IE: infective endocarditis; TEE: transesophageal echocardiography; TTE: transthoracic echocardiography. Table B: Definitions of diagnosis Definite IE Pathologic criteria Microorganism: demonstrated by culture or histology in a vegetation, or in a vegetation that has embolized, or in an intracardiac abscess OR Pathologic lesions: vegetation or intracardiac abscess, confirmed by histology showing active endocarditis Clinical criteria Using specific definitions listed in Table B: 2 major criteria OR 1 major and 3 minor criteria OR 5 minor criteria Possible IE 1 major criterion and 1 minor criterion OR 3 minor criteria Rejected IE Firm alternate diagnosis for manifestations of endocarditis OR Resolution of manifestations of endocarditis, with antibiotic therapy for four days or less OR No pathologic evidence of infective endocarditis at surgery or autopsy after antibiotic therapy for four days or less Does not meet criteria for possible infective endocarditis, as above IE: Infective Endocarditis. CHA2DS2_VASc Scores for anti-coagulation therapy in atrial fibrillation (a scoring rule) [3]: CHA2DS2-VASc definition Score Congestive heart failure 1 Hypertension 1 Age ≥ 75 years 2 Diabetes mellitus 1 Stroke/transient ischemic attack/thromboembolism 2 Vascular disease (prior myocardial infarction, peripheral artery disease, aortic plaque) 1 Age 65 to 74 years 1 Sex category (ie, female sex) 1 Maximum score CHA2DS2-VASc score Unadjusted ischemic stroke rate (% per year) 0 0.2% 1 0.6% 2 2.2% 3 3.2% 4 4.8% 5 7.2% 6 9.7% 7 11.2% 8 10.8% 9 12.2% 9 Part 3: Implementation of PheRM desiderata in PhEMA The Phenotype Execution Modeling Architecture (PhEMA) project is a multiinstitutional collaboration in developing a common PheRM (Phenotype Representation Model (PheRM) and its infrastructure to apply PheRM to multi-center EHR-based biomedical research (e.g., eMERGE network). During the ongoing implementations in PhEMA, these desiderata (D1 ~ D10) have been and are being constantly tested for practicality. The first PhEMA application being developed is an authoring tool, and it delivers two services (as D3): a graphical interface for developing, storing, and exporting humanunderstandable representations, and a validation service to ensure that authored algorithms are executable by translating them into QDM, KNIME workflows, SQL, and Fast Healthcare Interoperability Resources (FHIR) queries. The second PhEMA application being developed is a translator, which converts QDM (a mostly rule-based representation, as D5, and rich in temporal comparisons, as D6) into KNIME workflows (an executional environment based on relational tables, as proposed in D4). Both of these efforts are using VSAC services and standard HL7 terminologies (as D7). As a further step, the QDM to KNIME translator is able to construct queries against a local IDR by either directly matching concept identifiers or converting a concept list into regular expressions for searching. Both PhEMA and eMERGE are actively adopting a KNIME solution to dispatch phenotype algorithms for real-world research projects. As previously described, KNIME workflows allow bridging to external packages (e.g., Java, Python, R, Perl) to perform statistical classifications and project specific NLP (as D9). KNIME workflows are also amenable to adapt to local IDR with last-step simple queries (as D2). As a significant executable solution, KNIME provides the breadth of a full collection of generic computational tools to bridge the limitations of state-of-the-art PheRM implementations, ensuring that phenotyping projects will not be hindered from using our PheRM due to lack of features that could not be foreseen as being necessary or are not yet available. References for the Appendix: 1 Kho AN, Hayes MG, Rasmussen-Torvik L, et al. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J Am Med Inform Assoc JAMIA 2012;19:212–8. doi:10.1136/amiajnl-2011-000439 2 Li JS, Sexton DJ, Mick N, et al. Proposed modifications to the Duke criteria for the diagnosis of infective endocarditis. Clin Infect Dis Off Publ Infect Dis Soc Am 2000;30:633–8. doi:10.1086/313753 3 Friberg L, Rosenqvist M, Lip GYH. Evaluation of risk stratification schemes for ischaemic stroke and bleeding in 182 678 patients with atrial fibrillation: the Swedish Atrial Fibrillation cohort study. Eur Heart J 2012;33:1500–10. doi:10.1093/eurheartj/ehr488