1 I. DATABASE QUERY AND REFERENCE PAGES 1. Genotype-Treatment Correlations This Genotype-Treatment section of the database links to 15 interactive query pages that explore the relationship between treatment with HIV-1 antiretroviral drugs (ARVs) and mutations in HIV reverse transcriptase (RT), protease, and integrase. The grey box at the top of the page provides shortcuts to the query pages; the sections that follow provide a brief description of the following five types of interactive query pages. The Advanced Query Pages are being modified and are not reviewed here. A. Treatment Profiles (Protease and RT inhibitors) B. Mutation Profiles (Protease and RT mutations) C. Detailed Treatment Queries (Protease, RT, and integrase inhibitors) D. Detailed Mutation Queries (Protease, RT, and integrase mutations) E. Mutation Prevalence According to Subtype and Treatment 2 A. Treatment Profile Queries The figures below show two sample Protease Treatment Profile queries. The query form allows users to input information about different types of protease inhibitor (PI) treatments. For example, the user can request information on sequences from individuals who have never received a PI (No. of PI = 0), on those who have received a specific number of PIs, or those who have received one or more PIs (No. of PI = 1- 9). Users can also limit returns to particular PIs as well as to specific viral subtypes. The user can also specify a percent cutoff below which results are not displayed. For the two sample queries below, the top figure shows the query form and the bottom figure shows the query output. Sample Query 1: Protease variants in individuals who have not received a PI (all subtypes, 1% cutoff). The consensus sequence is indicated in grey. The number of individuals from whom the viruses were obtained (about 15,000, depending on the position) is shown in parentheses beneath the consensus sequence. Reported variants are shown in blue; the percent of sequences in which they are found is indicated by superscripts. The output shows that 34 of the protease’s 99 positions contain at least one variant present in at least 1% of PI-naïve individuals. 3 4 Sample Query 2: Protease variants in individuals who have received at least one PI (all subtypes, 1% cutoff). The output indicates that 52 of the protease’s 99 positions have at least one reported variant present in at least 1% of PI-treated individuals. If a lower cutoff had been set (e.g. 0.1%), a higher proportion of positions would have been reported as variants. 5 6 B. Mutation Profile Queries The figures below show two sample Mutation Profile queries. The query form allows the user to enter either an amino acid position alone or both an amino acid position and a specific amino acid. The output shows the prevalence of mutations at a given amino acid position in viruses from individuals receiving specific antiretroviral (ARV) treatments. The RT Mutation Profile query output characterizes viruses according to the nucleoside RT inhibitor (NRTI) and non-nucleoside RT inhibitor (NNRTI) treatment status of the individuals from whom the sequenced viruses were obtained. The Protease Mutation Profile query output characterizes viruses according to the PI treatment status of the individuals from whom the viruses were obtained. For the two sample queries below, the top figure shows the query form and the bottom shows the query output. Sample Mutation Profile Query 1: NRTI and NNRTI treatment status of individuals with viruses with mutations at RT position 106. The output shows that three mutations occur at RT position 106: V106I, V106A, and V106M. V106I is present in 1.7% of viruses from 16,572 untreated individuals (row 1), 1.5% of viruses from 4,467 NRTI-treated (but NNRTI-naïve) individuals (row 2), and 3.6% of viruses from 7,040 NNRTI-treated individuals (row 3). Therefore, although it is weakly associated with NNRTI therapy, it is also a polymorphism that occurs in nearly 2% of individuals who never received an RT inhibitor. In contrast, both V106A and V106M are strongly associated with NNRTI therapy: each occurs in 1.7% of NNRTI-treated individuals but in no untreated individuals. V106A appears to be selected for more strongly by treatment with nevirapine (NVP) than with efavirenz (EFV), whereas V106M appears to be selected for more strongly by treatment with efavirenz (EFV). 7 8 Sample Mutation Profile Query 2: PI treatment status of individuals with viruses with mutations at protease position 82. The output shows that eight mutations occur at protease position 82: V82A/I/T/F/S/C/L/M. V82I is a polymorphism that occurs in nearly 7% of untreated individuals and is generally not selected by PIs. The remaining seven mutations are nonpolymorphic (they do not occur in untreated individuals) but do occur in persons receiving PIs. Although V82M/C are generally not reported as PI-resistance mutations, these data suggest they probably should be. 9 10 C. Detailed Treatment Queries The figures below show an example of a Detailed RT Inhibitor query. The query form requests sequences from individuals who have received a combination of AZT+3TC+nevirapine (NVP). The query output shows the top part of the first of two pages of results. The results consist of a table containing columns with a (i) published reference (indicated by Author and Yr), (ii) individual ID, (iii) isolate ID, (iv) GenBank accession number, (v) NRTIs received, (vi) NNRTIs received, (vii) NRTIresistance mutations, (viii) NNRTI-resistance mutations, and (ix) subtype. If the query had specified ‘Complete Mutation List’ as an option, the columns with the NRTI and NNRTI-resistance mutations would be replaced with a single column containing all of the mutations in each sequence. The option to confine the query to individuals with viruses belonging to a specific subtype was not selected in this example. The drop-down boxes at the top left of the output page allow users to retrieve nucleotide sequences meeting the query’s criteria. The drop-down boxes at the top right of the output page allow users to view a composite alignment that summarizes the data in the table in a manner similar to that shown above in the Treatment Profile queries. 11 12 D. Detailed Mutation Queries The figures below show an example of a Detailed Mutation query. The query form requests data from individuals with subtype C viruses with the NRTI mutation K65R. The output shows that viruses from 32 individuals (reported in 15 literature references) met the query criteria. The table contains columns with (i) the publication, (ii) subject ID, (iii) isolate ID, (iv) GenBank accession number, (v) list of NRTIs, (vi) list of NNRTIs, (vii) list of NRTI resistance mutations, (viii) list of NNRTI resistance mutations, (ix) subtype, and (x-xii) complete list of the ARV regimens received by the persons with subtype C viruses containing K65R. The drop-down box at the top left of the page allows users to retrieve the sequences. The drop-down sequence at the top right allows users to view the sequences as a ‘Composite Alignment’ that shows the percent prevalence of each mutation at each position in the dataset. 13 14 E. Mutation Prevalence According to Subtype and Treatment The purpose of this query is to identify, for the eight most common subtypes, the frequency of all protease and RT mutations in untreated and treated individuals. The following screen shots show a Mutation Prevalence According to Subtype and Treatment query form and results. The results comprise aggregate data from approximately 25,000 individuals. The query form requests comparisons of the prevalence of mutations in RT inhibitor-naïve and RT inhibitor-experienced individuals (NRTI- and/or NNRTI-experienced). Mutations occurring below the selected cutoff of 0.5% are not shown. The query output is shown as two screen shots. The top shows the table header indicating that there are 18 total columns. The first two contain the position and subtype B consensus amino acid. The next 16 columns show the numbers of RT inhibitor-naïve and RT inhibitor-treated individuals with viruses belonging to each of the eight most common subtypes. The output table contains 560 rows, one for each of the 560 RT amino acid positions. Because this cannot be readily shown here, we show a screenshot of the portion of the table containing rows 98 to 108 — a region containing several positions associated with NNRTI resistance (98, 100, 101, 103, 106, 108). Several observations can be made from this part of the table: (i) A98S, K101R/Q, K103R, and V106I are relatively common polymorphisms that occur in multiple subtypes even in the absence of treatment; V108I occurs in CRF01_AG alone in the absence of treatment, (ii) A98G, L100I, K101E/P/H/N, V106M/A occur solely in treated individuals in multiple subtypes; V108I occurs at higher proportions in treated individuals in most subtypes, and (iii) V106M occurs preferentially in treated individuals with subtype C viruses. In contrast, V106A is the only mutation at this position that occurs in more than 0.5% of treated individuals infected with subtype B viruses. 15 16 2. Genotype-Phenotype Correlations The main page of the Genotype-Phenotype Correlations section links to four interactive query pages: three dynamically updated data summaries and one regularly updated downloadable dataset. A. Drug Resistance Positions – Query for levels of resistance associated with known drug resistance mutations B. Detailed Phenotype Queries – Queries for levels of resistance associated with individual mutations or mutation combinations at all positions of protease, RT, and integrase C. Patterns of Drug Resistance Mutations D. Downloadable Reference Dataset A. Drug Resistance Positions The Drug Resistance Positions query form lists known PI, NRTI, and NNRTI drug-resistance positions. In this example, position 215 in RT– a known NRTI-resistance position – is selected. The results show aggregate drug-susceptibility data on viruses with the most common patterns of mutations that also contain mutations at RT position 215. The first row lists viruses with the combination of the following four NRTI-resistance mutations: M41L, M184V, L210W, and T215Y. Phenotypic drug-susceptibility results for AZT were available in the database for 19 viruses with this pattern of mutation. The median reduction in susceptibility (or level of resistance) to AZT for viruses with these mutations is 7.5 fold, with an inter-quartile range of 5.2 to 27 fold. It is important to bear in mind that the clinical significance of different levels of resistance varies among different drugs. Drug susceptibility results on viruses with T215Y alone (i.e. without any other accompanying mutations) are available for eight viruses. The median reduction in susceptibility was 13 fold. The higher level of resistance in this virus compared with the one containing four mutations is due to the fact that the mutation M184V increases susceptibility to AZT. 17 18 B. Detailed Phenotype Queries The query form allows users to specify (i) individual mutations or combinations of mutations, (ii) one or more ARVs, and (iii) one or more methods of susceptibility testing. When specifying mutations, it is possible to request sequences with an exact match (i.e. no other major drugresistance mutations are present in the retrieved sequence) using the check box labeled ‘With no other NRTI/NNRTI mutations’ or ‘With no other PI mutations’. The figures below show two examples of Detailed Phenotype queries. Sample Query #1: Y181C+V179F and etravirine (ETR) susceptibility (Virco assay). This query returns four laboratory virus isolates with mutations that emerged during in vitro passage or that were introduced during site-directed mutagenesis. The results show that these two mutations alone are associated with high-level (32-to-130 fold) resistance to ETR. 19 20 Sample Query #2: Mutations at positions 10 + 46 + 54 + 82 + 84 + 90, All PIs, and the PhenoSense Assay are selected. The output shows that 26 results on five isolates from four references are available. This pattern of mutations is associated with resistance to each of the seven first-generation PIs: ATV, FPV, IDV, LPV, NFV, RTV, and SQV. No susceptibility data are available for the second-generation PIs TPV and DRV for this pattern of mutations using the PhenoSense Assay. However, running the same query and specifying the Virco Assay yields six viruses, of which five were tested for TPV susceptibility and four were found to have high-level TPV resistance. 21 22 C. Patterns of Drug Resistance Mutations This section contains dynamically updated datasets of drug susceptibility results obtained with the PhenoSense assay of viruses containing the most common patterns of drug resistance mutations. There are separate datasets for NRTI-, NNRTI-, and PI-resistance mutations. In each table, the first column contains the mutation pattern. The second column contains the number of sequences in the database with that pattern. The remaining columns contain the median fold decrease in susceptibility for the drug listed in the column header. The subscript indicates the number of results available for a particular mutation pattern and drug. Sequences containing electrophoretic evidence of a mixture of wildtype and mutant variants at a major drug-resistance position are excluded from this table. The figures below show the top part of the tables for the NRTI and NNRTI mutation datasets. Viruses without drug resistance mutations (row 1) have median fold reductions in susceptibilities of <1.0. Low and high levels of decreased susceptibility are shown in pink and red cells, respectively. The high level of cross-resistance among the first-generation NNRTIs is readily apparent. There are insufficient data (particularly with the PhenoSense assay) on the second-generation NNRTI etravirine to include it in the table. However, based on approximately 100 results, obtained primarily with the Virco assay, viruses with most of these patterns of NNRTI-resistance mutations are likely to retain etravirine susceptibility. 23 24 D. Downloadable Reference Datasets This section provides researchers the opportunity to download all of the drug susceptibility data in HIVDB. The data are provided as six text files that can be parsed based on the description of the fields in the dataset. This section is updated every 6 to 12 months. 25 3. Genotype-Clinical Correlations This part of the database has two main sections: A. Clinical Trials Datasets B. Summaries of Clinical Studies A. Clinical Trials Datasets This section contains data linking ARV treatments, genotypic resistance data, and virological response (plasma HIV-1 RNA levels) to a new treatment regimen. The data are from past clinical trials and are supplemented with datasets from well-characterized retrospective studies. The figure below shows the introductory page to the data from ACTG trial 384. The row labeled ‘References’ has links to PDFs of the study’s publication. The row labeled ‘HIVDB’ provides links to the RT and protease data as they appear in the References section of the database. The row labeled ‘Sequence Quality’ links to the sequences that were excluded from the dataset due to an issue with the quality of the sequences or accompanying data. The row labeled ‘Sequences’ contains links to the complete set of protease and RT sequences from the study for download (text files) or browsing (html files). The row labeled ‘Browse’ contains links to summary figures with information on one individual per figure. The row labeled ‘Dataset’ contains links to pages for downloading complete sets of data from the study (ARV treatments, genotypes, and plasma HIV-1 RNA levels). 26 27 The figure on the right below shows the page with 902 summary figures. The 902 figures contain a total of 1,158 protease and RT sequences because some individuals had sequences obtained at more than one time point. Beneath these two screen shots are examples of two summary figures with explanations. 28 The summary figures provide a quality control check for the correct temporal relationship between treatments, genotypes, and plasma HIV-1 RNA levels. The examples that follow summarize data from two individuals who developed consecutive virological failures. The initial ARV treatment regimens used in these individuals are no longer recommended today. Individual 39659 initiated therapy with the NRTIs d4T + ddI and the PI nelfinavir (NFV). Virological failure developed with the canonical NFV-resistance mutations D30N + N88D and the four NRTIresistance mutations K65R, D67G, K70E, and Q151M. The subsequent regimen containing the NRTIs AZT + 3TC and the NNRTI efavirenz (EFV) was pre-determined by the ACTG 384 study protocol. Genotypic resistance testing to guide clinical decision making was not yet standard at this time. Although there was an initial response to the second regimen, most likely due to the potent antiviral activity of EFV, the response was short-lived, presumably because the accompanying NRTIs were ineffective in the face of the NRTI-resistance mutations that were present at the start of therapy. Although a follow-up genotype is not shown, one would expect the resulting virus to have developed EFV resistance. Individual 40207 initiated therapy with the NRTIs d4T + ddI and EFV. Virological failure ensued with the development of the NNRTI-resistance mutation K103N and the NRTI-resistance mutation T215Y. Subsequent therapy with AZT + 3TC + NFV resulted in the addition of the 3TC-resistance mutation M184V and the NFV-resistance mutation N88S. Subsequent salvage therapy with ritonavir-boosted amprenavir was successful, possibly because N88S is known to render HIV-1 more susceptible to amprenavir. It is notable that, despite the virological failures, both individuals exhibited a gradual increase in their CD4 counts, indicating that therapy is often beneficial even if virological suppression is incomplete. 29 30 B. Summaries of Clinical Studies There have been many studies of the association between pre-therapy drug-resistance mutations and the virological response to a new treatment regimen containing a previously unused ARV. Because none of the raw data from such studies have been published, we have summarized these studies in this section. About 50 studies of this type have been published, including more than 30 for PIs, about 20 for NRTIs, and one for NNRTIs. Each of these studies is underpowered, due to the many different combinations of mutations often present at baseline and the many covariates associated with the virological response to a new drug: pre-therapy plasma HIV-1 RNA level and CD4 count, the extent of past ARV treatment, and the drugs used in combination with the ARV being analyzed. The studies are therefore most reliable when they identify mutations for which there is independent evidence of an association with resistance (such as phenotypic data or the emergence of the mutation during drug exposure) or when multiple studies of this type identify the same drug resistance mutations as being associated with virological failure. The screen shots below show parts of the page summarizing baseline protease mutations and the virological response to a new PI-containing regimen. 31 32 4. References This part of the database has two main sections: one with summaries of the data from each of the references in HIVDB and one in which every primate immunodeficiency virus sequence in GenBank is annotated according to its presence or absence in HIVDB. A. Studies in HIVDB B. GenBank <=> HIVDB A. Studies in HIVDB This page lists the more than 900 references in the database in alphabetical order by the last name of the first author. The scroll box makes it possible to find references using the name of any of the references’ authors. The first column, which contains the name of the first author and year of publication, is linked to the reference itself (the PubMed entry, a meeting poster or abstract, or a description of the origin of the sequences in the study). The last column contains a link to the data present in the reference. The first page of a reference contains a link to the reference and summarizes the number and types of virus isolates in that reference. In example #1 below, the reference contains 14 clinical integrase isolates. These were the first publicly available isolates from individuals receiving the integrase inhibitor raltegravir. In example #2, the reference contains 3,195 clinical isolates containing RT and protease. Page with Number of Isolates in a Reference: Example 1 Page with Number of Isolates in a Reference: Example 2 33 The ‘IN Clinical: 14’ link for the Charpentier study takes us to a page containing a table which summarizes the isolates by Subject ID, Isolate ID, ARV treatment, and a list of the mutations in the sequence according to whether the mutations are known major or minor mutations, other commonly observed variants, or unusual variants. The drop-down menus at the top of this page allow users to download the complete set of sequences in the study or to obtain four other views of the data: ‘Complete Rx’, ‘Isolate Data’, ‘Mutation Categories’, and ‘Susceptibility Data’. 34 The ‘PR Clinical: 3195’ link for the Baxter study takes us to a page with same format as described above. Each of the sequences in this study (from the RESIST clinical trial) was submitted with the approval of the study’s sponsor, Boehringer Ingelheim, to GenBank by the study’s authors – something that is not done nearly often enough. The authors also provided an extensive number of phenotypic susceptibility results for viruses obtained from the study subjects. The susceptibility data can be viewed by choosing ‘Susceptibility Data’ from the drop-down menu on the first page. A sample of the data is shown in the figure below. 35 B. GenBank <=> HIVDB This part of the database organizes HIV-1, HIV-2, and non-human primate lentivirus pol (RT, protease, and integrase) sequences according to the sequence’s primary reference cited in the GenBank annotation. This makes it possible to summarize the more than 100,000 pol sequences in GenBank using a list of about 1,300 references. The figure below contains a table listing these references. The scroll box on the left makes it possible to search the GenBank references by author. The scroll box on the right makes it possible to sort the entries in the table by each of the fields in the table. The first field contains the author and year of publication of the primary reference in the GenBank entry. If it is a PubMed reference then the field links to the PubMed abstract. The fourth field is the BLAST E-value of the sequence in the study with the lowest value. The E-value is a measure of the similarity of the sequence to the HIV-1 consensus B sequence; the lower the E-value, the more similar the sequence is to the HIV-1 consensus B sequence. The ‘# in GB’ field indicates the number of sequences in GenBank that cite this reference as the primary reference. The ‘# in HIVDB’ field indicates the number of sequences from this GenBank reference that are in HIVDB. The Annotation field indicates if the sequences from the study are in HIVDB (‘HIVDB’), if the sequences are being evaluated for entry into HIVDB (‘Pending’, ‘New’, ‘Unpublished’, and ‘ARV Rx and/or other data are N/A’), or have been intentionally left out of HIVDB (‘Gene fragments’, ‘Sequence quality’, ‘Laboratory/experimental isolate’, and ‘Evolution/quasispecies study’). The annotations themselves link to a page with a table that defines each annotation and provides a rationale for inclusion or exclusion in HIVDB. ‘Evolution/quasispecies studies’ are studies with many clones and/or sequences from a few individuals that address questions about HIV evolution but not necessarily about drug resistance. The decision to exclude the sequences from these types of studies will be periodically re-evaluated as more resources become available. 36 37 5. New Submissions Approximately every three months, the New Submissions section lists the studies that have been entered into HIVDB. The study title links to the introductory page of the study in the References section. 38 6. Database Statistics Database Statistics (http://hivdb.stanford.edu/cgi-bin/Summary.cgi) 39 II. INTERACTIVE PROGRAMS HIVDB has seven main interactive programs. 1. HIVdb Program A. Mutation List Analysis B. Sequence Analysis C. HIVdb Output D. Sierra Web Service E. Release Notes F. Algorithm Specification Interface (ASI) 2. HIValg Program 3. HIVseq Program 4. Calibrated Population Resistance (CPR) tool 5. Mutation ARV Evidence Listing (MARVEL) 6. ART-AiDE 7. Rega HIV-1 Subtyping tool 40 1. HIVdb Program The screen shot below shows the introductory page to the HIVdb program. There are three ways in which the program can be used: (i) entering a list of protease and RT mutations, (ii) entering a complete sequence containing protease, RT, and/or integrase, and (iii) using a Web Service. The introductory page also contains links to the release notes for HIVdb, HIVseq, and HIValg. Because the Mutation List and Sequence analysis yield similar results, these results are reviewed together. 41 HIVdb Introductory Page (http://sierra2.stanford.edu/sierra/servlet/JSierra) 42 43 A. Mutation List Analysis The screen shot below shows the form for entering a list of mutations for analysis. There are two ways for entering mutations: (i) text boxes and (ii) drop-down lists. The page currently contains forms for RT and protease; a form for integrase is pending. (i) Text boxes: The quickest way to enter mutations is by using the RT and protease text boxes. Each mutation must consist of a capitalized one-letter code for an amino acid because lower case letters are reserved to indicate insertions (ins) and deletions (del). Each mutation must be separated by one or more spaces or commas. The order of the mutations is not relevant. A preceding consensus amino acid is not necessary. Mixtures of more than one amino acid are indicated by the presence of more than one amino acid following the amino acid position. Intervening slashes between the mutations in a mixture are optional. (ii) Drop-down lists: The drop-down menus are useful primarily when the user does not have a mutation list to copy directly into a text box. The drop-down menus list amino acids at a position known to be associated with drug resistance. If a mutation is not listed on a drop down menu, it is possible to enter it into by clicking on the asterisk (*) symbol. Alternatively, it is possible to enter such mutations into the text box, which can be used in conjunction with the drop-down lists. An optional sequence identifier and date can be entered and will appear on the printed report. HIVdb Program: Mutation List Analysis Form (http://sierra2.stanford.edu/sierra/servlet/JSierra?action=hivseqMutationsInput) 44 B. Sequence Analysis The screen shot below shows the form for entering nucleic acid sequences for protease, RT, and/or integrase. There are three ways in which sequences can be entered: (i) entering one or more sequences into the Text Input box, (ii) uploading a text file containing one or more sequences or (iii) entering a GRF XML file for TruGene sequences. If a single sequence is entered it can be in FASTA format (i.e. preceded by ‘>’, a sequence descriptor, and a new-line) or as plain text consisting only of nucleic acids. If multiple sequences are entered (up to 100 at a time can be entered), they must be in FASTA format. The QA Analysis, Mutation Scores, and Mutation Comments options are selected by default. An optional sequence identifier and date can be entered and will appear on the printed report. HIVdb Program: Sequence Analysis Form (http://sierra2.stanford.edu/sierra/servlet/JSierra?action=hivseqSequenceInput) 45 C. HIVdb Output This section below contains multiple screen shots comprising the output that results from entering the following sequence into the Text Input box. >NC599-1997|AY030413 CCTCAAATCACTCTTTGGCAACGACCCATCGTCACAATAAGGATAGGAGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAG AAATGAATTTGCCAGGAAAATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTGTCAAAGTAAGACAGTATGAGCAGATACCCGTAGAAATCTGCGGACA TAAAGTTATAGGTACAGTATTAGTAGGACCTACACCTGCCAACATAATTGGAAGAAATCTGATGACTCAGCTTGGTTGTACTTTAAATTTTCCCATTAGTCCTATT GAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAGGAAAAAATAAATGCATTAGTAGAAATTTGTGCAGA AATGGAAAAGGAAGGGAAAATTTCWAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCYATAAAGAAAAAGAACAGTACTAGATGGAGAAAATTAG TAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCKCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTG GATGTGGGTGATGCATACTTTTCAGTTCCCTTATATGAAGACTTTAGAAAGTATACTGCATTTACCATACCTAGTAAAAACAATGAGACACCAGGGATTAGATAC CAGTATAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATCTTAGAGCCTTTTAGACAACAAAATCCAGACCTAGT TATCTATCAATACATGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAACTGAGACAACATCTGTTGAGGTGGGGAT TTTTCACACCAGATCAAAAACATCAGAARGAACCYCCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATACAGCTGCCAGAA The screenshot below shows the Summary Data and Sequence Quality Assessment sections. The Summary Data section indicates which codons in protease, RT, and/or integrase are present in the sequence. It notes whether there are insertions and/or deletions and shows the subtype of the protease and RT reference sequences that are most similar to the submitted sequence (‘uncorrected distance’) as well as the distance between the submitted sequence and the subtype reference. This subtype should be considered as only a first approximation. As indicated in the release notes, there are several programs that provide much more reliable HIV-1 subtyping but these are not implemented here because these other programs either may not work on short sequences (e.g. protease alone), may report an inconclusive result, or may take several seconds to minutes to assign a subtype. We are actively working to improve the subtyping of the HIVdb program. The Sequence Quality Assessment section identifies areas of poor sequence quality as indicated by the presence of stop codons or frame shifts, highly ambiguous nucleotides (B, D, H, V, N), or unusual residues (defined as mutations that occur at a frequency less than 0.05% in HIVDB). Although one or two highly ambiguous nucleotides or unusual residues may occur in a typical sequence, a localized cluster of such nucleotides suggests a problem with sequence quality in that region. The figures to the right of the table use red lines to indicate positions with a QA problem. Blue lines indicate differences from consensus B: tall blue lines indicate positions associated with drug resistance and short blue lines indicate other mutations. 46 HIVdb Output: Summary Data and QA The screenshot below shows the PI resistance interpretation. The protease mutations are divided into three categories: major PI resistance mutations, minor PI resistance mutations, and other mutations. Major mutations are defined as mutations that by themselves can reduce susceptibility to one or more PIs or as non-polymorphic mutations that are widely considered to be important in drug resistance. Minor mutations are generally considered to be accessory mutations. All major and minor mutations receive a score and/or have an associated comment. The other category includes mutations that do not receive a score. Some of these mutations may trigger a comment if they have ever been considered to be associated with drug resistance. Highly unusual mutations at major PI resistance mutation positions will be placed on the first line but may not receive a penalty score. However, such mutations will trigger a comment indicating that they are unusual mutations at an important position. After the mutations are classified, the program estimates the level of resistance to PIs based on the mutations in the submitted sequence. This section designates one of five levels of estimated drug resistance based on the total point score in the submitted sequence: susceptible, potential low-level resistance, low-level resistance, intermediate resistance, and high-level resistance. The Release Notes section explains how the mutation penalty scores are designed and used to assign these various levels. 47 The Comments section has comments for each of the NRTI- and NNRTI-resistance mutations as well as for the some of the other mutations. There are also several special comments such as the widely publicized genotypic susceptibility scores (GSSs) for TPV and DRV from the RESIST and POWER trials, respectively. The categorization of protease mutations into major, minor, and other is used consistently throughout the database. It is occasionally modified but when it is modified the latest categorization can always be found in the Release Notes. HIVdb output: Protease Mutations and PIs 48 The next screen shot shows the RTI resistance interpretation. The RT mutations are divided into three categories: NRTI-resistance mutations, NNRTI-resistance mutations, and other mutations. All NRTI- and NNRTI-resistance mutations receive at least one non-zero score. The ‘other’ category includes mutations that do not receive a score. Some of these mutations may trigger a comment if they have ever been considered to be associated with drug resistance. Highly unusual mutations at positions associated with NRTI and NNRTI resistance will be placed on the first and second lines, respectively, but may not receive a penalty score. However, such mutations will trigger a comment indicating that they are unusual mutations at an important position. Following the classification of mutations, the program assigns an estimated level of resistance to NRTIs and NNRTIs based on the mutations in the submitted sequence. The Comments section has comments for each of the NRTI and NNRTI resistance mutations as well as for some of the ‘other’ mutations. There is also a comment for the etravirine genotypic susceptibility score (GSS) from the DUET trials. HIVdb output: RT Mutations, NRTIs, and NNRTIs 49 The final part of the HIVdb output is a list of the mutation-penalty scores associated with each mutation in the submitted sequence. Each score is hyperlinked to data supporting the association between the mutation and each ARV through the Mutation ARV Evidence Listing (MARVEL) program. A complete list of all mutation scores can be found in the Release Notes and in several other locations in HIVDB. HIVdb Output: Mutation Scoring Table 50 D. Sierra Web Service Sierra is a web service that allows individuals and institutions to interact programmatically with the HIVdb program. Sierra accepts sequences from registered users and returns an XML file with the HIVdb scores, interpretations, and comments. These can then be parsed computationally to generate automated and customized reports. The web service has been in use for more than three years and is explained in detail on a separate page. 51 Sierra Web Service (http://hivdb.stanford.edu/DR/webservices/) E. Release Notes The HIVdb Release Notes also cover HIVseq and HIValg. The release notes explain how the three programs work, how mutations are classified, how mutation penalty scores are derived and how the scores are combined to generate an estimate of resistance from a submitted sequence. They also contain links to the mutation penalty scores, mutation comments, lists of not-uncommon 52 mutations, program updates, downloadable files containing the HIVdb code, the consensus B protease, RT, and integrase amino acid sequences, and sample sequence datasets. Release Notes Table of Contents (http://hivdb.stanford.edu/DR/asi/releaseNotes/index.html) 53 F. Algorithm Specification Interface (ASI) The ASI is a common platform for coding genotypic interpretation algorithms. It comprises an XML format for specifying an algorithm and a compiler that transforms the XML into executable code. The ASI makes it possible for drug resistance experts to develop and test genotypic interpretation algorithms without the assistance of a computer programmer. Algorithm Specification Interface (http://hivdb.stanford.edu/DR/asi/index.html) 54 2. HIValg Program The HIValg Program provides genotypic resistance interpretations using three algorithms: HIVdb, ANRS, and Rega. It is made possible courtesy of researchers at the ANRS (Agence Nationale de Recherches sur le SIDA) and Rega Institute. Each of the algorithms is implemented through the Algorithm Specific Interface (ASI). Like HIVdb, HIValg can be run using either submitted mutations or sequences. HIValg Program: Introductory Page (http://sierra2.stanford.edu/sierra/servlet/JSierra?action=hivalgs) 55 The HIValg Program also allows users to interpret sequences with an algorithm of their own design. To do so, users must submit an ASI-compliant algorithm (File Upload in the following figure) with their sequences. HIValg Program: Sequence Analysis (http://sierra2.stanford.edu/sierra/servlet/JSierra?action=hivseqSequenceInput) 56 3. HIVseq Program The HIVseq Program identifies mutations in a submitted sequence and reports their prevalence in HIVDB according to subtype and treatment history. Like HIVdb and HIValg, HIVseq accepts either complete sequences or lists of mutations. HIVseq Program: Introductory Page (http://sierra2.stanford.edu/sierra/servlet/JSierra?action=hivseq) The figure below was generated in response to the mutation list L10F, M36I, I54M, T80S, V82A, and I93L. Several conclusions can be drawn from the results: (i) although L10I and L10V occur in previously untreated individuals with all virus subtypes, L10F occurs only among treated individuals; (ii) although M36I occurs in only 13% of untreated individuals with subtype B viruses, it is the consensus in all other subtypes; (iii) I54M occurs only among treated individuals regardless of subtype; (iv) T80S has not been reported in more than 0.5% of sequences of any subtype regardless of treatment status. HIVseq Program: Output 57 Each mutation is a link to the reports of a particular mutation in a particular subtype. For example, clicking on the T (1%) at position 36 in PI-naïve individuals with subtype C viruses returns a page with the 20 reports of M36T in PI-naïve individuals with subtype C viruses. HIVseq Hyperlink Example: M36T in PI-Naïve Subtype C 58 4. Calibrated Population Resistance (CPR) Tool CPR is a program for performing routine analysis of sets of HIV-1 protease, RT, and integrase sequences. The program provides a standardized approach for estimating the prevalence of transmitted HIV drug resistance using population-sampled sequence data and for general batch analysis of HIV-1 pol gene sequences. The program interface consists of a text box in which users submit a set of sequences in FASTA format. Because sequences from individual drug resistance surveillance studies are rarely made publicly available, CPR makes it possible for epidemiologists to analyze different sequence datasets using precisely the same methods as published studies. CPR also ensures consistency in the handling of missing data, such as when sequences are incomplete or of poor quality. Although the CPR uses the 2009 Surveillance Drug Resistance Mutation (SDRM) list as a default to estimating resistance, each of the previously lists are also available to ensure reverse compatibility. Users also have the option to run the STAR genotyping and the HIVdb genotypic resistance interpretation programs on each submitted sequence. CPR Form (http://cpr.stanford.edu/cpr.cgi) 59 The CPR Release Notes are separate from the release notes for HIVdb, HIValg, and HIVseq. CPR Release Notes (http://cpr.stanford.edu/pages/releaseNotes.html) 60 In addition to explaining the design and output of the program, the CPR Release Notes contain an appendix with lists of mutations used to characterize each sequence. In particular, each sequence is characterized according to a list of surveillance drug-resistance mutations, a list of borderline/suspicious mutations that are not usually found in untreated individuals, a list of unusual mutations (defined as those found in fewer than 0.05% of sequences in HIVDB), and a list of mutations suggesting APOBEC-3G-mediated G-to-A hypermutation. CPR: Mutation Lists (http://cpr.stanford.edu/pages/releaseNotes.html#appendix1) 61 The following figures show the output generated by the CPR program in response to the 283 sequences described in Dilernia DA et al. AIDS Res Hum Retrovirus 2007;10:1201-7 (HIV-1 genetic diversity surveillance among newly diagnosed individuals from 2003 to 2005 in Buenes Aires, Argentia). The figures below show the three most important sections: (i) a summary of total and class-specific resistance (section 3), (ii) a list of the surveillance drug-resistance mutations (as defined by SDRM 2009) present in a dataset (section 5), and (iii) a list of the sequences with SDRMs (section 6). The results show an overall prevalence of resistance of 3.9% including 1.8% with NRTI-resistance SDRMs, 1.4% with NNRTI-resistance SDRMs, and 1.4% with PI-resistance SDRMs. Section 6 shows that two isolates had two-class resistance with NRTI- and PI-resistance SDRMs. CPR Output: Sections 2, 3, 5, and 6 62 5. Mutation ARV Evidence Listing (MARVEL) MARVEL provides the underlying data and references that link specific mutations to specific drugs. It brings together genotype-treatment, genotype-phenotype, and genotype-virological outcome correlations. Each mutation score penalty in HIVdb is linked to the MARVEL output for that mutation. The program can also be accessed directly from the form shown below. MARVEL Query Form (http://hivdb.stanford.edu/cgi-bin/Marvel.cgi) MARVEL output includes the following mutation-specific summaries: (i) HIVdb comments and scores, (ii) mutation prevalence according to subtype and drug class experience, (iii) mutation prevalence according to treatment with individual ARVs, (iv) genotype-phenotype correlations, (v) genotype-phenotype logistic regression coefficients, (vi) a list of genotype-clinical outcome correlations with the mutation of interest shown in bold. 63 MARVEL: Comments and Scores MARVEL: Mutation Frequency by Subtype and Drug Class Experience 64 MARVEL: Mutation Frequency by Treatment with Specific ARVs MARVEL: Genotype-Phenotype Correlations of Common Mutation Patterns 65 MARVEL: Genotype-Phenotype Logistic Regression Coefficients 66 MARVEL: Genotype-Clinical Outcome Correlations 67 6. ART-AiDE The Antiretroviral Therapy Acquisition and Display Engine (ART-AiDE) makes it possible to generate a permanent electronic and graphical record of a patient’s ARV history, plasma HIV-1 RNA levels, CD4 counts, and, when available, genotypic resistance data. The submitted data can be reviewed and the underlying XML file can be saved on the Graphical Summary page. Using the form below it is possible to enter data for a new patient or to load pre-existing information saved in an XML file. Here the ART-AiDE summary 3936.xml has been selected from the user’s desktop. This page also contains links to the ART-AiDE Release Notes and to a page with ART-AiDE XML schema updates. ART-AiDE Entry Point (http://dbpartners.stanford.edu/DDCRP/pages/art_aide.html) 68 The graphical summary generated by ART-AiDE displays the ARV treatment history, plasma HIV-1 RNA levels, CD4 counts, and major mutations present in a drug resistance genotype. The menu options at the bottom of the page bring the user to a page where the data can be examined in its entirety and where edits or updates to the patient record can be made and saved to the user’s desktop. A beta version of the eClinical Antiretroviral Resistance Estimator (eCARE) program is accessible from this page. eCARE uses the ARVs previously received by an individual to query HIVDB for genotypic data from individuals with similar treatment histories. ART-AiDE: Graphical Summary 69 7. Rega HIV-1 Subtyping Tool The Rega HIV-1 Subtyping Tool is the gold standard for HIV-1 subtyping. It was developed by Tulio de Oliveira, Koen Deforche, Sharon Cassol, Andrew Rambaut, and Anne-Mieke Vandamme as part of a collaboration between the Evolutionary Group at Oxford (UK), the Immunotherapeutics Program at the University of Pretoria (SA), and the Rega Institute (Belgium). The program has been made available through HIVDB as a courtesy by the program’s creators. The program accepts a FASTA sequence alignment. The Data Entry form and the subtyping process are shown in the figures below. Rega HIV-1 Subtyping Tool: Data Entry Form (http://dbpartners.stanford.edu/RegaSubtyping/) 70 Rega HIV-1 Subtyping Tool: Subtyping Process (http://dbpartners.stanford.edu/RegaSubtyping/) 71 III. EDUCATIONAL RESOURCES HIVDB contains several regularly updated sections summarizing data linking RT, protease, and integrase mutations and antiretroviral drugs (ARVs). These sections include (i) tabular summaries of the major mutations associated with each ARV class, (ii) detailed summaries of the major, minor, and accessory mutations associated with each ARV, (iii) the comments used by the HIVdb program, (iv) the scores used by the HIVdb program, (v) clinical studies in which baseline drug resistance mutations have been correlated with the virological response (‘clinical outcome’) to a specific ARV, (vi) mutations that can be used for drug resistance surveillance, and (vii) a two-page PDF handout. 1. Drug Resistance Summaries A. Tabular Drug Resistance Summaries by ARV Class B. Detailed Drug Resistance Summaries by ARV C. Drug Resistance Mutation Comments Used by the HIVdb Program D. Drug Resistance Mutation Scores Used by the HIVdb Program E. Genotype-Clinical Outcome Correlation Studies 2. Surveillance Drug-Resistance Mutation List Section 3. PDF Handout The main Drug Resistance Summaries page is shown below. It is accessed by the links at the lower right side of the home page and it provides access to items (1) – (5). Item (6) (the Surveillance Drug-Resistance Mutation List Section) can be accessed via the ‘Surveillance’ link on the home page, and item (7) (the PDF Handout) can be downloaded from the link beneath the ‘Drug Resistance Summaries’ box on the home page. 72 73 1. Drug Resistance Summaries A. Tabular Drug-Resistance Summaries by ARV Class There are five Tabular Drug-Resistance Summaries by ARV class. Each consists of a table associating the major mutations for a drug class with the drugs for that class. Bolded red text indicates mutations particularly important for drug resistance. Footnotes contain information about polymorphic mutations, minor mutations, rare mutations, and mutations that increase susceptibility to one or more drugs (the example shown below is for PIs). 74 B. Detailed Drug Resistance Summaries by ARV This section contains a summary of the major, minor, and accessory mutations for each NRTI, NNRTI, and PI, and for the fusion inhibitor enfuvirtide. It also contains a synopsis of the recommended use of each ARV for initial and salvage therapy. The text is annotated with 20 to 30 important references per ARV. The figures that follow show the summaries for the most recently approved PI – darunavir – and for the most recently approved NNRTI – etravirine. 75 76 C. Drug Resistance Mutation Comments Used by the HIVdb Program It is possible to view all of the comments for mutations associated with a specific drug class by selecting one of the Comments links in the Supporting Material for the HIVdb Program section. The following figure shows the top section of the list of mutations associated with resistance to integrase inhibitors. 77 D. Drug Resistance Mutation Scores Used by the HIVdb Program The HIVdb program uses mutation scores to estimate the level of ARV resistance in a sequenced HIV-1 isolate. Scores have been created for the NRTIs, NNRTIs, and PIs. Scores have not yet been created for the INIs because there is only one currently approved INI and published drug resistance data for this class are limited. The HIVdb Release Notes explain the scoring system: a total score (sum of the scores for all mutations in a sequence) of 0 to 9 is considered susceptible, 10 to 14 indicates potential low-level resistance, 15 to 29 indicates low-level resistance, 30 to 59 indicates intermediate resistance, and greater than 60 indicates high-level resistance. Each of the scores is hyperlinked to the MARVEL (Mutation ARV Evidence Listing) output summarizing the supporting data. The mutation scoring tables can be sorted by mutation position or drug. The examples below show the NRTI scores sorted in ascending order by position and the NNRTI scores sorted in descending order by predicted etravirine resistance. 78 E. Genotype-Clinical Outcome Correlation Studies This section contains synopses of more than 30 studies in which baseline protease mutations are correlated with the virological response to salvage therapy with a previously unused PI, 20 studies in which baseline RT mutations are correlated with the virologic response to salvage therapy with a previously unused NRTI, and one study of the NNRTI etravirine. This is described in detail in the Database Query and Reference Page Section. 79 2. Surveillance Drug-Resistance Mutation List Section The figure below summarizes the content of the Surveillance Mutations section. Two of the features have been reviewed in previous sections of this guide: (i) Mutation Prevalence According to Subtype and Treatment (database queries – Genotype-Treatment section) and (ii) Calibrated Population Resistance (CPR) tool (HIVDB Programs). Two of the features contain material not described previously: (i) the Surveillance Drug-Resistance Mutation Worksheet and (ii) the WHO 2009 List of Mutations for Surveillance of Transmitted Drug-Resistant HIV Strains. The Surveillance Drug-Resistance Mutation (SDRM) Worksheet contains tables of RT and protease mutations that can be sorted according to their presence or absence on five expert system lists (HIVDB, Rega, ANRS, IAS-USA, and Los Alamos), the number of lists on which a mutation appears, the prevalence of each mutation in untreated individuals infected with viruses of the eight most common subtypes (A, B, C, D, F, G, CRF01_AE, and CRF02_AG), the most prevalent mutation in any subtype in untreated individuals, and the most prevalent mutation in any subtype in treated individuals. The figure below shows the worksheet sorted by (i) total number of lists in descending order and (ii) mutation position in ascending order. The figure following the worksheet is the 2009 WHO SDRM list. 80 81 82 3. PDF Handout The PDF Handout contains a two page portable summary of drug resistance mutations. Screenshots of each page are shown below.