user_guide - HIV Drug Resistance Database

advertisement
1
I. DATABASE QUERY AND REFERENCE PAGES
1. Genotype-Treatment Correlations
This Genotype-Treatment section of the database links to 15 interactive query pages that explore
the relationship between treatment with HIV-1 antiretroviral drugs (ARVs) and mutations in HIV
reverse transcriptase (RT), protease, and integrase. The grey box at the top of the page provides
shortcuts to the query pages; the sections that follow provide a brief description of the following
five types of interactive query pages. The Advanced Query Pages are being modified and are not
reviewed here.
A. Treatment Profiles (Protease and RT inhibitors)
B. Mutation Profiles (Protease and RT mutations)
C. Detailed Treatment Queries (Protease, RT, and integrase inhibitors)
D. Detailed Mutation Queries (Protease, RT, and integrase mutations)
E. Mutation Prevalence According to Subtype and Treatment
2
A. Treatment Profile Queries
The figures below show two sample Protease Treatment Profile queries. The query form allows
users to input information about different types of protease inhibitor (PI) treatments. For example,
the user can request information on sequences from individuals who have never received a PI (No.
of PI = 0), on those who have received a specific number of PIs, or those who have received one
or more PIs (No. of PI = 1- 9). Users can also limit returns to particular PIs as well as to specific
viral subtypes. The user can also specify a percent cutoff below which results are not displayed.
For the two sample queries below, the top figure shows the query form and the bottom figure
shows the query output.
Sample Query 1: Protease variants in individuals who have not received a PI (all subtypes, 1%
cutoff). The consensus sequence is indicated in grey. The number of individuals from whom the
viruses were obtained (about 15,000, depending on the position) is shown in parentheses beneath
the consensus sequence. Reported variants are shown in blue; the percent of sequences in which
they are found is indicated by superscripts. The output shows that 34 of the protease’s 99 positions
contain at least one variant present in at least 1% of PI-naïve individuals.
3
4
Sample Query 2: Protease variants in individuals who have received at least one PI (all subtypes,
1% cutoff). The output indicates that 52 of the protease’s 99 positions have at least one reported
variant present in at least 1% of PI-treated individuals. If a lower cutoff had been set (e.g. 0.1%), a
higher proportion of positions would have been reported as variants.
5
6
B. Mutation Profile Queries
The figures below show two sample Mutation Profile queries. The query form allows the user to
enter either an amino acid position alone or both an amino acid position and a specific amino acid.
The output shows the prevalence of mutations at a given amino acid position in viruses from
individuals receiving specific antiretroviral (ARV) treatments. The RT Mutation Profile query output
characterizes viruses according to the nucleoside RT inhibitor (NRTI) and non-nucleoside RT
inhibitor (NNRTI) treatment status of the individuals from whom the sequenced viruses were
obtained. The Protease Mutation Profile query output characterizes viruses according to the PI
treatment status of the individuals from whom the viruses were obtained. For the two sample
queries below, the top figure shows the query form and the bottom shows the query output.
Sample Mutation Profile Query 1: NRTI and NNRTI treatment status of individuals with viruses with
mutations at RT position 106. The output shows that three mutations occur at RT position 106:
V106I, V106A, and V106M. V106I is present in 1.7% of viruses from 16,572 untreated individuals
(row 1), 1.5% of viruses from 4,467 NRTI-treated (but NNRTI-naïve) individuals (row 2), and 3.6%
of viruses from 7,040 NNRTI-treated individuals (row 3). Therefore, although it is weakly
associated with NNRTI therapy, it is also a polymorphism that occurs in nearly 2% of individuals
who never received an RT inhibitor. In contrast, both V106A and V106M are strongly associated
with NNRTI therapy: each occurs in 1.7% of NNRTI-treated individuals but in no untreated
individuals. V106A appears to be selected for more strongly by treatment with nevirapine (NVP)
than with efavirenz (EFV), whereas V106M appears to be selected for more strongly by treatment
with efavirenz (EFV).
7
8
Sample Mutation Profile Query 2: PI treatment status of individuals with viruses with mutations at
protease position 82. The output shows that eight mutations occur at protease position 82:
V82A/I/T/F/S/C/L/M. V82I is a polymorphism that occurs in nearly 7% of untreated individuals and
is generally not selected by PIs. The remaining seven mutations are nonpolymorphic (they do not
occur in untreated individuals) but do occur in persons receiving PIs. Although V82M/C are
generally not reported as PI-resistance mutations, these data suggest they probably should be.
9
10
C. Detailed Treatment Queries
The figures below show an example of a Detailed RT Inhibitor query. The query form requests
sequences from individuals who have received a combination of AZT+3TC+nevirapine (NVP). The
query output shows the top part of the first of two pages of results. The results consist of a table
containing columns with a (i) published reference (indicated by Author and Yr), (ii) individual ID, (iii)
isolate ID, (iv) GenBank accession number, (v) NRTIs received, (vi) NNRTIs received, (vii) NRTIresistance mutations, (viii) NNRTI-resistance mutations, and (ix) subtype. If the query had specified
‘Complete Mutation List’ as an option, the columns with the NRTI and NNRTI-resistance mutations
would be replaced with a single column containing all of the mutations in each sequence. The
option to confine the query to individuals with viruses belonging to a specific subtype was not
selected in this example.
The drop-down boxes at the top left of the output page allow users to retrieve nucleotide
sequences meeting the query’s criteria. The drop-down boxes at the top right of the output page
allow users to view a composite alignment that summarizes the data in the table in a manner
similar to that shown above in the Treatment Profile queries.
11
12
D. Detailed Mutation Queries
The figures below show an example of a Detailed Mutation query. The query form requests data
from individuals with subtype C viruses with the NRTI mutation K65R. The output shows that
viruses from 32 individuals (reported in 15 literature references) met the query criteria. The table
contains columns with (i) the publication, (ii) subject ID, (iii) isolate ID, (iv) GenBank accession
number, (v) list of NRTIs, (vi) list of NNRTIs, (vii) list of NRTI resistance mutations, (viii) list of
NNRTI resistance mutations, (ix) subtype, and (x-xii) complete list of the ARV regimens received
by the persons with subtype C viruses containing K65R. The drop-down box at the top left of the
page allows users to retrieve the sequences. The drop-down sequence at the top right allows
users to view the sequences as a ‘Composite Alignment’ that shows the percent prevalence of
each mutation at each position in the dataset.
13
14
E. Mutation Prevalence According to Subtype and Treatment
The purpose of this query is to identify, for the eight most common subtypes, the frequency of all
protease and RT mutations in untreated and treated individuals. The following screen shots show a
Mutation Prevalence According to Subtype and Treatment query form and results. The results
comprise aggregate data from approximately 25,000 individuals. The query form requests
comparisons of the prevalence of mutations in RT inhibitor-naïve and RT inhibitor-experienced
individuals (NRTI- and/or NNRTI-experienced). Mutations occurring below the selected cutoff of
0.5% are not shown.
The query output is shown as two screen shots. The top shows the table header indicating that
there are 18 total columns. The first two contain the position and subtype B consensus amino acid.
The next 16 columns show the numbers of RT inhibitor-naïve and RT inhibitor-treated individuals
with viruses belonging to each of the eight most common subtypes.
The output table contains 560 rows, one for each of the 560 RT amino acid positions. Because this
cannot be readily shown here, we show a screenshot of the portion of the table containing rows 98
to 108 — a region containing several positions associated with NNRTI resistance (98, 100, 101,
103, 106, 108). Several observations can be made from this part of the table: (i) A98S, K101R/Q,
K103R, and V106I are relatively common polymorphisms that occur in multiple subtypes even in
the absence of treatment; V108I occurs in CRF01_AG alone in the absence of treatment, (ii)
A98G, L100I, K101E/P/H/N, V106M/A occur solely in treated individuals in multiple subtypes;
V108I occurs at higher proportions in treated individuals in most subtypes, and (iii) V106M occurs
preferentially in treated individuals with subtype C viruses. In contrast, V106A is the only mutation
at this position that occurs in more than 0.5% of treated individuals infected with subtype B viruses.
15
16
2. Genotype-Phenotype Correlations
The main page of the Genotype-Phenotype Correlations section links to four interactive query
pages: three dynamically updated data summaries and one regularly updated downloadable
dataset.
A. Drug Resistance Positions – Query for levels of resistance associated with known drug
resistance mutations
B. Detailed Phenotype Queries – Queries for levels of resistance associated with individual
mutations or mutation combinations at all positions of protease, RT, and integrase
C. Patterns of Drug Resistance Mutations
D. Downloadable Reference Dataset
A. Drug Resistance Positions
The Drug Resistance Positions query form lists known PI, NRTI, and NNRTI drug-resistance
positions. In this example, position 215 in RT– a known NRTI-resistance position – is selected. The
results show aggregate drug-susceptibility data on viruses with the most common patterns of
mutations that also contain mutations at RT position 215. The first row lists viruses with the
combination of the following four NRTI-resistance mutations: M41L, M184V, L210W, and T215Y.
Phenotypic drug-susceptibility results for AZT were available in the database for 19 viruses with
this pattern of mutation. The median reduction in susceptibility (or level of resistance) to AZT for
viruses with these mutations is 7.5 fold, with an inter-quartile range of 5.2 to 27 fold.
It is important to bear in mind that the clinical significance of different levels of resistance varies
among different drugs. Drug susceptibility results on viruses with T215Y alone (i.e. without any
other accompanying mutations) are available for eight viruses. The median reduction in
susceptibility was 13 fold. The higher level of resistance in this virus compared with the one
containing four mutations is due to the fact that the mutation M184V increases susceptibility to
AZT.
17
18
B. Detailed Phenotype Queries
The query form allows users to specify (i) individual mutations or combinations of mutations, (ii)
one or more ARVs, and (iii) one or more methods of susceptibility testing. When specifying
mutations, it is possible to request sequences with an exact match (i.e. no other major drugresistance mutations are present in the retrieved sequence) using the check box labeled ‘With no
other NRTI/NNRTI mutations’ or ‘With no other PI mutations’. The figures below show two
examples of Detailed Phenotype queries.
Sample Query #1: Y181C+V179F and etravirine (ETR) susceptibility (Virco assay). This query
returns four laboratory virus isolates with mutations that emerged during in vitro passage or that
were introduced during site-directed mutagenesis. The results show that these two mutations
alone are associated with high-level (32-to-130 fold) resistance to ETR.
19
20
Sample Query #2: Mutations at positions 10 + 46 + 54 + 82 + 84 + 90, All PIs, and the
PhenoSense Assay are selected. The output shows that 26 results on five isolates from four
references are available. This pattern of mutations is associated with resistance to each of the
seven first-generation PIs: ATV, FPV, IDV, LPV, NFV, RTV, and SQV. No susceptibility data are
available for the second-generation PIs TPV and DRV for this pattern of mutations using the
PhenoSense Assay. However, running the same query and specifying the Virco Assay yields six
viruses, of which five were tested for TPV susceptibility and four were found to have high-level TPV
resistance.
21
22
C. Patterns of Drug Resistance Mutations
This section contains dynamically updated datasets of drug susceptibility results obtained with the
PhenoSense assay of viruses containing the most common patterns of drug resistance mutations.
There are separate datasets for NRTI-, NNRTI-, and PI-resistance mutations. In each table, the
first column contains the mutation pattern. The second column contains the number of sequences
in the database with that pattern. The remaining columns contain the median fold decrease in
susceptibility for the drug listed in the column header. The subscript indicates the number of results
available for a particular mutation pattern and drug. Sequences containing electrophoretic
evidence of a mixture of wildtype and mutant variants at a major drug-resistance position are
excluded from this table.
The figures below show the top part of the tables for the NRTI and NNRTI mutation datasets.
Viruses without drug resistance mutations (row 1) have median fold reductions in susceptibilities of
<1.0. Low and high levels of decreased susceptibility are shown in pink and red cells, respectively.
The high level of cross-resistance among the first-generation NNRTIs is readily apparent. There
are insufficient data (particularly with the PhenoSense assay) on the second-generation NNRTI
etravirine to include it in the table. However, based on approximately 100 results, obtained
primarily with the Virco assay, viruses with most of these patterns of NNRTI-resistance mutations
are likely to retain etravirine susceptibility.
23
24
D. Downloadable Reference Datasets
This section provides researchers the opportunity to download all of the drug susceptibility data in
HIVDB. The data are provided as six text files that can be parsed based on the description of the
fields in the dataset. This section is updated every 6 to 12 months.
25
3. Genotype-Clinical Correlations
This part of the database has two main sections:
A. Clinical Trials Datasets
B. Summaries of Clinical Studies
A. Clinical Trials Datasets
This section contains data linking ARV treatments, genotypic resistance data, and virological
response (plasma HIV-1 RNA levels) to a new treatment regimen. The data are from past clinical
trials and are supplemented with datasets from well-characterized retrospective studies.
The figure below shows the introductory page to the data from ACTG trial 384. The row labeled
‘References’ has links to PDFs of the study’s publication. The row labeled ‘HIVDB’ provides links to
the RT and protease data as they appear in the References section of the database. The row
labeled ‘Sequence Quality’ links to the sequences that were excluded from the dataset due to an
issue with the quality of the sequences or accompanying data. The row labeled ‘Sequences’
contains links to the complete set of protease and RT sequences from the study for download (text
files) or browsing (html files). The row labeled ‘Browse’ contains links to summary figures with
information on one individual per figure. The row labeled ‘Dataset’ contains links to pages for
downloading complete sets of data from the study (ARV treatments, genotypes, and plasma HIV-1
RNA levels).
26
27
The figure on the right below shows the page with 902 summary figures. The 902 figures contain
a total of 1,158 protease and RT sequences because some individuals had sequences obtained at
more than one time point. Beneath these two screen shots are examples of two summary figures
with explanations.
28
The summary figures provide a quality control check for the correct temporal relationship
between treatments, genotypes, and plasma HIV-1 RNA levels. The examples that follow
summarize data from two individuals who developed consecutive virological failures. The initial
ARV treatment regimens used in these individuals are no longer recommended today.
Individual 39659 initiated therapy with the NRTIs d4T + ddI and the PI nelfinavir (NFV). Virological
failure developed with the canonical NFV-resistance mutations D30N + N88D and the four NRTIresistance mutations K65R, D67G, K70E, and Q151M. The subsequent regimen containing the
NRTIs AZT + 3TC and the NNRTI efavirenz (EFV) was pre-determined by the ACTG 384 study
protocol. Genotypic resistance testing to guide clinical decision making was not yet standard at this
time. Although there was an initial response to the second regimen, most likely due to the potent
antiviral activity of EFV, the response was short-lived, presumably because the accompanying
NRTIs were ineffective in the face of the NRTI-resistance mutations that were present at the start
of therapy. Although a follow-up genotype is not shown, one would expect the resulting virus to
have developed EFV resistance.
Individual 40207 initiated therapy with the NRTIs d4T + ddI and EFV. Virological failure ensued
with the development of the NNRTI-resistance mutation K103N and the NRTI-resistance mutation
T215Y. Subsequent therapy with AZT + 3TC + NFV resulted in the addition of the 3TC-resistance
mutation M184V and the NFV-resistance mutation N88S. Subsequent salvage therapy with
ritonavir-boosted amprenavir was successful, possibly because N88S is known to render HIV-1
more susceptible to amprenavir. It is notable that, despite the virological failures, both individuals
exhibited a gradual increase in their CD4 counts, indicating that therapy is often beneficial even if
virological suppression is incomplete.
29
30
B. Summaries of Clinical Studies
There have been many studies of the association between pre-therapy drug-resistance mutations
and the virological response to a new treatment regimen containing a previously unused ARV.
Because none of the raw data from such studies have been published, we have summarized these
studies in this section. About 50 studies of this type have been published, including more than 30
for PIs, about 20 for NRTIs, and one for NNRTIs. Each of these studies is underpowered, due to
the many different combinations of mutations often present at baseline and the many covariates
associated with the virological response to a new drug: pre-therapy plasma HIV-1 RNA level and
CD4 count, the extent of past ARV treatment, and the drugs used in combination with the ARV
being analyzed. The studies are therefore most reliable when they identify mutations for which
there is independent evidence of an association with resistance (such as phenotypic data or the
emergence of the mutation during drug exposure) or when multiple studies of this type identify the
same drug resistance mutations as being associated with virological failure. The screen shots
below show parts of the page summarizing baseline protease mutations and the virological
response to a new PI-containing regimen.
31
32
4. References
This part of the database has two main sections: one with summaries of the data from each of the
references in HIVDB and one in which every primate immunodeficiency virus sequence in
GenBank is annotated according to its presence or absence in HIVDB.
A. Studies in HIVDB
B. GenBank <=> HIVDB
A. Studies in HIVDB
This page lists the more than 900 references in the database in alphabetical order by the last
name of the first author. The scroll box makes it possible to find references using the name of any
of the references’ authors. The first column, which contains the name of the first author and year of
publication, is linked to the reference itself (the PubMed entry, a meeting poster or abstract, or a
description of the origin of the sequences in the study). The last column contains a link to the data
present in the reference.
The first page of a reference contains a link to the reference and summarizes the number and
types of virus isolates in that reference. In example #1 below, the reference contains 14 clinical
integrase isolates. These were the first publicly available isolates from individuals receiving the
integrase inhibitor raltegravir. In example #2, the reference contains 3,195 clinical isolates
containing RT and protease.
Page with Number of Isolates in a Reference: Example 1
Page with Number of Isolates in a Reference: Example 2
33
The ‘IN Clinical: 14’ link for the Charpentier study takes us to a page containing a table which
summarizes the isolates by Subject ID, Isolate ID, ARV treatment, and a list of the mutations in the
sequence according to whether the mutations are known major or minor mutations, other
commonly observed variants, or unusual variants. The drop-down menus at the top of this page
allow users to download the complete set of sequences in the study or to obtain four other views of
the data: ‘Complete Rx’, ‘Isolate Data’, ‘Mutation Categories’, and ‘Susceptibility Data’.
34
The ‘PR Clinical: 3195’ link for the Baxter study takes us to a page with same format as
described above. Each of the sequences in this study (from the RESIST clinical trial) was
submitted with the approval of the study’s sponsor, Boehringer Ingelheim, to GenBank by the
study’s authors – something that is not done nearly often enough. The authors also provided an
extensive number of phenotypic susceptibility results for viruses obtained from the study subjects.
The susceptibility data can be viewed by choosing ‘Susceptibility Data’ from the drop-down menu
on the first page. A sample of the data is shown in the figure below.
35
B. GenBank <=> HIVDB
This part of the database organizes HIV-1, HIV-2, and non-human primate lentivirus pol (RT,
protease, and integrase) sequences according to the sequence’s primary reference cited in the
GenBank annotation. This makes it possible to summarize the more than 100,000 pol sequences
in GenBank using a list of about 1,300 references. The figure below contains a table listing these
references. The scroll box on the left makes it possible to search the GenBank references by
author. The scroll box on the right makes it possible to sort the entries in the table by each of the
fields in the table.
The first field contains the author and year of publication of the primary reference in the GenBank
entry. If it is a PubMed reference then the field links to the PubMed abstract. The fourth field is the
BLAST E-value of the sequence in the study with the lowest value. The E-value is a measure of
the similarity of the sequence to the HIV-1 consensus B sequence; the lower the E-value, the more
similar the sequence is to the HIV-1 consensus B sequence. The ‘# in GB’ field indicates the
number of sequences in GenBank that cite this reference as the primary reference. The ‘# in
HIVDB’ field indicates the number of sequences from this GenBank reference that are in HIVDB.
The Annotation field indicates if the sequences from the study are in HIVDB (‘HIVDB’), if the
sequences are being evaluated for entry into HIVDB (‘Pending’, ‘New’, ‘Unpublished’, and ‘ARV Rx
and/or other data are N/A’), or have been intentionally left out of HIVDB (‘Gene fragments’,
‘Sequence quality’, ‘Laboratory/experimental isolate’, and ‘Evolution/quasispecies study’). The
annotations themselves link to a page with a table that defines each annotation and provides a
rationale for inclusion or exclusion in HIVDB. ‘Evolution/quasispecies studies’ are studies with
many clones and/or sequences from a few individuals that address questions about HIV evolution
but not necessarily about drug resistance. The decision to exclude the sequences from these types
of studies will be periodically re-evaluated as more resources become available.
36
37
5. New Submissions
Approximately every three months, the New Submissions section lists the studies that have been
entered into HIVDB. The study title links to the introductory page of the study in the References
section.
38
6. Database Statistics
Database Statistics (http://hivdb.stanford.edu/cgi-bin/Summary.cgi)
39
II. INTERACTIVE PROGRAMS
HIVDB has seven main interactive programs.
1. HIVdb Program
A. Mutation List Analysis
B. Sequence Analysis
C. HIVdb Output
D. Sierra Web Service
E. Release Notes
F. Algorithm Specification Interface (ASI)
2. HIValg Program
3. HIVseq Program
4. Calibrated Population Resistance (CPR) tool
5. Mutation ARV Evidence Listing (MARVEL)
6. ART-AiDE
7. Rega HIV-1 Subtyping tool
40
1. HIVdb Program
The screen shot below shows the introductory page to the HIVdb program. There are three ways in
which the program can be used: (i) entering a list of protease and RT mutations, (ii) entering a
complete sequence containing protease, RT, and/or integrase, and (iii) using a Web Service. The
introductory page also contains links to the release notes for HIVdb, HIVseq, and HIValg. Because
the Mutation List and Sequence analysis yield similar results, these results are reviewed together.
41
HIVdb Introductory Page (http://sierra2.stanford.edu/sierra/servlet/JSierra)
42
43
A. Mutation List Analysis
The screen shot below shows the form for entering a list of mutations for analysis. There are two
ways for entering mutations: (i) text boxes and (ii) drop-down lists. The page currently contains
forms for RT and protease; a form for integrase is pending.
(i) Text boxes: The quickest way to enter mutations is by using the RT and protease text boxes.
Each mutation must consist of a capitalized one-letter code for an amino acid because lower case
letters are reserved to indicate insertions (ins) and deletions (del). Each mutation must be
separated by one or more spaces or commas. The order of the mutations is not relevant. A
preceding consensus amino acid is not necessary. Mixtures of more than one amino acid are
indicated by the presence of more than one amino acid following the amino acid position.
Intervening slashes between the mutations in a mixture are optional.
(ii) Drop-down lists: The drop-down menus are useful primarily when the user does not have a
mutation list to copy directly into a text box. The drop-down menus list amino acids at a position
known to be associated with drug resistance. If a mutation is not listed on a drop down menu, it is
possible to enter it into by clicking on the asterisk (*) symbol. Alternatively, it is possible to enter
such mutations into the text box, which can be used in conjunction with the drop-down lists.
An optional sequence identifier and date can be entered and will appear on the printed report.
HIVdb Program: Mutation List Analysis Form
(http://sierra2.stanford.edu/sierra/servlet/JSierra?action=hivseqMutationsInput)
44
B. Sequence Analysis
The screen shot below shows the form for entering nucleic acid sequences for protease, RT,
and/or integrase. There are three ways in which sequences can be entered: (i) entering one or
more sequences into the Text Input box, (ii) uploading a text file containing one or more sequences
or (iii) entering a GRF XML file for TruGene sequences.
If a single sequence is entered it can be in FASTA format (i.e. preceded by ‘>’, a sequence
descriptor, and a new-line) or as plain text consisting only of nucleic acids. If multiple sequences
are entered (up to 100 at a time can be entered), they must be in FASTA format.
The QA Analysis, Mutation Scores, and Mutation Comments options are selected by default. An
optional sequence identifier and date can be entered and will appear on the printed report.
HIVdb Program: Sequence Analysis Form
(http://sierra2.stanford.edu/sierra/servlet/JSierra?action=hivseqSequenceInput)
45
C. HIVdb Output
This section below contains multiple screen shots comprising the output that results from entering
the following sequence into the Text Input box.
>NC599-1997|AY030413
CCTCAAATCACTCTTTGGCAACGACCCATCGTCACAATAAGGATAGGAGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAG
AAATGAATTTGCCAGGAAAATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTGTCAAAGTAAGACAGTATGAGCAGATACCCGTAGAAATCTGCGGACA
TAAAGTTATAGGTACAGTATTAGTAGGACCTACACCTGCCAACATAATTGGAAGAAATCTGATGACTCAGCTTGGTTGTACTTTAAATTTTCCCATTAGTCCTATT
GAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAGGAAAAAATAAATGCATTAGTAGAAATTTGTGCAGA
AATGGAAAAGGAAGGGAAAATTTCWAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCYATAAAGAAAAAGAACAGTACTAGATGGAGAAAATTAG
TAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCKCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTG
GATGTGGGTGATGCATACTTTTCAGTTCCCTTATATGAAGACTTTAGAAAGTATACTGCATTTACCATACCTAGTAAAAACAATGAGACACCAGGGATTAGATAC
CAGTATAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATCTTAGAGCCTTTTAGACAACAAAATCCAGACCTAGT
TATCTATCAATACATGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAACTGAGACAACATCTGTTGAGGTGGGGAT
TTTTCACACCAGATCAAAAACATCAGAARGAACCYCCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATACAGCTGCCAGAA
The screenshot below shows the Summary Data and Sequence Quality Assessment sections.
The Summary Data section indicates which codons in protease, RT, and/or integrase are present
in the sequence. It notes whether there are insertions and/or deletions and shows the subtype of
the protease and RT reference sequences that are most similar to the submitted sequence
(‘uncorrected distance’) as well as the distance between the submitted sequence and the subtype
reference. This subtype should be considered as only a first approximation. As indicated in the
release notes, there are several programs that provide much more reliable HIV-1 subtyping but
these are not implemented here because these other programs either may not work on short
sequences (e.g. protease alone), may report an inconclusive result, or may take several seconds
to minutes to assign a subtype. We are actively working to improve the subtyping of the HIVdb
program.
The Sequence Quality Assessment section identifies areas of poor sequence quality as
indicated by the presence of stop codons or frame shifts, highly ambiguous nucleotides (B, D, H,
V, N), or unusual residues (defined as mutations that occur at a frequency less than 0.05% in
HIVDB). Although one or two highly ambiguous nucleotides or unusual residues may occur in a
typical sequence, a localized cluster of such nucleotides suggests a problem with sequence quality
in that region. The figures to the right of the table use red lines to indicate positions with a QA
problem. Blue lines indicate differences from consensus B: tall blue lines indicate positions
associated with drug resistance and short blue lines indicate other mutations.
46
HIVdb Output: Summary Data and QA
The screenshot below shows the PI resistance interpretation. The protease mutations are
divided into three categories: major PI resistance mutations, minor PI resistance mutations, and
other mutations. Major mutations are defined as mutations that by themselves can reduce
susceptibility to one or more PIs or as non-polymorphic mutations that are widely considered to be
important in drug resistance. Minor mutations are generally considered to be accessory mutations.
All major and minor mutations receive a score and/or have an associated comment. The other
category includes mutations that do not receive a score. Some of these mutations may trigger a
comment if they have ever been considered to be associated with drug resistance. Highly unusual
mutations at major PI resistance mutation positions will be placed on the first line but may not
receive a penalty score. However, such mutations will trigger a comment indicating that they are
unusual mutations at an important position.
After the mutations are classified, the program estimates the level of resistance to PIs based on
the mutations in the submitted sequence. This section designates one of five levels of estimated
drug resistance based on the total point score in the submitted sequence: susceptible, potential
low-level resistance, low-level resistance, intermediate resistance, and high-level resistance. The
Release Notes section explains how the mutation penalty scores are designed and used to assign
these various levels.
47
The Comments section has comments for each of the NRTI- and NNRTI-resistance mutations
as well as for the some of the other mutations. There are also several special comments such as
the widely publicized genotypic susceptibility scores (GSSs) for TPV and DRV from the RESIST
and POWER trials, respectively.
The categorization of protease mutations into major, minor, and other is used consistently
throughout the database. It is occasionally modified but when it is modified the latest categorization
can always be found in the Release Notes.
HIVdb output: Protease Mutations and PIs
48
The next screen shot shows the RTI resistance interpretation. The RT mutations are divided into
three categories: NRTI-resistance mutations, NNRTI-resistance mutations, and other mutations. All
NRTI- and NNRTI-resistance mutations receive at least one non-zero score. The ‘other’ category
includes mutations that do not receive a score. Some of these mutations may trigger a comment if
they have ever been considered to be associated with drug resistance. Highly unusual mutations
at positions associated with NRTI and NNRTI resistance will be placed on the first and second
lines, respectively, but may not receive a penalty score. However, such mutations will trigger a
comment indicating that they are unusual mutations at an important position.
Following the classification of mutations, the program assigns an estimated level of resistance to
NRTIs and NNRTIs based on the mutations in the submitted sequence. The Comments section
has comments for each of the NRTI and NNRTI resistance mutations as well as for some of the
‘other’ mutations. There is also a comment for the etravirine genotypic susceptibility score (GSS)
from the DUET trials.
HIVdb output: RT Mutations, NRTIs, and NNRTIs
49
The final part of the HIVdb output is a list of the mutation-penalty scores associated with each
mutation in the submitted sequence. Each score is hyperlinked to data supporting the association
between the mutation and each ARV through the Mutation ARV Evidence Listing (MARVEL)
program. A complete list of all mutation scores can be found in the Release Notes and in several
other locations in HIVDB.
HIVdb Output: Mutation Scoring Table
50
D. Sierra Web Service
Sierra is a web service that allows individuals and institutions to interact programmatically with the
HIVdb program. Sierra accepts sequences from registered users and returns an XML file with the
HIVdb scores, interpretations, and comments. These can then be parsed computationally to
generate automated and customized reports. The web service has been in use for more than three
years and is explained in detail on a separate page.
51
Sierra Web Service (http://hivdb.stanford.edu/DR/webservices/)
E. Release Notes
The HIVdb Release Notes also cover HIVseq and HIValg. The release notes explain how the three
programs work, how mutations are classified, how mutation penalty scores are derived and how
the scores are combined to generate an estimate of resistance from a submitted sequence. They
also contain links to the mutation penalty scores, mutation comments, lists of not-uncommon
52
mutations, program updates, downloadable files containing the HIVdb code, the consensus B
protease, RT, and integrase amino acid sequences, and sample sequence datasets.
Release Notes Table of Contents (http://hivdb.stanford.edu/DR/asi/releaseNotes/index.html)
53
F. Algorithm Specification Interface (ASI)
The ASI is a common platform for coding genotypic interpretation algorithms. It comprises an XML
format for specifying an algorithm and a compiler that transforms the XML into executable code.
The ASI makes it possible for drug resistance experts to develop and test genotypic interpretation
algorithms without the assistance of a computer programmer.
Algorithm Specification Interface (http://hivdb.stanford.edu/DR/asi/index.html)
54
2. HIValg Program
The HIValg Program provides genotypic resistance interpretations using three algorithms: HIVdb,
ANRS, and Rega. It is made possible courtesy of researchers at the ANRS (Agence Nationale de
Recherches sur le SIDA) and Rega Institute. Each of the algorithms is implemented through the
Algorithm Specific Interface (ASI). Like HIVdb, HIValg can be run using either submitted mutations
or sequences.
HIValg Program: Introductory Page (http://sierra2.stanford.edu/sierra/servlet/JSierra?action=hivalgs)
55
The HIValg Program also allows users to interpret sequences with an algorithm of their own
design. To do so, users must submit an ASI-compliant algorithm (File Upload in the following
figure) with their sequences.
HIValg Program: Sequence Analysis
(http://sierra2.stanford.edu/sierra/servlet/JSierra?action=hivseqSequenceInput)
56
3. HIVseq Program
The HIVseq Program identifies mutations in a submitted sequence and reports their prevalence in
HIVDB according to subtype and treatment history. Like HIVdb and HIValg, HIVseq accepts either
complete sequences or lists of mutations.
HIVseq Program: Introductory Page (http://sierra2.stanford.edu/sierra/servlet/JSierra?action=hivseq)
The figure below was generated in response to the mutation list L10F, M36I, I54M, T80S, V82A,
and I93L. Several conclusions can be drawn from the results: (i) although L10I and L10V occur in
previously untreated individuals with all virus subtypes, L10F occurs only among treated
individuals; (ii) although M36I occurs in only 13% of untreated individuals with subtype B viruses, it
is the consensus in all other subtypes; (iii) I54M occurs only among treated individuals regardless
of subtype; (iv) T80S has not been reported in more than 0.5% of sequences of any subtype
regardless of treatment status.
HIVseq Program: Output
57
Each mutation is a link to the reports of a particular mutation in a particular subtype. For
example, clicking on the T (1%) at position 36 in PI-naïve individuals with subtype C viruses
returns a page with the 20 reports of M36T in PI-naïve individuals with subtype C viruses.
HIVseq Hyperlink Example: M36T in PI-Naïve Subtype C
58
4. Calibrated Population Resistance (CPR) Tool
CPR is a program for performing routine analysis of sets of HIV-1 protease, RT, and integrase
sequences. The program provides a standardized approach for estimating the prevalence of
transmitted HIV drug resistance using population-sampled sequence data and for general batch
analysis of HIV-1 pol gene sequences. The program interface consists of a text box in which users
submit a set of sequences in FASTA format.
Because sequences from individual drug resistance surveillance studies are rarely made publicly
available, CPR makes it possible for epidemiologists to analyze different sequence datasets using
precisely the same methods as published studies. CPR also ensures consistency in the handling
of missing data, such as when sequences are incomplete or of poor quality. Although the CPR
uses the 2009 Surveillance Drug Resistance Mutation (SDRM) list as a default to estimating
resistance, each of the previously lists are also available to ensure reverse compatibility.
Users also have the option to run the STAR genotyping and the HIVdb genotypic resistance
interpretation programs on each submitted sequence.
CPR Form (http://cpr.stanford.edu/cpr.cgi)
59
The CPR Release Notes are separate from the release notes for HIVdb, HIValg, and HIVseq.
CPR Release Notes (http://cpr.stanford.edu/pages/releaseNotes.html)
60
In addition to explaining the design and output of the program, the CPR Release Notes contain
an appendix with lists of mutations used to characterize each sequence. In particular, each
sequence is characterized according to a list of surveillance drug-resistance mutations, a list of
borderline/suspicious mutations that are not usually found in untreated individuals, a list of unusual
mutations (defined as those found in fewer than 0.05% of sequences in HIVDB), and a list of
mutations suggesting APOBEC-3G-mediated G-to-A hypermutation.
CPR: Mutation Lists (http://cpr.stanford.edu/pages/releaseNotes.html#appendix1)
61
The following figures show the output generated by the CPR program in response to the 283
sequences described in Dilernia DA et al. AIDS Res Hum Retrovirus 2007;10:1201-7 (HIV-1
genetic diversity surveillance among newly diagnosed individuals from 2003 to 2005 in Buenes
Aires, Argentia). The figures below show the three most important sections: (i) a summary of total
and class-specific resistance (section 3), (ii) a list of the surveillance drug-resistance mutations (as
defined by SDRM 2009) present in a dataset (section 5), and (iii) a list of the sequences with
SDRMs (section 6).
The results show an overall prevalence of resistance of 3.9% including 1.8% with NRTI-resistance
SDRMs, 1.4% with NNRTI-resistance SDRMs, and 1.4% with PI-resistance SDRMs. Section 6
shows that two isolates had two-class resistance with NRTI- and PI-resistance SDRMs.
CPR Output: Sections 2, 3, 5, and 6
62
5. Mutation ARV Evidence Listing (MARVEL)
MARVEL provides the underlying data and references that link specific mutations to specific drugs.
It brings together genotype-treatment, genotype-phenotype, and genotype-virological outcome
correlations. Each mutation score penalty in HIVdb is linked to the MARVEL output for that
mutation. The program can also be accessed directly from the form shown below.
MARVEL Query Form (http://hivdb.stanford.edu/cgi-bin/Marvel.cgi)
MARVEL output includes the following mutation-specific summaries: (i) HIVdb comments and
scores, (ii) mutation prevalence according to subtype and drug class experience, (iii) mutation
prevalence according to treatment with individual ARVs, (iv) genotype-phenotype correlations, (v)
genotype-phenotype logistic regression coefficients, (vi) a list of genotype-clinical outcome
correlations with the mutation of interest shown in bold.
63
MARVEL: Comments and Scores
MARVEL: Mutation Frequency by Subtype and Drug Class Experience
64
MARVEL: Mutation Frequency by Treatment with Specific ARVs
MARVEL: Genotype-Phenotype Correlations of Common Mutation Patterns
65
MARVEL: Genotype-Phenotype Logistic Regression Coefficients
66
MARVEL: Genotype-Clinical Outcome Correlations
67
6. ART-AiDE
The Antiretroviral Therapy Acquisition and Display Engine (ART-AiDE) makes it possible to
generate a permanent electronic and graphical record of a patient’s ARV history, plasma HIV-1
RNA levels, CD4 counts, and, when available, genotypic resistance data. The submitted data can
be reviewed and the underlying XML file can be saved on the Graphical Summary page.
Using the form below it is possible to enter data for a new patient or to load pre-existing
information saved in an XML file. Here the ART-AiDE summary 3936.xml has been selected from
the user’s desktop.
This page also contains links to the ART-AiDE Release Notes and to a page with ART-AiDE XML
schema updates.
ART-AiDE Entry Point (http://dbpartners.stanford.edu/DDCRP/pages/art_aide.html)
68
The graphical summary generated by ART-AiDE displays the ARV treatment history, plasma
HIV-1 RNA levels, CD4 counts, and major mutations present in a drug resistance genotype. The
menu options at the bottom of the page bring the user to a page where the data can be examined
in its entirety and where edits or updates to the patient record can be made and saved to the
user’s desktop. A beta version of the eClinical Antiretroviral Resistance Estimator (eCARE)
program is accessible from this page. eCARE uses the ARVs previously received by an individual
to query HIVDB for genotypic data from individuals with similar treatment histories.
ART-AiDE: Graphical Summary
69
7. Rega HIV-1 Subtyping Tool
The Rega HIV-1 Subtyping Tool is the gold standard for HIV-1 subtyping. It was developed by
Tulio de Oliveira, Koen Deforche, Sharon Cassol, Andrew Rambaut, and Anne-Mieke Vandamme
as part of a collaboration between the Evolutionary Group at Oxford (UK), the Immunotherapeutics
Program at the University of Pretoria (SA), and the Rega Institute (Belgium). The program has
been made available through HIVDB as a courtesy by the program’s creators.
The program accepts a FASTA sequence alignment. The Data Entry form and the subtyping
process are shown in the figures below.
Rega HIV-1 Subtyping Tool: Data Entry Form (http://dbpartners.stanford.edu/RegaSubtyping/)
70
Rega HIV-1 Subtyping Tool: Subtyping Process (http://dbpartners.stanford.edu/RegaSubtyping/)
71
III. EDUCATIONAL RESOURCES
HIVDB contains several regularly updated sections summarizing data linking RT, protease, and
integrase mutations and antiretroviral drugs (ARVs). These sections include (i) tabular summaries
of the major mutations associated with each ARV class, (ii) detailed summaries of the major,
minor, and accessory mutations associated with each ARV, (iii) the comments used by the HIVdb
program, (iv) the scores used by the HIVdb program, (v) clinical studies in which baseline drug
resistance mutations have been correlated with the virological response (‘clinical outcome’) to a
specific ARV, (vi) mutations that can be used for drug resistance surveillance, and (vii) a two-page
PDF handout.
1. Drug Resistance Summaries
A. Tabular Drug Resistance Summaries by ARV Class
B. Detailed Drug Resistance Summaries by ARV
C. Drug Resistance Mutation Comments Used by the HIVdb Program
D. Drug Resistance Mutation Scores Used by the HIVdb Program
E. Genotype-Clinical Outcome Correlation Studies
2. Surveillance Drug-Resistance Mutation List Section
3. PDF Handout
The main Drug Resistance Summaries page is shown below. It is accessed by the links at the
lower right side of the home page and it provides access to items (1) – (5). Item (6) (the
Surveillance Drug-Resistance Mutation List Section) can be accessed via the ‘Surveillance’ link on
the home page, and item (7) (the PDF Handout) can be downloaded from the link beneath the
‘Drug Resistance Summaries’ box on the home page.
72
73
1. Drug Resistance Summaries
A. Tabular Drug-Resistance Summaries by ARV Class
There are five Tabular Drug-Resistance Summaries by ARV class. Each consists of a table
associating the major mutations for a drug class with the drugs for that class. Bolded red text
indicates mutations particularly important for drug resistance. Footnotes contain information about
polymorphic mutations, minor mutations, rare mutations, and mutations that increase susceptibility
to one or more drugs (the example shown below is for PIs).
74
B. Detailed Drug Resistance Summaries by ARV
This section contains a summary of the major, minor, and accessory mutations for each NRTI,
NNRTI, and PI, and for the fusion inhibitor enfuvirtide. It also contains a synopsis of the
recommended use of each ARV for initial and salvage therapy. The text is annotated with 20 to 30
important references per ARV. The figures that follow show the summaries for the most recently
approved PI – darunavir – and for the most recently approved NNRTI – etravirine.
75
76
C. Drug Resistance Mutation Comments Used by the HIVdb Program
It is possible to view all of the comments for mutations associated with a specific drug class by
selecting one of the Comments links in the Supporting Material for the HIVdb Program section. The
following figure shows the top section of the list of mutations associated with resistance to
integrase inhibitors.
77
D. Drug Resistance Mutation Scores Used by the HIVdb Program
The HIVdb program uses mutation scores to estimate the level of ARV resistance in a sequenced
HIV-1 isolate. Scores have been created for the NRTIs, NNRTIs, and PIs. Scores have not yet
been created for the INIs because there is only one currently approved INI and published drug
resistance data for this class are limited. The HIVdb Release Notes explain the scoring system: a
total score (sum of the scores for all mutations in a sequence) of 0 to 9 is considered susceptible,
10 to 14 indicates potential low-level resistance, 15 to 29 indicates low-level resistance, 30 to 59
indicates intermediate resistance, and greater than 60 indicates high-level resistance. Each of the
scores is hyperlinked to the MARVEL (Mutation ARV Evidence Listing) output summarizing the
supporting data.
The mutation scoring tables can be sorted by mutation position or drug. The examples below show
the NRTI scores sorted in ascending order by position and the NNRTI scores sorted in descending
order by predicted etravirine resistance.
78
E. Genotype-Clinical Outcome Correlation Studies
This section contains synopses of more than 30 studies in which baseline protease mutations are
correlated with the virological response to salvage therapy with a previously unused PI, 20 studies
in which baseline RT mutations are correlated with the virologic response to salvage therapy with a
previously unused NRTI, and one study of the NNRTI etravirine. This is described in detail in the
Database Query and Reference Page Section.
79
2. Surveillance Drug-Resistance Mutation List Section
The figure below summarizes the content of the Surveillance Mutations section. Two of the
features have been reviewed in previous sections of this guide: (i) Mutation Prevalence According
to Subtype and Treatment (database queries – Genotype-Treatment section) and (ii) Calibrated
Population Resistance (CPR) tool (HIVDB Programs). Two of the features contain material not
described previously: (i) the Surveillance Drug-Resistance Mutation Worksheet and (ii) the WHO
2009 List of Mutations for Surveillance of Transmitted Drug-Resistant HIV Strains.
The Surveillance Drug-Resistance Mutation (SDRM) Worksheet contains tables of RT and
protease mutations that can be sorted according to their presence or absence on five expert
system lists (HIVDB, Rega, ANRS, IAS-USA, and Los Alamos), the number of lists on which a
mutation appears, the prevalence of each mutation in untreated individuals infected with viruses of
the eight most common subtypes (A, B, C, D, F, G, CRF01_AE, and CRF02_AG), the most
prevalent mutation in any subtype in untreated individuals, and the most prevalent mutation in any
subtype in treated individuals. The figure below shows the worksheet sorted by (i) total number of
lists in descending order and (ii) mutation position in ascending order. The figure following the
worksheet is the 2009 WHO SDRM list.
80
81
82
3. PDF Handout
The PDF Handout contains a two page portable summary of drug resistance mutations.
Screenshots of each page are shown below.
Download