64 Standardizing a method for sequencing the entire HIV genome

advertisement
64
Standardizing a method for sequencing the entire HIV genome by
massive semiconductor sequencing
Margarita Matias-Florentino1, Santiago Ávila- Ríos1, Aarón Lecanda- Sánchez1 and
Gustavo Reyes- Terán1. 1Center for Research in Infectious Diseases, National Institute of
Respiratory Diseases, México City, México.
Next-generation sequencing (NGS) technologies allow the generation of large volumes of
data in a cost-effective manner. These advantages make it possible to sequence complete
HIV genomes with a deep coverage, and allows for the detection of low frequency variants.
However, this approach entails challenges in experimental design and bioinformatic
analysis. We present preliminary results on a project aiming to characterize the genetic
diversity of circulating HIV in the Mesoamerican region using whole genome sequences,
and to determine the role of minority variants in HIV immune escape and antiretroviral
(ARV) drug resistance.
Peripheral blood samples were obtained from ARV treatment-naïve, HIV-infected patients
from Mexico and Central America. Viral cDNA was produced from free plasma virus and
then amplified using a nested PCR strategy covering the entire viral genome with three
amplicons. Libraries for multiplex NGS runs were created and template for sequencing was
produced. The enriched templates were sequenced using 316v2 chips in the Ion PGM
platform. Approximately 13 patients were analyzed per run optimizing read yield without
compromising coverage and depth.
To assemble the viral genomes different strategies are being tested, HXB2 (K03455) and an
HIV-B isolate (GQ372988) were used as references, using the aligners T-Map, SMALT
and Bowtie2 with different parameters. The best alignment results were obtained with TMap and SMALT and were very similar for the two references, with 90% of the sequenced
bases aligned and average coverage of 4,000X per patient. However, the coverage
drastically dropped in the region between bases 6000 and 8000, corresponding to the env
gene (800X), with a conflictive zone below 50X. In order to solve this problem a pipeline
using T-map and a three steps reiterative alignment strategy is being developed.
Download