BACKGROUND & SIGNIFICANCE HIV-1 disease progression: AIDS remains the deadliest epidemic in human history, killing more than 25 million people worldwide, including more than 500,000 Americans. African Americans are among the hardest hit population in the U.S, accounting for half of all new HIV-1 diagnoses and more than a third of AIDS deaths to date (http://www.unaids.org). In general, clinical progression of HIV-1 disease is relatively slow, taking several years from initial infection to the development of severe immune suppression. A small percentage of HIV-1 infected individuals rapidly progress to AIDS within 1-4 yr after primary infection in spite of antiretroviral therapy; these patients are classified as rapid progressors (RP) (Anzala et al, 1995). HIV-1 infected persons who show no signs of disease progression for 12 yr or more and remain asymptomatic are classified as long term non-progressors (LTNP) (Buchbinder et al., 1994). Several factors causing a variable disease course have been identified A range of virus and host factors can contribute to the LTNP state including: (i) infection by attenuated viruses with impaired gene functions (e.g. mutations of vif, vpr, nef); (ii) decreased viral entry secondary to altered host cell, virus co-receptors (particularly CCR5); (iii) expression of specific genes of the major histocompatibility complex (MHC) (e.g. HLA *B27); and (iii) increased expression of mediators of innate immunity (e.g. IL10 and RANTES). The appearance of HIV-1 specific CD8+ cytotoxic T lymphocytes (CTLs) early after primary-infection correlates with control of HIV-1 viremia (Koup et al, 1994; Borrow et al, 1994). The HIV-1 subtype of the infecting virus is a factor in the rate of HIV-1 disease progression. Individuals infected with subtypes C, D and G are 8 times more likely to develop AIDS than individuals infected with other subtypes (Kanki et al, 1999). Co-infections with other pathogens may enhance HIV-1 replication by activating the immune system, which, in turn, facilitates virus entry into target cells and reverse transcription and provirus transcription (Lawn et al, 2000). HIV-1 infection also affects the cytokine balance by promoting a Th1 to Th0 shift, as HIV-1 replicates preferentially in Th2 and Th0 cells (Maggi et al, 1994). This shift in cytokine balance makes the host more susceptible to HIV-1 infection. Variations in host genetic susceptibility, as manifested by SNP allelic variants of several key chemokines and cytokines, influence susceptibility to HIV-1 infection and the subsequent rate of disease progression to AIDS. There is no platform that integrates basic biological data on HIV-1 with clinical data from HIV1 infected patients. Currently available online HIV-1 databases contain genome and protein sequence data, primarily useful to research scientists. Using informatics, the preparation of a comprehensive, online resource to integrate clinical and experimental data on HIV-1 infections will be valuable for the development of new, evidencebased approaches for the management of HIV-1 infected patients. The proposed resource could be used to predict HIV-1 disease prognosis and progression. Further, such a relational database could be useful in the design of novel, antiretroviral drugs. Thus, the field of bioinformatics can greatly enhance treatment efforts by serving as a bridge between medical informatics and experimental Figure 1: Outline of the genomic and proteomic analyses of clinical samples from HIV-1 infected patients from both cohorts (NP & LTNP). science. By correlating genetic variation and potential changes in protein structure with clinical risk factors, disease presentation, and differential response to treatment and vaccine candidates, it may be possible to obtain valuable new insights that can guide decision-making at both the clinical and public health levels. Genomics: Global profiling of gene expression is useful for assessing gene functions in pathological processes. Several high throughput methods for differential gene expression can enable functional annotation of sequenced genomes. The use of DNA microarrays to probe global and differential changes in host gene expression during HIV-1 infection can provide a wealth of information on host-pathogen interactions (Figure1). Genes identified by this method may play a role in host defense mechanisms, transcriptional control, and/or facilitation of the HIV-1 life cycle. The genetic background of the host also is important in the progression of HIV-1 infections. Despite intensive research, it is estimated that >90% of the genes responsible for the development and maintenance of the LTNP phenotype and resistance to HIV-1 infection remain undiscovered (O'Brien and Nelson, 2004). Proteomics: Identifying unique patterns of protein expression or biomarkers associated with HIV-1 disease is a rapidly emerging area of clinical proteomics. Genomics provides only a partial picture while proteomics identifies specific proteins responding to gene expression. It is desirable to compare differential gene expression data with differential protein expression. Progress in protein annotation and in our understanding of protein-protein interactions will undoubtedly lead to diagnostic and therapeutic advances in the treatment of HIV-1 infections. 2 Dimensional fluorescence difference gel electrophoresis (2D-DIGE) has emerged as a robust method to study protein expression profiling in clinical samples (Figures 1 & 2). We also shall use newer, labelfree proteomic methods such as isotope coded affinity tag (ICAT), isobaric tag for relative and absolute quantitation (ITRAQ), 18O-incorporation, or stable-isotope labeling by amino acids in cell culture (SILAC) that are well validated and available in our proteomic facility. Thus a detailed study of the genomic profiles of NP and LTNP patient Figure 2: Flow diagram of the steps involved in the proteomic analysis of patient samples. cohorts may identify the genes and their products involved in non-progression of HIV-1 infection. Specific Aim I of this proposal explores the genomic and proteomic profiles of our 2, unique patient cohorts (NP & LTNP). Our collaborators are expert computational scientists who have designed a clustering algorithm to find genes/proteins co-regulated across a dataset. Although clustering algorithms are predominantly used in the analysis of gene expression datasets, a host of other data-mining techniques, new algorithms, and visualization methods are being developed that constitute Specific Aim III of this proposal. Host genetic variants and HIV-1 infections: Human allelic variants affect susceptibility to and progression of HIV-1 infections. Analyzing the human genome for common genetic variants that influence the host response to HIV-1 could identify promising genes for therapeutic intervention (Hogan and Hammer, 2001). Epidemiological studies of AIDS cohorts have implicated at least 8 human gene loci whose alleles are associated with the pathogenesis of HIV infections (O'Brien and Nelson, 2004). Natural human gene polymorphisms affecting HIV-1 infections are broadly classified into 3 categories: (a) those that control viral entry into susceptible cells (chemokine and chemokine receptor polymorphisms), (b) mutational variants of genes involved in immune regulation (IL-10 and TNF-), and (c) polymorphisms in genes involved in adaptive immune recognition by T cells (HLA antigens). Specific Aim 2 of this proposal explores the genetic variants that influence disease progression in our 2 (NP & LTNP) HIV-1 infected patient cohorts. Polymorphisms in chemokine and chemokine receptors: Genomic analyses have shown that allelic variants in the genes for chemokine receptors, that are also HIV-1 entry co-receptors, and their natural ligands, the chemokines, can modulate HIV-1 transmission and disease progression (Dean et al, 1996; Smith et al, 1997; Winkler et al, 1998; Martin et al, 1998; McDermott et al, 1998; An et al, 2000; Liu et al, 1999; McDermott et al, 2000). HIV infected patients who are rapid progressors (RP) of HIV-1 infections usually develop symptoms of AIDS within 3 years of initial infection if untreated. This rapid progression is associated with homozygosity of the genotypes of their chemokine receptors/HIV co-receptors (CCR2 +/+ or CCR5 +/+). In some LTNP, delayed progression has been attributed to mutant genotypes for CCR2 (CCR2-64I) and CCR5 (CCR5-∆32). Single nucleotide polymorphisms (SNPs) are the most common types of genetic variation in the human genome and are of value for genetic mapping. The chemokine, RANTES, inhibits CCR5-mediated entry of R5 strains of HIV-1 by competitive binding to and down-regulating CCR5 expression (Cocchi et al, 1995; Trkola et al, 1998). The effect of a major RANTES SNP on the progression of HIV-1 infections appears to be due to down-regulation of the expression of the In1.1C allele. This allele has been implicated in rapid HIV-1 disease progression while the CCR2b-641 and the CCR5-∆32 alleles are associated with delayed progression. Polymorphisms in cytokine genes: IL10 is a Th-2 cell cytokine that limits HIV-1 replication in vivo by inhibiting macrophage/monocyte and T lymphocyte replication and their production of inflammatory cytokines (DeVico and Gallo, 2004). Individuals carrying the IL10-5' 592A (IL10-5'A) promoter allele were found to be at increased risk for HIV-1 infection and once infected they progressed to AIDS more rapidly than individuals who did not express this allele (Essner et al, 1989). The Th2 cytokine, IL-4, differentially regulates the expression of the 2 major HIV-1 co-receptors, CXCR4 and CCR5. In primary CD4+ T lymphocytes, IL-4 increases expression of CXCR4 and decreases expression of CCR5 (Abehsira-Amar et al, 1992). Additionally, IL-4 stimulates viral replication in HIV-1 infected cells via a transcriptional activation mechanism. Increased levels of IL-4 can alter HIV-1 replication kinetics and disease progression (Valentin et al, 1998). A polymorphism in the IL-4 gene with a C to T exchange at position -589 upstream of the open-reading frame (IL-4 -589T) has protective effect against replication of R5 HIV via down-regulation of CCR5 expression (Banchereau et al, 1994). Slower clinical progression in individuals carrying the IL-4 -589T allele as compared with wild type homozygotes is evident early in HIV-1 infection and not in the late period because the late period is characterized by the emergence of X4 variants. The correlation of persistent TNF- production with disease progression in patients infected with HIV-1, suggests a role for TNF- in the pathogenesis of HIV-1 infections (Damle and Doyle, 1989). Polymorphisms in human leukocyte antigens: Cellular immune responses encoded by human leukocyte antigen (HLA) genes influence a wide range of outcomes of HIV-1 infections (Mühl et al, 2002). Cytotoxic T lymphocyte (CTL) responses, activated by HLA presentation, are implicated in the control of HIV replication. In HIV-1+ individuals, CTLs destroy virus infected cells by targeting HIV-1 peptides bound to surface-expressed HLA class I molecules on the host cell, establishing a dynamic equilibrium between a continuously mutating virus population and host-specific recognition of evolving virus. The mechanisms by which class I alleles differentially bind peptides, restrict the generation of CD8+ CTLs, and govern the clinical response to HIV-1 are as yet undetermined. Reports suggest that HLA B27 plays a protective role in HIV disease due to the fact that B27+ patients have a specific, strong CTL responses to the p24 epitope which is a highly conserved HIV protein. The HLA-B27 allele found in LTNP is associated with a favorable prognosis. Bioinformatics: A major goal of bioinformatics is to transform biological and clinical data for maximal accessibility and utilization by all scientists and health professionals. This proposal offers a paradigm for data organization, management, analysis and interpretation to effect this data-to-knowledge transfer. Rationale for studying genomic and proteomic fingerprinting in NP and LTNP HIV-1 infected subjects: Completion of the sequencing of the human genome in 2000 led to the post-genomic era where proteomics has come of age. In the post-genomic era, the concept of one gene-one protein must be reconsidered because one gene can be translated into a whole family of gene products or multiple mature mRNAs via alternative splicing and other mechanisms, resulting in the production of multiple proteins. Although genomic studies provide some dynamic information, they are at best an indirect measure of protein expression. Since glycosylation, phosphorylation, and degradation can affect protein functions, mRNA expression studies do not provide this information. Proteomics can detect post translational modifications and complement gene expression studies. Proteomic studies have additional advantages because proteins have functional activities. Therefore to achieve high reliability, reproducibility and confidence levels, we shall use both DNA gene array and proteomic analyses and the results from both will be compared and correlated with our clinical database. Significance of our study: To understand the clinical significance of genetic variation, sequence analysis should be combined with methods that assess change in the structural and biological functions of proteins. The bioinformatics resource that we are building is an initial step in the development of improved methods for extracting and analyzing genomic and proteomic data and converting them into biologically and clinically useful information relevant to the structure, function and physiology of proteins relevant to assessin their roles in HIV infections. We have the immediate capacity to build a database consisting of basic scientific and clinical information on 2 distinct clinical cohorts of HIV-1 infected patients, NP and LTNP, differing in their clinical outcomes. This is an early goal of this project and could become a valuable national resource. Our ultimate goal is to develop the computational algorithms to integrate our clinical and basic data such that we can associate disease outcome or progression with specific protein biomarkers from the host. These biomarker proteins may subsequently become new targets for therapy. Information on the proteins of HIV-1 and its host and the tools for their systematic analysis are scattered across a wide range of online resources. To facilitate studies of the biological consequences of genetic variation, we propose to develop a user-friendly, bioinformatics resource that integrates genomic and proteomic data from unique HIV-1 patient cohorts (NP and LTNP) and correlates them with clinical data. Our preliminary studies (Table I & Figure 7) show significantly different genomic and proteomic expression profiles between HIV-1 infected patients and non-infected controls. Thus we anticipate that the gene and protein expression patterns of NP and LTNP subjects also will be significantly different based upon different clinical outcomes. No study has been published to date on the genomic and proteomic fingerprinting of NP and LTNP subjects and their association with disease outcome. Comparative analysis of gene arrays and proteomic profiles will increase confidence in the identification of unique HIV-1 responsive genes and proteins in these patient cohorts. The use of bioinformatics tools for integration of these genomic and proteomic data with clinical data, can lead to unique, new biomarkers for diagnostic, preventive, and therapeutic interventions for HIV-1 disease management.