From Genotype to Phenotype, Drylab 1 March 20th 2014 Aligning genomic and coding DNA sequences In this first drylab in ‘genotype to phenotype’, we want you to learn how most genes are built and to get familiar with exons, introns, start and stop codons (see figure below). You will work with a human Major Histocompatibility Complex (MHC) sequence, the HLA-A (Human Leucocyte Antigen at locus A). You will collect the human genomic HLA-A sequence from chromosome 6 using NCBI and then align it with an HLA-A cDNA in the program BioEdit. Next, you will build a phylogeny on some HLA/MHC sequences in primates and think a bit about balancing selection, trans-species evolution and drift. Part A. Find the genomic HLA-A sequence and align with a HLA-A transcript (cDNA) The human MHC (HLA) is found on chromosome 6 and your task is to locate and down load the entire HLA-A genomic sequence. Go to the GenBank homepage: http://www.ncbi.nlm.nih.gov/guide/ Select ‘Human Genome’ in the featured section (below). http://www.ncbi.nlm.nih.gov/genome/guide/human/ Click on ‘Chromosome 6’. Click on the locus HLA-A. 1 From Genotype to Phenotype, Drylab 1 March 20th 2014 Download the genomic HLA-A sequence to the program BioEdit, Click FASTA under ‘Genomic regions, transcripts and products’ (Homo sapiens chromosome 6, GRCh38 Primary Assembly NCBI Reference Sequence: NC_000006.12). Highlight the sequence and copy (ctrl C), open BioEdit, choose ‘New alignment’ under File and the Mode ‘Edit’ and ‘Insert’. Import the gDNA sequence to into BioEdit [File < Import from clipboard]. Rename the sequence to gDNA by double clicking on the name of the sequence. Save this file as HLA_A_genomic. 2 From Genotype to Phenotype, Drylab 1 March 20th 2014 Now, download the transcript of this HLA-A sequence, Click GenBank under ‘Genomic regions, transcripts and products’. Then find mRNA and Click on /transcript_id="NM_002116.7" (Homo sapiens major histocompatibility complex, class I, A (HLA-A), transcript variant 1 (A*03:01:0:01 allele), mRNA). You have found the Homo sapiens major histocompatibility complex, class I, A (HLA-A), transcript variant 1, mRNA. Copy the sequence. Import the cDNA sequence (from mRNA) into the file in BioEdit where the gDNA sequence is saved [File < Import from clipboard]. Rename the sequence to cDNA by double clicking on the name of the sequence. Save this file as a new file, HLA_A_cDNA. 3 From Genotype to Phenotype, Drylab 1 March 20th 2014 Aligning the HLA cDNA and the gDNA sequences in BioEdit: Mark the two sequences, and let the program align them [Sequence < Pairwise alignment < allow ends to slide], or align them manually. Find start codon, i.e. the first ATG. Check whether the introns start with GT (GU) and end with AG. If not correct, align manually after first changing the mode [Choose mode: Edit and Insert]. How many introns and exons do you find? How many base pairs are intron 1, intron 2 and intron 3? How many base pairs are exon 1 (count from the start codon ATG), exon 2, exon 3 and exon 4? For the following, use the HLA-A transcript (cDNA) first (file called ‘HLA_A_cDNA’); then you can try to find them also in the genomic sequence if you like Find the stop codon in the cDNA sequence. To do this, first find the start codon (ATG), then delete the nucleotides before the start codon, translate the cDNA sequence to amino acid sequence. Mark the cDNA sequence and then translate it to amino acids [Sequence < Toggle translation or ctrl G]. Find the first * (this is the stop codon). Place the pointer just before the * (between ‘TACKV’ and ‘*’) and then translate it back to a sequence again [Sequence < Toggle translation]. ‘TGA’, ‘TTA’ and ‘TAG’ are stop codons. What is the stop codon in your sequence and at what site is it found? Find the poly-A signal (AAUAAA) in the cDNA sequence; how many such signals do you find? What is the distance in base pairs to the poly-A tail from the stop codon? Are these features fund also in the genomic sequence? 4 From Genotype to Phenotype, Drylab 1 March 20th 2014 If we have time do Part B. Build a phylogeny with some HLA/MHC sequences in primates The peptide binding region (PBR; see figure below; page 5) of the MHC molecule is subject to balancing selection while the structural parts of the MHC molecule are not. You will now construct phylogenetic trees on different MHC (including HLA) alleles belonging to two different MHC loci, MHC-A and MHC-B in four primates. Your task is to evaluate different kinds of selection on different parts of the MHC gene, use the knowledge on different exons that you learnt in part A above. Open the file ‘HLAprimates.fas’ in BioEdit. The sequences are already aligned. First translate the sequences to amino acids [Sequence < Toggle translation]. Delete the non coding parts of each sequence from the stop codon. Adjust the lengths of the sequences so that all sequences have the same length (i.e. according to HLA-B). Translate back to DNA sequences [Sequence < Toggle translation]. Click on the locker so that gaps change from ~ to -.Save the file in FASTA format (name it e.g. ‘HLAprimates_coding.fas’). Import FASTA format files into MEGA5: Convert file to MEGA format [File < Convert file to MEGA format]. Locate the BioEdit file in FASTA format (‘HLAprimates_coding.fas’). The file is now converted from BioEdit FASTA format to MEGA format. Check that the file looks alright, save file (e.g. as ‘HLAprimates_coding.meg’) and exit the Editor. Open the converted file [File < Open a file]. The input data is: nucleotide sequences; protein-coding sequences. The genetic code is standard. Open the sequence data explorer and check the infile. Build a tree in MEGA: We will not use any out group because we are interested in knowing more about alleles belonging to two different MHC class I loci, MHC-A and MHC-B, in primates. Build a NJ-tree based on the full sequence with bootstrap values [Phylogeny < construct NJ < phylogeny test]. Check the nodes statistically. Save the tree. Build trees based on two different parts of the sequence; Exon 2, Exon 3 and Exon 4, respectively. To be able to select these parts you have to: (i) define them [Data –Select genes and domains –Add three new domains (call them e.g. Exon 2, 3 and 4) – Delete “Data” – Enter regions (i.e. base-pairs xxx-xxx for Exon 2… 5 From Genotype to Phenotype, Drylab 1 March 20th 2014 (ii) select one region at the time [tick the box for the part you wish to analyse; i.e. first Exon 2, Exon 3 and then Exon 4, also tick coding and untick independents]. Build NJ-trees (with bootstrap) based on these parts of the sequence (i.e. you should build two different trees). Check the nodes statistically. Save these trees. Compare the trees to each other, branch lengths, bootstrap values, division on species and on alleles (A and B, respectively). How old are the loci? What is the bootstrap support in the different trees, and why? 6 From Genotype to Phenotype, Drylab 1 March 20th 2014 Detailed information on the HLA-A sequence (HLAA.fas) DEFINITION Homo sapiens major histocompatibility complex, class I, A (HLA-A), mRNA. Summary: HLA-A belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-A alleles have been described. PBR MHC class I (HLA-A) 1 ex 1 ex2 2 3 ex3 ex4 ex 5 7 ex6 ex7 ex8 2 1 3 2m