Aligning genomic and coding DNA sequences

advertisement
From Genotype to Phenotype, Drylab 1
March 20th 2014
Aligning genomic and coding DNA sequences
In this first drylab in ‘genotype to phenotype’, we want you to learn how most genes are built
and to get familiar with exons, introns, start and stop codons (see figure below). You will
work with a human Major Histocompatibility Complex (MHC) sequence, the HLA-A
(Human Leucocyte Antigen at locus A). You will collect the human genomic HLA-A
sequence from chromosome 6 using NCBI and then align it with an HLA-A cDNA in the
program BioEdit. Next, you will build a phylogeny on some HLA/MHC sequences in
primates and think a bit about balancing selection, trans-species evolution and drift.
Part A. Find the genomic HLA-A sequence and align with a HLA-A transcript (cDNA)
The human MHC (HLA) is found on chromosome 6 and your task is to locate and down load
the entire HLA-A genomic sequence.
Go to the GenBank homepage: http://www.ncbi.nlm.nih.gov/guide/
Select ‘Human Genome’ in the featured section (below).
http://www.ncbi.nlm.nih.gov/genome/guide/human/

Click on ‘Chromosome 6’.

Click on the locus HLA-A.
1
From Genotype to Phenotype, Drylab 1
March 20th 2014

Download the genomic HLA-A sequence to the program BioEdit, Click FASTA under
‘Genomic regions, transcripts and products’ (Homo sapiens chromosome 6, GRCh38
Primary Assembly NCBI Reference Sequence: NC_000006.12). Highlight the
sequence and copy (ctrl C), open BioEdit, choose ‘New alignment’ under File and the
Mode ‘Edit’ and ‘Insert’.

Import the gDNA sequence to into BioEdit [File < Import from clipboard]. Rename
the sequence to gDNA by double clicking on the name of the sequence. Save this file
as HLA_A_genomic.
2
From Genotype to Phenotype, Drylab 1
March 20th 2014

Now, download the transcript of this HLA-A sequence, Click GenBank under
‘Genomic regions, transcripts and products’. Then find mRNA and Click on
/transcript_id="NM_002116.7" (Homo sapiens major histocompatibility complex,
class I, A (HLA-A), transcript variant 1 (A*03:01:0:01 allele), mRNA). You have
found the Homo sapiens major histocompatibility complex, class I, A (HLA-A),
transcript variant 1, mRNA. Copy the sequence.

Import the cDNA sequence (from mRNA) into the file in BioEdit where the gDNA
sequence is saved [File < Import from clipboard]. Rename the sequence to cDNA by
double clicking on the name of the sequence. Save this file as a new file,
HLA_A_cDNA.
3
From Genotype to Phenotype, Drylab 1
March 20th 2014
Aligning the HLA cDNA and the gDNA sequences in BioEdit:



Mark the two sequences, and let the program align them [Sequence < Pairwise
alignment < allow ends to slide], or align them manually.
Find start codon, i.e. the first ATG.
Check whether the introns start with GT (GU) and end with AG. If not correct, align
manually after first changing the mode [Choose mode: Edit and Insert].
How many introns and exons do you find?
How many base pairs are intron 1, intron 2 and intron 3?
How many base pairs are exon 1 (count from the start codon ATG), exon 2, exon 3 and
exon 4?
For the following, use the HLA-A transcript (cDNA) first (file called ‘HLA_A_cDNA’);
then you can try to find them also in the genomic sequence if you like


Find the stop codon in the cDNA sequence. To do this, first find the start codon
(ATG), then delete the nucleotides before the start codon, translate the cDNA
sequence to amino acid sequence. Mark the cDNA sequence and then translate it to
amino acids [Sequence < Toggle translation or ctrl G]. Find the first * (this is the stop
codon). Place the pointer just before the * (between ‘TACKV’ and ‘*’) and then
translate it back to a sequence again [Sequence < Toggle translation].
‘TGA’, ‘TTA’ and ‘TAG’ are stop codons. What is the stop codon in your sequence
and at what site is it found?

Find the poly-A signal (AAUAAA) in the cDNA sequence; how many such signals do
you find? What is the distance in base pairs to the poly-A tail from the stop codon?

Are these features fund also in the genomic sequence?
4
From Genotype to Phenotype, Drylab 1
March 20th 2014
If we have time do
Part B. Build a phylogeny with some HLA/MHC sequences in primates
The peptide binding region (PBR; see figure below; page 5) of the MHC molecule is subject
to balancing selection while the structural parts of the MHC molecule are not. You will now
construct phylogenetic trees on different MHC (including HLA) alleles belonging to two
different MHC loci, MHC-A and MHC-B in four primates. Your task is to evaluate different
kinds of selection on different parts of the MHC gene, use the knowledge on different exons
that you learnt in part A above.
Open the file ‘HLAprimates.fas’ in BioEdit.

The sequences are already aligned. First translate the sequences to amino acids
[Sequence < Toggle translation]. Delete the non coding parts of each sequence from the
stop codon. Adjust the lengths of the sequences so that all sequences have the same length
(i.e. according to HLA-B).

Translate back to DNA sequences [Sequence < Toggle translation]. Click on the
locker so that gaps change from ~ to -.Save the file in FASTA format (name it e.g.
‘HLAprimates_coding.fas’).
Import FASTA format files into MEGA5:



Convert file to MEGA format [File < Convert file to MEGA format]. Locate the
BioEdit file in FASTA format (‘HLAprimates_coding.fas’). The file is now converted
from BioEdit FASTA format to MEGA format. Check that the file looks alright, save
file (e.g. as ‘HLAprimates_coding.meg’) and exit the Editor.
Open the converted file [File < Open a file]. The input data is: nucleotide sequences;
protein-coding sequences. The genetic code is standard.
Open the sequence data explorer and check the infile.
Build a tree in MEGA:
We will not use any out group because we are interested in knowing more about alleles
belonging to two different MHC class I loci, MHC-A and MHC-B, in primates.


Build a NJ-tree based on the full sequence with bootstrap values [Phylogeny <
construct NJ < phylogeny test]. Check the nodes statistically. Save the tree.
Build trees based on two different parts of the sequence; Exon 2, Exon 3 and Exon 4,
respectively. To be able to select these parts you have to:
(i)
define them [Data –Select genes and domains –Add three new domains
(call them e.g. Exon 2, 3 and 4) – Delete “Data” – Enter regions (i.e.
base-pairs xxx-xxx for Exon 2…
5
From Genotype to Phenotype, Drylab 1
March 20th 2014
(ii) select one region at the time [tick the box for the part you wish to analyse; i.e.
first Exon 2, Exon 3 and then Exon 4, also tick coding and untick independents].
Build NJ-trees (with bootstrap) based on these parts of the sequence (i.e. you should build two
different trees). Check the nodes statistically. Save these trees.
 Compare the trees to each other, branch lengths, bootstrap values, division on species
and on alleles (A and B, respectively).
How old are the loci?
What is the bootstrap support in the different trees, and why?
6
From Genotype to Phenotype, Drylab 1
March 20th 2014
Detailed information on the HLA-A sequence (HLAA.fas)
DEFINITION
Homo sapiens major histocompatibility complex, class I, A (HLA-A),
mRNA.
Summary: HLA-A belongs to the HLA class I heavy chain paralogues.
This class I molecule is a heterodimer consisting of a heavy chain
and a light chain (beta-2 microglobulin). The heavy chain is
anchored in the membrane. Class I molecules play a central role in
the immune system by presenting peptides derived from the
endoplasmic reticulum lumen. They are expressed in nearly all
cells. The heavy chain is approximately 45 kDa and its gene
contains 8 exons. Exon 1 encodes the leader peptide, exons 2 and 3
encode the alpha1 and alpha2 domains, which both bind the peptide,
exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane
region, and exons 6 and 7 encode the cytoplasmic tail.
Polymorphisms within exon 2 and exon 3 are responsible for the
peptide binding specificity of each class one molecule. Typing for
these polymorphisms is routinely done for bone marrow and kidney
transplantation. Hundreds of HLA-A alleles have been described.
PBR
MHC class I (HLA-A)
1
ex 1
ex2
2
3
ex3
ex4
ex 5
7
ex6
ex7
ex8
2
1
3
 2m
Download