slides - Bioinf! - Trinity College Dublin

advertisement
GE3M25:
Bioinformatics, class 4
TCD, 03/12/2015
Karsten Hokamp, PhD
Genetics
Trinity College Dublin, The University of Dublin
GE3M25 Data Handling Module Content
Python
Programming
Bioinformatics
ChIP-Seq analysis
Trinity College Dublin, The University of Dublin
Overview
• Multiple alignments
• Phylogenetic trees
• Examples and Exercises
http://bioinf.gen.tcd.ie/GE3M25/
Trinity College Dublin, The University of Dublin
Why multiple alignments?
Comparative genomics
Phylogenetic studies
Hierarchical function annotation:
homologs, domains, motifs
Gene identification, validation
Structure comparison, modelling
MSA
Interaction networks
RNA sequence, structure, function
Human genetics, SNPs
Therapeutics, drug design
insertion domain
DBD
Therapeutics, drug discovery
LBD
binding sites / mutations
Trinity College Dublin, The University of Dublin
Multiple alignment of upstream sequence from ‘eve’ gene
across 12 Drosophila species
Eve gene
Conserved
Trinity College Dublin, The University of Dublin
non-conserved
Example analysis
Identify a sequence to align
– Heat shock protein beta 8
Find homologous sequences in different species
– BLAST
Retrieve the sequences
Run multiple alignment
Compare two alignment tools
Trinity College Dublin, The University of Dublin
Find gene in NCBI Gene
Scroll down to get
protein sequence
Trinity College Dublin, The University of Dublin
Extract Protein sequence
Trinity College Dublin, The University of Dublin
Extract Protein sequence
Trinity College Dublin, The University of Dublin
RUN BLAST
Blast allows users to enter a protein or DNA sequence and
search a database for similar sequences
Trinity College Dublin, The University of Dublin
BLAST RESULTS
Click taxonomy reports to access results by organism
Trinity College Dublin, The University of Dublin
BLAST hits by Organism
The top blast hit from a particular organism is usually
the ortholog to the sequence that you entered into
BLAST
Trinity College Dublin, The University of Dublin
Select the accession numbers of several orthologs and add to
a file
Paste accession numbers in NCBI Protein to retrieve the protein sequence
Trinity College Dublin, The University of Dublin
List of protein orthologs
Select ‘Display Settings’ to retrieve proteins in FASTA format
Trinity College Dublin, The University of Dublin
Lists FASTA format with summary data
Choose FASTA (text) to just get the FASTA sequence
without the summary data
Trinity College Dublin, The University of Dublin
FASTA format
Copy FASTA data to a file and save
Trinity College Dublin, The University of Dublin
Rename FASTA sequences
The alignment program displays the first word after the
‘>’ symbol. Edit FASTA header to include the species
Trinity College Dublin, The University of Dublin
EBI-EMBL MSA
http://www.ebi.ac.uk/Tools/msa/
Trinity College Dublin, The University of Dublin
Select the type of sequence
Paste your FASTA sequence or upload a file
Be notified by email for larger jobs
Trinity College Dublin, The University of Dublin
Trinity College Dublin, The University of Dublin
Trinity College Dublin, The University of Dublin
Highly conserved
region
30 identical matches
28 similar matches
Length = 243
Trinity College Dublin, The University of Dublin
Colour residues
by their
physicochemical
properties
Trinity College Dublin, The University of Dublin
Click ‘Results Summary’ to see how
identical each pair of sequence are
Trinity College Dublin, The University of Dublin
Alternative Approach: UniProt
1. Browse to uniprot.org
2. Search for HSPB8
3. Show up to 100 hits
4. Sort by gene name
5. Select 5-8 entries from different species for HSPB8
6. Include a fish, an insect, a bird and mammals (protein
lengths should be similar)
7. Click on 'Align'
Trinity College Dublin, The University of Dublin
Alternative Approach: UniProt
Trinity College Dublin, The University of Dublin
Alternative Approach: UniProt
•
Edit and resubmit to change headers, remove sequences
•
Selection at bottom of page to remove sequences, e.g. fly
Trinity College Dublin, The University of Dublin
Alternative Approach: UniProt
mismatch favoured over gap insertion:
Trinity College Dublin, The University of Dublin
Alternative Approach: UniProt
Trinity College Dublin, The University of Dublin
Exercise
Copy sequences and
run in a different alignment tool:
http://www.ebi.ac.uk/Tools/msa
Trinity College Dublin, The University of Dublin
Results Summary
Percent Identity Matrix
1
2
3
4
5
6
7
what is the percent amino acid sequence identity between human and cow?
Trinity College Dublin, The University of Dublin
EBI-EMBL MSA
http://www.ebi.ac.uk/Tools/msa/
Trinity College Dublin, The University of Dublin
Trinity College Dublin, The University of Dublin
Matrix:
No matrix
33 identical matches
27 similar matches
Length = 244
Trinity College Dublin, The University of Dublin
Matrix:
PAM 350 matrix
33 identical matches
27 similar matches
Length = 258
A lot of gaps introduced!
Trinity College Dublin, The University of Dublin
Matrix:
BLOSUM 62 matrix
34 identical matches
29 similar matches
Length = 248
Highest identity score
Trinity College Dublin, The University of Dublin
PAM and BLOSUM matrix
Matrix
Best in determining
PAM 40/ blosum 90
Short similar (conserved) alignments
PAM 250
Longer more divergent alignments
Pam 160/ blosum 80
Detecting members of protein families
blosum 62
In finding all potential similarities
Trinity College Dublin, The University of Dublin
Adapted from Baxevanis 2005
Exercise
Investigation of the 5-HT (Serotonin) receptors
The serotonin receptors, also known as 5-hydroxytryptamine receptors or
5-HT receptors, are a group of G protein-coupled receptors (GPCRs) and
ligand-gated ion channels (LGICs) found in the central and peripheral
nervous systems.
Serotonin receptors are found in almost all animals and humans and are
even known to regulate longevity and behavioral aging in the primitive
nematode, Caenorhabditis elegans.
Trinity College Dublin, The University of Dublin
Step 1 – collect sequences
http://uniprot.org
Sort by Entry name
Trinity College Dublin, The University of Dublin
Step 2 – alignment
Trinity College Dublin, The University of Dublin
Step 3 – summary information
Trinity College Dublin, The University of Dublin
Align across species
Step 1 – collect sequences
Trinity College Dublin, The University of Dublin
Search for 5HT1A
and sort by Entry name
Step 2 – alignment
Trinity College Dublin, The University of Dublin
Step 3 – summary
Trinity College Dublin, The University of Dublin
Exercise:
Pick one of
5HT1B, 5HT1D, 5HT1E, 5HT1F
and check for conservation across species
Trinity College Dublin, The University of Dublin
Exercise:
Pick all entries starting with 5HT
and check for conservation within and
across species
Trinity College Dublin, The University of Dublin
Fasta header reformatting
>sp|O42385|5H1AA_TAKRU 5-hydroxytryptamine receptor 1A-alpha
OS=Takifugu rubripes GN=htr1aa PE=3 SV=1
MDLRATSSNDSNATSGYSDTAAVDWDEGENATGSGSLPDPELSYQIITSLFLGALILCSI
FGNSCVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQDICDL
FIALDVLCCTSSILHLCAIALDRYWAITDPIDYVNKRTPRRAAVLISVTWLIGFSISIPP
MLGWRSAEDRANPDACIISQDPGYTIYSTFGAFYIPLILMLVLYGRIFKAARFRIRKTVK
KTEKAKASDMCLTLSPAVFHKRANGDAVSAEWKRGYKFKPSSPCANGAVRHGEEMESLEI
IEVNSNSKTHLPLPNTPQSSSHENINEKTTGTRRKIALARERKTVKTLGIIMGTFIFCWL
PFFIVALVLPFCAENCYMPEWLGAVINWLGYSNSLLNPIIYAYFNKDFQSAFKKILRCKF
HRH
Loads of information in the header only first bit shows up in alignment
Trinity College Dublin, The University of Dublin
Fasta header reformatting
>sp|O42385|5H1AA_TAKRU 5-hydroxytryptamine receptor 1A-alpha
OS=Takifugu rubripes GN=htr1aa PE=3 SV=1
MDLRATSSNDSNATSGYSDTAAVDWDEGENATGSGSLPDPELSYQIITSLFLGALILCSI
FGNSCVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQDICDL
FIALDVLCCTSSILHLCAIALDRYWAITDPIDYVNKRTPRRAAVLISVTWLIGFSISIPP
MLGWRSAEDRANPDACIISQDPGYTIYSTFGAFYIPLILMLVLYGRIFKAARFRIRKTVK
KTEKAKASDMCLTLSPAVFHKRANGDAVSAEWKRGYKFKPSSPCANGAVRHGEEMESLEI
IEVNSNSKTHLPLPNTPQSSSHENINEKTTGTRRKIALARERKTVKTLGIIMGTFIFCWL
PFFIVALVLPFCAENCYMPEWLGAVINWLGYSNSLLNPIIYAYFNKDFQSAFKKILRCKF
HRH
Reformat headers to show the organism
Trinity College Dublin, The University of Dublin
Fasta header reformatting
perl -p -i -e 's/>.+\|(5H.+?)_.+OS=(.+?) (.+?)/>${1}_${2}_${3}/' uniprot_5HT.fasta
Achieved through Perl one-liner
http://bioinf.gen.tcd.ie/pol
Trinity College Dublin, The University of Dublin
Fasta header reformatting
>5H1AA_Takifugu_rubripes GN=htr1aa PE=3 SV=1
MDLRATSSNDSNATSGYSDTAAVDWDEGENATGSGSLPDPELSYQIITSLFLGALILCSI
FGNSCVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQDICDL
FIALDVLCCTSSILHLCAIALDRYWAITDPIDYVNKRTPRRAAVLISVTWLIGFSISIPP
MLGWRSAEDRANPDACIISQDPGYTIYSTFGAFYIPLILMLVLYGRIFKAARFRIRKTVK
KTEKAKASDMCLTLSPAVFHKRANGDAVSAEWKRGYKFKPSSPCANGAVRHGEEMESLEI
IEVNSNSKTHLPLPNTPQSSSHENINEKTTGTRRKIALARERKTVKTLGIIMGTFIFCWL
PFFIVALVLPFCAENCYMPEWLGAVINWLGYSNSLLNPIIYAYFNKDFQSAFKKILRCKF
HRH
Important bits at start of line and connected via '_'
Trinity College Dublin, The University of Dublin
Phylogenetic Trees
-
based on multiple sequence alignments
-
show relation between sequences/species
Darwin, On the Origin of Species
Trinity College Dublin, The University of Dublin
Alignment to Tree
Trinity College Dublin, The University of Dublin
Alignment to Tree
Trinity College Dublin, The University of Dublin
Phylogenetic Trees
Baum, D. (2008) Reading a phylogenetic tree: The meaning of monophyletic groups. Nature Education 1(1):190
Trinity College Dublin, The University of Dublin
Phylogenetic Trees
Baum, D. (2008) Reading a phylogenetic tree: The meaning of monophyletic groups. Nature Education 1(1):190
Trinity College Dublin, The University of Dublin
Phylogenetic Trees
Baum, D. (2008) Reading a phylogenetic tree: The meaning of monophyletic groups. Nature Education 1(1):190
Trinity College Dublin, The University of Dublin
Phylogenetic Trees
Baum, D. (2008) Trait evolution on a phylogenetic tree: Relatedness, similarity, and the myth of evolutionary advancement. Nature Education 1(1):191
Trinity College Dublin, The University of Dublin
TreeDraw
http://webconnectron.appspot.com/Treedraw.html
Description of controls
Trinity College Dublin, The University of Dublin
Don't forget to log out!
Trinity College Dublin, The University of Dublin
Download