alignment

advertisement
Needleman-Wunsch-W3
• ACGTGTGCGTTTGAAC
• GGGTGTAGTCGTTTAAAC
• Apply the Needleman-Wunsch algorithm to
these two sequences
• Score the alignments
EBI FASTA-W4
Use the FASTA file you created before
 Run your query on EBI using the fasta algorithm
with the default settings
 Change the settings and keep track of which
settings you use and the number of queries that
have the correct result as the top hit
 Use Excel (settings, %correct)

NCBI BLAST
Use the FASTA file you produced before and do
the same research using NCBI BLAST that you
did for EBI fasta
 Use blastn
 Select the proper database
 Finish EBI FASTA if you couldn't before

Sequencing-W5
3. Arbitrarily add linebreaks into the resulting
document
1. At least 30 (10 per copy min)
2. Spread out throughout the sequence
4. Add a FASTA definition line after each line
break
– Use >Copy-N-Fragment-X as a template for the
definition line
• Ensure that the overall number of characters
is less than 50000
Restriction Maps
• You sent a sample for sequencing. You might
want to check if the sequence makes sense
• What is a restriction map?
• www.restrictionmapper.org
CAP3 Assembly
• GOTO: http://pbil.univ-lyon1.fr/cap3.php
• Use the sequences you prepared earlier to assemble them
with cap3
• Analyze the results
– Did you get a full correct assembly?
Prokaryotic DNA-W6
• Finding protein coding regions
• Finding ORFs
• Goto NCBI and find the entry for
– M68521, gi|147118
– Get the FASTA sequence
– Keep the gene bank entry visible
More Gene Finding Tools
• Large Collection
– http://www.nslij-genetics.org/gene/programs.html
• GeneScan
– http://genes.mit.edu/GENSCAN.html
• HMMgene
– http://www.cbs.dtu.dk/services/HMMgene/
• GeneBuilder
– http://zeus2.itb.cnr.it/~webgene/genebuilder.html
Finding Genes
• http://rulai.cshl.org/tools/genefinder/human.htm
• Get AF018429 from gene bank
• Enter the FASTA sequence and predict the gene
• Double check with
http://genes.mit.edu/genomescan.html
More Gene Finding Tools
• GeneScan
– http://genes.mit.edu/GENSCAN.html
• HMMgene
– http://www.cbs.dtu.dk/services/HMMgene/
• Gene Prediction Software List
– http://en.wikipedia.org/wiki/List_of_gene_predict
ion_software
Gathering Sequences-W7
• Retrieve a protein sequence from NCBI
– Translated nucleotides could be tried
• Go to: http://www.expasy.ch/tools/blast
• Paste that sequence into the box
Gathering Sequences
• Scroll through the results and select about 10
full length sequences
• From different levels of similarity e.g. Different
number of identities
• Export collection as FASTA
Identities in Range?
• Go to: http://www.biolnk.com
– Choose Tools and then MultiIdentity
– Paste your FASTA formated information
– Set the thresholds
– See if all sequences are in the desired range of
identities amongst each other
• Add/ Delete Sequences accordingly
MSA
• http://www.ebi.ac.uk/clustalw
• http://www.tcoffee.org
• http://www.drive5.com/muscle
• Try all the above and compare the resulting
MSAs
Converting Formats
• http://bioweb.pasteur.fr/seqanal/interfaces/
fmtseq.html
• Names (>…) no longer than 15 characters
• Different formats maintain different data
• Converting will introduce the problem of
loosing data
• Make sure to have a master copy
Editing Alignments
• http://www.jalview.org
• Start the applet
• Choose File – Input Alignment – from Textbox
• Copy and paste the ClustalW alignment
Logo
• http://blocks.fhcrc.org/blocks/process_blocks.
html
• Retrieve the FASTA sequence of your
alignment
• Paste it to the box above and create blocks
Logos
• Go to: http://weblogo.berkeley.edu
• Copy and paste one of the blocks, turn it into FASTA
format
• Create the logo
Create an MSA-W8
• This time use 20 – 50 sequences
– From different species
• Use ClustalW for alignment
• Most ClustalW servers display a dendrogram
• Confirm this by using a few of them
Gathering Sequences
• Download the sequences as a FASTA file as
well
• Most programs will support this format
Editing Alignments
• http://www.jalview.org
• Start the program
• Choose File – Input Alignment – from Textbox
• Copy and paste the ClustalW alignment
Dendrogram
• Jalview also allows you to view different types
of Dendrograms based on different similarity
measures
• Use Jalview and compare the trees that are
constructed based on the different measures
Edit your MSA
• Remove blocks consisting of mostly gaps
(using JalView)
• Remove N- and C-termini if not conserved well
Easy Tree
•
•
•
•
www.ebi.ac.uk/clustalw/
Paste your alignment
Select a tree type
Other options need to be set (see
right)
• Press run
• Make a screen shot
• You can paste it where needed
Phylip (More elaborate tree)
• http://bioweb.pasteur.fr/seqanal/phylogeny/
phylip-uk.html
• Choose protdist from the page
• Paste the MSA
• Bootstrapping e.g.:
Phylip
• Run the query
• Click further analysis
Click Run
Select full screen view
There is your tree
Other Resources
• http://en.wikipedia.org/wiki/List_of_phylogen
etics_software
• http://itol.embl.de/
NCBI-W9
• http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi
• Browse the webpage for 15 minutes
Available Data
•
•
•
•
•
Search for human data
How much data is available?
Find accession ERX628533
How large is the dataset?
Why is it so large?
Let’s tackle this problem-W10
• Get a protein from swissprot
– O82533 (Gene: AtFtsZ2-1)
• Annotation: Chloroplast targeting
• Try a few prediction tools to see if you can
confirm the annotation
Localization Prediction
• Choose tools from Expasy for example
• ChloroP
• SignalP
• Predotar
Summary
•
•
•
•
•
Look at the IPR summary, if any
Select Table: For all matching proteins
Select the FASTA option on the following page
Add your original sequence to the FASTA coll.
Make an MSA
Protein Domains
• Use the same sequence
– Do the same analysis using NCBI CD server
– http://www.ncbi.nlm.nih.gov/Structure/cdd/w
rpsb.cgi
• NCBI may have domains that InterScanPro
doesn’t have and vice versa
CD Server
In class assignment-W11
• Choose a protein sequence
– Not too short!
• Perform secondary structure predictions with as
many tools as possible
– Google at least one more than given in the slides
• Retrieve and rewrite the predictions such that
they use the 3 letter code (H,C,S; Helix, Coil,
Sheet)
– Use search and replace functionality of your word
processor
• Make an MSA with the predicted secondary
structures to compare the results
– Are there gaps?
– Are they within the transition from one secondary
structure to the next?
Sec Struct Prediction
http://bioinf.cs.ucl.ac.uk/psipred/psiform.html
http://compbio.soe.ucsc.edu/HMM-apps/T02-query.html
http://distill.ucd.ie/porter/
http://sable.cchmc.org/
http://www.compbio.dundee.ac.uk/www-jpred/advanced.html
http://genamics.com/expression/strucpred.htm
http://www.predictprotein.org/
http://npsa-pbil.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_phd.html
http://www.chemie.uni-erlangen.de/lanig/PMII/sek_str.html
http://npsa-pbil.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_sopma.html
http://molbiol-tools.ca/Protein_secondary_structure.htm
http://mobyle.pasteur.fr/cgi-bin/portal.py?form=predator
http://www.aber.ac.uk/~phiwww/prof/
http://www.expasy.ch/tools/
http://gor.bb.iastate.edu/
http://www.predictprotein.org/
Try to predict TMDs
• Find a protein with TMDs
• Expasy will provide you with prediction methods
– DAS - Prediction of transmembrane regions in prokaryotes using the Dense
Alignment Surface method (Stockholm University)
– HMMTOP - Prediction of transmembrane helices and topology of proteins
(Hungarian Academy of Sciences)
– PredictProtein - Prediction of transmembrane helix location and topology
(Columbia University)
– SOSUI - Prediction of transmembrane regions (Nagoya University, Japan)
– TMHMM - Prediction of transmembrane helices in proteins (CBS; Denmark)
– TMpred - Prediction of transmembrane regions and protein orientation (EMBnetCH)
– TopPred - Topology prediction of membrane proteins (France)
RNA Secondary Structure
• Online
• http://compbio.cs.sfu.ca/taverna/alterna/
• http://www.bioinfo.rpi.edu/applications/mfold/
• Download
• RNAShapes
• RNAFold
• Get RNAs
– http://www.ncrna.org/frnadb/search.html
3D Structure Prediction?-W12
• Get a protein sequence
• Go to: http://bioinf.cs.ucl.ac.uk/psipred
– Use threading
• Got to: http://www.rcsb.org/pdb
– Find known structure
• Folding@home
– Ab inito prediction
Crystal structure of a monomeric retroviral protease solved by protein folding game players.
• FoldIt (http://fold.it/portal/)
Increased Diels-Alderase activity through backbone remodeling guided by Foldit players.
Download