Needleman-Wunsch-W3 • ACGTGTGCGTTTGAAC • GGGTGTAGTCGTTTAAAC • Apply the Needleman-Wunsch algorithm to these two sequences • Score the alignments EBI FASTA-W4 Use the FASTA file you created before Run your query on EBI using the fasta algorithm with the default settings Change the settings and keep track of which settings you use and the number of queries that have the correct result as the top hit Use Excel (settings, %correct) NCBI BLAST Use the FASTA file you produced before and do the same research using NCBI BLAST that you did for EBI fasta Use blastn Select the proper database Finish EBI FASTA if you couldn't before Sequencing-W5 3. Arbitrarily add linebreaks into the resulting document 1. At least 30 (10 per copy min) 2. Spread out throughout the sequence 4. Add a FASTA definition line after each line break – Use >Copy-N-Fragment-X as a template for the definition line • Ensure that the overall number of characters is less than 50000 Restriction Maps • You sent a sample for sequencing. You might want to check if the sequence makes sense • What is a restriction map? • www.restrictionmapper.org CAP3 Assembly • GOTO: http://pbil.univ-lyon1.fr/cap3.php • Use the sequences you prepared earlier to assemble them with cap3 • Analyze the results – Did you get a full correct assembly? Prokaryotic DNA-W6 • Finding protein coding regions • Finding ORFs • Goto NCBI and find the entry for – M68521, gi|147118 – Get the FASTA sequence – Keep the gene bank entry visible More Gene Finding Tools • Large Collection – http://www.nslij-genetics.org/gene/programs.html • GeneScan – http://genes.mit.edu/GENSCAN.html • HMMgene – http://www.cbs.dtu.dk/services/HMMgene/ • GeneBuilder – http://zeus2.itb.cnr.it/~webgene/genebuilder.html Finding Genes • http://rulai.cshl.org/tools/genefinder/human.htm • Get AF018429 from gene bank • Enter the FASTA sequence and predict the gene • Double check with http://genes.mit.edu/genomescan.html More Gene Finding Tools • GeneScan – http://genes.mit.edu/GENSCAN.html • HMMgene – http://www.cbs.dtu.dk/services/HMMgene/ • Gene Prediction Software List – http://en.wikipedia.org/wiki/List_of_gene_predict ion_software Gathering Sequences-W7 • Retrieve a protein sequence from NCBI – Translated nucleotides could be tried • Go to: http://www.expasy.ch/tools/blast • Paste that sequence into the box Gathering Sequences • Scroll through the results and select about 10 full length sequences • From different levels of similarity e.g. Different number of identities • Export collection as FASTA Identities in Range? • Go to: http://www.biolnk.com – Choose Tools and then MultiIdentity – Paste your FASTA formated information – Set the thresholds – See if all sequences are in the desired range of identities amongst each other • Add/ Delete Sequences accordingly MSA • http://www.ebi.ac.uk/clustalw • http://www.tcoffee.org • http://www.drive5.com/muscle • Try all the above and compare the resulting MSAs Converting Formats • http://bioweb.pasteur.fr/seqanal/interfaces/ fmtseq.html • Names (>…) no longer than 15 characters • Different formats maintain different data • Converting will introduce the problem of loosing data • Make sure to have a master copy Editing Alignments • http://www.jalview.org • Start the applet • Choose File – Input Alignment – from Textbox • Copy and paste the ClustalW alignment Logo • http://blocks.fhcrc.org/blocks/process_blocks. html • Retrieve the FASTA sequence of your alignment • Paste it to the box above and create blocks Logos • Go to: http://weblogo.berkeley.edu • Copy and paste one of the blocks, turn it into FASTA format • Create the logo Create an MSA-W8 • This time use 20 – 50 sequences – From different species • Use ClustalW for alignment • Most ClustalW servers display a dendrogram • Confirm this by using a few of them Gathering Sequences • Download the sequences as a FASTA file as well • Most programs will support this format Editing Alignments • http://www.jalview.org • Start the program • Choose File – Input Alignment – from Textbox • Copy and paste the ClustalW alignment Dendrogram • Jalview also allows you to view different types of Dendrograms based on different similarity measures • Use Jalview and compare the trees that are constructed based on the different measures Edit your MSA • Remove blocks consisting of mostly gaps (using JalView) • Remove N- and C-termini if not conserved well Easy Tree • • • • www.ebi.ac.uk/clustalw/ Paste your alignment Select a tree type Other options need to be set (see right) • Press run • Make a screen shot • You can paste it where needed Phylip (More elaborate tree) • http://bioweb.pasteur.fr/seqanal/phylogeny/ phylip-uk.html • Choose protdist from the page • Paste the MSA • Bootstrapping e.g.: Phylip • Run the query • Click further analysis Click Run Select full screen view There is your tree Other Resources • http://en.wikipedia.org/wiki/List_of_phylogen etics_software • http://itol.embl.de/ NCBI-W9 • http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi • Browse the webpage for 15 minutes Available Data • • • • • Search for human data How much data is available? Find accession ERX628533 How large is the dataset? Why is it so large? Let’s tackle this problem-W10 • Get a protein from swissprot – O82533 (Gene: AtFtsZ2-1) • Annotation: Chloroplast targeting • Try a few prediction tools to see if you can confirm the annotation Localization Prediction • Choose tools from Expasy for example • ChloroP • SignalP • Predotar Summary • • • • • Look at the IPR summary, if any Select Table: For all matching proteins Select the FASTA option on the following page Add your original sequence to the FASTA coll. Make an MSA Protein Domains • Use the same sequence – Do the same analysis using NCBI CD server – http://www.ncbi.nlm.nih.gov/Structure/cdd/w rpsb.cgi • NCBI may have domains that InterScanPro doesn’t have and vice versa CD Server In class assignment-W11 • Choose a protein sequence – Not too short! • Perform secondary structure predictions with as many tools as possible – Google at least one more than given in the slides • Retrieve and rewrite the predictions such that they use the 3 letter code (H,C,S; Helix, Coil, Sheet) – Use search and replace functionality of your word processor • Make an MSA with the predicted secondary structures to compare the results – Are there gaps? – Are they within the transition from one secondary structure to the next? Sec Struct Prediction http://bioinf.cs.ucl.ac.uk/psipred/psiform.html http://compbio.soe.ucsc.edu/HMM-apps/T02-query.html http://distill.ucd.ie/porter/ http://sable.cchmc.org/ http://www.compbio.dundee.ac.uk/www-jpred/advanced.html http://genamics.com/expression/strucpred.htm http://www.predictprotein.org/ http://npsa-pbil.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_phd.html http://www.chemie.uni-erlangen.de/lanig/PMII/sek_str.html http://npsa-pbil.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_sopma.html http://molbiol-tools.ca/Protein_secondary_structure.htm http://mobyle.pasteur.fr/cgi-bin/portal.py?form=predator http://www.aber.ac.uk/~phiwww/prof/ http://www.expasy.ch/tools/ http://gor.bb.iastate.edu/ http://www.predictprotein.org/ Try to predict TMDs • Find a protein with TMDs • Expasy will provide you with prediction methods – DAS - Prediction of transmembrane regions in prokaryotes using the Dense Alignment Surface method (Stockholm University) – HMMTOP - Prediction of transmembrane helices and topology of proteins (Hungarian Academy of Sciences) – PredictProtein - Prediction of transmembrane helix location and topology (Columbia University) – SOSUI - Prediction of transmembrane regions (Nagoya University, Japan) – TMHMM - Prediction of transmembrane helices in proteins (CBS; Denmark) – TMpred - Prediction of transmembrane regions and protein orientation (EMBnetCH) – TopPred - Topology prediction of membrane proteins (France) RNA Secondary Structure • Online • http://compbio.cs.sfu.ca/taverna/alterna/ • http://www.bioinfo.rpi.edu/applications/mfold/ • Download • RNAShapes • RNAFold • Get RNAs – http://www.ncrna.org/frnadb/search.html 3D Structure Prediction?-W12 • Get a protein sequence • Go to: http://bioinf.cs.ucl.ac.uk/psipred – Use threading • Got to: http://www.rcsb.org/pdb – Find known structure • Folding@home – Ab inito prediction Crystal structure of a monomeric retroviral protease solved by protein folding game players. • FoldIt (http://fold.it/portal/) Increased Diels-Alderase activity through backbone remodeling guided by Foldit players.