HandsOn

(PSI-)BLAST & MSA via Max-Planck General Issues • Where? (to find homologues) • Structural templates- search against the PDB • Sequence homologues- search against SwissProt or Uniprot (recommended!) • How many? • As many as possible, as long as the MSA looks good (next week…) General Issues • How long? (length of homologues) • Fragments- short homologues (less than 50,60% the query’s length) = bad alignment • Ensure your sequences exhibit the wanted domain(s) • N/C terminal tend to vary in length between homologues • How close? (distance from query sequence) • All too close- no information • Too many too far- bad alignment • Ensure that you have a balanced collection! General Issues • From who? (which species the sequence belongs to) • Don’t care, all homologues are welcome • Orthologues/paralogues may be helpful • Sequences from distant/close species provide different types of information • Which method? (BLAST/PSI-BLAST) • Depends on the protein, available homologues, the goal in mind… General Issues Rules For Choosing Sequences • Very similar sequences have little information • Very different sequences cause trouble…<30% identical with more than half of the other sequences in the set • Choose sequences as distantly related as possible Sequence between 30-80% identical with more than half of the sequences in the set • The more sequences the better Overall work steps 1. Run the search1. Select database 2. E-value threshold 3. BLAST or PSI-BLAST- how many rounds? 2. Take out sequences- HSP (slider region) or full sequences 3. Align sequences- choose alignment program 4. View alignment with BioEdit tor another program 5. Calculate trees, conservation scores (ConSurf) etc… (PSI-)BLAST via Max-Planck http://toolkit.tuebingen.mpg.de/sections/search • Databases- swissprot, tremble, NR, env, pdb or any combination for proteins, but only NT for DNA. • All BLAST programs Main advantage- you can easily extract and filter the HSPs, on top of full sequences The Query Protein Name: Dihydrodipicolinate reductase Enzyme reaction: Molecular process: Lysine biosynthesis (early stages) Organism: E. coli Sequence length: 273 aa The Query Protein Query: DAPB_ECOLI >DAPB_ECOLI MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAV KDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLL EKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATV RAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL (PSI-)BLAST via Max-Planck http://toolkit.tuebingen.mpg.de/psi_blast/ Upload sequence or MSA Choose database or databases (selecting a few using CTRL) (PSI-)BLAST via Max-Planc (PSI-)BLAST via Max-Planc (PSI-)BLAST via Max-Planc (PSI-)BLAST via Max-Planc (PSI-)BLAST via Max-Planck E-value threshold can be assessed using the distribution Forward results to MSA http://toolkit.tuebingen.mpg.de/sections/alignment Forward results to MSA Forward results to MSA All marked hits or filter by e-value HSP (sider region) or full sequences Forward results to MSA Align via Max-Planck Alignment results: Save the alignment Alignmen viewing & editing BioEdit • http://www.mbio.ncsu.edu/BioEdit/BioEdit.html • Easy-to-use sequence alignment editor • View and manipulate alignments up to 20,000 sequences. •Four modes of manual alignment: select and slide, dynamic grab and drag, gap insert and delete by mouse click, and on-screen typing which behaves like a text editor. •Reads and writes Genbank, Fasta, Phylip 3.2, Phylip 4, and NBRF/PIR formats. Also reads GCG and Clustal formats Alignment viewing & editing Easiest Using Bioedit http://www.mbio.ncsu.edu/BioEdit/bioedit.html Alignment viewing & editing Easiest Using Bioedit • Find a specific sequence: “Edit-> search -> in titles” • Erase\add sequences: “Edit-> cut\paste\delete sequence” • “Sequence Identity matrix” under “Alignment”useful for a rough evaluation of distances within the alignment. • After taking out sequences, “Minimize Alignment” under “Alignment” takes out unessential gaps. • Can save an image using: “File -> Graphic View” & then “Edit -> Copy page as BITMAP” http://www.mbio.ncsu.edu/BioEdit/bioedit.html A little of ConSurf Compute Conservation Scores • Give an MSA or will compute one for you (given a FASTA sequence, BLAST & MSA) Main advantage: filters short HSPs, removes redundant sequences • Shows conservation scores on sequence or on a protein structure (if available) ConSurf http://consurf.tau.ac.il/ ConSurf ConSurf http://consurf.tau.ac.il/results/1321532763/output.php ConSurf http://consurf.tau.ac.il/results/1321532763/output.php ConSurf Sequence conservation MSA colored by conservation PSI-BLAST result Sequences used MSA Phylogenetic tree ConSurf Jmol- Easy web-based viewer WebLogo http://weblogo.berkeley.edu/logo.cgi WebLogo http://weblogo.berkeley.edu/logo.cgi No “Miracle solution”  Each sequence is a different story  adjust parameters: • BLAST- E-value, substitution matrix, gap penalties, database, minimum length, redundancy level, fragment overlap… • PSI-BLAST- BLAST parameters + PSSM inclusion threshold (or chose manually), number of rounds… • Try using HSP or full sequences, different MSA programs…

HandsOn

Related documents

Products

Support

HandsOn

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib