HandsOn

advertisement
(PSI-)BLAST & MSA
via Max-Planck
General Issues
• Where? (to find homologues)
• Structural templates- search against the PDB
• Sequence homologues- search against SwissProt or
Uniprot (recommended!)
• How many?
• As many as possible, as long as the MSA looks good
(next week…)
General Issues
• How long? (length of homologues)
• Fragments- short homologues (less than 50,60% the
query’s length) = bad alignment
• Ensure your sequences exhibit the wanted domain(s)
• N/C terminal tend to vary in length between homologues
• How close? (distance from query sequence)
• All too close- no information
• Too many too far- bad alignment
• Ensure that you have a balanced collection!
General Issues
• From who? (which species the sequence belongs to)
• Don’t care, all homologues are welcome
• Orthologues/paralogues may be helpful
• Sequences from distant/close species provide different
types of information
• Which method? (BLAST/PSI-BLAST)
• Depends on the protein, available homologues, the goal
in mind…
General Issues
Rules For Choosing Sequences
• Very similar sequences have little information
•
Very different sequences cause trouble…<30% identical with
more than half of the other sequences in the set
• Choose sequences as distantly related as possible
Sequence between 30-80% identical with more than half of the
sequences in the set
• The more sequences the better
Overall work steps
1. Run the search1. Select database
2. E-value threshold
3. BLAST or PSI-BLAST- how many rounds?
2. Take out sequences- HSP (slider region) or full sequences
3. Align sequences- choose alignment program
4. View alignment with BioEdit tor another program
5. Calculate trees, conservation scores (ConSurf) etc…
(PSI-)BLAST via Max-Planck
http://toolkit.tuebingen.mpg.de/sections/search
• Databases- swissprot, tremble, NR, env, pdb or any
combination for proteins, but only NT for DNA.
• All BLAST programs
Main advantage- you can easily extract and filter the
HSPs, on top of full sequences
The Query Protein
Name: Dihydrodipicolinate reductase
Enzyme reaction:
Molecular process: Lysine biosynthesis (early stages)
Organism: E. coli
Sequence length: 273 aa
The Query Protein
Query:
DAPB_ECOLI
>DAPB_ECOLI
MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAV
KDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLL
EKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATV
RAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL
(PSI-)BLAST via Max-Planck
http://toolkit.tuebingen.mpg.de/psi_blast/
Upload sequence
or MSA
Choose database or
databases (selecting a
few using CTRL)
(PSI-)BLAST via Max-Planc
(PSI-)BLAST via Max-Planc
(PSI-)BLAST via Max-Planc
(PSI-)BLAST via Max-Planc
(PSI-)BLAST via Max-Planck
E-value threshold can be assessed using the distribution
Forward results to MSA
http://toolkit.tuebingen.mpg.de/sections/alignment
Forward results to MSA
Forward results
to MSA
All marked hits or
filter by e-value
HSP (sider region) or
full sequences
Forward results to MSA
Align via Max-Planck
Alignment results:
Save the alignment
Alignmen viewing & editing
BioEdit
• http://www.mbio.ncsu.edu/BioEdit/BioEdit.html
• Easy-to-use sequence alignment editor
• View and manipulate alignments up to 20,000 sequences.
•Four modes of manual alignment: select and slide, dynamic grab
and drag, gap insert and delete by mouse click, and on-screen
typing which behaves like a text editor.
•Reads and writes Genbank, Fasta, Phylip 3.2, Phylip 4, and
NBRF/PIR formats. Also reads GCG and Clustal formats
Alignment viewing & editing
Easiest Using Bioedit
http://www.mbio.ncsu.edu/BioEdit/bioedit.html
Alignment viewing & editing
Easiest Using Bioedit
• Find a specific sequence: “Edit-> search -> in titles”
• Erase\add sequences: “Edit-> cut\paste\delete sequence”
• “Sequence Identity matrix” under “Alignment”useful for a rough evaluation of distances within the alignment.
• After taking out sequences, “Minimize Alignment” under
“Alignment” takes out unessential gaps.
• Can save an image using:
“File -> Graphic View” & then “Edit -> Copy page as BITMAP”
http://www.mbio.ncsu.edu/BioEdit/bioedit.html
A little of ConSurf
Compute Conservation Scores
• Give an MSA or will compute one for you (given
a FASTA sequence, BLAST & MSA)
Main advantage:
filters short HSPs, removes redundant sequences
• Shows conservation scores on sequence or on a
protein structure (if available)
ConSurf
http://consurf.tau.ac.il/
ConSurf
ConSurf
http://consurf.tau.ac.il/results/1321532763/output.php
ConSurf
http://consurf.tau.ac.il/results/1321532763/output.php
ConSurf
Sequence conservation
MSA colored by conservation
PSI-BLAST result
Sequences used
MSA
Phylogenetic tree
ConSurf
Jmol- Easy web-based viewer
WebLogo
http://weblogo.berkeley.edu/logo.cgi
WebLogo
http://weblogo.berkeley.edu/logo.cgi
No “Miracle solution” 
Each sequence is a different story 
adjust parameters:
• BLAST- E-value, substitution matrix, gap penalties,
database, minimum length, redundancy level, fragment
overlap…
• PSI-BLAST- BLAST parameters + PSSM inclusion
threshold (or chose manually), number of rounds…
• Try using HSP or full sequences, different MSA programs…
Download