#13 - Star Alignment; HMMs 9/19/07 Lecture 13 #13_Sept19

advertisement
#13 - Star Alignment; HMMs
9/19/07
Required Reading
BCB 444/544
(before lecture)
√ Mon Sept 17 - Lecture 12
Position Specific Scoring Matrices & PSI-BLAST
• Chp 6 - pp 75-78 (but not HMMs)
Lecture 13
Star Alignment & Clustal (for MSA)
Wed Sept 19 - Lecture 13
(not covered on Exam 1)
Profiles & Hidden Markov Models
• Chp 6 - pp 79-84
• Eddy: What is a hidden Markov Model?
Perhaps: Profiles &
Hidden Markov Models (HMMs)
2004 Nature Biotechnol 22:1315
http://www.nature.com/nbt/journal/v22/n10/abs/nbt1004-1315.html
#13_Sept19
Fri Sept 21 - EXAM 1
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
1
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
√ Sun Sept 16 - Study Guide for Exam 1 was posted
SECTION II
√ Mon Sept 17 - Answers to HW#2 were posted
Xiong: Chp 5
• √ Scoring Function
• √ Exhaustive Algorithms
• Heuristic Algorithms
• Star Alignment
• Clustal
• √ Practical Issues
Fri Sept 21 - Exam 1 - Will cover:
Lectures 2-12 (thru Mon Sept 17)
Labs 1-4
HW2
All assigned reading:
Chps 2-6 (but not HMMs)
Eddy: What is Dynamic Programming?
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
• First, review MSA scoring briefly, then back to
Star Alignment & ClustalW
3
Scoring an Alignment - in Lecture 12,
so will be covered on Exam 1
F
F
F
I
D
D
D
F
F
F
I
Y
Y
Y
G
G
Q
G
Q
G
K
• Compute for each column c:
Gap penalty
A
F
P
G
Q
I
K
F
F
F
I
I
-
F
F
F
I
D
D
D
W
W
W
W
W
W
W
A
F
P
G
Q
I
K
BCB 444/544 Fall 07 Dobbs
4
F
F
I
Y
Y
Y
F
F
I
-
S(mi) = Σk<l s(mik, mil)
I
D
D
D
mi
G
G
G
G
G
G
G
9/19/07
residue l
PAM or BLOSUM score
A
F
P
G
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
• SP = sum of pairs = sum of scores of all
possible pairs of sequences in an MSA,
based on a particular scoring matrix
i
A
F
P
G
Q
I
K
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
Sum of Pairs (SP) Score
In practice, simple scoring functions are used
Usually, columns are scored independently:
ith column of alignment m
SEQUENCE ALIGNMENT
Multiple Sequence Alignment
Thu Sept 20 - Lab = Optional Review Session for Exam
S(m) = ! S (mi )+ G
2
Chp 5- Multiple Sequence Alignment
Assignments & Announcements
•
•
•
•
9/19/07
5
F
F
F
I
F
F
F
I
G
G
Q
G
A
F
P
G
F
F
I
-
F
F
F
I
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
W
W
W
W
A
F
P
G
F
F
Y
D
D
G
G
G
G
9/19/07
6
1
#13 - Star Alignment; HMMs
9/19/07
Example: Calculating SP Score
F
Y
G
D
Algorithms & Software for MSA? #1
m1 m2 m3
F
Y
5
-2 -2 -1
7
G
I added
more colors
to this slide
D
1
-5
4
-3
M=
F
F
F
Y
G
G
D
Exhaustive Methods
• √ Multidimensional dynamic programming (DP)
• Divide-and-Conquer Alignment (DCA) - "semi-exhaustive"
web-based version available - see textbook
G
G
D
Prohibitive in both time
& space requirements for more than 10 sequences!!
• Full DP Optimal Global Alignment?
5
Gap penalty = -8
s(-,-) = 0
BLOSUM 60
Heuristic Methods
• Progressive (Star Alignment, Clustal)
• Iterative
• Block-based
S(m) = S(m1) + S(m2) + S(m3)
= 3s(F,F) + 2s(-,Y) + s(-,-) + s(G,G) + 2s(G,D)
= 15 -16 + 0 + 4 -6 = -3
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
7
Dynamic Programming for MSA
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
8
Generalized Needleman-Wunsch Algorithm
• As with pairwise alignments, MSAs can be computed by
dynamic programming*
Given 3 sequences x, y, and z:
*(if you're not in a rush!)
Main iteration loop:
S(i,j,k) =
F
2D
3D
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
9
What Happens to Computational Complexity?
max ( S(i-1, j-1, k-1) + σ(xi, yj, zk),
S(i-1, j-1, k ) + σ(xi, y j, - ),
S(i-1, j , k-1) + σ(xi, -, zk ),
S(i-1, j , k ) + σ(xi, -, - ),
S(i , j-1, k-1) + σ( -, yj, zk),
S(i , j-1, k ) + σ( -, y j, -),
S(i , j , k-1) + σ( -, -, zk) )
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
3D
9/19/07
10
What's so bad about those exponents?
Example: Running Time of DP for MSA
• Overall runtime: O(k22kn k)
Given k sequences of length n
• Space for matrix: O(nk )
• Neighbors/cell: 2k-1
• Time to compute SP score: O(k2)
• Overall runtime: O(k22kn k)
# Sequences
3D
 Wow!!!
Running Rime
2
1 second
3
2 minutes
4
5 hours
5
3 weeks
6
9 years
Sequences? Globins only »150 aa !!
But: There are fast heuristics
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
BCB 444/544 Fall 07 Dobbs
9/19/07
11
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
12
2
#13 - Star Alignment; HMMs
9/19/07
Progressive Alignment
Heuristic procedure:
1. Align most similar sequences first
2. Add sequences progressively
Guide Trees
Binary tree
• Leaves correspond to sequences
• Internal nodes represent alignments
• Root corresponds to final MSA
Multiple Alignment by
adding sequences
Often: use guide tree to determine
order of alignments
1
2
-TCG
-TCC
ATCATG-
3
4
2 Examples:
Star Alignment
ClustalW
ATC
ATG
ATC
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
13
•
•
Fast heuristic to compute MSA
Good approximation of optimal MSA, if scoring
scheme satisfies triangle inequality
1. Compute pairwise similarities
2. Select center sc that maximizes Σi≠c S(sc,si)
3. Add sequences in decreasing order of similarity to center sc
4. Produce a multiple alignment M such that, for every i ,
the induced pairwise alignment of sc and s i is same as
the optimal alignment of sc and s i
2. Clustal
9/19/07
15
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
16
of similarity to center sc
Does that function look familiar?
FGGHL-GF
F-GHLPGF
FGGHP-FG
FGGHL-GF
s1
MPE
MSKE
| |
| ||
MKE
M-KE
MKE
s1 :
s2 :
s3 :
s4 :
s3
s2
MPE
MKE
MSKE
SKE
||
SKE
Steiner consensus sequence or string: Given sequences
s1,…, sk, find a sequence s* that maximizes Σi S(s*,si)
"String" equivalent of arithmetic mean: consensus sequence is string
that minimizes sum of edit distances to members of a family of
strings (thus, maximizing similarity score…)
9/19/07
9/19/07
Step 3 - Add sequences in decreasing order
Step 2 - Select center sc that maximizes
Σi≠c S(sc,si)
BCB 444/544 Fall 07 Dobbs
14
Algorithm:
1. STAR Alignment
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
Star Alignment
Back to 2 Examples of
Progressive Alignment Heuristics for MSA:
Recall:
Consensus sequence = single sequence
(more accurately; "model") that
represents most common residue of
each column in MSA
TCG TCC
ATG
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
Star Alignment - skipped on Monday:
will NOT be covered on Exam 1
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
TCG
TCC
s4
MSKE
M-KE
S2+S3
17
M-PE
MSKE
M-KE
S-KE
M-PE
MSKE
M-KE
+S1
+S4
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
18
3
#13 - Star Alignment; HMMs
9/19/07
Step 4 - Produce a multiple alignment M
Complexity of Star Alignment?
such that for every i:
the induced pairwise alignment of sc and si
is same as optimal alignment of sc and si
Sc
AA--CCTT
Sc
A-ACC-TT
S1
AATGCC--
S2
AGACCGT-
S1
A-ATGCC---
Sc
A-A--CC-TT
S2
AGA--CCGT-
Given k sequences of length n, and an upper bound l
for alignment length
We need:
• O(k2n2) to compute the alignments
• O(k2) to compute the center
• O(k2l) to build multiple alignment
Overall: O(k2n2)
Duh - Is this really much better than O(k22 knk)?
YES!
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
19
CLUSTAL: Overview
2
3
4
Progressive
Alignment
5
1
Distance Matrix
2
3
5
9/19/07
20
1
2
3
4
9/19/07
22
5
4
3
4
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
2
3
1
2
k = # of sequences
n = length of sequences
CLUSTAL: Example
Guide Tree
1
Remember:
1
4
1. Compute pairwise alignments (DP)
2. Convert similarities into distances
1+2
1+3
1+4
Distance between a pair = # of mismatched
positions in alignment (divided by total # of
matches)
2+3
2+4
3+4
3. Build guide tree from distances by
Neighbor Joining
4. Align with respect to guide tree
Pairwise Alignments
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
21
One "small" problem?
Finding the Guide Tree
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
CLUSTAL W Tree
Goal: Given k sequences and their pairwise
distances, find a tree, such that all distances
correspond to path lengths between leaves
Problem: Such a tree might not exist!
Guide Tree
1
2
3
4
5
1
1
2
Distance Matrix
3
4
5
2
3
4
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
BCB 444/544 Fall 07 Dobbs
Tree calculated from an alignment of >1100 ring finger domains,
using ClustalW 1.83
9/19/07
23
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
24
4
#13 - Star Alignment; HMMs
9/19/07
Algorithms & Software for MSA? #3
will NOT be covered on Exam1
Algorithms & Software for MSA? #2
Heuristic Methods - continued
√ Exhaustive Methods
• Progressive alignments (Star Alignment, Clustal)
• Others: T-Coffee, DbClustal -see text: can be better than Clustal
• Multidimensional dynamic programming (DP)
• Match closely-related sequences first using a guide tree
• Divide-and-Conquer Alignment (DCA) - "semi-exhaustive"
web-based version available - see textbook
• Partial order alignments (POA)
• Doesn't rely on guide tree; adds sequences in order given
• PRALINE
• Full DP Optimal Global Alignment? Prohibitive in both time
& space requirements for more than 10 sequences!!
• Preprocesses input sequences by building profiles for each
• Iterative methods
• Idea: optimal solution can be found by repeatedly modifying existing
suboptimal solutions (eg: PRRN)
Heuristic Methods
• √ Progressive (Star Alignment, Clustal)
• Iterative
• Block-based
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
• Block-based Alignment
• Multiple re-building attempts to find best alignment
(eg: DIALIGN2 & Match-Box)
• Local alignments
• Profiles, Blocks, Patterns - more on these soon!
9/19/07
25
• e.g., larger penalty for Ala→Gly substitution if
in a helix rather than in a loop
Profiles & HMMs
• Basic idea:
• √ Position Specific Scoring Matrices (PSSMs)
• √ PSI-BLAST
• Use BLAST with high stringency to generate a set of
closely related sequences
• Align those sequences to create a new substitution
matrix for each position
• Use this matrix (iteratively) to find additional sequences
First, review above briefly, then:
• Profiles
• Markov Models & Hidden Markov Models
9/19/07
27
Position-Specific
Scoring Matrix
A PSSM is:
• a representation of a motif
• an n by m matrix, where n is
size of alphabet & m is length of
sequence
• a matrix of scores in which
entry at (i, j) is score assigned
by PSSM to letter i at the jth
position
Convert query to PSSM (or a Profile)
do {
BLAST database with PSSM
Stop if no new homologs are found
Add new homologs to PSSM
}
Print current set of homologs
Xiong: PSSM = table that contains
probability information re: residues
at each position of an ungapped MSA
Note: Xiong textbook distinguishes between PSSMs
(which have no gaps) & Profiles (can include gaps).
Thus, based on these definitions, PSI-BLAST uses a
Profile to iteratively add new homologs - other authors
refer to pattern used by PSI-BLAST as a PSSM.
9/19/07
9/19/07
What is a PSSM?
Position-Specific Scoring Matrix
PSI-BLAST Pseudocode
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
Also, sometimes called:
Position Weight Matrix (PWM)
29
28
I added
more text to
this slide
8 residue sequence
20 letter alphabet
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
BCB 444/544 Fall 07 Dobbs
26
• Position Specific Iterated BLAST
• Intuition: substitution matrices should be
"sensitive" to protein context
SEQUENCE ALIGNMENT
Xiong: Chp 6
This step requires
a user-defined
threshold
9/19/07
PSI-BLAST (Covered in Lecture 12, so
will be covered on Exam1)
Chp 6 - Profiles & Hidden Markov Models
SECTION II
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
A
-1
-2
-1
0
-1
-2
0
R
5
0
5
-2
1
-3
-2
N
0
6
0
0
0
-3
0
1
D
-2
1
-2
-1
0
-3
-1
-1
C
-3
-3
-3
-3
-3
-2
-3
-3
Q
-2
0
1
0
1
-2
5
-3
-2
E
0
0
0
-2
2
-3
-2
0
G
-2
0
-2
6
-2
-3
6
-2
0
H
0
1
0
I
-3
-3
-3
L
-2
-3
-2
-4
-2
0
-4
-3
K
2
0
2
-2
1
-3
-2
-1
M
-1
-2
-1
-3
0
0
-3
-2
F
-3
-3
-3
-3
-3
6
-3
-1
P
-2
-2
-2
-2
-1
-4
-2
-2
S
-1
1
-1
0
0
-2
0
-1
T
-1
0
-1
-2
-1
-2
-2
-2
W
-3
-4
-3
-2
-2
1
-2
-2
Y
-2
-2
-2
-3
-1
3
-3
2
V
-3
-3
-3
-3
-2
-1
-3
-3
“K”
-2 at0 position
-1
-2 3
8
-4
0
-4 2 -3
gets
a-3score
of
Note: Assumes positions are independent
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
30
5
#13 - Star Alignment; HMMs
9/19/07
Assigning a "Match" Score with a PSSM
Creating a PSSM from 1 Sequence
R
L
PSSM assigns sequence
NMFWAFGH
-1
-2
-1
0
-1
-2
0
-2
A
-1
-2
-1
0
-1
-2
0
R
5
0
5
-2
1
-3
-2
0
R
5
0
5
-2
1
-3
-2
N
0
6
0
0
0
-3
0
1
N
0
6
0
0
0
-3
0
1
D
-2
1
-2
-1
0
-3
-1
-1
D
-2
1
-2
-1
0
-3
-1
-1
C
-3
-3
-3
-3
-3
-2
-3
-3
C
-3
-3
-3
-3
-3
-2
-3
-3
Q
a score of:
0 + -2 + -3 + -2 + -1 + 6 + 6 + 8 =
A
12
RNRGQFGH
-2
0
1
0
1
-2
5
-3
-2
0
Q
1
0
1
-2
5
-3
-2
E
0
0
0
-2
2
-3
-2
0
E
0
0
0
-2
2
-3
-2
0
G
-2
0
-2
6
-2
-3
6
-2
G
-2
0
-2
6
-2
-3
6
-2
H
0
1
0
-2
0
-1
-2
8
I
-3
-3
-3
-4
-3
0
-4
-3
L
-2
-3
-2
-4
-2
0
-4
-3
H
0
1
0
-2
0
-1
-2
8
I
-3
-3
-3
-4
-3
0
-4
-3
L
-2
-3
-2
-4
-2
0
-4
-3
R
0
K
2
0
2
-2
1
-3
-2
-1
K
2
0
2
-2
1
-3
-2
-1
M
-1
-2
-1
-3
0
0
-3
-2
M
-1
-2
-1
-3
0
0
-3
-2
F
-3
-3
-3
-3
-3
6
-3
-1
F
-3
-3
-3
-3
-3
6
-3
-1
P
-2
-2
-2
-2
-1
-4
-2
-2
P
-2
-2
-2
-2
-1
-4
-2
-2
S
-1
1
-1
0
0
-2
0
-1
S
-1
1
-1
0
0
-2
0
-1
T
-1
0
-1
-2
-1
-2
-2
-2
T
-1
0
-1
-2
-1
-2
-2
-2
W
-3
-4
-3
-2
-2
1
-2
-2
W
-3
-4
-3
-2
-2
1
-2
-2
Y
-2
-2
-2
-3
-1
3
-3
2
Y
-2
-2
-2
-3
-1
3
-3
2
V
-3
-3
-3
-3
-2
-1
-3
-3
V
-3
-3
-3
-3
-2
-1
-3
-3
BLOSUM62
matrix
20 by 20
20 by L
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
31
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
1. Discard columns that contain gaps in query sequence
2. Compute relative sequence weights
3. Compute PSSM entries, taking into account
• Observed residues in column
• Sequence weights
• Substitution matrix
2- Compute Sequence Weights
EEFGSVDGLVNNA
QKYGRLDVMINNA
RRLGTLNVLVNNA
GGIGPVDLLVNNA
KALGGFNVIVNNA
ARFGKIDTLIPNA
FEPEGMWGLVNNA
AQLKTVDVLINGA
1.2
1.2
0.8
0.8
1.1
0.9
1.1
1.3
9/19/07
33
3- Compute PSSM Entries
How are weights determined?
Based on branch lengths in guide tree: value for each sequence is
then used to multiply raw alignment scores
Goal of weighting? to decrease matching scores of frequent
characters in MSA & increase scores of infrequent characters
E
Q
R
G
K
A
F
A
/
Background
frequencies
Usually derived from
large sequence database
35
9/19/07
34
This slide was modified
(simplified version)
Observed
residues
• Larger weights are assigned
to unique sequences
BCB 444/544 Fall 07 Dobbs
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
Info re: weights
was added to
this slide
9/19/07
EEFGSVDGLVNNA
QKYGRLDVMINNA
RRLGTLNVLVNNA
GGIGPVD-LVNNA
KALGGFNVIVNNA
ARFGKID-LIPNA
FEPEGMWGLVNNA
AQLKTVDVLINGA
EEFG----SVDGLVNNA
QKYG----RLDVMINNA
RRLG----TLNVLVNNA
GGIG----PVD-LVNNA
KALG----GFNVIVNNA
ARFG----KID-LIPNA
FEPEGPEKGMWGLVNNA
AQLK----TVDVLINGA
• Smaller weights are assigned
to redundant sequences
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
32
1- Discard Columns with Gaps in Query
Creating a PSSM from Multiple Sequences
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
A
C
D
E
F
G
H
I
K
L
M
P
Q
R
S
T
V
W
Y
0.085
0.019
0.054
0.065
0.040
0.072
0.023
0.058
0.056
0.096
0.024
0.053
0.042
0.054
0.072
0.063
0.073
0.016
0.034
=
PSSM
column
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
PSSM
9/19/07
36
6
#13 - Star Alignment; HMMs
9/19/07
This slide was modified
PSSM Entries = Log-Odds Scores
Observed frequency
of residue “A”
1. Estimate probability of observing
each residue (probability of A given
M, where M is PSSM model)
2. Divide by background probability of
observing each residue (probability
of A given B, where B is background
model)
3. Take log so that can add (rather than
multiply) scores
• Psi-BLAST weights sequences according to observed
diversity specific to family under investigation
Foreground model
(i.e., the PSSM)
• Advantage: If sequences used to construct PSSMs
are all homologous, sensitivity for a given level of
specificity improves significantly
& Pr (A M )#
!
log 2 $$
!
% Pr (A B ) "
• Disadvantage: However, if any non-homologous
sequences are included in PSSMs, they become
“corrupted” and "pull in" additional non-homologous
sequences, resulting in false positive hits
Background model
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
37
How to Use PSI-BLAST Effectively
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
38
Summary: DP, BLAST & PSI-BLAST
• Dynamic programming is O(NM) for pairwise alignment
• Set initial thresholds high
• Inspect each iteration's result for suspicious
sequences (When in doubt, leave it out!)
• Do several iterations (~5), or until no new sequences
are found
• Make initial search very broad
• BLAST is O(M)
• BLAST produces an index of words in query sequence
that allows fast matching to the database
• At NCBI, target databases are also pre-indexed to
indicate positions in all database sequences that
match each possible search word above some score
threshold
• First, use NR (large, inclusive database) with up to 5 iterations
to set PSSM
• Then use that PSSM to search in a more restricted domain, if
possible
• Be particularly cautious about matches to sequences
with highly biased amino acid content
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
Why (not) PSI-BLAST?
9/19/07
• PSI-BLAST iterates BLAST, adding new homologs at
each iteration
39
Applications of MSA
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
40
Application: Discover Conserved Patterns
Is there a conserved cis-acting regulatory sequence?
• Building phylogenetic trees
• Finding conserved patterns:
Rationale: if sequences are homologous (derived from a common ancestor),
they may be structurally/functionally equivalent
• Regulatory motifs (TF binding sites)
• Splice sites
• Protein domains
TATA box = transcriptional
promoter element
• Identifying and characterizing protein families
• Find out which protein domains have same function
• Finding SNPs (single nucleotide polymorphisms) &
mRNA isoforms (alternatively spliced forms)
• DNA fragment assembly (in genomic sequencing)
Sequence Logo
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
BCB 444/544 Fall 07 Dobbs
9/19/07
41
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
9/19/07
42
7
#13 - Star Alignment; HMMs
9/19/07
Sequence Motifs (Patterns)
Other types of representations?
• √ Consensus Sequence
• √ PSSM - Position-Specific Scoring Matrix
• √ Sequence Logo - "enhanced"consensus sequence,
in which symbol size ∝ information entropy
• Information entropy??? In information theory, the Shannon
entropy or information entropy is a measure of the [decrease in]
uncertainty associated with a random variable. Entropy quantifies
information in a piece of data.
- Wikipedia
• Check out this fun website: Tom Scheider, NCIF
• http://www.ccrnp.ncifcrf.gov/~toms/glossary.html#sequence_logo
• Profile
• HMM - Hidden Markov Model
BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs
BCB 444/544 Fall 07 Dobbs
9/19/07
43
8
Download