s1,2,3,4

advertisement
Computational method on
biochemistry
정진원
순서
•
•
•
•
Protein Structure and Dynamics
Bioinformatics
Comparative modeling
Other method
Protein structure and dynamics
• Time scale in biological phenomena
• Newtonian mechanics
• Force field
• CHARMM
• AMBER
• Energy minimization
• Molecular Dynamics
• Example
Time scale in biological phenomena
-15
fs
ps
ns
ms
ms
s
~hr
Force field
• 주어진 분자에서 각 원자들의 좌표-위치로부터 에너지
를 정의.
• 이 값은 분자의 상태를 모사하기 위해 수치화한 것이므
로 실제 현상에서의 에너지와는 직접적인 관계는 없음.
Newtonian mechanics
•
•
•
•
F=ma
v=v0+at=f(t)
s=v0t+at2/2=g(t)
E=mv2/2
힘이 존재하고 시간이 흐르면 물체의 위치와 속도, 에너지는 변한다
Energy minimization
Energy minimization
구조를 최적화!!
Molecular Dynamics
Molecular Dynamics
• Etot=Epot+Ekin
CHemistry at HARvard
Macromolecular
Mechanics
• CHARMm forcefields
• CHARMm, which derives from CHARMM (CHemistry at HARvard
Macromolecular Mechanics), is a highly flexible molecular mechanics
and dynamics program originally developed in the laboratory of Dr.
Martin Karplus at Harvard University. It was parameterized on the basis
of ab initio energies and geometries of small organic models.
• Applicability
• CHARMm performs well over a broad range of calculations and
simulations, including calculation of geometries, interaction and
conformation energies, local minima, barriers to rotation, timedependent dynamic behavior, free energy, and vibrational frequencies
(Momany & Rone, 1992). CHARMm is designed to give good (but not
necessarily "the best") results for a wide variety of modelled systems,
from isolated small molecules to solvated complexes of large biological
macromolecules; however, it is not applicable to organometallic
complexes.
Assisted Model Building with Energy
Refinement
• AMBER forcefield
• The standard AMBER forcefield (Weiner et al. 1984, 1986) is
parameterized to small organic constituents of proteins and nucleic
acids. Only experimental data were used in parameterization.
• However, AMBER has been widely used not only for proteins and DNA,
but also for many other classes of models, such as polymers and small
molecules. For the latter classes of models, various authors have added
parameters and extended AMBER in other ways to suit their calculations.
The AMBER forcefield has also been made specifically applicable to
polysaccharides (Homans 1990, and see Homans' carbohydrate
forcefield).
• AMBER is used mainly for modeling proteins and nucleic acids. It is
generally lower in accuracy and has a limited range of applicability. The
use of AMBER is recommended mainly for those customers who are
familiar with AMBER and have developed their own AMBER-specific
parameters. It generally gives reasonable results for gas-phase model
geometries, conformational energies, vibrational frequencies, and
solvation free energies.
Application
•
•
•
•
protein motion
protein folding
enzyme mechanism
model optimization
In silico protein folding
1us=1,000,000,000 fs(or step)
644 step/sec on 256 CPUs CRAY
machine
Simulation of the travel of potassium
Bioinformatics
• Introduction
• Sequence alignment
• Pairwise sequence alignment
• BLAST
• Multiple sequence alignment
• CLUSTALW
• T-COFFEE
• Scoring matrix
• Structure Alignment
• Example
Pairwise alignment
• Smith-Waterman Algorithm
• BLAST – local alignment
• FASTA – global alignment
Smith-Waterman Algorithm
Align S1=ATCTCGTATGATG S2=GTCTATCAC
 A T
2 if ( x  y )
Sbt ( x, y )  
 1 else
=1, =1

G
T
C
T
A
T
C
A
C
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
1
0
2
1
C T
0 0
0 00
2 1
1 4
2 3
2 2
4 3
3 6
2 5
1 4
0
 H (i  1, j )  1

H (i, j )  max 
 H (i, j  1)  1
 H (i  1, j  1)  Sbt ( S1i , S 2 j )
0
0
2
3
6
5
4
5
5
4
C G T
A T G A T G
0
0
1
4
5
5
4
6
5
7
0
0
3
3
4
7
5
5
7
6
0
2
1
3
4
4
4
5
5
6
0
1
4
3
5
4
6
5
4
5
0
0
2
2
5
6
9
8
7
6
0 0
2 1
1 1
1 0
4 3
5 6
8 7
8 7
7 10
10
6 9
ATCTCGTATGATG
GTC TATCAC
0
0
3
2
2
5
8
7
9
9
0
2
2
2
1
4
7
7
8
8
BLAST
• Basic Local Alignment Search Tool
• Altschul, S.F., Gish, W., Miller, W.,
Myers, E.W. & Lipman, D.J.
Journal of Molecular Biology
v. 215, 1990, pp. 403-410
• Used to search sequence databases for local
alignments to a query
BLAST algorithm
• Keyword search of all words of length w from
the in the query of length n in database of
length m with score above threshold
• w = 11 for nucleotide queries, 3 for
proteins
• Do local alignment extension for each found
keyword
• Extend result until longest match above
threshold is achieved
• Running time O(nm)
BLAST algorithm
(cont’d)
keyword
Query: KRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKIFLENVIRD
GVK 18
GAK 16
Neighborhood
GIK 16
words
GGK 14
neighborhood
GLK 13
score threshold
GNK 12
(T = 13)
GRK 11
GEK 11
GDK 11
extension
Query: 22
VLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLK 60
+++DN +G +
IR L
G+K I+ L+ E+ RG++K
Sbjct: 226 IIKDNGRGFSGKQIRNLNYGIGLKVIADLV-EKHRGIIK 263
High-scoring Pair (HSP)
Original BLAST
• Dictionary
• All words of length w
• Alignment
• Ungapped extensions until score falls below
some threshold
• Output
• All local alignments with score > statistical
threshold
Original BLAST: Example
From lectures by Serafim Batzoglou
(Stanford)
C T G A T C C T G G A T T G C G A
• w=4
• Exact keyword
match of GGTC
• Extend
diagonals with
mismatches
until score is
under 50%
• Output result
GTAAGGTCC
GTTAGGTCC
A C G A A G T A A G G T C C A G T
ClustalW
• Popular multiple alignment tool today
• Several heuristics to improve accuracy:
• Sequences are weighted by relatedness
• Scoring matrix can be chosen “on the fly”
• Position-specific gap penalties
ClustalW (cont’d)
• Often used for protein alignment
• ‘W’ stands for ‘weighted’
• Different parts of alignment are weighted.
• Position/residue specific gap penalties.
• Three-step process
1.) Pairwise alignment
2.) Build Guide Tree
3.) Progressive Alignment
Step 1: Pairwise Alignment
• Aligns each sequence again each other giving
a distance matrix
• Distance = exact matches / sequence length
(percent identity)
S1
S2
S3
S4
S1
S2 S3 S4
.17 .87 .28 .59 .33 .62 -
(.17 means 17 % identical)
Step 2: Guide Tree
• Create Guide Tree using the distance matrix
• ClustalW uses the neighbor-joining method
• Guide tree roughly reflects evolutionary
relations
Step 2: Guide Tree (cont’d)
S1
S2
S3
S4
S1
S2 S3 S4
.17 .87 .28 .59 .43 .62 -
S1
S3
S4
S2
Calculate:
s1,3
= consensus(s1, s3)
s1,3,4
= consensus((s1,3),s4)
s1,2,3,4 = consensus((s1,3,4),s2)
Step 3: Progressive Alignment
• Align the two most similar sequences
• Following the guide tree, add in the next
sequences, aligning to the existing alignment
• Insert gaps as necessary
Sample output:
FOS_RAT
FOS_MOUSE
FOS_CHICK
FOSB_MOUSE
FOSB_HUMAN
PEEMSVTS-LDLTGGLPEATTPESEEAFTLPLLNDPEPK-PSLEPVKNISNMELKAEPFD
PEEMSVAS-LDLTGGLPEASTPESEEAFTLPLLNDPEPK-PSLEPVKSISNVELKAEPFD
SEELAAATALDLG----APSPAAAEEAFALPLMTEAPPAVPPKEPSG--SGLELKAEPFD
PGPGPLAEVRDLPG-----STSAKEDGFGWLLPPPPPPP-----------------LPFQ
PGPGPLAEVRDLPG-----SAPAKEDGFSWLLPPPPPPP-----------------LPFQ
.
. :
** .
:.. *:.*
*
. *
**:
Dots and stars show how well-conserved a column is.
Scoring Matrix
• BLOSUM
• PAM
• PSSM
PAM
• Percentage of Acceptable point Mutations per 108 years
• 어떤 아미노산이 임의의 아미노산으로 바뀔 수 있는 확률을 바탕
으로 score 설정
• matrices are based on global alignments of closely related
proteins. The PAM 1 is the matrix calculated from comparisons of
sequences with no more than 1% divergence. Scores are derived
from a mutation probability matrix where each element gives the
probability of the amino acid in column X mutating to the amino
acid in row Y after a particular evolutionary time, for example
after 1 PAM, or 1% divergence. A PAM matrix is specific for a
particular evolutionary distance, but may be used to generate
matrices for greater evolutionary distances by multiplying it
repeatedly by itself. However, at large evolutionary distances the
information present in the matrix is essentially degenerated. It is
rare that a PAM matrix would be used for an evolutionary
distance any greater than 256 PAMs.
BLOSUM
• Local alingment에 사용하기 위해 개발
• BLOcks SUbstitution Matrix
• 일정정도의 유사한 서열들을 모아 정렬하고
그 안에서 치환되는 정도를 이용해서 scoring
matrix작성
• BLOSUM 62는 유사성 62% 이상의 서열들을
모아서 작성한 것
Position Specific Scoring Matrix
• 유사한 단백질간의 서열 정렬결과를 바탕으로 특성 아미
노산이 특정 위치에 나타나는지의 여부를 점수화
• PSI-BLAST에서 사용하는 방법
• 특징적인 서열이나 잔기를 가지는 단백질에 대한 전역탐
색에 적절
Homology/Comparative modeling
• Introduction
• Method
• Example
Introduction
• 유사한 기능을 지닌 단백질은 유사한 구조를
가지고 있음.
• Ex) hemoglobin/myoglobin, ubiquitin/ubiquitin like
proteins. Serine proteases,
thioredoxin/glutaredoxin
Method
1. 30% 이상의 homology를 가진 단백질 중 구
조가 있는 것 검색
2. Pairwise or multiple sequence alignment
3. Alignment를 기준으로 구조를 따오거나
distance constraint작성.
4. Model 최적화
Example: Modeling of malonly-CoA synthetase
Malonyl-CoA synthetase
Firefly luciferase
Other Methods
• Simulated Annealing
• Monte Carlos method
• Docking
Download