Computational method on biochemistry 정진원 순서 • • • • Protein Structure and Dynamics Bioinformatics Comparative modeling Other method Protein structure and dynamics • Time scale in biological phenomena • Newtonian mechanics • Force field • CHARMM • AMBER • Energy minimization • Molecular Dynamics • Example Time scale in biological phenomena -15 fs ps ns ms ms s ~hr Force field • 주어진 분자에서 각 원자들의 좌표-위치로부터 에너지 를 정의. • 이 값은 분자의 상태를 모사하기 위해 수치화한 것이므 로 실제 현상에서의 에너지와는 직접적인 관계는 없음. Newtonian mechanics • • • • F=ma v=v0+at=f(t) s=v0t+at2/2=g(t) E=mv2/2 힘이 존재하고 시간이 흐르면 물체의 위치와 속도, 에너지는 변한다 Energy minimization Energy minimization 구조를 최적화!! Molecular Dynamics Molecular Dynamics • Etot=Epot+Ekin CHemistry at HARvard Macromolecular Mechanics • CHARMm forcefields • CHARMm, which derives from CHARMM (CHemistry at HARvard Macromolecular Mechanics), is a highly flexible molecular mechanics and dynamics program originally developed in the laboratory of Dr. Martin Karplus at Harvard University. It was parameterized on the basis of ab initio energies and geometries of small organic models. • Applicability • CHARMm performs well over a broad range of calculations and simulations, including calculation of geometries, interaction and conformation energies, local minima, barriers to rotation, timedependent dynamic behavior, free energy, and vibrational frequencies (Momany & Rone, 1992). CHARMm is designed to give good (but not necessarily "the best") results for a wide variety of modelled systems, from isolated small molecules to solvated complexes of large biological macromolecules; however, it is not applicable to organometallic complexes. Assisted Model Building with Energy Refinement • AMBER forcefield • The standard AMBER forcefield (Weiner et al. 1984, 1986) is parameterized to small organic constituents of proteins and nucleic acids. Only experimental data were used in parameterization. • However, AMBER has been widely used not only for proteins and DNA, but also for many other classes of models, such as polymers and small molecules. For the latter classes of models, various authors have added parameters and extended AMBER in other ways to suit their calculations. The AMBER forcefield has also been made specifically applicable to polysaccharides (Homans 1990, and see Homans' carbohydrate forcefield). • AMBER is used mainly for modeling proteins and nucleic acids. It is generally lower in accuracy and has a limited range of applicability. The use of AMBER is recommended mainly for those customers who are familiar with AMBER and have developed their own AMBER-specific parameters. It generally gives reasonable results for gas-phase model geometries, conformational energies, vibrational frequencies, and solvation free energies. Application • • • • protein motion protein folding enzyme mechanism model optimization In silico protein folding 1us=1,000,000,000 fs(or step) 644 step/sec on 256 CPUs CRAY machine Simulation of the travel of potassium Bioinformatics • Introduction • Sequence alignment • Pairwise sequence alignment • BLAST • Multiple sequence alignment • CLUSTALW • T-COFFEE • Scoring matrix • Structure Alignment • Example Pairwise alignment • Smith-Waterman Algorithm • BLAST – local alignment • FASTA – global alignment Smith-Waterman Algorithm Align S1=ATCTCGTATGATG S2=GTCTATCAC A T 2 if ( x y ) Sbt ( x, y ) 1 else =1, =1 G T C T A T C A C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 2 1 C T 0 0 0 00 2 1 1 4 2 3 2 2 4 3 3 6 2 5 1 4 0 H (i 1, j ) 1 H (i, j ) max H (i, j 1) 1 H (i 1, j 1) Sbt ( S1i , S 2 j ) 0 0 2 3 6 5 4 5 5 4 C G T A T G A T G 0 0 1 4 5 5 4 6 5 7 0 0 3 3 4 7 5 5 7 6 0 2 1 3 4 4 4 5 5 6 0 1 4 3 5 4 6 5 4 5 0 0 2 2 5 6 9 8 7 6 0 0 2 1 1 1 1 0 4 3 5 6 8 7 8 7 7 10 10 6 9 ATCTCGTATGATG GTC TATCAC 0 0 3 2 2 5 8 7 9 9 0 2 2 2 1 4 7 7 8 8 BLAST • Basic Local Alignment Search Tool • Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Journal of Molecular Biology v. 215, 1990, pp. 403-410 • Used to search sequence databases for local alignments to a query BLAST algorithm • Keyword search of all words of length w from the in the query of length n in database of length m with score above threshold • w = 11 for nucleotide queries, 3 for proteins • Do local alignment extension for each found keyword • Extend result until longest match above threshold is achieved • Running time O(nm) BLAST algorithm (cont’d) keyword Query: KRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKIFLENVIRD GVK 18 GAK 16 Neighborhood GIK 16 words GGK 14 neighborhood GLK 13 score threshold GNK 12 (T = 13) GRK 11 GEK 11 GDK 11 extension Query: 22 VLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLK 60 +++DN +G + IR L G+K I+ L+ E+ RG++K Sbjct: 226 IIKDNGRGFSGKQIRNLNYGIGLKVIADLV-EKHRGIIK 263 High-scoring Pair (HSP) Original BLAST • Dictionary • All words of length w • Alignment • Ungapped extensions until score falls below some threshold • Output • All local alignments with score > statistical threshold Original BLAST: Example From lectures by Serafim Batzoglou (Stanford) C T G A T C C T G G A T T G C G A • w=4 • Exact keyword match of GGTC • Extend diagonals with mismatches until score is under 50% • Output result GTAAGGTCC GTTAGGTCC A C G A A G T A A G G T C C A G T ClustalW • Popular multiple alignment tool today • Several heuristics to improve accuracy: • Sequences are weighted by relatedness • Scoring matrix can be chosen “on the fly” • Position-specific gap penalties ClustalW (cont’d) • Often used for protein alignment • ‘W’ stands for ‘weighted’ • Different parts of alignment are weighted. • Position/residue specific gap penalties. • Three-step process 1.) Pairwise alignment 2.) Build Guide Tree 3.) Progressive Alignment Step 1: Pairwise Alignment • Aligns each sequence again each other giving a distance matrix • Distance = exact matches / sequence length (percent identity) S1 S2 S3 S4 S1 S2 S3 S4 .17 .87 .28 .59 .33 .62 - (.17 means 17 % identical) Step 2: Guide Tree • Create Guide Tree using the distance matrix • ClustalW uses the neighbor-joining method • Guide tree roughly reflects evolutionary relations Step 2: Guide Tree (cont’d) S1 S2 S3 S4 S1 S2 S3 S4 .17 .87 .28 .59 .43 .62 - S1 S3 S4 S2 Calculate: s1,3 = consensus(s1, s3) s1,3,4 = consensus((s1,3),s4) s1,2,3,4 = consensus((s1,3,4),s2) Step 3: Progressive Alignment • Align the two most similar sequences • Following the guide tree, add in the next sequences, aligning to the existing alignment • Insert gaps as necessary Sample output: FOS_RAT FOS_MOUSE FOS_CHICK FOSB_MOUSE FOSB_HUMAN PEEMSVTS-LDLTGGLPEATTPESEEAFTLPLLNDPEPK-PSLEPVKNISNMELKAEPFD PEEMSVAS-LDLTGGLPEASTPESEEAFTLPLLNDPEPK-PSLEPVKSISNVELKAEPFD SEELAAATALDLG----APSPAAAEEAFALPLMTEAPPAVPPKEPSG--SGLELKAEPFD PGPGPLAEVRDLPG-----STSAKEDGFGWLLPPPPPPP-----------------LPFQ PGPGPLAEVRDLPG-----SAPAKEDGFSWLLPPPPPPP-----------------LPFQ . . : ** . :.. *:.* * . * **: Dots and stars show how well-conserved a column is. Scoring Matrix • BLOSUM • PAM • PSSM PAM • Percentage of Acceptable point Mutations per 108 years • 어떤 아미노산이 임의의 아미노산으로 바뀔 수 있는 확률을 바탕 으로 score 설정 • matrices are based on global alignments of closely related proteins. The PAM 1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. Scores are derived from a mutation probability matrix where each element gives the probability of the amino acid in column X mutating to the amino acid in row Y after a particular evolutionary time, for example after 1 PAM, or 1% divergence. A PAM matrix is specific for a particular evolutionary distance, but may be used to generate matrices for greater evolutionary distances by multiplying it repeatedly by itself. However, at large evolutionary distances the information present in the matrix is essentially degenerated. It is rare that a PAM matrix would be used for an evolutionary distance any greater than 256 PAMs. BLOSUM • Local alingment에 사용하기 위해 개발 • BLOcks SUbstitution Matrix • 일정정도의 유사한 서열들을 모아 정렬하고 그 안에서 치환되는 정도를 이용해서 scoring matrix작성 • BLOSUM 62는 유사성 62% 이상의 서열들을 모아서 작성한 것 Position Specific Scoring Matrix • 유사한 단백질간의 서열 정렬결과를 바탕으로 특성 아미 노산이 특정 위치에 나타나는지의 여부를 점수화 • PSI-BLAST에서 사용하는 방법 • 특징적인 서열이나 잔기를 가지는 단백질에 대한 전역탐 색에 적절 Homology/Comparative modeling • Introduction • Method • Example Introduction • 유사한 기능을 지닌 단백질은 유사한 구조를 가지고 있음. • Ex) hemoglobin/myoglobin, ubiquitin/ubiquitin like proteins. Serine proteases, thioredoxin/glutaredoxin Method 1. 30% 이상의 homology를 가진 단백질 중 구 조가 있는 것 검색 2. Pairwise or multiple sequence alignment 3. Alignment를 기준으로 구조를 따오거나 distance constraint작성. 4. Model 최적화 Example: Modeling of malonly-CoA synthetase Malonyl-CoA synthetase Firefly luciferase Other Methods • Simulated Annealing • Monte Carlos method • Docking