Biomolecular Modeling: building a 3D protein structure from its sequence Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia & Dept of Biochemistry, Yong Loo Lin School of Medicine National University of Singapore (shoba@bic.nus.edu.sg) Why protein structure? In the factory of the living cell, proteins are the workers, performing a variety of tasks Each protein adopts a particular folding pattern that determines its function The 3D structure of a protein brings into close proximity residues that are far apart in the amino acid sequence How does a protein fold? Most newly synthesized proteins fold without assistance! Ribonuclease A: denatured protein could refold and recover its activity (C. Anfinsen -1966) “Structure implies function” The amino acid sequence encodes the protein’s structural information 1. 2. 3. 4. 5. Understanding Protein Structure A Quick Overview of Sequence Analysis Finding a Structural Homologue Template Selection Aligning the Query Sequence to Template Structure(s) 6. Building the Model The basics Proteins are linear heteropolymers: one or more polypeptide chains Repeat units: 20 amino acid residues Range from a few 10s-1000s Three-dimensional shapes (“folds”) adopted vary enormously Experimental methods: X-ray crystallography, electron microscopy and NMR (nuclear magnetic resonance) The (L-)amino acid R Amino + N Side chain = H,CH3,… Ca Backbone O C - O Carboxylate The peptide bond Coplanar atoms Levels of protein structure Zeroth: amino acid composition Primary This is simply the order of covalent linkages along the polypeptide chain, i.e. the sequence itself Levels of protein structure Secondary Local organization of the protein backbone: ahelix, b-strand (which assemble into b-sheets), turn and interconnecting loop Ramachandran / phi-psi plot b-sheet a-helix (left handed) y a-helix (right handed) f Levels of protein structure Tertiary packing of secondary structure elements into a compact spatial unit “Fold” or domain – this is the level to which structure prediction is currently possible Levels of protein structure Quaternary Assembly of homo- or heteromeric protein chains Usually the functional unit of a protein, especially for enzymes Structural classes All-a (helical) All-b (sheet) Structural classes a/b (parallel b-sheet) a+b (antiparallel b-sheet) Structural information Protein Data Bank: maintained by the Research Collaboratory for Structural Bioinformatics http://www.rcsb.org/pdb > 45,744 structures of proteins Also contains structures of DNA, carbohydrates, protein-DNA complexes and numerous small ligand molecules. The PDB data Text files Each entry is identified by a unique 4letter code: say 1emg 1emg entry Header information Atomic coordinates in Å (1 Ångstrom = 1.0e-10 m) PDB Header details HEADER TITLE TITLE COMPND COMPND COMPND COMPND COMPND COMPND COMPND identifies the molecule, any modifications, date of release of PDB entry GREENFLUORESCENT PROTEIN 12-NOV-98 1EMG GREEN FLUORESCENT PROTEIN (65-67 REPLACED BY CRO, S65T 2 SUBSTITUTION, Q80R) MOL_ID: 1; 2 MOLECULE: GREEN FLUORESCENT PROTEIN; 3 CHAIN: A; 4 ENGINEERED: YES; 5 MUTATION: 65 - 67 REPLACED BY CRO, S65T SUBSTITUTION, Q80R 6 SUBSTITUTION; 7 BIOLOGICAL_UNIT: MONOMER organism, keywords, method Authors, reference, resolution if X-ray structure Sequence, x-reference to sequence databases The data itself Coordinates for each heavy (non-hydrogen) atom from the first residue to the last ATOM 1 ATOM 2 ATOM 3 ATOM 4 ATOM 5 ATOM 6 ------ATOM 1737 TER 1738 N CA C O CB OG SER SER SER SER SER SER A A A A A A 2 2 2 2 2 2 29.089 27.883 26.659 26.718 28.039 27.582 9.397 10.162 9.634 8.686 11.660 12.038 51.904 52.185 51.463 50.686 51.932 50.639 1.00 1.00 1.00 1.00 1.00 1.00 81.75 79.71 82.64 81.02 75.59 43.28 CD1 ILE A 229 ILE A 229 39.535 21.584 52.346 1.00 41.62 Any ligands (starting with HETATM) follow the biomacromolecule O of water molecules (also HETATM) at the end Structural Families SCOP - Structural Classification Of Proteins FSSP – Family of Structurally Similar Proteins http://scop.mrc-lmb.cam.ac.uk/scop http://www.ebi.ac.uk/dali/fssp/ CATH – Class, Architecture, Topology, Homology http://www.biochem.ucl.ac.uk/bsm/cath Structure comparison facts Proteins adopt a limited number of topologies. Homologous sequences show very similar structures, with strong conservation in secondary structural elements: variations in non-conserved regions. In the absence of sequence homology, some folds are preferred by vastly different sequences. Structure comparison facts The “active site” (a collection of functionally critical residues) is remarkably conserved, even when the protein fold is different. Structural models (especially those based on homology) provide insights into possible function for new proteins. Implications for protein engineering ligand/drug design, function assignment of genomic data. Visualizing PDB information RASMOL: most popular, available for all platforms (Sayle et al, 2005) http://www.bernstein-plus-sons.com/software/rasmol DeepView Swiss-PDBViewer: from Swiss-Prot (Guex & Peitsch, 1997) http://tw.expasy.org/spdbv/ Chemscape Chime Plug-in: for PC and Mac http://www.mdli.com/products/framework/chemscape PyMOL: Very good, available for all platforms (DeLano, W.L. The PyMOL Molecular Graphics System, 2002) http://pymol.sourceforge.net RASMOL views - SH2 domain All-atom model Atom colors: Space-filling model NOCS RASMOL views – 1sha Ca Trace Ribbon Rainbow coloring: N to C Coloring: by structural units Homologous folds Hemoglobin and erythrocruorin: 31% sequence identity Analogous folds Hemoglobin and phycocyanin: 9% sequence identity Surface Properties Cro repressor – DNA complex Basic residues in blue Acidic residues in red Mapping Functional Regions Immunoglobulin l light chain - dimer Hydrophobhic residues in magenta Hydrophilic and charged residues in cyan 1. 2. 3. 4. 5. Understanding Protein Structure A Quick Overview of Sequence Analysis Finding a Structural Homologue Template Selection Aligning the Query Sequence to Template Structure(s) 6. Building the Model Siblings and Cousins Siblings or homologues: sequences with at least 30% sequence identity over an alignment length of at least 125 residues and conservation of function. Cousins or paralogues: < 30% identity but with conservation of function Both show structural conservation Homologues located using a database search tool such as BLAST (free webserver): http://www.ncbi.nlm.nih.gov/BLAST Paralogues require a more sensitive method such as PSI-BLAST Multiple Sequence Alignment Finding the best way to match the residues of related sequences Identical residues must be lined up The rest should be arranged, based on observed substitution in protein families chemical similarity charge similarity Where it is impossible to get the residues to line up, the biological concept of insertion/deletion in invoked: the ‘gap’ in alignments MSA Methods CLUSTALW / CLUSTALX (Thompson et al, 1997): freely available for all platforms and one of the best alignment programs http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/Top.html MAXHOM (Sander & Schneider, 1991): alignment based on maximum homology; available via the PredictProtein webserver, free for academics http://cubic.bioc.columbia.edu/predictprotein/ MALIGN (Johnson et al, 1994): freely available UNIX program, based on the structural alignment of protein families http://www.abo.fi/fak/mnf/bkf/research/johnson/software.html Alignment Checks Conservation of functionally important residues: e.g. the catalytic triad (Asp-SerHis) that are essential for serine proteinase activity Line up of structurally important residues: e.g. cysteines forming disulfide bonds Overall, maximizing the alignment of “like” residues Completely conserved residues usually indicate some conserved structural or functional role, especially buried charges Sequence Motifs & Patterns From the analysis of the alignment of protein families Conserved sequence features, usually associated with a specific function PROSITE (Hulo et al, 2006) database for protein “signature” patterns: http://www.expasy.ch/prosite Aligned Sequence Families From alignments of homologous sequences: PRINTS PRODOM: http://www.toulouse.inra.fr/prodom.html From Hidden Markov Model based methods: PFAM: http://www.sanger.ac.uk/Pfam Protein Domains Most proteins are composed of structural subunits called domains A domain is a compact unit of protein structure, usually associated with a function. It is usually a “fold” - in the case of monomeric soluble proteins. A domain comprises normally only one protein chain: rare examples involving 2 chains are known. Domains can be shared between different proteins: like a LEGO block Protein Architectures Beads-on-a-string: sequential location: tyrosine-protein kinase receptor TIE-1 (immunoglobulin, EGF, fibronectin type-3 and protein kinase). Domain insertions: “plugged-in” - pyruvate kinase (1pyk) SMART: smart.embl-heidelberg.de Simple Modular Architecture Retrieval Tool Dissection into Domains A sequence, usually > 125 residues should be routinely checked to see how many domains are present. Conserved Domain Architecture Retrieval Tool (CDART) uses information in Pfam and SMART to assign domains along a sequence E.g. NP_002917 shows similarity to G-protein regulators: 1. 2. 3. 4. 5. Understanding Protein Structure A Quick Overview of Sequence Analysis Finding a Structural Homologue Template Selection Aligning the Query Sequence to Template Structure(s) 6. Building the Model Structural Homologues BLASTP vs. PDB database or PSI-BLAST: look for 4-character PDB ID E < 0.005 Domain coverage: at least 60% coverage is recommended Gaps: we don’t want them. Choose between: few gaps and reasonable similarity scores or lots of gaps and high similarity scores? Small Proteins: Disulfide bonds BLAST-type methods may not locate homologues, if Conserved Domain search is not turned on. gnl|Pfam|pfam00095, wap, WAP-type (Whey Acidic Protein) four-disulfide core'. CD-Length = 46 residues, 100.0% aligned Score = 43.9 bits (102), Expect = 1e-06 Q:49 KAGFCPWNLLQMISSTGPCPMKIECSSDRECSGNMKCCNVDCVMTCTPP 97 D: 1 KPGVCPWVSISE---AGQCLELNPCQSDEECPGNKKCCPGSCGMSCLTP 46 Are the Cys residues conserved? Gaps: where are they on the structure? Metal-binding domains C2H2 Zinc Finger 2 Cys & 2 His binding to Zinc Not detected even by CDsearch in BLAST Detected by Pfam & SMART Sequence Pattern: #-X-C-X(1-5)-C-X3-#-X5-#X2-H-X(3-6)-[H/C] Structure Prediction Methods Secondary Structure Prediction: identify local structural elements such as helices, strands and loops. > 75% accuracy achievable PredictProtein or PHD http://cubic.bioc.columbia.edu/pp/ PSIPRED http://bioinf.cs.ucl.ac.uk/psipred/ SSPro http://promoter.ics.uci.edu/BRNN-PRED/ Folds from Secondary Structure Predictions Assembling SSEs into folds is a combinatorial problem Current methods depend on available structural data for mapping predictions: FORREST http://abs.cit.nih.gov/foresst/foresst.html TOPITS from the PHD server http://cubic.bioc.columbia.edu/pp Tertiary Structure Prediction Fold recognition/Threading: < 20% identity typically Best results obtained by combining several database search and knowledge-based tools: 3D-PSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/ FUGUE http://www-cryst.bioc.cam.ac.uk/fugue/ 1. 2. 3. 4. 5. Understanding Protein Structure A Quick Overview of Sequence Analysis Finding a Structural Homologue Template Selection Aligning the Query Sequence to Template Structure(s) 6. Building the Model One or many templates? REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK REMARK Sequence similarity: extract template sequences and align with query: select the most similar structure Completeness: Missing data? 465 465 465 465 465 465 465 465 470 470 470 470 MISSING RESIDUES THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) M RES MET THR M RES GLU GLU GLU C SSSEQI A 1 A 230 CSSEQI ATOMS A 5 OE2 A 6 CG CD OE1 OE2 A 17 OE1 One or many templates? X-ray or NMR?: Lowest resolution X-ray structure X-ray and then NMR NMR average over assembly One or many?: Structure alignment of Ca atoms If 2 templates are very close, keep only one Keep templates that provide new information Many templates Sequence alignment from structure comparison of templates (SSA) can be different from a simple sequence alignment (SA). For model building, 1. align templates structurally 2. extract the corresponding SSA 1. 2. 3. 4. 5. Understanding Protein Structure A Quick Overview of Sequence Analysis Finding a Structural Homologue Template Selection Aligning the Query Sequence to the Template Structure(s) 6. Building the Model Query - Template Alignment >40% identity: any alignment method is OK Below this, checks are essential. Collect close sequence homologues (about 10) and align to query to get MSA (multiple sequence alignment) Collect several structural templates (at least 5) and align them using structure comparison methods: extract the SSA (structural sequence alignment) Align MSA to SSA using profile alignment Extract query and selected template(s) from the final alignment – QTA. QTA Checks Residue conservation checks Indels Functional regions Patterns/motifs conserved? Combine gaps separated by few residues Editing the alignment Move gaps from secondary structures to loops Within loops, move gaps to loop ends, i.e. turnaround point of backbone QTA Checks Residue conservation checks Indels Functional regions Patterns/motifs conserved? Combine gaps separated by few residues Editing the alignment Move gaps from secondary structures to loops Within loops, move gaps to loop ends, i.e. turnaround point of backbone Visual Inspection of Indels 2-residue deletion from sequence alignment End-of-loop 2-residue deletion 1. 2. 3. 4. 5. Understanding Protein Structure A Quick Overview of Sequence Analysis Finding a Structural Homologue Template Selection Aligning the Query Sequence to Template Structure(s) 6. Building the Model Input for Model Building Query sequence Template structure Template sequence Query-template sequence alignment 1. Methods Available WHATIF (Vriend G, 1990) : " High quality models where template is available Indels not modelled Side chain rotamers In silico mutations In silico disulfide bond creation Methods Available 2. SWISS-MODEL (Schwede et al, 2003) : " Automatic modeling mode with multiple templates Query + template input High Homology situations DeepView for input file creation Methods Available 3. MODELLER (Sali & Blundell, 1993) : High quality models Sequence alignment Structure analysis/alignment Multiple templates Multiple chains Ligand/cofactor present 4. ESyPred3D (uses MODELLER): " QTAs from several methods & neural networks http://www.fundp.ac.be/urbm/bioinfo/esypred/ Methods Available 5. ICM (Ruben et al, 1994) : High quality models Loop modelling Multiple templates not possible Sequence/Structure alignment/analysis Ab initio peptide modeling Secondary structure prediction 6. Geno3D (Combet et al, 2002) : " Automated modelling Distance geometry used for loops http://geno 7. Methods Available 3D-JIGSAW (Bates et al, 2001) : " Automatic modeling mode Interactive user mode to select templates Multiple templates Multidomain protein modeling Methods Available 8. CPH-MODELS (Lund et al, 1997) : " Fully automated FASTA search for templates Not validated Automatic or Manual Mode? Automatic: High homology Manual Medium/Low homology Template from structure prediction Multiple templates Multiple chains Ligand present How good is the model? Structural Quality Analysis PROCHECK (Laskowski et al, 1993) : WHATIF (Vriend G, 1990) : " ERRAT (Colovos & Yeates, 1993) : " Improving ill-defined regions Iterative model building Rebuild or anneal bad regions Check/edit alignment and rebuild Molecular dynamics and/or Monte Carlo simulations Compute intensive Input files need to be set up Optional Molecular Modeling Protocol Resources required The query sequence Personal computer with internet connectivity RASMOL/DeepView for PDB structure visualization CLUSTALX sequence alignment software Access to a UNIX workstation MODELLER/ICM – UNIX software WHATIF – UNIX/PC software PROCHECK – UNIX or Windows software MM Protocol – Input Files Minimum requirement Query sequence Template structure Template sequence Query-Template alignment Ex 1. High Homology Case Human SOX9 WT - homologous to SRY (PDB: 1HRY) - 49% identity S9WT: SOXCORE: ..AGAACAATGG.. ..GCAACAATCT.. highest least Mutants (campomelic dysplasia): F12L: No DNA binding H65Y: Minimal binding P70R: altered specificity; no SOXCORE A19V: near WT but normal binding 1. SOX9 Models WT & P70R models built Ca overlay: WT-SRY ~ 0.72 Å SOX9 & P70R J. Biol. Chem. 274 based on SRY (1999) 24023 Ex 1. SOX9 + DNA Models SOX9-WT SRY SOX9-P70R Observed disease-linked mutations mapped Other residues in DNAbinding groove determined Ex 2. Low Homology Situation Pigments from reef-building corals: similar to Pocilloporin fluoresce under UV and visible radiation similar to the Green Fluorescent Protein - GFP (19.6% identity) contain ‘QYG’ instead of ‘SYG’ in GFP, as proposed fluorophore 2. Alignment of POC4 & GFP 2. POC4 Model Barrel ends open C-ter not included b-sheet OK ‘QYG’ fits the site! 26 residues within 5Å of QYG (only 19 in GFP) Increased thermal stability UV protection Ex 3. Small Disulfide-bonded Protein: Complement Factor H 20 tandem homologous units = SCRs (short consensus repeat) or “sushi” regions Each SCR is ~ 60 aa: conserved Y, P, G 2 disulfide bridges:1-3 & 2-4 Linkers of 3-8 aa Heparin binding SCRs: 7 (high affinity) & 20 Previous SCRs required for activity: minimum constructs are fH67 and fH18-20 C-ter C2 C4 HV loop C3 C1 N-ter 3. Sequence Alignment of Close Functional Homologues Site A Site d Site B Site c hfH 385 C L R K C Y F P Y L E N G Y N Q N H G R K F V Q G K S I D V A fHR-3 83 bfH 292 C L R Q C I F N Y L E N G H N Q H R E E K Y L Q G E T V R V H mfH 385 C V R K C V F H Y V E N G D S A Y W E K V Y V Q G Q S L K V Q Consensus C L RKC Y F P Y L E NG YNQN YGRK F VQGN S T E V A * : * : * * * : * * * . . : : * * : : * hfH 416 C H P G Y A L P K A - Q T T V T C M E N G W S P T P R C I R 444 fHR-3 114 C H P G Y G L P K V R Q T T V T C T E N G W S P T P R C I R 143 bfH 323 C Y E G Y S L Q N D - Q N T M T C T E S G W S P P P R C I R 351 mfH 416 C Y N G Y S L Q N G - Q D T M T C T E N G W S P P P K C I R 444 Consensus * : * * . * : * * : * * * . * * * * . * : * * * 3. Templates for fH SCRs 6-7 hfH SCRs 15&16 (fH1516; PDB ID: 1HFH) Vaccinia virus complement control protein domains 3&4 (vcp34; PDB ID: 1VVC) hfH15 vcp3 vcp3 hfH15 Orientations differ considerably Vcp34 28% identical to hfH67 compared to hfH1516 (25%) ! 3. Query-Templates alignment 3. hfH67 model Sialic acid Heparin hfH67 hfH1516 disaccharide repeat 3. Locating residues for mutation from model Lys-410 SCRs 6&7 Lys-388 Arg-387 Lys-405 His -402 SCRs 15&16 Arg-404 Pacific Symposium of Biocomputing 2000, 5:155 Ex 4: Protein Engineering Thermolysin-like protease unstable at high temperatures (> 40 ºC unlike trypsin) Homology Model built G8 & N60 suited for disulfide bond Double Mutant functional at 92.5 ºC J Mansfeld et al. Extreme Stabilization of a Thermolysin-like Protease by an Engineered Disulfide Bond” J. Biol. Chem. 1997 272: 11152-11156. Ex 5. Multiple chains: Human Hand, Foot & Mouth Disease Virus capsid 2000 outbreak of HFMD in Singapore: thousands of children affected – 4 deaths (The Lancet, 2000, 356, 1338) Major etiological agent: EV71 (enterovirus group) Neurological complications 5. EV71 genome structure Capsid VP4 VP2 VPg VP3 Replication VP1 2A 2B 5’ AUG 2C 3A,B 3C 3D PolyA 3’ UAG Pan-enterovirus primers EV71 specific primers 95/94% (RNA) homology coxsackievirus A16/B3 Only 1% difference between neurovirulent and non-neuro virulent isolates Most variations in non-capsid regions Within capsid regions, VP1 shows maximum variability relative to other Evs Differences in capsid region 1: VP1 & VP2, 2: VP3 5. Picornaviridae Icosahedral Capsid 5. Template hunt BLASTP against PDB sequences VP1: 3 templates 1BEV 38.7% (bovine enterovirus) 1EAH 36.5% (poliovirus type 2 strain Lansing) 1FPN 38.0% (human rhinovirus serotype 2) VP2: 1BEV 56.7% VP3: 1BEV 54.9% VP4: 1BEV* 50.0% 5. Fixing the VP1 Alignment Structural alignment of templates: using VAST (Gibrat, Madej, & Bryant, 1996) Extract corresponding sequence alignment Match HFMDV VP1 to aligned templates using profile alignment in CLUSTALW 5. VP1 alignment to templates 3,10-helices a-helices VP1 1BEV1 1EAH1 1FPN1 EV711 14 24 15 23 Q A A L A P G A A A N L P L N V T V L V G A P P Q G D N N . T T I T S Q N Q . T S S V S S S S * T G N S H P P H S A T R V H T L T T D K N G S E S E . 1BEV1 1EAH1 1FPN1 EV711 84 90 80 93 P I S I L E K D L V L L A D E P T N V L D T E L - A G N T Y T N N G S K P T K E N S L N G I F F Y : T S T A : HWR V WK V WA NWD * I I I I * DF R E TY K D NL Q E D I TG 1BEV1 1EAH1 1FPN1 EV711 147 161 147 157 P P P P * P P P P * G G G G * A A A A * P P P P * V I V K P P P P * S G N E NQD K WN S RD S R E . : SF DY DY SL . QWQ TWQ AWQ AWQ * * S T S T : G S G A . C S T T N N N N * P P A P . S S S S * V V V V * 1BEV1 1EAH1 1FPN1 EV711 211 231 207 224 I A T A LP S N A S L N A N TN CP N N * F L G D FG N MG MMG : * FM S L S L T F : F V S V T V I T D D E S . H K S N H - P I K A T H S A K K K H L V Y 1BEV1 1EAH1 1FPN1 EV711 275 296 277 291 R K R R : A P T S T T T : S G I A L L I I : T T T T * Y Y A - 281 301 283 296 Y A C S R R R R * L V V V : E N T G A S S D T T A V . P P P P * A A A A * L L L L * Q T D Q A V A A . E E E E * T T T I G G G G * QL QL Q I QM * : R R R R * A R R R FAD FY T FWQ FV K * T Y H L D G G T G A Q D Q T H P F I T V R R R R * I V I I : FV TV MA Y A . V S I L R K M V A A A A * A A H A T T T S : S N S S . T P S N A L V T RD ESM VP S D T QP EDV S D ESM . . KM KL K F KV * . SW E F E L E L . F F F F * T T T T * Y Y Y Y * M S T M P P A P . P P Y P A A P A . Q R R Q : F I F V . S S S S * V V L V : P P P P * Y Y Y Y * A M H M K I KP KA RM : K K K K * H H H H * T V V V . S R K R b-strands I V I I : E Q E E : T T T T * R R R R * T H Y C I V V V : V I Q L P Q T N T K S S . H R Q H : G T T S I H RS RD TA E T ES EM E T * S T S T : V V L L : E E E D : S S S S * F F F F * F F L F : G A G S . R R R R * S G S A . S A G G . L C C L V V I V : G A H G M I E E DV DM DS DA * E E E E * F F I F : T T T T * I F L F : I V V V : A V P A T T C C S S I T S N S P Y A - T T L - G D - Q A - N N - V N S T T G Q G T H D E E A I V Q L G V H N H P T Q I Q T V T L Y Q Y Q MQ L Q * V I Y Y MY MY MY M F * : V I V V : FMS Y V G F L S F MS : : . S I V P A A A A * N N S S . A A A A * Y Y Y Y * S TV S H F Y MF QW F . Y Y Y Y * D D D D * G G G G * Y F Y Y : A A D P R K E T F V F M P G L - A - G - D Q E T A H S K T Q E E DP D R GDS L QDQN K D L E Y Y Y Y * G G G G * CW I V WC AWC AW I * P P P P * R R R R * A P P P . PR PR PR MR * Q A A N Y Y Y Y * K Y T L K G R F R P A K Y H A N G R N L V T P V D N N F Y F Y : S K K A I - E - G D D G . R N D G S S . S L I I D A Q K S P V T N G T S R R R R * F F F F * A V L Q P P E N Pocket-factor binding residues R P T P I A T C L I G 5. Model building steps Build all 4 capsid proteins (VP1-VP4) together to ensure 3D fit Use 1BEV alone for VP2-VP4 For VP1: use aligned 1BEV, 1EAH, 1FPN Check model 5. Round 1: VP1,VP2,VP3,VP4 Clip hanging ends Re-position problem loops: adjust gaps in alignment Build again 5. Round 2: Pentamer Check Loops look OK Build pentamer Publish…. Oops: clash in pentamer assembly. Go back 5. Close encounters of the 3rd Kind Build only VP3 pentamer N-terminus of each VP3 hydrogenbonded Also, in BEV, Asp-Lys ion pair First 25 aa overlay v. well 5. Fourth foray: build with 5 VP3s Only first 50aa of the other 4 VP3s included Model resulted in knots due to insufficient refinement cycles However, VP3 pentameric region OK 5. Fifth and final attempt 5. Canyon Pit and Antigenic Sites Poliovirus sites Neurovirulent Polio (mouse) Cardiovirus 5. Putative antigenic sites VP1 VP2 5. HFMDV Conclusions Unique surface loops identified for Immunodiagnostic assays Vaccine design Antibodies being generated Canyon pit: depth is similar to BEV Mapping the antigenic regions of other related enteroviruses on the HFMDV surface: specific VP1 and VP2 sites buried Sunita Singh, Vincent T. K. Chow, C. L. Poh, M. C. Phoon: Dept. of Microbiology, NUS Applied Bioinformatics, Vol 1, issue 1, 43-52: invited research article