Visualization of Macromolecular Structures Shuchismita Dutta Molecules Around You Julian Voss-Andreae David Goodsell Why Visualize? • Communication: since human eye effective in pattern recognition • Summarize information – what, where, how many (data)? – mechanisms – change (cause and effect) • Decision making aid Visual storytelling Road sign near Niagara Falls. Picture postedby Tim Roy http://dd.dynamicdiagrams.com/?p=541 Story: How Hemoglobin Works What, Where, How Many? http://popvssoda.com:2998/ What, Where, How Many? Structure of Kir2.2. (A) Stereoview of a ribbon representation of the Kir2.2 tetramer from the side with the extracellular solution above. Four subunits of the channel are uniquely colored. Approximate boundaries of the lipid bilayer are shown as gray bars. (B) A close-up view of the pore-region of a single subunit (in ribbon representation) with the turret, pore helix and selectivity filter labeled. Side chains of residues E139, R149 and a pair of disulfide-bonded cysteines (C123 and C155) are shown as sticks and colored according to atom type: carbon, yellow; nitrogen, blue; oxygen, red; and sulfur, green. Ionized hydrogen bonds are indicated by dashed black lines. The region flanked by the two disulfide-bonded cysteines is colored orange. (C) Electron density (blue wire mesh, 2Fo-Fc, calculated from 50 to 3.1Å using phases from the final model and contoured at 1.0 σ) is shown for the side chains of E139 and R149 [sticks, colored the same scheme as in (B)] forming a salt bridge. (D and E) K+ selectivity filter of the Kir2.2 channel (D) compared with that of the Kv1.2-Kv2.1 paddle chimera channel [(E), PDB ID 2R9R]. For clarity, only two of the four subunits [sticks, colored with the same scheme as in (B)] are shown. K+ (green spheres), water molecules (cyan spheres), and hydrogen bonds between R149 and E139 (Kir, dashed black lines), or between D379, M380 and waters (Kv, dashed black lines) are shown. Tao X, Avalos JL, Chen J, MacKinnon R., Science. 2009 Dec 18;326(5960):1668-74. Change and Mechanism Figure 1 Crystal structure of SNARE-induced Ca2+-bound Syt3. (a) Showing a ribbon diagram of the C2AB fragment of Syt3 including the C2A (magenta) and C2B (yellow) domains, and bound Ca2+ ions (gray spheres). (b) Side view illustrating that the Ca2+-binding loops of both C2 domains emerge from the same side of the molecule. Residues Lys427, Lys557, and Arg556 are shown as sticks. (c) Superposition of the crystal structures of SNARE-induced Ca2−-bound (magenta and yellow) and Ca2+free (blue, PDB ID 1DQV20) Syt3 C2AB fragments. We aligned structures by superposition of the Cα positions of their respective C2A domains (see cartoon where the “+” sign indicates the Ca2+-binding regions). Vrljic M, Strop P, Ernst JA, Sutton RB, Chu S, Brunger AT. Nat Struct Mol Biol. 2010, 17(3): 325–331. Decision Making Aid Treating Chronic Myeloid Leukemia T315I Gleevec bound to Abl Kinase PDB ID 1iep GLEEVEC F317L SPRYCEL Mira Patel Student 2008, 2009 Phosphate Binding Loop Sprycel bound to Abl Kinase PDB ID 2gqg Activation Loop Overlap of Gleevec & Sprycel bound to Abl kinase (PDB IDs 1iep, 2gqg) Visualization http://www.umass.edu/microbio/rasmol/history.htm Early Drawings of Molecules Roger Hayward Linus Pauling Irving Geis Richard Dickerson www.umass.edu/microbio/rasmol/history.htm Toobers Rapid prototyping Laser crystal Wire models (1958) Richard’s box (1958) Ball and Spoke (1960) Byron’s bender (1970s) Rapid prototyping (1990s) Toobers (2000s) Byron’s bender Physical Models Computer Models Early computers (1960s-70s) TAMS (1980s) Evans & Sutherland (1980s-90s) Modern computers (1990s …) Kinemage, Rasmol, Chimera … Ribbons Wireframe Ball and Stick Space fill Surface Visualizing Molecules • Visual Metaphors – assumptions and conventions • Dealing with Coordinates – Biological assembly – Missing atoms/residues – Split entries • Aesthetic Choices – Use of orientation, color and style • Scale: Atoms to Cells Visualization Metaphors What does a molecule look like? Wireframe Ribbons Backbone All atoms Spacefill Style and Purpose • Overall shape & structure – space fill • Fold, classification etc. - ribbons • Biologically significant regions - ball & stick, stick – Understand binding geometry – Design activators, inhibitors, drugs • Because molecules are complex and beautiful – mixed or other creative representations Illustrations from D. Goodsell Dealing with Coordinates: Biological Assembly Dealing with Coordinates: Missing Pieces ATP Synthase: PDB entries 1c17, 1e79, 2a7u, 1l2p Illustrations from D. Goodsell Dealing with Coordinates: Split entries Program Defaults Waters? Aesthetic Choices • Orientation • Color • Style PDB ID 1tim chain B Scale: Atoms to Molecules to Cells Illustrations from D. Goodsell Illustrations that span scales from nanometers to microns, for use in education and science outreach. Visualize molecules on a computer 1. Coordinate file from PDB 4. Molecule image 3. Computer 2. Visualization software RasMol, Chimera, Swiss PDB Viewer etc. PDB Format File: a Database Report • The database is built on PDB exchange & chemical component dictionaries which helps with keeping track of all the information • Validation uses dictionaries to – Check inter-relationships between different data components (PDB exchange dictionary) – Match information to chemical component dictionary Exploring PDB file Meta data Coordinates Title section OBSLTE 18-JUL-84 1HHB 2HHB 3HHB 4HHB SPLIT 1JGP 1JGQ 1JGO CAVEAT 1B86 THERE ARE CHIRALITY ERRORS IN C-ALPHA CENTERS REVDAT REVDAT REVDAT REVDAT 4 3 2 1 24-FEB-09 01-APR-03 15-OCT-89 17-JUL-84 4HHB 4HHB 4HHB 4HHB 1 VERSN 1 JRNL 3 MTRIX 0 SPRSDE 17-JUL-84 4HHB 1HHB Remarks: the numbers mean something Biological assembly information Example of a virus (1AYN) Remarks Compound details Missing residues, atoms Geometry: close contacts, bond length, angle and torsion deviations, sterochemistry Ligand details Related entries Sequence details Chemistry sections : Primary Structure & Ligand DBREF 1BH0 A 1 29 UNP P01275 GLUC_HUMAN 53 81 SEQADV 1BH0 LYS A 17 UNP P01275 ARG 69 ENGINEERED SEQADV 1BH0 LYS A 18 UNP P01275 ARG 70 ENGINEERED SEQADV 1BH0 GLU A 21 UNP P01275 ASP 73 ENGINEERED SEQRES 1 A 29 HIS SER GLN GLY THR PHE THR SER ASP TYR SER LYS TYR SEQRES 2 A 29 LEU ASP SER LYS LYS ALA GLN GLU PHE VAL GLN TRP LEU SEQRES 3 A 29 MET ASN THR MODRES 2F4K NLE A 65 LEU NORLEUCINE MODRES 2F4K NLE A 70 LEU NORLEUCINE HET PO4 D 147 1 HET PO4 B 147 1 HET HEM A 142 43 HET HEM B 148 43 HET HEM C 142 43 HET HEM D 148 43 HETNAM PO4 PHOSPHATE ION HETNAM HEM PROTOPORPHYRIN IX CONTAINING FE HETSYN HEM HEME FORMUL 5 PO4 2(O4 P 3-) FORMUL 7 HEM 4(C34 H32 FE N4 O4) FORMUL 11 HOH *221(H2 O) Secondary Structure & Connectivity HELIX 1 AA SER A 3 GLY A 18 1 16 HELIX 2 AB HIS A 20 SER A 35 1 16 HELIX 3 AC PHE A 36 TYR A 42 1 7 SHEET 1 A 4 ILE A 18 LEU A 23 0 SHEET 2 A 4 LEU A 111 VAL A 118 -1 O GLY A 115 N TRP A 19 SSBOND SSBOND SSBOND SSBOND LINK LINK LINK LINK 1 2 3 4 CYS CYS CYS CYS A A A A 6 30 64 76 NE2 HIS A 87 NE2 HIS B 92 FE HEM B 147 FE HEM A 143 CYS CYS CYS CYS FE FE O1 O1 A A A A HEM HEM OXY OXY 127 1555 1555 2.02 115 1555 1555 2.02 80 1555 1555 2.03 94 1555 1555 2.01 A B B A 143 147 150 150 1555 1555 1555 1555 CISPEP 1 PRO A 98 PRO A 99 0 0.53 CISPEP 2 GLY A 109 PRO A 110 0 -0.01 1555 1555 1555 1555 1.94 2.07 1.87 1.66 Crystallographic info, Coordinate Transformations & coordinates CRYST1 88.814 95.207 89.164 90.00 104.96 90.00 P 1 21 1 8 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 0.011259 0.000000 0.003009 0.00000 SCALE2 0.000000 0.010503 0.000000 0.00000 SCALE3 0.000000 0.000000 0.011609 0.00000 MODEL 1 ATOM 1 N ATOM 2 CA ATOM 3 C ATOM 4 O ATOM 5 CB ATOM 6 OG ATOM 7 N ATOM 8 CA ... ENDMDL SER SER SER SER SER SER THR THR A A A A A A A A 41 41 41 41 41 41 42 42 -9.122 -8.282 -7.051 -6.646 -7.845 -7.250 -6.473 -5.290 -10.304 -11.187 -11.693 -11.108 -10.416 -11.264 -12.792 -13.380 89.511 88.650 89.414 90.421 87.393 86.423 88.935 89.552 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 51.94 52.75 52.51 53.15 51.93 52.59 51.75 50.38 N C C O C O N C GLY GLY GLY GLY GLY GLY GLY GLY ASER ASER ASER ASER ASER ASER BGLY BGLY BGLY BGLY A A A A A A A A A A A A A A A A A A 8 8 8 8 9 9 9 9 10 10 10 10 10 10 10 10 10 10 2.326 3.121 3.533 4.302 3.080 3.330 4.552 4.720 5.404 6.598 6.236 5.150 7.516 8.894 5.404 6.598 6.236 5.150 4.110 3.079 3.408 2.642 4.526 4.880 5.685 6.098 6.014 6.814 8.234 8.733 6.864 6.884 6.014 6.814 8.234 8.733 1.416 2.065 3.476 4.092 4.038 5.396 5.709 6.885 4.753 5.042 5.479 5.233 3.822 4.237 4.753 5.042 5.479 5.233 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.33 0.33 0.33 0.33 0.33 0.33 0.67 0.67 0.67 0.67 Microheterogeneity (1ENM) 42.03 42.27 42.32 44.09 40.18 40.11 39.75 40.96 39.21 38.11 36.87 32.77 39.46 40.79 39.21 38.11 36.87 32.77 Atom type B-factor Occupancy z coordinate y coordinate x coordinate Residue # N CA C O N CA C O N CA C O CB OG N CA C O Chain ID 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 Residue name Atom name ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM Alternate conformer ID ATOM ATOM ATOM ATOM S.# Coordinate section: A Closer look N C C O N C C O N C C O C O N C C O 1 N 2 CA 3 C 4 O 5 CB 6 N 7 CA 8 C 9 O 10 CB 11 N 12 CA 13 C 14 O 15 CB 16 CG 17 OD1 18 OD2 19 N 20 CA 21 C 22 O GLU GLU GLU GLU GLU ALA ALA ALA ALA ALA ASP ASP ASP ASP ASP ASP ASP ASP CYS CYS CYS CYS L L L L L L L L L L L L L L L L L L L L L L 1C 1C 1C 1C 1C 1B 1B 1B 1B 1B 1A 1A 1A 1A 1A 1A 1A 1A 1 1 1 1 63.677 64.338 63.351 63.320 65.320 62.537 61.571 60.631 60.238 60.810 60.262 59.378 57.965 57.476 59.593 58.724 57.452 59.188 57.321 56.005 55.351 56.002 26.331 26.818 27.360 28.565 25.825 26.499 26.988 28.018 27.865 25.845 29.089 30.016 29.526 28.873 31.557 32.268 32.455 32.658 29.802 29.353 30.160 30.636 17.947 16.736 15.717 15.489 16.101 15.096 14.116 14.729 15.872 13.511 14.012 14.691 14.760 13.851 14.587 13.564 13.924 12.472 15.860 16.036 17.077 17.968 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 Residue numbering (1DWD) 31.77 35.78 41.73 49.37 38.64 36.03 33.01 32.42 31.68 33.36 33.13 35.05 31.74 36.72 41.32 46.17 47.60 48.99 22.52 15.35 15.83 18.73 Atom type B-factor Occupancy z coordinate y coordinate x coordinate Chain ID Residue # Residue name Atom name S.# ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM Insertion codes ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM N C C O C N C C O C N C C O C C O O N C C O www.wwpdb.org mmCIF format file -snip- PDB format file PDB Format vs mmCIF Format • 80 characters wide • Includes header and coordinates (x, y, z, occupancy and B-factors) for all atoms. • Includes name, source and sequence of all polymers • Can include a maximum of 62 chains and 99999 atoms. • Free format • Includes header and coordinates (x, y, z, occupancy and B-factors) for all atoms. • Includes name, source and sequence of all polymers • No restriction to number of chains or atoms in file. Dictionaries • PDB Exchange (pdbx) dictionary – (http://mmcif.pdb.org/) – Includes the syntax, definitions, relations, boundaries – Includes examples for the contents of the mmCIF format file. • Chemical Component Dictionary – Describes all residues in the PDB files (standard, nonstandard amino acids, nucleotides and other ligands - ions, drugs, cofactors, inhibitors) – 1-3 alphanumeric character identifier – Includes model & idealized coordinates for components, connectivities, name, formula, smiles strings – Maintained by the wwPDB. – Used for data processing and validation of structures PDB Exchange Dictionary includes syntax & definitions for mmCIF format files PDB format file mmCIF format file -snip- Instance of valine matched to VAL in Chemical Component Dictionary Visualization software • • • • • • • • RasMol Chimera Pymol Jmol Webmol King Cn3D MolMol • • • • Swiss PDB viewer MolView MIDAS VMD Using Chimera Checklist • Upload file and Save file/image/session • Select chain/residue/atoms/neighbors • Display (and color) atoms/ ribbons/ surface/ labels • Structure analysis – measure bond lengths, angles