3D Structure Visualizing, Comparing, Classifying David Wishart Athabasca 3-41 david.wishart@ualberta.ca Outline & Objectives* • • • • • • Visualization Programs Vectors & Matrices Difference Distance Matrices Molecular Superposition Measuring Superposition Classifying 3D Structures PDB Viewers Jmol* • Java-based program • Open source applet and application – Compatible with Linux, MacOS, Windows • Menus access by clicking on Jmol icon on lower right corner of applet • Works with all major web browsers – Internet Explorer (Win32) – Mozilla/Firefox (Win32, OSX, *nix) – Safari (Mac OS X) and Opera 7.5.4 WebMol* WebMol* • Both a Java Applet and a downloadable application • Offers many tools including distance, angle, dihedral angle measurements, detection of steric conflicts, interactive Ramachandran plot, diff. distance plot • Compatible with most Java (1.3+) enabled browsers including: – Internet Explorer – Safari on Mac OS – Mozilla 1.6/Firefox on Linux (Redhat 8.0) PDB SimpleViewer Requires Java WebStart (~30 sec install) Chime* Chime* • http://www.umass.edu/microbio/chime/neccsoft.h tm#download_install • Among first PDB viewing programs with limited manipulation capacity • Uses Rasmol for its back end source • View both large and small molecules • Browser Plug-in (Like PDF reader) • Interesting from historical perspective (now mostly phased out) Protein Explorer (Chime) Protein Explorer* • http://www.umass.edu/microbio/chime/pe_beta/ pe/protexpl/ • Uses Chime or Jmol for its back-end • Very flexible, user friendly, well documented, offers morphing, sequence structure interface, comparisons, contextdependent help, smart zooming, off-line • Browser Plug-in (Like PDF reader) • Compatible with Netscape (Mac & Win) QuickPDB Quick PDB* • http://www.sdsc.edu/pb/Software.html • Very simple viewing program with limited manipulation and very limited rendering capacity -- Very fast • Java Applet (Source code available) • Compatible with most browsers and computer platforms Rasmol Rasmol* • http://www.umass.edu/microbio/rasmol/ • Very simple viewing program with limited manipulation capacity, easy to use! • “Grand-daddy” of all visual freeware • Runs as installed “stand-alone” program • Source code available • Runs on Mac, Windows, Linux, SGI and most other UNIX platforms B (Biomer) Biomer (B) • http://casegroup.rutgers.edu/Biomer/index.html • Very sophisticated molecular rendering and modelling package for both large and small molecules (kind of rough) • Supports molecular dynamics & En. min • Written in Java (source code available) • Can run as an applet or stand-alone • Compatible on most platforms Swiss PDB Viewer Swiss PDB Viewer* • http://spdbv.vital-it.ch/ • Among most sophisticated molecular rendering, manipulation and modelling packages (commercial or freeware) • Supports threading, hom. modelling, energy minimization, seq/struc interface • Stand-alone version only • Compatible on Mac, Win, Linux, SGI Swiss PDB Tutorial* http://spdbv.vital-it.ch/TheMolecularLevel/SPVTut/index.html MolMol MolMol* • http://www.mol.biol.ethz.ch/wuthrich/software/molmol/ • Very sophisticated molecular rendering, and manipulation package (among the best graphics of all freeware) • Special focus on NMR compatibility, supports many calculations/plots • Stand-alone version only • Compatible on Win, Unix (nearly all) Summary* Mac Win Unix Rendr SeqView Super E Min Modeling Rasmol + + + ++ - - - - Chime + + - + - - - - Prot. Expl. + + - ++ + + - - Quick PDB + + + + + - - - Biomer + + + ++ - + + + SwP Viewer + + + +++ + + + + MolMol - + + +++ - + - + Visualization Hub http://www.umass.edu/microbio/chime/top5.htm Graphics Formats • • • • • • • • GIF JPEG PNG TIFF (Tag Image) BMP EPS PS RGP (SGI) Graphics Formats* • GIF (Graphical Interchange Format) – pronounced “JIF” – introduced in 1987 by CompuServe – handles 8 bit colour (256 colours) – lossy compression (up to 10 X) – best for drawings, simple B+W or colour diagrams, images with hard edges – supported by Perl graphics library (GD.pm) – supports animation & transparency Graphics Formats* • JPEG (Joint Photographic Experts Group) – pronounced “JAY-peg” – exploits eye’s poor perception of small changes in colour variation – handles 24 bit colour (1.6 million colours) – allows adjustable lossy compression – best for colour pictures of real objects with varied colour, shadow, fuzzy edges – among most common web image formats Graphics Formats* • PNG (Portable Network Graphics) – designed to replace GIF and TIFF – supports lossless compression – supports 24 bit, grayscale and 8 bit – supports transparency & interlacing – offers better compression than GIF (15%) – supported by new GD.pm Perl library – problems with many early browsers in viewing PNG (now fixed) PovRay (www.povray.org) Aliasing & Antialiasing* True Image Aliased Image Anti-aliased Image Ray Casting (from 3D to 2D)* • Ray = beam of light • For each pixel on screen, cast ray from eye thru pixel • Test every object in scene to see if ray intersects object • Each ray intersection nearest to eye is made visible, color pixel Ray Tracing & Reflection* • Used to determine surface appearance • Begins with ray casting, determine intersects, then recursively sends 2ndary rays to see which objects reflect, which are transparent, which absorb, etc. Shadowing* • Uses ray tracing algorithm • Sends out 2ndary rays towards light sources to see if opaque objects are in the way, if so, then surface is in shadow • “shadow feelers” HAV-3C Protease - Alan Gibbs Outline • • • • • • Visualization Programs Vectors & Matrices Difference Distance Matrices Molecular Superposition Measuring Superposition Classifying 3D Structures Vectors Define Bonds and Atomic Positions O H3N+ z y x Origin H CO bond O R Review – Vectors* z (1,2,1) ^ ^ ^ u = 1i + 2j + 1k y u 1 u = 2 1 x (0,0,0) u = (1-0)2 + (2-0)2 + (1-0)2 = 6 Vectors have a length & a direction Review - Vectors • Vectors can be added together • Vectors can be subtracted • Vectors can be multiplied (dot or cross or by a matrix) • Vectors can be transformed (resized) • Vectors can be translated • Vectors can be rotated Matrices* • A matrix is a table or “array” of characters • A matrix is also called a tensor of “rank 2” 6 5 1 6 3 8 7 0 4 4 9 9 1 3 3 4 3 0 5 4 A 5 x 6 Matrix # columns 4 3 0 4 4 # rows 2 1 1 9 3 column row Different Types of Matrices 2 1 1 9 3 3 4 3 0 4 4 6 6 5 1 6 3 7 8 7 0 4 4 9 9 9 1 3 3 1 A square Matrix 4 3 0 5 4 0 2 4 6 8 9 4 4 3 5 7 9 3 6 5 1 0 1 0 8 7 0 4 3 5 9 9 1 3 3 4 4 3 0 5 4 0 A symmetric Matrix 1 3 5 9 7 3 A column Matrix (A vector) Different Types of Matrices* cosq A G M S B H N T C I O U D J P V E K Q W F L R X A rectangular Matrix sinq 0 sinq -cosq 0 24689 0 0 A rotation Matrix 1 A row Matrix (A vector) Review - Matrix Multiplication 2 4 0 1 0 2 1 3 1 x 2 1 3 1 0 0 0 1 0 2x1 2x0 2x2 1x1 1x0 1x2 1x1 1x0 1x2 + + + + + + + + + 4x2 4x1 4x3 3x2 3x1 3x3 0x2 0x1 0x3 + + + + + + + + + 0x0 0x1 0x0 1x0 1x1 1x0 0x0 0x1 0x0 10 4 16 7 4 11 1 0 0 Rotation* 1 0 0 cosq 0 -sinq z q y f x cosf -sinf 0 0 sinq cosq sinf cosf 0 0 0 1 Rotate about x Rotate about z Rotation* Counterclockwise about x Counterclockwise about z 1 0 0 0 0 cosq -sinq sinq cosq Clockwise about x 1 0 0 cosq 0 -sinq 0 sinq cosq cosf -sinf sinf cosf 0 0 0 0 1 Clockwise about z cosf -sinf 0 sinf cosf 0 0 0 1 Rotation X z y x X 1 0 0 cosq 0 -sinq 1 0 0 cosq 0 -sinq 0 sinq cosq 0 sinq cosq = z = y x Rotation (Detail)* 1 0 0 cosq 0 -sinq 1 0 0 cosq 0 -sinq 0 sinq cosq 0 sinq cosq z z = X y y x x 1 1 1 1 cosq + sinq -sinq + cosq = Comparing 3D Structures • • • • • Visual or qualitative comparison Difference Distance Matrices Superimposition or superposition Root mean square deviation (RMSD) Subgraph isomorphisms (Ullman’s algorithm) • Combinatorial extension (CE) Qualitative Comparison Same or Different? Outline • • • • • • Visualization Programs Vectors & Matrices Difference Distance Matrices Molecular Superposition Measuring Superposition Classifying 3D Structures Difference Distance Matrix* D E D C B E C B A A Object A Object B Difference Distance Matrix* A B C A 0 4 5 B 0 4 C 0 D E D 6 6 3 0 E 4 7 6 3 0 A B C A 0 4 5 B 0 4 C 0 D E C A E 4 3 5 3 0 A B C A 0 0 0 B 0 0 C 0 D E D 0 1 0 0 E 0 4 1 0 0 D D E D 6 5 3 0 B E C B A Hinge motion Difference Distance Matrix Difference Distance Matrices or DDM’s* • Simplest method to perform structural comparisons • Requires no transfomations, no rotations or superpositions • Very effective at identifying “hinge” motions or localized changes • Produces a visually pleasing, quantitative measure of similarity Superposition* • Objective is to match or overlay 2 or more similar objects • Requires use of translation and rotation operators (matrices/vectors) • Recall that very three dimensional object can be represented by a plane defined by 3 points Superposition* z z b c’ b c’ c b’ c b’ a a a’ a’ y x y x Identify 3 “equivalence” points in objects to be aligned Superposition z z b c’ c’ c b’ b a c b’ a’ a y x x Translate points a,b,c and a’,b’,c’ to origin y Superposition z c’ z c’ b b c b’ a x b’ q a y c y x Rotate the a,b,c plane clockwise by q about x axis Superposition z z c’ c’ b b’ b’ a f c a y f c x b x Rotate the a,b,c plane clockwise by f about z axis y Superposition z z c’ b’ c’ b’ a a y y y c x b c b x Rotate the a,b,c plane clockwise by y about x axis Superposition z z c’ q’ b’ c’ a a y y b’ c x b c b x Rotate the a’,b’,c’ plane anticlockwise by q’ about x axis Superposition z c’ z a y f‘ b’ c x a b’ y c’ b c b x Rotate the a’,b’,c’ plane anticlockwise by f’ about z axis Superposition z z a c’ c x a b’ y’ y y c’ b c b’ b x Rotate the a’,b’,c’ plane clockwise by y’ about x axis Superposition z z a a y c’ c x y c’ b’ b c b’ b x Apply all rotations and translations to remaining points Superposition z z b c’ c b’ a a’ a y y c’ c x b’ b x Before After Returning to the “red” frame z z b c a a y c’ c y b’ b x x Before After Returning to the “red” frame* • Begin with the superimposed structures on the x-y plane • Apply counterclockwise rot. By y • Apply counterclockwise rot. By f • Apply counterclockwise rot. By q • Apply red translation to red origin Just do things in reverse order! Shortcomings* • Requires some initial assumptions regarding the anchoring points for superposition • Anchoring points can’t always a priori be known or easily calculated • It “privileges” the first point “a” over “c” which is in turn privileged over “b” More General Approaches* • Monte Carlo or Genetic Algorithms • Matrix methods using least squares or conjugate gradient minimization (McLachlan/Kabsch) • Lagrangian multipliers • Rotation Angle Methods • Quaternion-based methods (fastest) Superposition – Applications* • Ideal for comparing or overlaying two or more protein structures • Allows identification of structural homologues (CATH and SCOP) • Allows loops to be inserted or replaced from loop libraries (comparative modelling) • Allows side chains to be replaced or inserted with relative ease Outline • • • • • • Visualization Programs Vectors & Matrices Difference Distance Matrices Molecular Superposition Measuring Superposition Classifying 3D Structures Measuring Superpositions Molecule b Molecule a RMSD - Root Mean Square Deviation* • Method to quantify structural similarity same as standard deviation • Requires 2 superimposed structures (designated here as “a” & “b”) • N = number of atoms being compared RMSD = Si (xai - xbi)2+(yai - ybi)2+(zai - zbi)2 N Superpositions for Multiple Structures RMSD - For Multiple Structures* • Requires multiple superimposed structures over a single “averaged” structure (x,y,z) • N = number of atoms being compared • M = number of structures superimposed RMSD = S a Si (xai - xi)2+(yai - yi)2+(zai - zi)2 N M RMSD without Superposition* A B C A 0 4 5 B 0 4 C 0 D E D 6 6 3 0 E 4 7 6 3 0 A B C A 0 4 5 B 0 4 C 0 D E C A E 4 3 5 3 0 A B C A 0 0 0 B 0 0 C 0 D E D 0 1 0 0 E 0 4 1 0 0 D D E D 6 5 3 0 B E C B A RMS = 1+4+1 = 1.89 10 RMSD* • • • • • • 0.0-0.5 Å <1.5 Å < 5.0 Å 5.0-7.0 Å > 7.0 Å > 12.0 Å • • • • • • Essentially Identical Very good fit Moderately good fit Structurally related Dubious relationship Completely unrelated SuperPose Web Server http://wishart.biology.ualberta.ca/SuperPose/ Outline • • • • • • Visualization Programs Vectors & Matrices Difference Distance Matrices Molecular Superposition Measuring Superposition Classifying 3D Structures Classifying Protein Folds* Lactate Dehydrogenase: Mixed a / b Immunoglobulin Fold: b Hemoglobin B Chain: a Detecting Unusual Relationships Similarity between Calmodulin and Acetylcholinesterase Classifying Protein Folds SCOP Database http://scop.mrc-lmb.cam.ac.uk/scop SCOP • Class folding class derived from secondary structure content • Fold derived from topological connection, orientation, arrangement and # 2o structures • Superfamily clusters of low sequence ID but related structures & functions • Family clusers of proteins with seq ID > 30% with v. similar struct. & function SCOP Structural Classification The eight most frequent SCOP superfolds The CATH Database http://www.cathdb.info CATH • Class [C] derived from secondary structure content (automatic) • Architecture (A) derived from orientation of 2o structures (manual) • Topology (T) derived from topological connection and # 2o structures • Homologous Superfamily (H) clusters of similar structures & functions CATH - Class Class 1: Mainly Alpha Class 2: Mainly Beta Class 3: Mixed Alpha/Beta Class 4: Few Secondary Structures Secondary structure content (automatic) CATH - Architecture Roll Super Roll Barrel 2-Layer Sandwich Orientation of secondary structures (manual) CATH - Topology L-fucose Isomerase Serine Protease Aconitase, domain 4 TIM Barrel Topological connection and number of secondary structures CATH - Homology Alanine racemase Dihydropteroat e (DHP) synthetase FMN dependent fluorescent proteins 7-stranded glycosidases Superfamily clusters of similar structures & functions Other Servers/Databases • • • • • • Dali - http://ekhidna.biocenter.helsinki.fi/dali_server/ VAST - http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml Matras - http://biunit.aist-nara.ac.jp/matras/ CE - http://cl.sdsc.edu/ce.html TopMatch - http://topmatch.services.came.sbg.ac.at/ PDBsum - http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/ CE Search http://cl.sdsc.edu/ce/all-to-all/1-to-all.html CE Search Summary • Many different tools and formats to visualize 3D structure – learn how to use at least one of them • Visualization on computers is mostly about matrix and vector manipulation • Structure comparison also requires the use of linear algebra • Protein structures can be compared and aligned – just like sequences