Structural Biophysics Group
School of Optometry and Vision Sciences
Cardiff University
Outline of Talk
Setting the scene
Data collection
Data analysis software
High throughput computing through Condor
Proteins are large organic compounds made of amino acids arranged in a linear chain and joined together by peptide bonds.
Each protein has a unique, genetically defined amino acid sequence which determines its specific shape and function.
Proteins can work together to achieve a particular function, and they often associate to form stable complexes.
Roles and Functions
Enzymes
Structural
Hormones
Immunoglobulins
Involved in oxygen transport
Muscle contraction
Cell signalling
Techniques to investigate protein structures:
X-ray crystallography is the science of determining the arrangement of atoms within a crystal from the manner in which a beam of X-rays is scattered from the electrons within the crystal.
Restrictions:
X-ray crystallography requires good quality crystals
Therefore, a significant fraction of proteins cannot be analysed
Structure of haemoglobin - the iron-containing oxygentransport metalloprotein in the red blood cells of the blood in vertebrates and other animals.
Importance of understanding structure of proteins
• how individual components fit together to build complex systems
• structure - function
Possibility of manipulation
•Drug design
•Drug therapies
Why scatter solutions?
Main advantage - the possibility to study the structure and structural dynamics of native particles in physiological solutions.
•Broad range of sizes and conditions
•Shape
•Complexes
Research Interests:
* Structural organisation at the nanometer length scale.
* Systems in solution / in situ
X-ray Scattering
* ideal for investigating the structure and organisation of particles/molecules in a system.
* provides information about size, shape and arrangement of particles/molecules.
X-ray Diffraction (high angles)
Diffraction from repeating structure (crystal lattice)
Incident
X-rays
SAXS (small angles)
Scattering from particles or changes in electron density
Sample
X-rays interact with molecules and are deflected.
We can interpret the deflection .
X-Rays
Sample
Scattered
X-Rays
Detector
Small angle X-ray scattering image
The shape and distribution of the scattering provides information such as size, shape and arrangement of the scattering particles.
The two-dimensional data was converted into onedimensional linear profiles.
Background corrected - buffer subtracted from protein using PRIMUS.
Konarev, P.V., Volkov, V.V., Sokolova, A.V., Koch, M.H.J. and Svergun, D.I. (2003) J.
Appl. Cryst, 36, 1277-1282.
Buffer (background)
Protein
Subtracted/corrected data
GNOM was used to estimate the particle distance distribution function, ρ (r) from the experimental scattering data.
GNOM output is entered into DAMMIN and
GASBOR.
Semenyuk, A.V. and Svergun, D.I. (1991) J. Appl. Cryst, 24, 537-540.
Size and shape of molecules in solution can be extracted from the scattering pattern using a series of computer algorithms.
DAMMIN uses an ab initio method to build models of the protein shape by simulated annealing using a singlephase dummy-atoms model (Svergun, 1999).
GASBOR uses similar parameters to DAMMIN; however, instead of the dummy-atom model, an ensemble of dummy residues are used to form a chain-compatible model (Svergun et al., 2001).
Svergun, D.I. (1999) Biophys. J., 76, 2879-2886.
Svergun, D.I., Petoukhov, M.V. and Koch, M.H.J. (2001) Biophys. J., 80, 2946-2953.
FINISH
START
Output files from Dammin and Gasbor are entered into a series of programs (DAMAVER), which align the models and produce an average of the models.
In order to obtain a reliable representation of the protein shape, DAMMIN and GASBOR need to be repeated a number of times and averaged.
The greater the number of repetitions the more accurate model is produced.
Transglutaminases are a family of enzymes that are capable of introducing isopeptide bonds in or between polypeptide chains
The average shape of 20 independent simulations produced from DAMMIN.
Using Condor, Dammin and Gasbor can be run multiple times for the same protein, and also multiple times for a number of different proteins simultaneously.
Before Condor, the total time for 20 repeat runs of approx. 36 mins would have been approximately 12 h on one PC.
Using Condor, 20 repeat runs were performed in approximately 36 mins.
Representing a significant performance gain in terms of accessibility.
A Submit Script Generator (SSG) was developed by
James Osborne to assist running DAMMIN and
GASBOR using the Condor toolkit.
The SSG asks the user only once for the necessary information to prepare and submit multiple jobs to
Condor; thereby reducing the time taken to submit and process multiple proteins.
Running Dammin on Condor
========================
1) put your gnom.out files into the input directory
2) double click on make.bat
This runs makesubmit.exe which will ask you some questions
3) double click on submit.bat
The jobs are submitted
You can check the progress of your jobs using "condor_q"
When your jobs are finished
4) copy the input and output directories somewhere safe
5) double click on clean.bat
6) Go to 1
Execute Nodes
1600 Workstations
Central Manager master, collector, negotiator
Submit Nodes
30 Workstations master, schedd, shadow master, startd, starter
>Run >cmd> condor_q
Summary
User friendly
Easy to use
Overall, Condor has proved invaluable to our research since the work is completed rapidly and efficiently
Related Publications
Lammie D., Osborne J., Aeschlimann D., Wess T.J. (2007) Rapid shape determination of tissue transglutaminase using high-throughput computing. Acta crystallographica section D-biological crystallography, 63: 1022-1024.
Mankelow T.J., Burton N., Stefansdottir F. O., Spring F. A., Parsons S. F.,
Pedersen J. S., Oliveira C. L., Lammie D, Wess T., Mohandas N., Chasis J. A.,
Brady R.L., Anstee D.J. (2007) The Laminin 511/521-binding site on the
Lutheran blood group glycoprotein is located at the flexible junction of Ig domains 2 and 3. Blood, 110:3398-406.
Dyksterhuis L. B., Baldock C., Lammie D., Wess T. J., Weiss A.S. (2007)
Domains 17-27 of tropoelastin contain key regions of contact for coacervation and contain an unusual tum-containing crosslinking domain. Matrix Biology, 26:
125-135.
Baldock C., Siegler V., Bax D. V., Cain S. A., Mellody K. T., Marson A., Haston
J. L., Berry R., Wang M.C., Grossmann J. G., Roessle M., Kielty C. M., Wess T.
J. (2006) Nanostructure of fibrillin-1 reveals compact conformation of EGF arrays and mechanism for extensibility. Proc Natl Acad Sci U S A, 103:11922-7.
Acknowledgements
Cardiff University:
Tim Wess
Daniel Aeschlimann
James Osborne (E-mail: condor@cardiff.ac.uk )