Protein x-ray crystallography Two truths of x-ray crystallography (1.) Models, not structures. Corroborating results suggest that the models are close to biological reality. (2.) No matter how carefully performed, any experiment will have errors associate with it. Errors in the fitting of the (sparse) electron density maps are some of the most common. Steps in protein x-ray crystallography • Molecular biology: over-express protein in expression system. • More art than science (and more luck than art): Grow crystals of the protein that diffract well (difficult step, can take from weeks to years!). • Physics: Obtain the X-ray diffraction data. • Computation: Compute electron density maps. • Computation again: Refinement --calculate an atomic model to fit electron density; compare the diffraction data computed from the model with the actual data; refine the model to fit the data (iterate). Protein crystals are “liquid crystals” Look like normal crystals, but are actually more like gels (20 to 80% solvent). Crystallization energy diagram Unit cell Experimental set-up Diffraction pattern Bragg’s law explains why cleavage faces of crystals reflect x-ray beans at certain angles of incidence (diffraction) d = spacing between molecules in the lattice 2d (sin) = n = angle of observed diffraction = wavelength of x-rays n = integer for first order, second order, etc. X-rays Crystal Detector Resolution is directly proportional to In x-ray crystallography, the phrase “2 Å model” means that the analysis included reflections out to a distance of 1/(2 Å) from the center. Diffraction to electron density (which is not the same as the final model structure) Fourier Transform To get from the diffraction pattern to the electron density, you have to use a Fourier Transform. Note: This process is largely done automatically by a computer. Phases critically impact model quality Methods to resolve the phase problem (largely outside our scope) • Isomorphous Replacement – Single Isomorphous Replacement (SIR) – Multiple Isomorphous Replacement (MIR) • Anomalous Dispersion – Single Wave-Length Anomalous Dispersion (SAD) – Multiple Wave-Length Anomalous Dispersion (MAD) • Selenomethionine is commonly used for MAD • Molecular Replacement • Direct Methods From electron density to model Note: While some manual fitting still occurs, this process is largely done automatically by a few different computer programs. Final models are determined from a combination of electron density overlap and MM energies Note: due to the wavelength of x-rays, hydrogen atoms are only resolved in the absolute highest resolution structures. Key x-ray crystallography model quantities Quality: Resolution (in Å) and R-factor (values = 0 to 1). Atom coordinates: Define the mean coordinates of the (heavy) atoms. B-factors (aka, temperature factors): Describes the apparent disorder about the mean. Disorder is spatial (crystal heterogeneity) and temporal (protein flexibility). However, in reality, B-factors are in protein crystallography are NOT pure DebyeWaller factors (mobilities). Instead, B-factors are most often best characterized as “fudge factors” uses to fit the electron density maps. Occupancies: Occasionally, a better fit to the electron density can often by obtained by assuming that certain atoms can be in more than one location, due to alternate conformations. Resolution Resolution statistics R-factor R-factor (aka, residual factor or agreement factor) is a measure of the difference between the observed and computed intensities. Note that the structure factor F is related to intensities from the diffraction pattern. A similar quality criterion is Rfree, which is calculated from a subset (~10%) of reflections that were not included in the structure refinement. 0.6: Very bad ||Fobs| - |Fcalc|| R = -----------------|Fobs| 0.5: Bad 0.4: Recoverable R values: 0.2: Good for Protein 0.05: Good for small organic models 0.0: Perfect Rfree statistics Common rules of thumb A good rule of thumb for defining an acceptability threshold is based on resolution and R-factor. A resolution of 2.0 Å or lower and a R-factor of 0.20 or lower is a commonly used threshold in structural bioinformatic analyses. It is important to remember though, that there is no such thing as a single structure. Proteins are best described by ensembles. In the past, NMR structures were considered to be of lower quality than x-ray structures. However, they are increasingly accepted, especially since the environmental conditions (solvent vs. liquid crystal) have been argued to be more biological. Unfortunately, there is no magic number that can be used to assess NMR structure quality, or lack thereof. An example of occupancy != 1.00 Common methods for model evaluation (you will cover this more in Dr. Guo’s class) Model evaluation via MM force fields (you will cover this more in Dr. Guo’s class) ki ki 2 V (r ) = å (li - li,o ) + å (q i - qi,o ) 2 + 2 2 bonds angles N Vn å 2 (1+ cos(nw - g )) + torsions æ éæ ö12 æ ö 6 ù ö s s q q ij ij i j ÷ ç ê ú å å ç 4eij êçç r ÷÷ - çç r ÷÷ ú + 4pe r ÷ o ij i=1 j= i+1 è ëè ij ø è ij ø û ø For more info on x-ray crystallography I strongly recommend this book to anyone doing structural bioinformatics! Protein NMR A few comments about protein structure determination via NMR (HSCQ + others) Introduction of a magnetic field will orient the random spins along the external field The basics of NMR The extent of the chemical shift is related to local environment (e.g. chemical shifts in 1H NMR) Chemical shifts Chemical shifts are determined relative to a reference state --- frequently tetramethalsilane (TMS). TMS is great for several reasons... (1.) Twelve chemically equivalent protons means lots-o-signal (2.) Electronegativity of Silicon << electronegativity of Carbon, thus signal shouldn’t effect things. (3.) Low boiling point, so can be easily removed via heating. J-Coupling Q: What is the output of multidimensional protein NMR experiment? Distance restraints, angle restraints, and orientation restraints. Distance comes from HSQC’s (NOESY, etc.). A series of protein structure models is built that attempt to satisfy as many of the restraints as possible, in addition to general properties of proteins such as bond lengths and angles. The algorithms convert the restraints and the general protein properties into energy terms, and thus tries to minimize the energy. The process results in an ensemble of structures that, if the data were sufficient to dictate a certain fold, will converge. Q: What is the output of multidimensional protein NMR experiment? Answer: A series of models that satisfy the experimental constraints, while still obeying the chemical rules that govern protein structure (as we understand it). Also: While other NMR experiments do directly quantify flexibility through NMR order parameters (i.e., S2), which is beyond the scope of this class, NMR protein structures do not directly quantify flexibility. Nevertheless, regions where models vary is frequently used to indirectly identify flexible regions. Sometimes NMR spectra are informative even when they can’t be resolved Heteronuclear single quantum correlation Brief aside: Magnetic resonance imaging (MRI) Other methods to determine macromolecular structure: Examples from (cryo)-electron microscopy Other methods to determine macromolecular structure: Small Angle X-Ray Scattering (SAXS) Current PDB Holdings (as of 4/11/12) Method Proteins Nucleic Acid Prot/NA Complex Other Total X-ray 66098 1348 3266 2 70714 NMR 8190 979 186 7 9362 Electron microscopy 284 22 116 0 422 Hybrid 44 3 2 1 50 other 140 4 5 13 162 Total 74756 2356 3575 23 80710