High Throughput Technique in Structural BioinformaticsApplication to Catalase, an enzyme of 57 kDa molecular weight By Prof. D. VELMURUGAN DEPARTMENT OF CRYSTALLOGRAPHY & BIOPHYSICS UNIVERSITY OF MADRAS GUINDY CAMPUS CHENNAI – 600 025 One of the main interests in the molecular biosciences is in understanding structure function relations and X-ray crystallography plays a major role in this. ab initio solutions of the crystal structures of small molecules are possible by using atomic-resolution diffraction data, usually at ~0.8 Å. Most of these small molecular crystal structures are usually solved using direct-methods programs. Macromolecules have mainly been solved at resolutions less than atomic and this has necessitated determination of initial phases either from experimental techniques such as Molecular replacement techniques, MIR or MAD . During the last decade, admirable advances have taken place in the data-collection facilities and techniques available to the macromolecular crystallographer. To get better X-ray intensity data for this purpose, new techniques like cryo temperature data collection, halide soaking and passing of Ar, Ne, Hg gas have been developed. With the above advances, more data sets appear to be coming from atomic-resolution data. The above possibility of gaining atomic resolution data even for macromolecules prompted the direct methods practitioners to make attempts to extend the direct methods using other macromolecular techniques to enable them to tackle the structure solution of macromolecules. X-ray Crystallography has become a central tool in modern drug and target discovery, providing important insights into molecular interactions and biological function. The past few years have seen many advances in the methods underlying macromolecular crystallography such as protein production, crystallization, cryo-crystallography and synchrotron technology. Together these advances mean that X-ray data can be collected extremely quickly for many different crystals and ligand-bound complexes. The challenge is to ensure rapid and accurate interpretation of the data to provide valuable structural information. The High Throughput Crystallography (HTC) Consortium offers scientists a valuable new dimension to the drug discovery process. The HTC Consortium aims to accelerate crystallographic structure determination by developing new science as well as utilizing current technology to go from initial phasing through to structure refinement and analysis while minimizing the amount of human intervention that is required. The ability to examine in atomistic detail the interactions between many different proteins and ligands provides scientists unprecedented insight into the mechanics of drug binding. Rapid and revolutionary developments in genome sciences, combinatorial chemistry, informatics and robotics are having major impacts on drug discovery. Genome sequencing projects in man and micro-organisms have provided an unprecedented number of potential drug targets. These have given impetus to the study of protein expression (proteomics) and structure (structural genomics), and have allowed a clearer description of drug targets as molecular components of disease processes. At the same time, there is rapidly expanding range of screening technologies, as well as consolidations in medicinal chemistry arising from the combinatorial approaches that were pioneered in the 1990s. These developments have created an environment for the emergence of new strategies for drug discovery. High-Throughput Crystallography is essential for structure-based lead discovery – a strategy that combines features of random screening and rational structure-based design. More than 29,000 protein structures are deposited in the Protein Data Bank (PDB) and more than 1,50,000 sequence (SWISS-PROT) entries exist for which the three dimensional structures are not available. In Structural Genomics, one is interested in determining the structure in the fastest way to understand new folds and this has opened up the “High Throughput Crystallography”. An understanding of the three-dimensional structure (fold) correlates the function of the molecule. High Throughput Crystallography using Automated procedures promotes a quicker elimination of the structure having the same fold among the deposited ones when analyzing thousands of macromolecular structures for which functional assignments are yet to be known. ACORN is a comprehensive and efficient phasing procedure involving direct methods for the determination of protein structures when atomic-resolution data are available (better than 1.2 Å) (Foadi et al., 2000; Mcauley et al., 2001; Yao, 2002; Foadi, 2003; Dodson & Yao, 2003). The fragment can be as a small-idealized piece of secondary structure (Rajakannan et al., 2004a, b; Selvanayagam et al., 2004) or an experimental substructure, such as a metal or a set of S, Se or similar atoms which can be located from anomalous scattering measurements. ACORN then uses a combination of approaches, most importantly dynamic density modification, to develop a refined set of phases. Key to the procedure is the use of a correlation factor for the weak amplitudes as a criterion of phase quality. Dynamic Density Modification (DDM) is designed to modify the densities in three steps: ’ = 0 ’ = tanh{0.2[/()]3/2} ’ = kn() if <0 if >0 if ’>kn(), It sets all negative densities to zero. It modifies the positive densities according the ratio /(). It truncates the modified densities to a value of kn(), where k is a constant given by the user (default value is 3); n is the cycle number of DDM. The reflections are divided into three groups (strong, medium and weak) according to their normalized structurefactor (E) values. The strong reflections (E > 1.2) are used in the phase refinement by the Dynamic density modification (DDM) and Patterson superposition (SUPP) procedures. Both strong and weak reflections (E < 0.1) are used in Sayre-equation refinement (SER). The medium reflections (0.1 < E < 1.2) are used to calculate a correlation coefficient (CC) for each potential solution of DDM. An important component of ACORN is a CC that describes the extent to which the magnitudes of the calculated normalized structure factors (Ec) resemble the observed normalized structure-factor amplitudes (Eo). A fragment in a particular position and orientation in the unit cell will have an associated set of structure factors and the CC will be expressed by CC where Ec Eo Ec E c Ε ο Eo , = <E2> - <E2>½ Ec and CC values are calculated from the starting fragment for all reflections to find the correct orientation and position in molecular replacement (MR) or random MR or for single random-atom searching. In phase refinement Ec and CC values are calculated from the modified map for medium reflections, which are not used for computing the map, to indicate solutions of DDM. The ACORN procedure, as implemented in CCP4, is divided into two parts, ACORN-MR and ACORN-PHASE, as illustrated in the flow diagram. Large Motif AMoRe in CCP4 PDBSET in CCP4 Positioned fragment Known phases Patterson superposition Yes No No SUPP? Standard helices Molecular replacement of random molecule replacement by search on CC Structurefactor calculation Initial phase sets SER? Yes Sayre-equation refinement Dynamic density modification Small motif Single randomatom search on CC Known heavy-atom positions Best set of refined phases ACORN-MR ACORN-PHASE ACORN-MR, deals with finding the position of a fragment of the structure, even a single atom, that provides an initial set of estimated phases. This set is passed into ACORN-PHASE, where phase refinement by a number of real-space processes is performed. For locating a single atom, this approach randomly generates thousands of positions in the asymmetric unit. For each random position, the calculated normalized structure factor values and corresponding CCs are calculated for all reflections. 1000 sets with highest CCs are saved as starting points for further calculations. In most cases, the solution is normally found in the top 100 sets. This approach can be used to determine a native protein structure from AR data, if the structure contains at least one heavy atom (sulphur or heavier). Foadi (2003) has given a detailed explanation of the reasons for the failure of ACORN when the resolution is below 1.2 Å. At atomic resolution, two neighbouring atomic peaks will be two separate entities and DDM will enhance both of them. At lower resolutions, these two peaks will merge into a single peak and DDM will just enhance it and no positive phase refinement can be expected in this situation. The present work overcomes the above problem at low resolution using the fragments for seed phasing information. The use of ACORN in solving a 57 kDa macromolecule with atomic resolution (0.88 A) / truncated synchrotron data (1.5Å resolution) Micrococcus lysodeikticus catalase (Murshudov et al., 2002) 12 17 19 18 7 2 1 6 8 5 4 14 3 8 7 6 5 4 9 10 3 11 15,16 13 1 PDB i.d.:1gwe Total residues: 503 20 Details of the crystallographic data, helices, sheets and sets Ab initio phasing using ACORN ACORN was run with 5000 random single atom trials and the 40 positions with highest CCs’ were selected. ACORN refined the phases from the random atom trials using DDM and led to the solution with good agreement of CC. In this run, 78 cycles of DDM increased the CC for medium reflections with E values from 0.0285 to 0.5246 in 14.2 hours of CPU time. In this ab initio case 8 chains could be automatically built with the ARP/wARP (Perrakis et al., 1999) followed by REFMAC (Murshudov et al., 1999) (482 residues). Manual model building was carried out for the missing residues and the final Rw and Rf values are 14.0 and 16.2% respectively. The superposition using PROFIT of the backbone atoms of this structure with the backbone atoms of the same structure solved using conventional technique gives the r.m.s deviation of 0.143 Å. Details of ACORN, ARP/wARP and REFMAC results for ab initio case Applications of truncated data at 1.5 Å resolution For set 23(minimum input), all sheets and one helix (helix4) containing 76 residues were given as input to ACORN. Here, the ACORN-PHASE option was selected for the structure solution. The R-factor and correlation coefficient for the medium E value reflections of the initial model are 54.2% and 0.0469, respectively. Within 56 cycles of DDM the R-factor and correlation coefficient attained 53.9% and 0.0771 indicating a good solution. The phases were then fed to ARP/wARP (Perrakis et al., 1999) followed by REFMAC (Murshudov et al., 1999). After the initial model building by ARP/wARP, the Rw and Rf values were 44.8 and 44.4% respectively. This initial model was refined for ten cycles of auto building along with five cycles of REFMAC in each auto-building cycle. Finally, ARP/wARP was able to build 212 residues. At this stage Rw and Rf values were 28.9 and 36.3% respectively. An iterative cycle carried out with these output phases revealed 481 residues out of 503 residues with a connectivity index of 0.97. Manual model building was carried out in the missing regions as densities were clear. After the manual model building, 20 cycles of maximum-likelihood refinement were performed using REFMAC and solvent atoms were updated after the refinement using ARP/wARP ‘build solvent atoms’ script. The final Rw and Rf values were 13.6 and 15.6% respectively. The backbone of this final model was superimposed with the structure conventionally solved by the molecular replacement method. The root-mean square deviation was 0.176 Å and the details are shown in Table 2. The results for sets 1-16 and 23 are also shown in Table 2. Figs 3a to 3q describe the final models obtained after all the sets were used for ‘seed-phasing’ information to ACORN. Table 2 lists the ACORN statistics and the ARP/wARP details for all these cases. The final results obtained in each case are also mentioned in this table. Table 2. Details of ACORN phasing, ARP/wARP model building and REFMAC refinement PROGRAM ACORN SET 9 R-factor (%) CC R-factor (%) CC R-factor (%) CC R-factor (%) CC Large E (L) 0.399 0.2110 0.402 0.2070 0.411 0.1871 0.417 0.1708 Medium E (M) 0.521 0.1212 0.523 0.1153 0.526 0.1044 0.529 0.0935 H 1,4-5,7,10-11,13-14 &17-20 (151a.a) After 37 cycles of DDM FINAL H 1,4-5,7,10-11,14 &17-20 (145a.a) After 39 cycles of DDM H 1,5,7,10-11,14&17-20 (135 a.a.) After 37 cycles of DDM H 1,5,10-11,14&17-20 (125 a.a) After 41 cycles of DDM L 0.269 0.6329 0.269 0.6330 0.270 0.6331 0.271 0.6308 M 0.525 0.1199 0.526 0.1166 0.526 0.1156 0.528 0.1108 R-factor (%) Rfree R-factor (%) Rfree R-factor (%) Rfree R-factor (%) Rfree INITIAL 43.5 42.8 43.5 42.8 43.6 43.7 43.8 43.2 FINAL 14.4 18.1 13.6 17.1 15.3 19.3 14.6 18.2 ARP/wARP Details of ARP/wARP result SET 12 STARTING Input AUTOBUILDING : 10 Cycles REFMAC : 5 Cycles for each auto building; Side dock after 6 cycles of auto building SET 11 SET 10 8 chains, 475 residues, missing residues 1 -7, 59, 60, 113, 114, 142, 143, 175, 176, 186, 187,195 202,388, 389,503,dummy atoms 1288,connectivity index 0.96 R-factor (%) Rfree 7 chains, 482 residues, missing residues 1 -7, 59, 60, 113, 114, 142, 143, 174-176, 388, 389, 401, 402, 503, d ummy atoms 1273, connectivity index 0.97 8 chains, 481 residues, missing residues 1 -7, 59, 60, 113, 114, 142, 143, 174,175, 186, 187, 201, 202, 388, 389, 503, dummy atoms 1235, connectivity index 0.97 8 chains, 481 residues, missing residues 1 -7, 59, 60, 1 42, 143, 186, 187, 201, 202, 331, 332, 388, 389, 401, 402,503, dummy atoms1203, connectivity index 0.97 R-factor (%) R-factor (%) R-factor (%) Rfree Rfree Rfree Without dummy atoms made by ARP/wARP 28.4 29.4 27.3 28.2 27.6 28.9 27.2 28.3 After manual model building for missing residues and solvent building 14.2 15.8 13.6 15.4 13.9 15.7 13.8 15.6 r.m.s. deviations of the model with backbone atoms superposed with that of 1gwe 0.145 0.191 0.161 0.169 PROGRAM ACORN Input SET 13 ARP/wARP DETAILS OF ARP/wARP result SET 15 SET 16 R-factor (%) CC R-factor (%) CC R-factor (%) CC R-factor (%) CC Large E (L) Medium E (M) 0.423 0.533 0.1513 0.0821 0.424 0.534 0.1418 0.0760 0.432 0.539 0.1225 0.0614 0.437 0.542 0.1058 0.0544 H1,10-11,14&17-20 (114 a.a) FINAL AUTOBUILDING : 10 Cycles REFMAC : 5 Cycles for each auto building; Side dock after 6 cycles of auto building SET 14 STARTING After 42 cycles of DDM L 0.272 M 0.529 R-factor (%) 0.6286 0.1085 H 1,10-11,14&18-20 (103 a.a) H 10-11,14&18-20 (91 a.a) H 11,14&18-20 (79 a.a) After 46 cycles of DDM 0.274 0.6283 0.531 0.1022 After 55 cycles of DDM 0.273 0.6289 0.534 0.0961 After 55 cycles of DDM 0.272 0.6275 0.536 0.0869 Rfree R-factor (%) Rfree R-factor (%) Rfree R-factor (%) Rfree 44.2 44.2 44.5 44.5 45.0 44.8 44.8 44.4 29.1 35.7 28.9 36.3 INITIAL 43.8 43.8 FINAL 14.9 19.0 6 chains, 485 residues, missing residues 1 7, 59, 60, 113, 114, 142, 143, 174, 175, 388, 389, 503, dummy atoms 1200, connectivity index 0.98 15.2 19.1 8 chains, 474 residues, missing residues 1 -9, 59 60, 1 42, 143, 174-180, 186, 187, 201, 202, 258, 259, 388, 389, 503,dummy atoms 1250, connectivity index 0.97 14.5 18.1 7 chains, 482 residues, missing residues 1 -8, 59, 60, 113, 114, 142, 143, 186,187, 201, 202, 503, dummy atoms 1224, connectivity index 0.97 ARP/wARP AUTOBUILDING : 10 Cycles REFMAC : 5 Cycles for each auto building; Side dock after 6 cycles of auto building Details of ARP/wARP result After manual model building for missing residues and solvent building r.m.s. deviations of the model with backbone atoms superposed with that of 1gwe 20 chains, 3 10 residues, connectivity index 0.88 0.146 22 chains, 212 residues, connectivity index 0.79 R-factor (%) Rfree R-factor (%) Rfree 29.1 35.8 28.9 36.3 13.0 16.6 8 chains, 481 residues, missing residues 1-7, 59, 60, 142, 143, 186, 187, 258, 259, 388, 389, 401, 402, 503, dummy atoms 1337, connectivity index 0.97 Without dummy atoms made by ARP/wARP SET 23 R-factor CC (%) 0.437 0.1143 0.542 0.0469 Sheets (1-8) & H4 (76 a.a) After 56 cycles of DDM 0.273 0.6211 0.539 0.0771 R-factor Rfree (%) 14.0 17.6 7 chains, 481 residues, missing residues 1-9, 59, 60, 142, 143, 174, 175, 186, 187, 388, 389, 401, 402, 5 03, dummy atoms 1287, connectivity index 0.97 R-factor Rfree (%) R-factor (%) Rfree R-factor (%) Rfree R-factor (%) Rfree R-factor (%) Rfree 27.0 28.2 28.0 29.3 27.3 28.3 27.2 28.1 27.1 28.2 13.3 14.9 15.0 16.9 14.3 15.8 14.6 16.2 13.6 15.6 0.182 0.146 0.204 0.176 Fig. 1 PDB-id: 1GWE Total:503residues Fig. 3b SET 2 Input: 184 residues Auto Built: 470 residues Fig. 3e SET 5 Input: 172 residues Auto Built: 477 residues Fig. 3h SET 8 Input: 157 residues Auto Built: 484 residues Fig. 2 Ab initio Auto Built: 482 residues Fig. 3c SET 3 Input: 181 residues Auto Built: 474 residues Fig. 3f SET 6 Input: 167 residues Auto Built: 472 residues Fig. 3i SET 9 Input: 151 residues Auto Built: 475 residues Fig. 3a SET 1 Input: 187 residues Auto Built: 476 residues Fig. 3d SET 4 Input: 177 residues Auto Built: 482 residues Fig. 3g SET 7 Input: 162 residues Auto Built: 479 residues Fig. 3j SET 10 Input: 145 residues Auto Built: 482 residues Fig. 3k SET 11 Input: 135 residues Auto Built: 481 residues Fig. 3n SET 14 Input: 103 residues Auto Built: 474 residues Fig. 3l SET 12 Input: 125 residues Auto Built: 481 residues Fig.3o SET 15 Input: 91 residues Auto Built: 482 residues Fig. 3q SET 23 Input: 76 residues Auto Built: 481 residues Fig. 3m SET 13 Input: 114 residues Auto Built: 485 residues Fig. 3p SET 16 Input: 79 residues Auto Built: 481 residues Seed phasing using Cα atoms Only the 503 Cα atoms from the known structure were used for seed phasing to ACORN with the truncated data extending to 1.3 Å resolution. Successful model could be built with 474 amino acids (a.a), the backbone atoms of which had an r.m.s deviation 0.132 Å with the actual structure (1gwe). To mimic the above ‘seed feeding’ in real situations, mean positional errors (MPE, hereafter) of 0.1, 0.2 Å were introduced for the above Cα atoms using MOLEMAN (Kleywegt, 1992-2004). Successful model could be built with 483, 481 a.a corresponding to input fragments with MPE of 0.1 and 0.2 Å respectively. The backbone atoms of these had an r.m.s deviation of 0.169, 0.163 Å respectively with the actual structure (1gwe). Results of ACORN and ARP/wARP using only Cα atoms (1gwe) Resolution 20-1.3 Å Cα atoms alone PROGRAM ACORN STARTING Large E (L) R-factor (%) CC 0.1 Å R-factor (%) CC 0.2 Å R-factor (%) CC 0.446 0.0607 0.445 0.0640 0.446 0.0543 Medium E (M) 0.551 0.0368 0.551 0.0348 0.552 0.0303 Input (1 atom /a.a - ~13% of the total structure) 503 atoms 503 atoms After 123 cycles of DDM FINAL L After 121 cycles of DDM After 165 cycles of DDM 0.6542 0.255 0.6520 0.257 0.6452 0.503 0.1944 0.505 0.1904 0.508 0.1826 R-factor (%) Rfree R-factor (%) Rfree R-factor (%) Rfree INITIAL 39.9 40.6 40.5 40.6 40.7 40.4 FINAL 13.3 16.8 14.7 18.4 14.5 18.2 ARP/wARP Details of ARP/wARP result 503 atoms 0.254 M AUTOBUILDING : 10 Cycles REFMAC : 5 Cycles for each auto building; Side dock after 6 cycles of auto building Mean Positional Error (MPE) of Cα atoms 9 chains, 474 residues, missing residues 1-7, 3440,59,60,113,114,142,143,174,175,186,187,388,38 9,401, 402,503, dummy atoms 1389, connectivity index 0.96 R-factor (%) Rfree 7 chains, 483 residues, missing residues 1-7, 59, 60, 113, 114, 142, 143, 186, 187, 331, 332, 388 ,389, 503, dummy atoms 1225, connectivity index 0.97 8 chains, 481 residues, missing residues 1-7, 59, 60, 113, 114, 142, 143, 175 ,176, 201, 202, 331, 332, 388 ,389, 503, dummy atoms 1247, connectivity index 0.97 R-factor (%) R-factor (%) Rfree Rfree Without dummy atoms made by ARP/wARP 28.2 28.9 26.9 28.1 27.5 28.5 After manual model building for missing residues and solvent building 15.1 17.1 14.2 16.0 14.4 16.1 r.m.s. deviations of the model with backbone atoms superposed with that of 1gwe 0.132 0.169 0.163 PDB i.d. : 1gwe Total residues:503 Input: 0.1Angstrom error at calpha atoms Auto built: 483 residues Input: Calpha atoms (503) Auto built: 474 residues Input: 0.2Angstrom error at calpha atoms Auto built: 481 residues Seed phasing using 120 a.a as polyala The first 120 a.a from the actual structure were treated as polyala model and the above procedures were carried out to obtain the final model. Results are detailed in Table. With the 120 residues as polyala model, ARP/wARP was able to build 111 residues in 15 chains when the above procedures were followed. An iterative cycle carried out with this output as input revealed 480 residues out of 503 residues with a connectivity index of 0.98. In the case of first 120 residues of polyala model with 0.1 Å MPE, ARP/wARP initially built only 6948 dummy atoms. Two iterative cycles carried out with this as input finally built 481 residues. These two models have an r.m.s deviation of 0.176, 0.173 Å respectively with the backbone atoms of the actual structure (1gwe). Results of ACORN and ARP/wARP using polyala model (5atoms/a.a) (1gwe) Resolution 20 – 1.5 Å PROGRAM First 120 residues of polyala ACORN STARTING Large E (L) Medium E (M) FINAL ARP/wARP AUTOBUILDING : 10 Cycles REFMAC : 5 Cycles for each auto building; Side dock after 6 cycles of auto building L M CC 0.0915 0.0525 After 56 cycles of DDM 0.276 0.6196 0.537 0.0807 R-factor (%) Rfree 45.1 44.4 45.1 44.9 FINAL 32.3 42.7 24.5 45.1 15 chains, 111 residues, connectivity index 0.88, dummy atoms 4437 0 chains, 0 residues, connectivity index 0.00, dummy atoms 6948 R-factor (%) Rfree R-factor (%) Rfree INITIAL 32.3 42.8 24.5 45.2 FINAL 13.1 16.6 26.8 35.8 6 chains, 480 residues, missing residues 1-9, 59, 60, 110-114, 142, 143, 174, 175, 388, 389, 503, dummy atoms 1354, connectivity index 0.98 Details of ARP/wARP result ARP/wARP AUTOBUILDING : 10 Cycles REFMAC : 10 Cycles for each auto building; Side dock after 6 cycles of auto building After 55 cycles of DDM 0.274 0.6195 0.540 0.0775 R-factor (%) Rfree INITIAL DETAILS OF ARP/wARP result ARP/wARP AUTOBUILDING : 10 Cycles REFMAC : 10 Cycles for each auto building; Side dock after 6 cycles of auto building R-factor (%) 0.439 0.541 First 120 residues of polyala with a mean positional error (MPE) of 0.1 Å R-factor (%) CC 0.442 0.0853 0.541 0.0499 INITIAL FINAL Details of ARP/wARP result Without dummy atoms made by ARP/wARP After manual model building for missing residues and solvent building r.m.s. deviations of the model with backbone atoms superposed with that of 1gwe 20 chains, 282 residues, dummy atoms 3271, connectivity index 0.85 R-factor (%) Rfree 26.9 35.8 13.6 17.5 8 chains, 481 residues, missing residues 1-7, 39, 40, 59, 60, 142, 143, 174, 175, 186, 187, 388, 389, 401, 402, 503, dummy atoms 1201, connectivity index 0.97 R-factor (%) Rfree R-factor (%) Rfree 27.3 28.0 27.1 28.0 14.0 15.7 14.0 15.5 0.176 0.173 PDB i.d. : 1gwe Total residues:503 Input: First 120 a.a as polyala model Auto built: 480 residues Input: First 120 a.a as polyala model after introducing the MPE of 0.1Angstrom Auto built: 481 residues STEREO VIEW OF THE ELECTRON DENSITY (2FO-FC|) MAP SUPERPOSED WITH FINAL MODEL (Input: Polyala model for the first 120a.a with a MPE of 0.1 Å) STEREO VIEW OF THE FINAL ELECTRON DENSITY (2FO-FC|) MAP STARTING WITH THE POLYALA MODEL OF FIRST 120A.A WITH MPE OF 0.1 Å FINAL ELECTRON DENSITY (2FO-FC|) MAP FOR POLY ALA MODEL ELECTRON DENSITY (2FO-FC|) MAP FOR HEME GROUP IN POLYALA MODEL Seed phasing using Ncap, Ccap and Middle portions of helices/sheets Instead of feeding the entire helices or sheets [Selvanayagam et al., 2004 (a minimum of 76 residues were found to be sufficient for seed phasing with 1.5 Å truncated data to solve the three dimensional structure of catalase)] either the N cap/C cap regions or the mid portion in the helices or sheets could also be fed as input for seed phasing. Successful model can be built in these cases also. The results obtained are listed in Table. Results of ACORN and ARP/wARP using Ncap, Ccap and Middle portions of helices/sheets (1gwe) Resolution 20-1.5 Å Ncap region of helices/sheets PROGRAM ACORN STARTING (L) Large E Medium E (M) R-factor (%) CC R-factor (%) CC R-factor (%) CC 0.435 0.1118 0.437 0.1076 0.434 0.1319 0.539 0.0569 0.542 0.0553 0.538 Details of ARP/wARP result After 52 cycles of DDM After 53 cycles of DDM After 51 cycles of DDM L 0.275 0.6270 0.271 0.6307 0.272 0.6248 M 0.534 0.0901 0.533 0.0946 0.533 0.0926 R-factor (%) Rfree R-factor (%) Rfree R-factor (%) Rfree INITIAL 44.5 44.8 44.4 44.9 44.7 44.5 FINAL 14.9 18.6 14.5 18.2 15.2 19.1 FINAL ARP/wARP AUTOBUILDING : 10 Cycles REFMAC : 5 Cycles for each auto building; Side dock after 6 cycles of auto building 0.0623 76 a.a 76 a.a 76 a.a Input Middle region of helices/sheets Ccap region of helices/sheets 10 chains, 470 residues, missing residues 1-7, 39, 40, 59, 60, 113, 114, 142, 143, 174, 175, 176, 195-202, 331, 332, 388, 389, 401, 402, 503, dummy atoms 1308, connectivity index 0.95 R-factor (%) Rfree 9 chains, 474 residues, missing residues 1-7, 39-40, 59, 60, 142, 143, 174, 175, 186, 187, 201, 202, 331, 332, 388, 389, 503, dummy atoms 1277, connectivity index 0.96 Rfree R-factor (%) 8 chains, 479 residues, missing residues 1-7, 60, 110-114, 142, 143, 175, 176, 186, 187, 388, 389, 503, dummy atoms 1230, connectivity index 0.97 R-factor (%) Rfree Without dummy atoms made by ARP/wARP 28.9 29.3 28.3 29.3 27.8 28.7 After manual model building for missing residues and solvent building 13.3 16.2 14.6 17.4 13.0 15.8 r.m.s. deviations of the model with backbone atoms superposed with that of 1gwe 0.218 0.183 0.151 Input: Ncap region of helices/sheets(76 a.a) Auto Built: 470 residues Input: Ccap region of helices/sheets(76 a.a) Auto Built: 474 residues Input: Middle region of helices/sheets(76 a.a) Auto Built: 479 residues Black shaded regions correspond to the input residues from 1gwe Conclusion • Based on the published work and the work being carried out by our group (Rajakannan et al., 2004a; 2004b), it has now become very clear that very little information (15%) is needed to determine the structure of a protein using ACORN. • Ours is the first case of ACORN applications using seedphasing information to solve even larger molecular weight protein (57 kDA) when the resolution extends to 1.5 Å. • Among the multiple solutions, the correct solutions can be obtained in all trials with high reliability by the working of correlation coefficient and hence high resolution and fairly complete diffraction data enable one to solve a protein ab initio, in a relatively short amount of time. • ACORN has the great potential to establish itself as program for high-throughput structure determination. • Currently, in order to extend the applicability of ACORN to lower resolutions, the seed phasing has been obtained from the native structure itself (as the structure had already been solved by traditional macromolecular crystallographic methods). Data mining approach to feed fragments using the PDB entries is in progress. References Banumathi, S., Rajakannan, V., Velmurugan, D., Dauter, Z., Dauter, M., Tsai, M. D. & Sekar, K. (2002). Japanese Crystallographic Society Meeting, Poster, P3-II-27, 123. Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760763. Dodson, E. J. & Yao, J. -X. (2003). Crystallogr. Rev. 9, 67-72. Foadi, J. (2003). Crystallogr. Rev. 9, 43-65. Foadi, J., Woolfson, M. M., Dodson, E. J., Wilson, K. S., Yao, J. -X. & Chao-de, Z. (2000). Acta Cryst. D56, 1137-1147. Kleywegt, G. J. (1992-2004). Uppsala University, Uppsala, Sweden. Unpublished program. McAuley, K. E., Yao, J. –X., Dodson, E. J., Lehmbeck, J., Ostergaard, P. R. & Wilson, K. S. (2001). Acta Cryst. D57, 1571-1578. Murshudov, G. N., Lebedev, A., Vagin, A. A., Wilson, K. S. & Dodson, E. J. (1999). Acta Cryst. D55, 247-255. Murshudov, G. N., Grebenko, A. I., Brannigan, J. A., Antson, A. A., Barynin, V. V., Dodson, G. G., Dauter, Z., Wilson, K. S. & Melik-Adamyan, W. R. (2002). Acta Cryst. D58, 1972-1982. Perrakis, A., Morris, R. M. & Lamzin, V. S. (1999). Nature Struct. Biol. 6, 458463. Rajakannan, V., Velmurugan, D., Yamane, T., Dauter, Z., Dauter, M., Tsai, M. D. & Sekar, K. (2002). Japanese Crystallographic Society Meeting, Poster, P3I-22, 84. Rajakannan,V., Yamane, T., Shirai, T., Kobayshi, T., Ito, S. & Velmurugan, D. (2003). International Symposium on Diffraction Structural Biology, Tsukuba, Japan, 28-31 May 2003, Poster P-085. Rajakannan, V., Yamane, T., Shirai, T., Kobayshi, T. Ito, S. & Velmurugan, D. (2004a). J. Synchrotron Rad. 11, 64-67. Rajakannan, V., Selvanayagam, S., Yamane, T., Shirai, T., Kobayshi, T., Ito, S. & Velmurugan, D. (2004b). J. Synchrotron Rad. 11, 358-362. Selvanayagam, S., Velmurugan, D., Yamane, T. (2004). Asian Crystallographic Association Meeting (AsCA’04) Poster(P0165). Velmurugan, D., Rajakannan, V., Yamane, T., Dauter, Z. & Sekar, K. (2002). Japanese Crystallographic Society Meeting, Poster, P3-II26, 122. Yao, J. -X. (2002). Acta Cryst. D58, 1941-1947.