Searching for Amino Acids on WebCSD Purpose of this exercise The following instructions will guide you through the process of finding a specific molecule (asparagine) in the Cambridge Database and then connecting to the literature paper(s) that have described the structure of that molecule. You will then look within those papers for information about how to crystallize that particular molecule. About the Cambridge Structure Database The Cambridge Structural Database (CSD) from the Cambridge Crystallographic Data Center (CCDC) is the most useful database for small-molecule structures in organic and inorganic chemistry. Most of the structures in the database have been determined using X-ray diffraction techniques. The “jurisdiction” of the CCDC is any structure that contains at least one carbon atom. The current CSD contains over 500,000 organic and inorganic molecules including carbohydrates, amino acids, nucleic acids and small peptides. The database comes in two flavors, a locally installed version (installed on individual computers) and a web version called (appropriately enough) WebCSD. For this exercise we will only be using WebCSD. Accessing the web version of the CSD (WebCSD) Type the following address into your web browser: http://webcsd.ccdc.cam.ac.uk/ Look for the Licensed to: (your institution) up in the top right corner – if it does not show up then you will not be able to use the full version of the database. You should get a screen that looks like Figure 1 below. Figure 1. Typical opening screen for WebCSD Created by Dean H. Johnston, Otterbein University. Copyright Dean H. Johnston, 2011. This work is licensed under the Creative Commons Attribution Non-commercial Share Alike License. To view a copy of this license visit http://creativecommons.org/about/license/. Example 1: Substructure Search for Asparagine In this example, we will search for molecules that have the basic structure found in the amino acid asparagine (see Figure 2). Figure 2. Line drawing for the amino acid L-asparagine (zwitterionic form) Drawing the molecule backbone We’ll be doing a “substructure” search… Click on Substructure Search, either on the left picture or the top link – you should get a screen that looks like Figure 3. Start by drawing out the basic backbone using the pencil tool (see Figure 3) Figure 3. Molecular backbone drawn using the structure drawing tools in WebCSD Assigning atom and bond types Then assign atom types and bond types by selecting an atom type (from the list at the bottom of the screen) or bond type (from the menu at the bottom of the screen) and clicking on the appropriate atom or bond (see Figure 4) Figure 4. Molecule with all atoms and bond-types assigned correctly Assigning charges to selected atoms Note: Amino acids contain both an acid (the –CO2H part) and a base (the –NH2 part). At neutral pH, the amine group removes the proton from the acid portion and forms a zwitterion, a molecule with positive and negative charges on different atoms within the molecule. To assign charges: right-click on the singly-bonded oxygen atom and select Charge > Negative > –1 right-click on the right nitrogen atom on the bottom and select Charge > Positive > +1 Adding hydrogen atoms automatically Note: If we searched on just this structure, we would probably come up with many molecules we’re not interested in since we have not specified the hydrogen atoms. Click on the Atom menu and select Hydrogens > Generate > All Atoms. This will now show hydrogens on the nitrogen and oxygen atoms. You should end up with something looking like Figure 5. Figure 5. Molecule with all charges and hydrogen atoms added. Note: there are hydrogens on the carbon atoms, they are just not shown. You can see them if you hover the arrow over a carbon atom. Searching the database Click on the START SEARCH button Accept the default option and click the Start button Figure 6. Results from the first substructure search for asparagine If you are searching asparagine, you should receive about fifteen “hits” – the first nine of which are exactly what we want (see Figure 6). This is referred to as entry ASPARM in the database; each structure receives a six-letter reference code, or REFCODE. Note that many entries are titled ASPARMXX, where XX is a number – these are all sets of data on the exact same molecule. In many cases the structure of a molecule may be published more than once. Now that we have the reference code for the molecule that we are interested in, we can do another type of search that will give us even more information. Searching by REFCODE Find the Entry Identifier box at the top of the screen Type the reference code (ASPARM) into the box and click Find Figure 7. Running up a refcode (entry ID) search WebCSD should produce a list of all the structures of L-ASPARagine Monohydrate contained in the Cambridge Database (total of 11 structures as of Jan 2011) as shown in Figure 7. Click on each in turn – is there anything different about the entries? What kind of order are they in? Accessing the Literature* * The Cambridge Database will link most structures to the scientific paper and journal where the structural data was published. This tutorial assumes your institution has full access to ACS Journals and archives. Click on entry ASPARM08 – we can see on the right side that this entry was published in the Journal of the American Chemical Society (J. Am. Chem. Soc.) in 2000 Now click on the doi link, it will bring up the title and abstract of the paper (see Figure 8) Figure 8. Title and abstract for the paper reference by entry ASPARM08 Now select Full Text HTML to get the text of the entire paper. Scroll down to the Experimental Section and you will see that they recrystallized their sample from hot water. (Many other details are listed, most of which we don’t need at this point). Experimental Section A sample of l-asparagine monohydrate (Sigma, St. Louis, MO) was recrystallized from hot water, and single-crystal X-ray diffraction data were collected at the SUNY X3A1 beamline at the National Synchrotron Light Source, Brookhaven National Laboratory. X-ray crystallographic data for L-asparagine monohydrate are summarized in Table 1… Another paper Click on entry ASPARM03 – we can see on the right side that this entry was published in Acta Crystallogr., Sect. B: Struct. Crystallogr. Cryst. Chem. in 1972. Again, click on the doi link at the top of the page and it will show you the at least the first page of the article (the Experimental section is shown in Figure 9) Figure 9. Experimental section from entry ASPARM03 Stereoisomers Since most amino acids are chiral, they can occur as a stereoisomer or as a racemic mixture. In our case, even though we didn’t specify in our search, we found only L-asparagine. But if you look carefully at the Details tab in our search result, you will notice that under Cross References there is a link to the structure of the stereoisomer under Refcode ILUXOC. Clicking on the ILUXOC link will bring up the structure of D-asparagine monohydrate (a.k.a. (R)-2,4-Diamino4-oxobutanoic acid monohydrate). Note that we can’t find a paper for this structure because it has been contributed in a Private Communication. Moving On This completes this brief introduction to WebCSD. There are many more features within WebCSD that have not been covered including similarity searching, unit cell searching, and fulltext searching. Feel free to explore the other searching options and see how things work.