Searching for Amino Acids on WebCSD

advertisement
Searching for Amino Acids on WebCSD
Purpose of this exercise
The following instructions will guide you through the process of finding a specific molecule
(asparagine) in the Cambridge Database and then connecting to the literature paper(s) that have
described the structure of that molecule. You will then look within those papers for information
about how to crystallize that particular molecule.
About the Cambridge Structure Database
The Cambridge Structural Database (CSD) from the Cambridge Crystallographic Data Center
(CCDC) is the most useful database for small-molecule structures in organic and inorganic
chemistry. Most of the structures in the database have been determined using X-ray diffraction
techniques. The “jurisdiction” of the CCDC is any structure that contains at least one carbon
atom. The current CSD contains over 500,000 organic and inorganic molecules including
carbohydrates, amino acids, nucleic acids and small peptides. The database comes in two
flavors, a locally installed version (installed on individual computers) and a web version called
(appropriately enough) WebCSD. For this exercise we will only be using WebCSD.
Accessing the web version of the CSD (WebCSD)


Type the following address into your web browser: http://webcsd.ccdc.cam.ac.uk/
Look for the Licensed to: (your institution) up in the top right corner – if it does not
show up then you will not be able to use the full version of the database. You should get
a screen that looks like Figure 1 below.
Figure 1. Typical opening screen for WebCSD
Created by Dean H. Johnston, Otterbein University. Copyright Dean H. Johnston, 2011. This
work is licensed under the Creative Commons Attribution Non-commercial Share Alike License.
To view a copy of this license visit http://creativecommons.org/about/license/.
Example 1: Substructure Search for Asparagine
In this example, we will search for molecules that have the basic structure found in the amino
acid asparagine (see Figure 2).
Figure 2. Line drawing for the amino acid L-asparagine (zwitterionic form)
Drawing the molecule backbone
We’ll be doing a “substructure” search…
 Click on Substructure Search, either on the left picture or the top link – you should get a
screen that looks like Figure 3.
 Start by drawing out the basic backbone using the pencil tool (see Figure 3)
Figure 3. Molecular backbone drawn using the structure drawing tools in WebCSD
Assigning atom and bond types
 Then assign atom types and bond types by selecting an atom type (from the list at the
bottom of the screen) or bond type (from the menu at the bottom of the screen) and
clicking on the appropriate atom or bond (see Figure 4)
Figure 4. Molecule with all atoms and bond-types assigned correctly
Assigning charges to selected atoms
Note: Amino acids contain both an acid (the –CO2H part) and a base (the –NH2 part). At
neutral pH, the amine group removes the proton from the acid portion and forms a zwitterion, a
molecule with positive and negative charges on different atoms within the molecule.
To assign charges:
 right-click on the singly-bonded oxygen atom and select Charge > Negative > –1
 right-click on the right nitrogen atom on the bottom and select Charge > Positive > +1
Adding hydrogen atoms automatically
Note: If we searched on just this structure, we would probably come up with many molecules
we’re not interested in since we have not specified the hydrogen atoms.

Click on the Atom menu and select Hydrogens > Generate > All Atoms. This will now
show hydrogens on the nitrogen and oxygen atoms. You should end up with something
looking like Figure 5.
Figure 5. Molecule with all charges and hydrogen atoms added.
Note: there are hydrogens on the carbon atoms, they are just not shown. You can see them if
you hover the arrow over a carbon atom.
Searching the database
 Click on the START SEARCH button
 Accept the default option and click the Start button
Figure 6. Results from the first substructure search for asparagine
If you are searching asparagine, you should receive about fifteen “hits” – the first nine of which
are exactly what we want (see Figure 6). This is referred to as entry ASPARM in the database;
each structure receives a six-letter reference code, or REFCODE. Note that many entries are
titled ASPARMXX, where XX is a number – these are all sets of data on the exact same
molecule. In many cases the structure of a molecule may be published more than once.
Now that we have the reference code for the molecule that we are interested in, we can do
another type of search that will give us even more information.
Searching by REFCODE


Find the Entry Identifier box at the top of the screen
Type the reference code (ASPARM) into the box and click Find
Figure 7. Running up a refcode (entry ID) search
WebCSD should produce a list of all the structures of L-ASPARagine Monohydrate contained in
the Cambridge Database (total of 11 structures as of Jan 2011) as shown in Figure 7.


Click on each in turn – is there anything different about the entries?
What kind of order are they in?
Accessing the Literature*
* The Cambridge Database will link most structures to the scientific paper and journal where
the structural data was published. This tutorial assumes your institution has full access to
ACS Journals and archives.


Click on entry ASPARM08 – we can see on the right side that this entry was published
in the Journal of the American Chemical Society (J. Am. Chem. Soc.) in 2000
Now click on the doi link, it will bring up the title and abstract of the paper (see Figure 8)
Figure 8. Title and abstract for the paper reference by entry ASPARM08


Now select Full Text HTML to get the text of the entire paper.
Scroll down to the Experimental Section and you will see that they recrystallized their
sample from hot water. (Many other details are listed, most of which we don’t need at
this point).
Experimental Section
A sample of l-asparagine monohydrate (Sigma, St. Louis, MO) was recrystallized from hot
water, and single-crystal X-ray diffraction data were collected at the SUNY X3A1 beamline
at the National Synchrotron Light Source, Brookhaven National Laboratory. X-ray
crystallographic data for L-asparagine monohydrate are summarized in Table 1…
Another paper
 Click on entry ASPARM03 – we can see on the right side that this entry was published
in Acta Crystallogr., Sect. B: Struct. Crystallogr. Cryst. Chem. in 1972.
 Again, click on the doi link at the top of the page and it will show you the at least the first
page of the article (the Experimental section is shown in Figure 9)
Figure 9. Experimental section from entry ASPARM03
Stereoisomers
Since most amino acids are chiral, they can occur as a stereoisomer or as a racemic mixture. In
our case, even though we didn’t specify in our search, we found only L-asparagine. But if you
look carefully at the Details tab in our search result, you will notice that under Cross References
there is a link to the structure of the stereoisomer under Refcode ILUXOC. Clicking on the
ILUXOC link will bring up the structure of D-asparagine monohydrate (a.k.a. (R)-2,4-Diamino4-oxobutanoic acid monohydrate). Note that we can’t find a paper for this structure because it
has been contributed in a Private Communication.
Moving On
This completes this brief introduction to WebCSD. There are many more features within
WebCSD that have not been covered including similarity searching, unit cell searching, and fulltext searching. Feel free to explore the other searching options and see how things work.
Download