Virtual Classrooms and E-Learning: Bringing Cheminformatics

advertisement
Virtual Classrooms and E-Learning: Bringing Cheminformatics
Training Into Academic and Industrial Settings
TJ O'Donnell
O'Donnell Associates
Norah E. MacCuish and John D. MacCuish
Mesa Analytics &Computing, LLC
Introduction
The use of computers in chemistry has grown quickly over few decades to a
point where it is ubiquitous in the pharmaceutical industry. While one can think
of it as being derived from roots in physical chemistry, it is an integrative
discipline, incorporating ideas from physical chemistry, organic chemistry and
computer science. After 25 years, it may be considered as a discipline in its own
right. There are several textbooks devoted exclusively to cheminformatics.
Many universities offer courses in cheminformatics.
We have begun a project that integrates the teaching of concepts with real-world
software applications. The initial goal of the project is to create modules that will
be used in graduate courses in cheminformatics. These modules could be
extended to offer an introduction or training to chemists in industry. They could
also provide educational assistance to students in undergraduate, or even earlier
levels of education. During the first phase of the project, we demonstrated the
feasibility of such a project by creating a web-based module to introduce the
concepts of fingerprints, clustering, and sub-structure commonality analysis. We
also made contact with several universities throughout the U.S. who expressed a
need for tools like the ones we will develop.
In addition to Mesa Analytics & Computing’s software, a group of vendor
participants’ products are also included. Together, these offer the full range of
state-of-the-art cheminformatics and modeling. We show ChemAxon’s Marvin
Tools integration into a prototype module as an example of the use of one of the
vendor participants.
Web-based Courses
We have begun the second phase of our research. Our goal is to create webbased modules that cover distinct topics in cheminformatics, such as molecular
representations, fingerprinting, clustering, databases, 3D modeling, etc. We
have coordinated our efforts with professors at several universities that are
currently offering courses in cheminformatics. Rather than providing an entire
course including HTML pages of information and links, we have chosen to create
CGI and Java interactive tools, with only a small amount of explanatory text.
This approach relies on the teachers to provide the bulk of the text-based
materials. They will use our tools in ways that integrate best with their current
courses. In addition, they will provide feedback on how well our modules work in
their courses. This will allow us to continually improve the modules and expand
them into other areas. This approach should also work well with potential users
in industry and in undergraduate and earlier educational institutions.
Interactive Modules
While we are using modern techniques to deliver instruction on the internet, our
approach is based on traditional methods of education. Our modules correspond
to chapters in a book, or perhaps even an entire university course. It is exciting
to speculate that entire courses might someday be devoted to the single topics of
fingerprints and clustering, the use of databases in cheminformatics or 3D
modeling of chemical interactions.
Our use of interactive web-based methods, using CGI and Java corresponds to
the traditional use of laboratories to augment classroom education. As in a
laboratory setting, students using our modules will be directed to accomplish
certain goals, but still be free to experiment with ways of using each of the
particular computer tools at their disposal.
Demonstration
In order to better explain our approach, we demonstrate our first prototype
module. It summarizes aspects of fingerprinting, clustering, and the identification
of sub-structure commonalities among a group of similar chemical structures
using Mesa’s ChemTattoo®. This module uses a web-client browser to display
text and images and allow interaction with the student. It uses a web-server to
process the student’s input and provide the appropriate results.
Fingerprints
Fingerprints are computed using software from Mesa that uses the MDLI’s
MACCS 320 keys1. These can be used to group molecular structures or to
identify interesting and important sub-structural fragments contained in the set of
input structures. The structures are input from a variety of sources on the
students’ computer: uploaded files of SMILES or SDF files, sketched using the
ChemAxon Marvin Sketcher, or by pasting in a list of SMILES. The variety of
input methods is rather typical of real-world work in cheminformatics. It also
introduces the concepts of SMILES, sketchers, and connection table files to
show how they all can be used to represent the same molecular structures. The
Marvin Viewer can be used to verify the correct input of structures after
uploading, pasting, or sketching. Finally, the student asks for the fingerprints to
be computed. She is then shown typical output from the program, including a
type of representation of the bitstring fingerprint, seen here in Figure 1.
1
Reoptimization of MDL Keys for Use in Drug Discovery , J. L. Durant, B. A. Leland, D. R. Henry, J. G.
Nourse, JCICS, 2002, 42 (6), 1273-1280.
Figure 1. Fingerprint generation and graphic representation.
Clustering
Once the fingerprints are computed, students can choose to cluster the input
structures. Figure 2 shows the clustering module setup page. The results from
the clustering are the typical text output as well as a graphical display of results.
These include a dendrogram – in this example we use hierarchical clustering -with which the student can interact to view the contents of each cluster using
MarvinView. In addition, an interactive graph of hierarchical level-selection
statistics is displayed to demonstrate the trade-offs inherent in selecting the final
clustering result. Figure 3 shows both views.
Figure 2. Clustering Module set up page with MarvinSketch
Figure 3. Interactive clustering dendrogram (left); level selection and ambiguity
plot (right).
Sub-Structure Commonalities
Another method that can be used to characterize the set of input structures is to
identify sub-structural features that are common among them. We use the
ChemTattoo®,, analogous to the Stigmata program2, a public domain contributed
software that uses Daylight, CIS fingerprints. ChemTattoo®, finds features
common as defined by the MACCS 320 keys (or any predefined set of SMARTSbase key set) among a group of similar structures. This is what is known as a
modal fingerprint. Those features can then in turn be displayed, and if they
overlap, the frequency of the intersection of features can be enumerated and
visually identified via atom and bond coloring of the structural depictions via
MarvinView. What features are in common can also be relaxed via a threshold.
Figure 4 shows an example of the results of ChemTattoo visualized with
MarvinView.
Figure 3. ChemTattoo results with MarvinView display
2
N.E. Shemetulskis, D. Weininger, C.J. Blankley, J.J. Yang, and C. Humblet, "Stigmata: An Algorithm To
Determine Structural Commonalities in Diverse Datasets", Journal of Chemical Information and Computer
Sciences , 36(4),1996,862-871.
Slides and Demo
The following slides summarize the description presented here and provide
screen images of typical results while using this module.
During the
presentation, we also show a live demonstration of this module running with the
web-server and web-client on a laptop computer. While we envision providing a
web-server on the internet (or university intranet), our server-client based model
does not require that configuration and can be easily adapted for use on one
single computer.
Acknoweldgement
Our research results are based upon work supported by the National Science
Foundation Small Business Innovation Research (SBIR) Program under Grant
No. 0450457. Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the author(s) and do not necessarily
reflect the views of the National Science Foundation.
Download