Department of Biomedical Informatics Nanoinformatics: Advancing in silico Cancer Research David E. Jones John D. Morgan Award Research partially supported by NLM Training Grant #T15LM007124 1 Department of Biomedical Informatics What is Nanotechnology? • The study of controlling and manipulating matter at the atomic or molecular level • Focuses on the development of materials, devices, and other structures at the nanoscale • Very diverse field that bridges multiple sciences – – – – Molecular Biology Organic Chemistry Molecular Physics Material Science http://www.nanoinstitute.utah.edu/ 2 Department of Biomedical Informatics Nanomedicine Defined • The medical application of nanotechnology used in the diagnosis, treatment, and prevention of diseases in the clinical setting Department of Biomedical Informatics Science-to-Informatics Clinical Informatics Bioinformatics ? Department of Biomedical Informatics Nanoinformatics • Defined in 2007 by the United States National Science Foundation – Improve research in the field of nanotechnology by using informatics techniques and tools on nanoparticle data and information http://www.nsf.gov/ 5 Department of Biomedical Informatics Background: Nanoinformatics • National nanotechnology initiative – Enhance quality and availability of data • Data acquisition, analysis, and sharing – Expand theory, modeling, and simulation • Structural and predictive models – Informatics infrastructure • Semantic search and sharing of data/models • Web-enabled tools for collaboration http://www.nano.gov/node/681 Department of Biomedical Informatics Nanomedicine Areas of Focus In Vitro Detection Theranostics Nanocarriers http://www.nanotech-now.com http://www.universityofcalifornia.edu http://www.wikipedia.org/ 7 Department of Biomedical Informatics Why are Nanocarriers so Important? • Nanomedicine delivery devices are important to the future of cancer treatment – Promising due to their properties • Suitable size, high solubility, and ability to change design Tanner P, et. al. Polymeric Vesicles: From Drug Carriers to Nanoreactors and Artificial Organelles. 2011. 8 Department of Biomedical Informatics Why are Nanocarriers so Important? • Enhanced permeability and retention (EPR) effect Park K. Polysaccharide-based near-infrared fluorescence nanoprobes for cancer diagnosis. 2012. http://krauthammerlab.med.yale.edu/imagefinder/Figure.external?sp=443431&state:Home=BrO0ABXcTA AAAAQAADHNlYXJjaFN0cmluZ3QAEG1pUiogYnJhaW4gaGVhcnQ%3D 9 Department of Biomedical Informatics Types of Nanocarriers Cho K, et. al. Therapeutic Nanoparticles for Drug Delivery in Cancer. 2008. Department of Biomedical Informatics Poly(amido amine) Dendrimers • PAMAM dendrimers are particularly promising – Have potential for oral delivery – Cancer drugs can bind to the surface and interior of the molecule – Molecules surface can easily be modified http://www.dendritech.com Department of Biomedical Informatics Design Challenges for Nanocarriers http://bioserv.rpbs.univ-paris-diderot.fr/services/FAF-Drugs/admetox.html 12 Department of Biomedical Informatics For Small Molecule Pharmaceutics • Well known in silico approaches exist • Quantitative Structure Activity Relationships (QSAR) – Analyze the structures and functions of pharmaceutical and chemical compounds • Used for many different bioactive molecules in the fields of medicinal chemistry and cheminformatics • This method has seen limited application in the ability to empirically calculate biochemical properties of nanoparticles 13 Department of Biomedical Informatics Nanoinformatics Challenges • These approaches have not been used in nanocarriers for many reasons – Availability of nanoparticle data – Actual atomic size of the nanoparticle structures – Computational capability and algorithms http://www.nanoinstitute.utah.edu/ 14 Department of Biomedical Informatics Ultimate Goal of this Research • Demonstrate that in silico aided design of nanocarriers is possible by developing and adapting advanced informatics techniques • Utilize state of the art data mining and machine learning techniques to develop a model linking PAMAM dendrimer cytotoxicity to molecular descriptors and structure of the nanoparticle 15 Department of Biomedical Informatics Where Do We Start? • Availability of Nanoparticle Data – Databases containing information relevant to biomedical nanoparticles are critical for secondary uses such as data mining and predictive modeling Department of Biomedical Informatics caNanoLab • Database containing information relevant to nanomedicine on nanoparticles and their properties • Developed by the National Cancer Institute for sharing nanoparticle information https://cananolab.nci.nih.gov/caNanoLab/ Department of Biomedical Informatics caNanoLab • Issues – Limited number of nanoparticles (not all inclusive or current) – Incomplete information regarding the chemical and physical properties of nanoparticles – No simple way to download the data to apply machine learning or statistical analyses – There is no ability to query this system and no data model exists to compare the properties of the molecule to its biochemical activity Department of Biomedical Informatics Data Not Easily Accessible • Availability of nanoparticle data – To our knowledge, there is no authoritative, up-todate database – Manual extraction is not feasible Department of Biomedical Informatics Natural Language Processing (NLP) • Information extraction method – Used to automatically extract information from an unstructured (free-text) document – Shown to be successful in extracting information from related biomedical fields http://www.conversational-technologies.com/nldemos/nlDemos.html Department of Biomedical Informatics Nano-NLP • Garcia-Remesal, Maojo, and colleagues – Text classification method – Identified: • • • • Nanoparticle names Routes of exposure Toxic effects Particle targets – Successful, but qualitative not quantitative Department of Biomedical Informatics Our Approach • Two-Step process Text Classification Text Extraction Department of Biomedical Informatics Text Extraction Purpose • Extract numeric values associated with PAMAM dendrimer properties from the cancer nanomedicine literature – NanoSifter • 10 properties taken from the NanoParticle Ontology (NPO) • Hydrodynamic diameter, particle diameter, molecular weight, zeta potential, cytotoxicity, IC50, cell viability, encapsulation efficiency, loading efficiency, and transfection efficiency Jones DE, Igo S, Hurdle J, Facelli JC. Automatic Extraction of Nanoparticle Properties Using Natural Language Processing: NanoSifter an Application to Acquire PAMAM Dendrimer Properties. PloS one. 2014;9(1):e83932. Epub 2014/01/07. 23 Department of Biomedical Informatics Properties to be Extracted VARIABLE Hydrodynamic Diameter Particle Diameter Molecular Weight Zeta Potential Cytotoxicity IC50 Cell Viability Encapsulation Efficiency Loading Efficiency Transfection Efficiency DEFINITION The hydrodynamic size which is the diameter of a particle or molecule (approximated as a sphere) in an aqueous solution. Diameter which inheres in a particle. The sum of the relative atomic masses of the constituent atoms of a molecule. The potential difference between the bulk dispersion medium (liquid) and the stationary layer of liquid near the surface of the dispersed particulate. Toxicity that impairs or damages cells, and it is a desired property for the killing of growing tumor cells. A measure of toxicity which is the concentration of a drug or inhibitor that is required to inhibit a biological process or a participant's activity in that process by half. Viability of a cell to proliferate, grow, divide, or repair damaged cell components. The efficiency inhering in a nanomaterial or supramolecular structure by virtue of its capacity to encapsulate an amount of molecular entity, isotope or nanomaterial. A quality inhering in a material entity by virtue of it having the capacity to carry an amount of another material entity. The efficiency inhering in a bearer's ability to facilitate transfection. 24 Department of Biomedical Informatics NanoSifter Extraction Pipeline 25 Department of Biomedical Informatics NanoSifter Performance Nanoparticle Property Term TP FP FN Recall Precision F-measure Encapsulation Efficiency 1 0 0 1.00 1.00 1.00 Hydrodynamic Diameter 8 0 0 1.00 1.00 1.00 5 41 124 143 211 47 78 19 0 0 18 23 39 8 31 13 0 1 1 2 1 1 0 1 1.00 0.98 0.99 0.99 1.00 0.98 1.00 0.95 1.00 1.00 0.87 0.86 0.84 0.85 0.72 0.59 1.00 0.99 0.93 0.92 0.91 0.91 0.83 0.73 Loading Efficiency Zeta Potential Cytotoxicity Molecular Weight Particle Diameter IC50 Cell Viability Transfection Efficiency 26 Department of Biomedical Informatics NanoSifter Performance Type of Average Recall Precision F-measure Macro 0.99 0.87 0.92 Micro 0.99 0.84 0.91 27 Department of Biomedical Informatics NanoSifter Observations • Recall vs. precision – Desire a higher recall because this means that we are capturing most instances (i.e. missing very few in the literature) – Tradeoff is that the number of false positives increases which in turn reduces the precision 28 Department of Biomedical Informatics NanoSifter Limitations • Data extracted by our method is not always directly associated with a dendrimer nanoparticle • Only pair a nanoparticle property term with a single numeric value annotation before and after itself (co-reference resolution) • Cannot extract data from tables and figures 29 Department of Biomedical Informatics NanoSifter Discussion • Next steps – Continue work on text classification methods to improve the precision of the system – Expand the property terms and numeric values that the system targets – Annotate and extract information from other subclasses of nanoparticles – Implement some sort of negation analysis tool into our system 30 Department of Biomedical Informatics Text Classification Purpose • Identify and annotate entities in the unstructured nanomedicine literature – Augment the text extraction method – Improve the precision of extracted property data Department of Biomedical Informatics Text Classification Pipeline Department of Biomedical Informatics Now Have the Necessary Data… • Data mining and predictive modeling – Previous studies • Liu et al. analyzed a number of attributes of a variety of nanoparticles in order to predict post-fertilization mortality in zebrafish • Horev-Azaria and colleagues used predictive modeling to explore the effect of cobalt-ferrite nanoparticles on the viability of seven different cell lines – This method has not been applied to empirically calculate a prediction of the cytotoxicity of PAMAM dendrimers 33 Department of Biomedical Informatics In Silico Platform Jones DE, Hamidreza Ghandehari, Facelli JC. Data Mining in Nanomedicine: Predicting Toxicity of PAMAM Dendrimers by Molecular Descriptors and Structure. Submitted 2014. 34 Department of Biomedical Informatics PAMAM Dendrimers G4 G3 35 Department of Biomedical Informatics PAMAM Dendrimers G5 36 Department of Biomedical Informatics Molecular Descriptors Sample Name Molecular Weight Aliphatic Atom (g/mol) Count Refractivity G3 PAMAM 6908.8403 484 1847.28 G4 PAMAM 14214.1651 996 3798.47 G5 PAMAM 28824.8147 2020 7700.85 37 Department of Biomedical Informatics Classification Analysis • Initial analysis Classifier J48 Bagging Filtered Classifier LWL SMO Classification via Regression DTNB NBTree Decision Table Naïve Bayes Precision 0.838 0.836 0.789 Recall 0.835 0.835 0.748 F-Measure 0.836 0.835 0.750 Accuracy 83.5% 83.5% 74.8% 0.775 0.738 0.724 0.738 0.738 0.728 0.741 0.725 0.723 73.8% 73.8% 72.8% 0.691 0.681 0.678 0.621 0.670 0.670 0.660 0.602 0.674 0.673 0.664 0.607 67.0% 67.0% 66.0% 60.2% 38 Department of Biomedical Informatics Classification Analysis • Feature selection analysis Classifier Precision Recall F-Measure ROC Area Accuracy J48 0.888 0.883 0.884 0.844 88.3% Filtered 0.736 0.718 0.722 0.800 71.8% 0.819 0.767 0.769 0.834 76.7% Classifier LWL 39 Department of Biomedical Informatics J48 Decision Tree 40 Department of Biomedical Informatics Regression Analysis Prediction of Cell Viability 110 y = 0.4174x + 55.086 RMS Error = 14.21% 100 Predicted 90 80 70 60 50 20 30 40 50 60 70 80 90 100 110 120 Actual 41 Department of Biomedical Informatics Discussion • Greatest prediction accuracies were achieved after supplementing the expert selected features with experimental conditions • The properties presented in the decision tree diagram represent the more general properties of charge, size, and concentration • Experimentally, these properties have been hypothesized to be primary causes of cytotoxicity 42 Department of Biomedical Informatics Conclusion • The results indicate that data mining and machine learning can be used to predict cytotoxicity and cell viability of PAMAM dendrimers on Caco-2 cells with good accuracy • Nanoinformatics methods could be implemented to significantly reduce the search space necessary to create suitable PAMAM dendrimers which exhibit less cytotoxicity 43 Department of Biomedical Informatics References 1. Jain K. The Handbook of Nanomedicine. 1st ed. Totowa, New Jersey: Humana; 2008. 2. Staggers N, McCasky T, Brazelton N, Kennedy R. Nanotechnology: the coming revolution and its implications for consumers, clinicians, and informatics. Nursing outlook. 2008;56(5):268-74. Epub 2008/10/17. 3. de la Iglesia D, Maojo V, Chiesa S, Martin-Sanchez F, Kern J, Potamias G, et al. International efforts in nanoinformatics research applied to nanomedicine. Methods of information in medicine. 2011;50(1):84-95. Epub 2010/11/19. 4. Thomas DG, Pappu RV, Baker NA. NanoParticle Ontology for cancer nanotechnology research. J Biomed Inform. 2011;44(1):59-74. Epub 2010/03/10. 5. National Cancer Institute. caNanoLab. 2011 [cited 2011]; Welcome to the cancer Nanotechnology Laboratory (caNanoLab) portal. caNanoLab is a data sharing portal designed to facilitate information sharing in the biomedical nanotechnology research community to expedite and validate the use of nanotechnology in biomedicine. caNanoLab provides support for the annotation of nanomaterials with characterizations resulting from physico-chemical and in vitro assays and the sharing of these characterizations and associated nanotechnology protocols in a secure fashion.]. Available from: https://cananolab.nci.nih.gov/caNanoLab/. 6. Hunter L, Lu Z, Firby J, Baumgartner WA, Jr., Johnson HL, Ogren PV, et al. OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression. BMC bioinformatics. 2008;9:78. Epub 2008/02/02. 7. Garcia-Remesal M, Garcia-Ruiz A, Perez-Rey D, de la Iglesia D, Maojo V. Using nanoinformatics methods for automatically identifying relevant nanotoxicology entities from the literature. BioMed research international. 2013;2013:410294. Epub 2013/03/20. 8. Cunningham H, al. e. Text Processing with GATE: University of Sheffield Department of Computer Science; 2011. 9. Yang Y. An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval. 1999;1(1-2):69-90. 10. Tropsha A, Golbraikh A. Predictive QSAR modeling workflow, model applicability domains, and virtual screening. Current pharmaceutical design. 2007;13(34):3494-504. Epub 2008/01/29. 11. Liu X, Tang K, Harper S, Harper B, Steevens JA, Xu R. Predictive modeling of nanomaterial exposure effects in biological systems. International journal of nanomedicine. 2013;8 Suppl 1:31-43. Epub 2013/10/08. 12. Horev-Azaria L, Baldi G, Beno D, Bonacchi D, Golla-Schindler U, Kirkpatrick JC, et al. Predictive toxicology of cobalt ferrite nanoparticles: comparative in-vitro study of different cellular models using methods of knowledge discovery from data. Particle and fibre toxicology. 2013;10:32. Epub 2013/07/31. 13. ChemAxon, Berry I, Ruyts B. Future-proofing Cheminformatics Platforms2012 10/31/2013:[1-16 pp.]. Available from: http://www.chemaxon.com/wpcontent/uploads/2012/04/Future_proofing_cheminformatics_platforms.pdf. 14. Ltd. C. Marvin. 2013. 15. Witten I, Frank E, Hall M. Data Mining: Practical Machine Learning Tools and Techniques. 3 ed: Morgan Kaufmann Publishers; 2011. 629 p. 16. Vasumathi V, Maiti PK. Complexation of siRNA with Dendrimer: A Molecular Modeling Approach. Macromolecules. 2010;43:8264-74. 17. Karatasos K, Posocco P, Laurini E, Pricl S. Poly(amidoamine)-based dendrimer/siRNA complexation studied by computer simulations: effects of pH and generation on dendrimer structure and siRNA binding. Macromolecular bioscience. 2012;12(2):225-40. Epub 2011/12/08. 44 Department of Biomedical Informatics Acknowledgements • Morgan Family • National Library of Medicine Training Grant • Department of Biomedical Informatics at the University of Utah • Ph.D. Committee – – – – – Julio C. Facelli, Ph.D. Hamidreza S. Ghandehari, Ph.D. John F. Hurdle, M.D., Ph.D. Karen Eilbeck, Ph.D. Bruce E. Bray, M.D. 45 Department of Biomedical Informatics Questions 46