Rich Caruana 4157 Upson Hall Cornell University Ithaca, NY 14853 caruana@cs.cornell.edu phone: 607-255-1164 fax: 607-255-4428 Research Areas Machine learning and data mining, medical informatics and decision making, bio-informatics, feature selection, missing values, inductive transfer (e.g., multitask learning), rank learning, artificial neural networks, ensemble learning, memory-based learning. Education Ph.D., Computer Science, 1997, Carnegie Mellon University, Pittsburgh, Pennsylvania. Ph.D. Thesis: "Multitask Learning." Committee: Tom Mitchell, Herb Simon, Tom Dietterich, Dean Pomerleau. M.S., Computer Science, 1984, Villanova University, Villanova, Pennsylvania. M.S. Thesis: "BANDIT: A Fast Algorithm for the Resolution of Spectra." B.S., Mathematics, 1982. Minors in Physics, Chemistry, and Honors Liberal Arts, Villanova University, Villanova, Pennsylvania. Research Positions July 2001–Present Assistant Professor, Department of Computer Science, Cornell University, Ithaca, New York July 2000–July 2001 Research Faculty, Center for Automated Learning and Discovery, Carnegie Mellon University, Pittsburgh, Pennsylvania September 1998–July 2000 Assistant Professor, Radiology and Computer Science, University of California, Los Angeles, Los Angeles, California Visiting Professor, Center for Automated Learning and Discovery, Carnegie Mellon University, Pittsburgh, Pennsylvania July 1996–March 2000 Research Scientist (machine learning), Justsystem Pittsburgh Research Center (JPRC), Pittsburgh, Pennsylvania October 1986–August 1989 Research Scientist (machine learning), Philips Labs, Briarcliff Manor, New York September 1984–October 1986 Knowledge Engineer (expert systems), GTE, Mountain View, California August 1980–August 1983 Research Scientist (biomedical pattern recognition), Geometric Data Company, King of Prussia, Pennsylvania Professional Activities Area Chair for Neural and Information Processing Systems (NIPS), 2003 Area Chair for International Conference on Machine Learning (ICML), 2003 Program Committee for International Conference on Data Mining (ICDM), 2003 Program Committee for ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2003 Program Committee American Medical Informatics Association Conference (AMIA), 2003 Thesis Committee Member, Sylvain le Tourneau, University of Ottawa, August 2003 Co-Chair of Workshop on Rank Learning, Neural and Information Processing Systems (NIPS), 2002 Instructor for advanced course in artificial neural nets (35 lecture hours) for NeuralWare Corporation, 2000, 2001 Program Committee for the International Conference on Artificial Neural Networks in Medicine and Biology (ANNIMAB), 2000 General Chair of Workshops, Neural and Information Processing Systems (NIPS), 2000 General Chair of Workshops, Neural and Information Processing Systems (NIPS), 1999 Co-Chair of Workshop on Integrating Supervised and Unsupervised Learning, Neural and Information Processing Systems (NIPS), 1998 Co-Chair of Workshop on Tricks of the Trade, Neural and Information Processing Systems (NIPS), 1996 Co-Chair of Workshop on Inductive Transfer, Neural and Information Processing Systems (NIPS), 1995 Reviewing American Association for Artificial Intelligence (AAAI) IEEE Transactions on Knowledge and Data Engineering IEEE Transactions on Neural Nets IEEE Transactions on Systems, Men and Cybernetics International Conference on Artificial Neural Nets (ICANN) International Conference on Artificial Neural Networks in Medicine and Biology (ANNIMAB) International Conference on Data Mining (ICDM) International Conference on Machine Learning (ICML) International Joint Conference on Artificial Intelligence (IJCAI) International Joint Conference on Artificial Neural Nets (IJCNN) Journal of Analytical Chemistry Journal of Artificial Intelligence Research (JAIR) Journal of Machine Learning Research (JMLR) Knowledge Discovery and Data Mining (KDD) Machine Learning Journal (MLJ) Neural and Information Processing Systems (NIPS) Portuguese Conference on Artificial Intelligence Honors Nomination for Best Paper: Caruana, Rich, Niculescu, Stefan, Rao, Bharat, and Simms, Cynthia, "Evaluating the C-section Rate of Different Physician Practices: Using Machine Learning to Model Standard Practice" to appear at the American Medical Informatics Conference (AMIA), November 2003. Invited speaker, AAAI Workshop on Learning From Imbalanced Data, July 2000: "Methods for Learning From Imbalanced Data" Invited speaker, Research Jamboree, University of Edinburgh, Edinburgh, Scotland, May 2000 (3 talks): "Unabridged and Multitask Learning", "Machine Learning, Medical Decision Making, and Explanation", "Semi-Supervised Clustering" Invited speaker, University of Skovde, Sweden, December 1994: "Multitask Learning" Participant, Generalization in Neural Nets and Graphical Models Summer School, Newton Institute, Cambridge, England, 1997 Participant, Learning in Graphical Models Summer School, Ettore Majorana, Erice, Italy, 1996 Participant, Connectionist Models Summer School, Boulder, Colorado, 1993 Award for Outstanding Contribution, Philips Labs, 1988 Journal Publications Caruana, Rich, and de Sa, Virginia R., “Benefitting from the Variables that Variable Selection Discards,” Journal of Machine Learning Research, Vol. 3, March 2003, pp.1245-1264. Goldenberg, A., Shmueli, G., Caruana, R., Fienberg, S., "Early Statistical Detection of Anthrax Outbreaks by Tracking Over-the-counter Medication Sales," Proceedings of the National Academy of Sciences, 99, 5237-5240, 2002. Simms, Cynthia J., Meyn, Leslie, Caruana, Rich, Rao, R. Bharat, Mitchell, Tom, Krohn, Marijane, "Predicting Cesarean Delivery with Decision Tree Models," The American Journal of Obstetrics and Gynecology, Vol. 183, No. 5, November 2000, pp. 1198-1206. Cooper, G. F., Aliferis, C. F., Ambrosino, R., Aronis, J., Buchanan, B. G., Caruana, R., Fine, M. J., Glymour, C., Gordon, G., Hanusa, B. H., Janosky, J. E., Meek, C., Mitchell, T., Richardson, T., Spirtes, P., "An Evaluation of Machine Learning Methods for Predicting Pneumonia Mortality." Artificial Intelligence in Medicine 9, 1997, pp. 107-138. Caruana, Rich, "Multitask Learning." Machine Learning, Vol. 28, pp. 41-75, Kluwer Academic Publishers, 1997. Mitchell, Tom, Caruana, Rich, Freitag, Dayne, McDermott, John, Zabowski, David, "Experience with a Learning Personal Assistant." Communications of the ACM, 1994. Caruana, Rich, Searle, Roger B., Shupack, Saul I., "Additional Capabilities for a Fast Algorithm for the Resolution of Spectra." Journal of Analytical Chemistry, 1988. Caruana, Rich, Searle, Roger B., Heller, Thomas, Shupack, Saul I., "Fast Algorithm for the Resolution of Spectra." Journal of Analytical Chemistry, 1986. Book Chapters Caruana, Rich, "15 Useful Tricks with Extra Outputs." Neural Networks: Tricks of the Trade, G. B. Orr and K.-R. Muller (Eds.), Springer-Verlag, 1998. Caruana, Rich, "Multitask Learning." Learning to Learn, S. Thrun and L. Pratt (Eds.), Kluwer Academic Publishers, 1997. Caruana, Rich, Freitag, Dayne, "How Useful Is Relevance?" Intelligent Relevance: Papers from the 1994 Fall Symposium, AAAI Report FS-94-02, ISBN 0-929280-76-8, 1994. Caruana, Rich, "The Automatic Training of Rule Bases that Use Numerical Uncertainty Representations." Uncertainty in Artificial Intelligence, Vol. 3, North-Holland, 1988. Refereed Conference Papers Caruana, Rich, Niculescu, Stefan, Rao, Bharat, and Simms, Cynthia, "Evaluating the C-section Rate of Different Physician Practices: Using Machine Learning to Model Standard Practice" to appear at the American Medical Informatics Conference (AMIA), November 2003. Langford, John., and Caruana, Rich, "(Not)Bounding the True Error," Neural and Information Processing Systems, Vol. 14 (Proceedings of NIPS*2001), MIT Press, 2002. Caruana, Rich, "A Non-Parametric EM-Style Algorithm for Imputing Missing Values," Artificial Intelligence and Statistics, January 2001. Caruana, Rich, Lawrence, Steve, and Giles, Lee, "Overfitting in Artificial Neural Nets Trained with Backpropagation, Conjugate Gradient, and Early Stopping," Neural and Information Processing Systems, Vol. 13 (Proceedings of NIPS*2000), MIT Press, 2001. Berger, Adam, Caruana, Rich, Cohn, David, Freitag, Dayne, Mittal, Vibhu, "Bridging the Lexical Chasm: Automatic FAQ Answer Finding." Special Interest Group on Information Retrieval (SIGIR), Athens, Greece, July 2000. O'Sullivan, Joseph, Langford, John, Caruana, Rich, Blum, Avrim, "Unabridged Learning." International Conference on Machine Learning (ICML), Stanford, California, July 2000. Caruana, Rich, Cohn, David, McCallum, Andrew, "Semi-Supervised Clustering with User Feedback." Machines that Learn, Snowbird, Utah, April 2000. Simms, Cynthia, M.D., Caruana, Rich, Krohn, M. J., Meyn, Leslie, Mitchell, Tom, Rao, R. Bharat, and Schmeuking, Ingo, "Predicting Caesarean Section with Decision Trees." Annual Meeting of the Society of Fetal and Maternal Medicine, February 2000. Caruana, Rich, "Case-Based Explanation of Artificial Neural Nets." Artificial Neural Nets in Medicine and Biology (ANNIMAB), Goteborg, Sweden, May 2000. Caruana, R., Kangarloo, H., David, J., Dionisio, N., Sinha, U., Johnson, D. "Case-Based Explanation of Non-Case-Based Learning Methods." Proceedings of the 1999 American Medical Informatics Association (AMIA) Symposium, 1999, pp.212-215. Caruana, Rich, Dionisio, John, Johnson, David, Taira, Ricky, Kangarloo, Hooshang, "Automatic Imaging Protocol Selection." American Radiology Conference (IRAS), 1999. Caruana, Rich, "Applying Case-Based Explanation to Non-Case-Based Methods such as Artificial Neural Nets or Decision Trees." American Radiology Conference (IRAS), 1999. Caruana, Rich, O'Sullivan, Joseph, "Multitask Pattern Recognition for Autonomous Robots." IEEE International Conference on Intelligent Robotic Systems (IROS), Victoria, B.C., Canada, October 1998. Caruana, Rich, de Sa, Virginia, "Using Feature Selection to Find Inputs that Work Better as Extra Outputs." The International Conference on Neural Nets (ICANN), Skövde, Sweden, September 1998. Caruana, Rich, O'Sullivan, Joseph, "Multitask Pattern Recognition for Vision-Based Autonomous Robots." The International Conference on Neural Nets (ICANN), Skövde, Sweden, September 1998. Caruana, Rich, de Sa, Virginia, "Promoting Poor Features to Supervisor: Some Inputs Works Better as Outputs." Neural and Information Processing Systems, Vol. 9 (Proceedings of NIPS*96), MIT Press, 1997, pp. 389-395. Caruana, Rich, "Algorithms and Applications for Multitask Learning." Machine Learning, Proceedings of the 13th International Conference on Machine Learning (ICML 1996, Bari, Italy), Morgan Kauffmann, 1996, pp. 87-95. Caruana, Rich, Baluja, Shumeet, Mitchell, Tom, "Using the Future to 'Sort Out' the Present: Rankprop and Multitask Learning for Medical Risk Evaluation." Advances in Neural Information Processing Systems, Vol. 8 (Proceedings of NIPS*95), MIT Press, 1996, pp. 959-965. Baluja, Shumeet, Caruana, Rich, "Removing the Genetics from the Standard Genetic Algorithm." Proceedings of the 12th Annual Conference on Machine Learning, 1995, pp. 38-46. Caruana, Rich, "Learning Many Related Tasks at the Same Time with Backpropagation." Advances in Neural Information Processing Systems 7 (Proceedings of NIPS*94), MIT Press, 1995 pp. 657-664. Caruana, Rich, Freitag, Dayne, "Greedy Attribute Selection." Machine Learning, Proceedings of the Eleventh International Conference on Machine Learning, (ICML 1994, New Brunswick, New Jersey) Morgan Kauffmann, 1994, pp. 28-36. Caruana, Rich, "Multitask Connectionist Learning." Proceedings of the 1993 Connectionist Models Summer School, 1993, pp. 372-379. Caruana, Rich, "Multitask Learning: A Knowledge-Based Source of Inductive Bias." Proceedings of the 10th International Conference on Machine Learning, 1993, pp. 41-48. Caruana, Rich, Schaffer, J. David, "Using Multiple Representations to Control Inductive Bias: Gray and Binary Codes for Genetic Algorithms." Proceedings of the Sixth International Workshop on Machine Learning (ML 1989), Morgan Kaufmann, 1989, pp. 375-378. Schaffer, J. David, Caruana, Rich, Eshelman, Larry J., "Designing Neural Nets that Generalize Optimally with Genetic Algorithms." Los Alamos Conference on Emergent Computation, May 1989. Eshelman, Larry J., Caruana, Rich, Schaffer, J. David, "Biases in the Crossover Landscape." The 1989 International Conference on Genetic Algorithms, June 1989. Schaffer, J. David, Caruana, Rich, Eshelman, Larry J., Das, Raj, "A Study of Control Parameters Affecting Online Performance of Genetic Algorithms for Function Optimization." The 1989 International Conference on Genetic Algorithms, June 1989. Caruana, Rich, Eshelman, Larry J., Schaffer, J. David, "Representation and Hidden Bias II: Eliminating Defining Length Bias in Genetic Search with Shuffle Crossover." International Joint Conference on Artificial Intelligence (IJCAI), August 1989. Caruana, Rich, Schaffer, J. David, "Representation and Hidden Bias: Gray vs. Binary Coding for Genetic Algorithms." Fifth International Conference on Machine Learning, June 1988. Peer-Reviewed Workshops Caruana, Rich, Hodor, Paul, "A High-Precision Workbench for Extracting Information from the Protein Data Bank (PDB)." Knowledge and Data Discovery (KDD) Workshop on Text and Information Extraction, Boston, Massachusetts, 2000. Caruana, Rich, Mullin, Matt, "Estimating the Number of Local Minima in Complex Search Spaces." International Joint Conference on Artificial Intelligence Workshop on Optimization, (IJCAI), Stockholm, Sweden, 1999. Caruana, Rich, "15 Useful Tricks with Extra Outputs." Neural and Information Processing Systems (NIPS) Workshop on Tricks of the Trade, 1996. Caruana, Rich, Freitag, Dayne, "How Useful Is Relevance?" AAAI Fall Symposium on Relevance, New Orleans, Louisiana, 1994. Caruana, Rich, "Generalization vs. Network Size." Neural and Information Processing Systems (NIPS) Workshop on Generalization, 1993. Chrisman, Lonnie, Caruana, Rich, Carriker, Wayne, "Intelligent Agent Design Issues: Internal Agent State and Incomplete Perception." AAAI Fall Symposium on Sensory Aspects of Robotic Intelligence, Asilomar, California, 1991. Caruana, Rich, "The Automatic Training of Rule Bases that Use Numerical Uncertainty Representations." AAAI-87 Workshop on Uncertainty in Artificial Intelligence, Seattle, Washington, 1987. Technical Reports David, Cohn, Rich Caruana, and McCallum Andrew, "Semi-Supervised Clustering with User Feedback." Cornell University Technical Report, TR2003-1892, 2003. Caruana, Rich, Artigas, Pedro, Goldenberg, Anna, and Likhodedov, Anton, "Meta Clustering." Cornell University Technical Report, TR2002-1884, 2002. Caruana, Rich, "Multitask Learning." Ph.D. Dissertation, School of Computer Science, Carnegie Mellon University, CMU-CS-97-203, 1997. Buntine, Wray, Caruana, Rich, "Introduction to IND and Recursive Partitioning." NASA Ames Research Center, TR# FIA-91-28, 1991. Caruana, Rich, "BANDIT: A Fast Algorithm for the Resolution of Spectra." Master's Thesis, Departments of Computer Science and Chemistry, Villanova University, 1988. Caruana, Rich, "Estimating the Number of Minima in Complex Search Spaces." Philips Labs, TR-88-159, 1988. Caruana, Rich, Schaffer, J. David, "Optimizing Digital Filters with Simulated Annealing and Genetic Algorithms." Philips Labs, TR-88-123, 1988. Caruana, Rich, Schaffer, J. David, "An Investigation of Parameter Sets for Genetic Algorithms." Philips Labs, TR-88-045, 1988. Caruana, Rich, Coffey, Brian J., "Searching for Optimal FIR Multiplierless Digital Filters with Simulated Annealing." Philips Labs, TR-88-031, 1988. Pelavin, Rich N., Coffey, Brian J., Caruana, Rich, "Research in Diagnosis Using Design Knowledge." Philips Labs, TR-88-001, 1988. Caruana, Rich, Schaffer, J. David, "Gray vs. Binary Coding for Genetic Algorithm Function Optimizers." Philips Labs, TR-87-080, 1987. Benjamin, D. Paul, Caruana, Rich, "Partial-Matching as Search." Philips Labs, TR-87-019, 1987. Caruana, Rich, "Experiments in Rule-Based Learning in Systems Using Numerical Uncertainty Representations." GTE Western Division Technical Report, 1986. Patents and Invention Disclosures Sukthankar, Rahul, Caruana, Rich, Hasegawa, Keiko, Mullin, Matt, "Using Active Monitor Illumination for 3-D Active Imaging." Patent disclosure filed April 2000. Caruana, Rich, "Iterated K-Nearest Neighbor Method and Article of Manufacture for Filling in Missing Values." United States Patent 6,047,287. Assignee: Justsystem Pittsburgh Research Center, Pittsburgh, Pennsylvania. Filed May 5, 1998, granted April 4, 2000. Work in Progress or Recently Submitted "Selecting Ensembles from Libraries of Models," with Alex Niculescu, Geoff Crew, and Alex Ksikes. "An Automatic Method for Finding Interpretations for Dimensions in Multidimensional Scaling," with Marc Fasnacht. "Temporal Clustering of Microarray Protein Activities," with Paul Hodor. "Finding High-Quality Protein Structures in the Protein Data Bank(PDB)," with Paul Hodor, Bruce Buchanan, and John Rosenberg. "Using Machine Learning to Discover Sequence Patterns for Protein Folding Motifs," with Paul Hodor, Bruce Buchanan, and John Rosenberg. "Backprop Nets Do Unsupervised Clustering of Outputs." "Improving Kernel Regression with Multitask Learning." "Soft Ranks: Making Rank Metrics Differentiable for Easier Optimization." "A Machine Learning Approach to Data Cleansing," with Adam Kalai. "Combining Multiple Databases in Order to Improve Prediction," with P. Spirtes, V. Abraham, J. Aronis, B. Buchanan, G. Cooper, M. Fine, G. Livingston, and S. Monti. "Predicting Dire Outcomes of Patients with Community Acquired Pneumonia (CAP) Using All Available and Relevant Data," with G. Cooper, C. Aliferis, J. Aronis, B. Buchanan, M. Fine, J. Janosky, T. Mitchell, and P. Spirtes. "Predicting Etiology of Patients with Community Acquired Pneumonia (CAP)," with G. Cooper, C. Aliferis, J. Aronis, B. Buchanan, M. Fine, J. Janosky, T. Mitchell, and P. Spirtes. Teaching (at Cornell) COM-578: Empirical Methods in Machine Learning and Data Mining, Fall '01, Fall '02, Fall '03. COM-678: Special Topics in Machine Learning, Spring '02 (with Thorsten Joachims). COM-778: Advanced Topics in Machine Learning, Spring '03.