BIOGRAPHICAL SKETCH Provide the following information for the key personnel and other significant contributors. Follow this format for each person. DO NOT EXCEED FOUR PAGES. NAME POSITION TITLE Zhu, Ji Associate Professor of Statistics eRA COMMONS USER NAME jizhu1 EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, and include postdoctoral training.) INSTITUTION AND LOCATION Peking University, Beijing, China Stanford University, Stanford, CA DEGREE (if applicable) YEAR(s) B.Sc. Ph.D. 1996 2003 FIELD OF STUDY Physics Statistics A. Personal Statement I have worked with Dr. Waljee and Dr. Hayward on several IBD clinical research projects since 2006, resulting in 3 publications. I have extensive experience in developing and implementing machine learning algorithms. I received my B.Sc. in Physics from Peking University in China, and my Ph.D. in Statistics from Stanford University in 2003. I am now an Associate Professor in the Department of Statistics at the University of Michigan. I am recognized as a leading researcher in the areas of statistical machine learning and high-dimensional data analysis. I received a CAREER award from the National Science of Foundation (USA) in 2008, and I was elected as the Chair (2011-2012) of the Statistical Learning and Data Mining Section for the American Statistical Association. I bring to the project an established research record in statistics and machine learning. I have published 60 research papers (50 journal articles, 5 refereed conference articles and 5 discussion articles). I have devoted my research to developing theory and methodologies in the fields of classification, clustering, kernel methods, variable selection, high-dimensional data analysis and statistical network analysis. I will directly help Dr. Waljee and Dr. Hayward in the work on developing and implementing machine learning algorithms for IBD studies. I am excited about and committed to the implementation of this work in clinical care. B. Positions and Honors Professional Positions: 2003 - 2008 Assistant Professor, Department of Statistics, University of Michigan, Ann Arbor, MI 2006 Faculty Member, Center for Computational Medicine and Bioinformatics, University of Michigan 2008 Associate Professor, Department of Statistics, University of Michigan, Ann Arbor, MI 2010 Associate Professor (Courtesy), Department of EECS, University of Michigan, Ann Arbor, MI Honors and Awards: 1998-2001 Kimball Graduate Fellowship, Stanford University 2002 Student Paper Competition Award, Computing Section, American Statistical Association 2008-2013 CAREER Award, National Science Foundation 2010 Elected Member of the International Statistical Institute 2011 Chair, Statistical Learning and Data Mining Section, American Statistical Association C. Selected peer-reviewed publications (Selected from 55 peer-reviewed publications) 1. Zhu, J. and Hastie, T. (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3):427-444. 2. Roe, B., Yang, H., Zhu, J., Liu, Y., Stancu, I. and McGregor, G. (2005) Boosted decision trees as an alternative to artificial neural networks for particle identification. Nuclear Instruments and Methods for Physics Research, Section A 543(2-3):577-584. (This paper consists of applications of tree based machine learning algorithms in nuclear physics.) 3. Yang, H., Roe, B. and Zhu, J. (2005) Studies of boosted decision trees for MiniBooNE particle identification. Nuclear Instruments and Methods for Physics Research, Section A 555(1-2):370-385. (This paper consists of applications of tree based machine learning algorithms in nuclear physics.) 4. Ulintz, P., Zhu, J., Qin, Z. and Andrews, P. (2006) Improved classification of mass spectrometry database search results using newer machine learning approaches. Molecular and Cellular Proteomics 5(3):497-509. (This paper consists of applications of tree based machine learning algorithms in biochemistry.) 5. Li, Y. and Zhu, J. (2007) Analysis of array CGH data for cancer studies using the fused quantile regression. Bioinformatics 23(18):2470-2476. 6. Wang, S. and Zhu, J. (2007) Improved centroids estimation for the nearest shrunken centroid classifier. Bioinformatics 23(8):972-979. (One of four winning papers in the 2007 ASA Student Paper Competition sponsored by the Statistical Computing Section) 7. Yang, H., Roe, B. and Zhu, J. (2007) Studies of stability and robustness for artificial neural networks and boosted decision trees. Nuclear Instruments and Methods for Physics Research, Section A 574(2):342-349. (This paper consists of applications of tree based machine learning algorithms in nuclear physics.) 8. Wang, S. and Zhu, J. (2008) Variable selection for model-based high-dimensional clustering and its application to microarray data. Biometrics 64(2):440-448. 9. Zou, H., Zhu, J. and Hastie, T. (2008) New multi-category boosting algorithms based on multi-category Fisherconsistent losses. Annals of Applied Statistics 2(4):1290-1306. (This paper develops a novel classification framework based on boosting classification trees.) 10. Peng, J., Wang, P., Zhou, N. and Zhu, J. (2009) Partial correlation estimation using joint sparse regression models. Journal of the American Statistical Association 104(486):735-746. 11. Wang, S., Nan, B., Zhou, N. and Zhu, J. (2009) Hierarchically penalized Cox regression for censored data with grouped variables. Biometrika 96(2):307-322. (Winner of the 2008 ICSA J.P. Hsu Memorial Award) 12. Zhu, J., Zou, H., Rosset, S. and Hastie, T. (2009) Multi-class adaboost. Statistics and Its Interface 2(3):349360. (Special issue on statistics and machine learning) (This paper develops a novel multi-class classification method based on boosting classification trees.) 13. Choi, N., Li, W. and Zhu, J. (2010) Variable selection with the strong heredity constraint and its oracle property. Journal of the American Statistical Association 105(489):354-364. (One of the winning papers in the 2007 ENAR Student Paper Competition) 14. Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D., Pollack, J. and Wang, P. (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Annals of Applied Statistics 4(1):53-77. 15. Waljee, A., Joyce, J., Wang, S., Saxena, A., Hart, M., Zhu, J. and Higgins, P. (2010) Algorithms outperform metabolite tests in predicting response of patients with inflammatory bowel disease to thiopurines. Clinical Gastroenterology and Hepatology 8:143-150. (This paper consists of applications of tree based machine learning algorithms in thiopurine studies.) D. Research Support Ongoing NSF DMS-0748389 Zhu (PI) 7/1/2008 - 6/30/2013 CAREER: Statistical Learning from Data with Graph/Network Structure The research aims to develop new statistical methodologies and associated theory that incorporate the network/graph structure in the data. Role: PI NIH R01-AG-036802 Nan (PI) 9/1/2010 - 8/31/2014 High-dimensional Data Issues in Aging Research The research aims to address several emerging issues in high-dimensional data analysis and close certain gaps between statistical theory and biomedical applications, with a focus on aging related diseases. Role: Co-PI NIH R01-GM-096194 Zhu (PI) 9/1/2010 – 8/31/2014 Sparse Structure Identification from High-dimensional Epigenomic Data The research aims to develop novel statistical methods for sparse structure estimation from histone modification data, identify various histone modification patterns and link them with functional elements of the genome. Role: PI Completed NSF DMS-0705532 James (PI) 7/1/2007 - 6/30/2010 Generalized Variable Selection with Applications to Functional Data Analysis and Other Problems The major goal of this project is to study four important applications of generalized variable selection in areas as diverse as functional regression, principal component analysis (both standard and functional), multivariate nonparametric regression, and transcription regulation network problems for microarray experiments. Role: Co-PI NSF DMS-0505432 Zhu (PI) 7/1/2005 - 6/30/2008 Flexible Classification and Regression The research aims to combine statistical and computational considerations in designing new and useful predictive modeling tools and algorithms. Role: PI