1 Curriculum Vitae --- Xianfeng Jeff Chen, Ph.D. Executive Summary: (1) I am a bioinformatics and computational/systems biologist in human, plant, microbe, and with research interests and over 10 years’ professional experience in bioinformatics, computational biology, genomics, gene expression profiling, and proteomics with emphasis on genome annotation, network analysis, functional and comparative genomics, biological database and data integration, algorithm development, software engineering, high throughput distributed parallel cluster computing, automation of data processing, biological data mining and knowledge discovery. (2) I had worked at both biotech and pharmaceutical companies for about 8-9 years as a bioinformatician before returning my academic career for about 4 years. I have had over 10 year genomics experiences on genome-scale genomic, gene-space, EST sequencing, and micro-array transcription profiling. I had assembled genome-scale human, mouse, over 5 plant genomes, and about 170 microbiology genomes. I also have had over 6 years proteomics (Mass Spectrometry, Yeast-Two-Hybrid, and Protein Array) professional experiences on human and category A bio-defense pathogens; and assembled genome-scale human protein-protein interaction networks for colon cancer drug-able target proteins mining, target validation, assay development, interactive chemical small molecular screening, and drug discovery. Citizenship: The United States of America. Address: 1726 Webland Park, Charlottesville, VA 22901. Email: xianfengchen05@gmail.com Phone : 434-974-7099. Education: Ph.D. Major: Genetics Iowa State University. U.S.A. 1996 B.S./M.Sc. equivalent, Major: Computer Science Iowa State University. GPA: 3.98/4.00, completed 18 computer courses including all undergraduate plus 4 graduate computer courses. Major training had been focused on data warehouse and software engineering. 1998 Honors and Awards: Honored as top 2% of Iowa State computer science student C. R. Weber Award for Excellence of Graduate Studies Iowa State University. 1998 1996 Areas of Expertise: (1) (2) (3) (4) (5) (6) (7) Bioinformatics and computational proteomics, network analysis, data mining and knowledge discovery. Biological database/data warehouse design, implementation, and management. Algorithms in computational biology, biological sequence analysis and processing. Bioinformatics on second generation sequencing technology and disease genetics association study. Programming language, software engineering, and high throughput distributed parallel cluster computing. Chemical compound management, library similarity and diversity analysis, and compound clustering. Genetics and biochemistry, comparative and functional genomics, and systems biology. Skills in Computational Biology: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Dynamic programming, greedy algorithm, and divide-and-conquer strategy. Needleman-Wunsch algorithm. Smith-Waterman algorithm. Pair-wise sequence alignment. Multiple/progressive pair-wise alignment. Memory-based reasoning. Neural network and belief network classifiers. Decision tree classifier. Consensus and regular expression pattern match. Position weight matrix for motif detection. 1 2 (11) (12) (13) (14) (15) (16) (17) (18) Profile or template. Byesian network. Hidden Markov Model. Phylogenetics tree classification. Non-homologous based annotation. Expert at computer farms such as Loading and Sharing Facility, Portal Batch System, and DeCypher system. Expert at IDBS’s ACTIVITYBASE chemo-informatics software for interactive small molecular screening Expert at proteomics software systems such as Genologics Proteus, ISB TPP, Thermo BioWorks and SIEVE for protein identification and quantitative profiling, Scaffold of Proteomics Software, Scripps proteomics software systems etc.. Experience in Bioinformatics and Computational Biology: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) Automation of genomics sequence high throughput analysis and processing using Perl, shell script, C++, Java, CGI, GUNmaker etc. various information technologies. Design and implementation of biological databases using PostgreSQL, MySQL, Oracle 10i, Sybase 11, and Illustra DBMS. Experience at data modeling using ERwin, and database programming using Proc*C++, PL/SQL, JDBC, DBI/DBD, Oracle scripts etc in Oracle DBMS. Expert knowledge at biological databases, especially, Transfac, TRRD, TFD, COMPEL, EPD, UTR, PLACE, PlantCARE, BIND, DIP etc. Developed a graphic linkage analysis software using Visual C++ as a graduate student and postdoctoral. Hands-on experience operating software of gene prediction, EST clustering and assembly, genome annotation, promoter prediction and annotation such as Genscan, MZEF, Genefinder, Grail, NetPlantGene, Pangea CAT, CAP, Phrap, PromFinder, Promoter Scan, Signal Scan, MatInspector, Pattern search, SplicePredictor, GeneMark.hmm, Aragen etc. Experience in incorporating proteomics software such as X!tandem, OMSSA, Sequest, Mascot, TPP, Scaffold, Qscore, ProteinProphet, PeptideProhet, AMASS, Rscore, Bioworks, and SIEVE etc. and genomics software such as Blast, Fasta, repeatMasker, cross_match, EMBOSS, HMMER etc. genome analysis tools into biological data warehouse. Experience in building databases of promoter sequences, transcription factors and their binding sites and development of promoter prediction software using regulatory sequence databases with machine learning and pattern recognition algorithm. Experience in building company Intranet using HTML, JavaScript, and CGI. Experience in comparative genomics among arabidopsis, maize, soybean, tobacco, cowpea, and rice. Design and implementation of metabolic & developmental database of plant. Experience in inference on protein-protein interaction network from data of yeast two-hybrid System, mass spectrometry based pull down assay, and protein quantification and profiling. Major Academic Bioinformatics Databases Developed: (1) (2) (3) (4) (5) http://www.proteomicsresource.org, National Biodefense Proteomics Data Center. http://geossdev.med.virginia.edu/~xc3m/, Microarray Coexpression Explorer for Cancer Chemotherapy. http://cowpeagenomics.med.virginia.edu/, Cowpea Genomics Knowledge Base. http://compsysbio.achs.virginia.edu/tobfac/, Tobacco Transcription Factor Database. http://xi00.achs.virginia.edu/~xc3m/, UVa Systems Biology Knowledge Warehouse. Professional Experience: Bioinformatics Consultant IFXworks, LLC. (http://www.ifxworks.com), Dulles, VA. 2007-2009 Director of Informatics/Adjunct Professor of Bioinformatics Division of Systems Biology, Zhejiang-California International Nanoscience Institute (ZCNI), Zhejiang University, Hangzhou, P.R. of China 2006 -2009 Description of organizations and positions: (1) IFXworks LLC is a life science informatics consulting company headquartered in Washington DC area. I am one of the founding members and conducting bioinformatics consulting service in health IT, next generation sequencing technologies, bioenergy genomics, networks and systems biology areas. (2) ZCNI is a joint venture of the Institute of Systems Biology, California Nanoscience Institute at UCLA, and Zhejiang University. Zhejiang University is located in Hangzhou (my Chinese home town) and has been one of the distinguished top 3 schools in China. My appointment has been a courtesy appointment as adjunct faculty to provide 2 3 consultation and guidance in the establishment of computational cyber-infrastructure for systems biology research in the institute. Duty/projects: (1) Basic infrastructure building for IFXworks on next generation sequencing data management, genomics sequence assembly and analysis, human disease and trait genotype to phenotype association study, data analysis and processing for plant and microbial bio-energy genomics. (2) Building informatics prototypes for genomics, transcriptomics, proteomics data analysis, high throughput processing and management. (3) Developing grant proposal and contact applications to CaBig, health IT technology, and next generation sequence analysis to personalized medicine and data management. Computational and Systems Biologist Virginia Bioinformatics Institute (VBI) and University of Virginia (UVa), VA 2005-2009 Description of organizations and positions: (1) I was a research investigator at VBI, which is a systems biology organization with strong presence in the field of bioinformatics performing tasks related to networks biology, genome annotation, transcription profiling, proteomics, and metabolic profiling data management and analysis. (2) I had also worked at the Center for Academic Computing Health Sciences as well as the W.M. Keck Foundation Center for Biomedical Mass Spectrometry (UVa research support facility) for bioinformatics collaborative research to faculty working in systems biology. (3) I was jointly appointed as contract-based research assistant professor affiliated with Department of Microbiology and is affiliated with Dept. of Biology as research scientist as well. The position has been dependent on funded grants and cost recovery service fee from collaborating faculty. Duty/projects: At VBI, I was the project manager for the Administrative Center funded through NIH/NIAID National Bio-defense Proteomics Program that has 7 Proteomics Research Centers across the nation including Scripps, Harvard Proteomics Institute, University of Michigan, PNNL, Myriad Genetics etc. to perform : (1) design and implementation of bioinformatics cyber-infrastructure for genomics, microarray, and proteomics data processing and management system; (2) data integration of various of public and private proteomics and protein-protein interaction network knowledge datasets; (3) data analysis and network inference of proteomics data such as 2D gel, mass spectrometry, Y2H, NMR etc. datasets. At UVa, I was the research faculty collaborating with medical researchers and plant scientists to perform : (1) proteomics profiling study, data management and analysis, search engine comparison, algorithm development, high throughput distributed computing for collaborative research with UVa Health System proteomics scientists; (2) cowpea, common bean, striga, and tobacco genome-scale genespace sequencing, assembly and annotation, data integration and management; (3) cowpea and tobacco microarray chip design and transcriptional profiling study, data analysis and management; (4) comparative genome analysis among populus, medicago, arabidopsis, rice, tobacco, cowpea for species specific gene and pathway discovery; (5) genome-scale bioinformatics analysis of transcription factors in legume species and construction of transcription factor knowledgebase. Project Manager of Computational Proteomics/Senior Scientist of Bioinformatics and Chemo-informatics Myriad Proteomics, Inc. / Prolexys Pharmaceuticals, Inc. Salt Lake City, UT 2002 - 2005 Description of the organization: Prolexys Pharmaceuticals, formerly Myriad Proteomics, is a human proteomics and drug discovery company. The company is the lead in mapping of genome-scale of human protein-protein interaction network and was a subsidiary of Myriad Genetics, Inc. Duty/projects: (1) Construction of automated sequence processing pipeline including raw trace assembly, vector/adaptor clipping, contamination screen, annotation of raw read, clustering of ESTs from interaction libraries, domain/motif identification, and mapping of the assembled contigs to human genome, Refseq, and LocusLink; (2) design and implementation of analysis pipeline, databases and visualization tools for raw sequence data, literature data, curation data, and in-house human Yeast-Two-Hybrid (Y2H) and mass spectrometry protein-protein interaction and quantification data; (3) inference on protein-protein interaction network on data from Y2H system, protein fingerprint, mass spectrometry pull-down assay, and public knowledgebase such as BIND, DIP, YPD etc. (4) data management for assay data on high throughput small molecular screen for drug discovery on validated targets with IDBS’s ActivityBase for protocol, template, testset construction. I am a domain expert on ActivityBase ---a major database management system for drug discovery and HTS. (5) chemical compound management for HTS, chemical compound library diversity and similarity analysis, compound/library selection, compound clustering and partition. Project Manager of Computational Genomics /Senior Scientist of Computational Biology Ceres Inc. in Los Angeles, CA and Monsanto Global Headquarter in St. Louis, Missouri 1997 - 2002 Description of the organizations: Ceres, Monsanto partially owned subdivision with over 250 employee, is an agricultural functional genomics company. Ceres has one of the best agricultural genomics programs in the world. Monsanto is a multinational giant agricultural company with business in agricultural productivity, seeds, and genomics. 3 4 Duty/project: (1) Construction of metabolism pathway and protein-protein interaction database for yeast, crop, and microbial sequence data; (2) transcriptional profiling and genomics data mining and knowledge discovery for gene lead identification to support agricultural important trait development; (3) software engineering of promoter prediction and annotation as well as construction of regulatory sequence database; (4) maintenance and enhancement of genomics sequence process pipeline and annotation with raw trace assembly, gene modeling, promoter prediction, comparative genomics, and protein family classification; (5) maintenance and enhancement of genome high throughput analysis for full length cDNA sequences, EST clustering and assembly pipeline, for top 10 business critical species – human, mouse, dog, pig, soybean, maize, rice, wheat, cotton, arabidopsis; (6) comparative genomics among human and mouse as well as Arabidopsis, rice, maize, wheat etc. crop genomes; (7) processing over 170 microbial genomes for gene prediction and inter-species clustering for comparative genomics through Incyte Pathoseq bio-analysis dataflow, and development of non-homologous based annotation methodologies such as phylogenetic profiling, co-expression profiling, Rosetta stone pattern, and gene neighboring etc.; (8) a member of Genomics Source Team for strategic planning for Monsanto agricultural genomics and crop biotechnology; also a member of company wide Research Hardware Redesign Team for Computer Farm and Networked File Server Optimization; (9) I had been a leading computational scientist over the years and had managed a group of 3-20 people in the project teams at Monsanto including Ph.D. level scientists and senior software engineers for curation team, microbial genomics team, and sequence processing pipeline team etc.. The sizes of my group change due to the complexity and priority of the projects and business operations. Selected Latest Publications and Patents (Out of over 30 publications and over 20 patents): 1. Michael P. Timko, Paul J. Rushton, Thomas W. Laudeman, Marta T. Bokoviec, Edmond Chipumuro, Foo Cheung, Christopher D. Town, and Xianfeng Chen, 2008. Sequencing and Analysis of the Gene-Rich Space of Cowpea. BMC Genomics. 2008, 9:103. 2. Paul J. Rushton, Marta T. Bokowiec, Thomas W. Laudeman, Jennifer F. Brannock, Xianfeng Chen, and Michael P. Timko, 2008. TOBFAC: The Database of Tobacco Transcription Factors. BMC Bioinformatics. 2008, 9:53 3. Paul J. Rushton, Marta T. Bokowiec, Shengcheng Han, Hongbo Zhang , Jennifer F. Brannock, Thomas W. Laudeman, Xianfeng Chen, and Michael P. Timko, 2008. Bioinformatics Analysis on Tobacco Transcription Factors: Novel Insights into Transcriptional Regulation in the Solanaceae. Plant Physiology. 2008, 147:280-295. 4. Xianfeng Chen et al. (co-authored with about 10 co-inventors), 2008. Expression of microbial proteins in plants for production of plants with improved properties. United States Patent in Bioinformatics and Biotechnology. United States Patent Publication Number: 10369493 (US 2003/0233675 A1). United States Filing Date: 20.02.2003. United States Publication Date and Number: 1.12.2003 (20030233675). Granted United States Patent Number 7,314,974. Officially Granted Date: 02.05.2008. 5. Xianfeng Chen et al (co-authored with about 20 inventors), 2008. Transgenic Plants with Enhanced Agronomic Traits. International Patent in Bioinformatics and Biotechnology. Publication Number.: WO/2008/021543. International Application Number.: PCT/US2007/018368. Publication Date:21.02.2008. International Filing Date:17.08.2007. 6.Xianfeng Chen, Thomas W. Laudeman, Paul J. Rushton, Thomas A. Spraggins, and Michael P. Timko, 2007. CGKB: An Annotation Knowledge Base for Cowpea (Vigna unguiculata L.) Methylation Filtered Genomic Genespace Sequences. BMC Bioinformatics. 2007, 8:129. 7. Guoqing Lu, Liying Jiang, Resa M. Kotalik, Thaine W. Rowley, Luwen Zhang, Xianfeng Chen, and Etsuko N. Moriyama, 2006. GenomeBlast: A Web Tool for Small Genome Comparison. BMC Bioinformatics. 2006, 7(Suppl 4):S18. 4