Kemin Zhou Ph.D. 14610 Via Bergamo, San Diego, CA 92127 Phone: (858) 366-8260 kmzhou4@yahoo.com Summary Bioinformatics scientist with background in computer science, genomics, biochemistry, information technology, molecular biology, genetics, and drug target discovery. Expert in eukaryotic genome annotation and comparative genomics. Developed COMBEST algorithm to assembly billions of next-generation EST into alternatively spliced gene models with digital expression levels. Recently written two manuscripts on inton evolution using comparative genomics approach. Ability to communicate and organize bioinformatics efforts to people from diverse backgrounds. Skills Bioinformatics o Drug Target Identification for Pharmaceutical Companies o Database schema design for biological and genomic data o Building bioinformatics infrastructure based on OpenSource products o Expert experience in using public databases o Genome annotation and molecular phylogeny o Custom Affymetrix DNA-Chip (Microarray) Design and analysis o Knowledge extraction (Web Crawler) from public database using perl o Familiar with Chemoinformatics tool Pipline Polit o Pipeline for genome annotation and data analysis Computer Skills o Expert knowledge in relational databases: MySQL, Postgres, and Oracle o Expert Knowledge in web development technologies: PHP, Mod_perl, JavaScript, CGI o Fluent in many computer languages: C++, Perl, Java, Shell Script, SQL, PL/SQL, AQL, HTML, DHTML, XML, SVG, JavaScript, Python, and Rubby o Expert in LINUX, Unix operating systems, and SGE o Network Programming TCP/IP in C/C++, Java, and Perl. Statistics Skills o Basic training and knowledge in probability and statistics o Fluent with R in data analysis o Application of basic statistics in bioinformatics algorithm developments Bioinformatics Endeavors Probability model and simulation of intron loss Exon alignment algorithm to compare exon structure at whole genome level Proteomics: peptide analysis on genome scale NREP algorithm: Combine gene models and build non-redundant set by removing low quality models such as ‘pregnant’, chimera, fragments, etc. Algorithm to detection of fusion proteins Written a paper on Ancient nature of alternative splicing and function of introns Written a paper on Intron Number Evolution GeneWise Pipeline: More accurate and efficient pipeline to derive proteinhomology models based on PFOG CAIWE algorithm: combine predicted and EST gene model PFOG algorithm: protein foot prints on genome COMBEST algorithm: Assembly billions of next-generation EST into alternatively spliced gene models with digital expression level Invented Hornagram to analyze digital gene expression Build a Vertebrate Ortholog/Paralog Database for regulatory element discovery Implemented a versatile sequence alignment that works on databases Implemented a single-linkage clustering algorithm using a novel HATREE data structure Build a G-Protein-Coupled Receptor (GPCR) Databases From Public Resources Wrote a Parser in C++ that Converts GenBank Format into ACEDB Wrote a Java Driver to ACEDB Wrote a C++ Driver to ACEDB Develop libraries for bioinformatics efforts Professional and Research Experience Syngenta Jan 2012-Dec 2012 Bioinformatics Programmer (Consultant through Adecco) Develop and maintain Java ETL for genetic and genomic databases. Develop portals for bioinformatics tools. JGI May 2005 – Dec 2011 Bioinformatics Systems Analyst Trouble-shoot, maintain, and improve JGI genome annotation pipeline and the underlying mySQL-based database schema. Contribute new algorithms and data analysis procedures. Develop and implement NREP algorithm for consolidating gene models from diverse methods Comparative genomic analysis of intron evolution Contribute genome data analysis for various publications Develop EST-based gene modeling method based on Illumina short reads Design and implement automated procedures to do quality control Convert JGI data into external release formats: GenBank, GFF3, and Flat files Implement homology-based gene modeling module based on a new footprinting algorithm Develop and implement chimera detection algorithm for both improving the quality of gene predictions and identifying fusion proteins Implement procedures to map cosmids, fosmids, protein to genomes and display in the result in the JGI genome-browser Identify problems and bottle necks in existing pipelines, and implementing procedures to improve the quality and efficiency of the pipelines Develop pipelines based on parallel computing on SGE (Sun Grid Engine) Develop pipeline to secreted and membrane proteins and display in genomebrowser Develop procedures for mapping proteomics data onto genomes and display the result in the genome browser Fgenesh training both automatically and manually for ab initio gene prediction Orpara Nov 2004 - Dec 2005 Bioinformatics Consultant Provide Bioinformatics Services to Small and Medium-Sized Biotech Companies Provide informational services to small biotech companies Web Text mining to generate targeted email list for particular biotech products Web graphics development using SVG Integrate Dynamic Sequence Alignment algorithms with Postgres DB using C/C++ Develop statistical learning algorithms for identifying true orthologues based on evolutionary relationships among vertebrates Ferring Research Institute May 2002 - Nov 2004 Bioinformatics Scientist In charge of all bioinformatics needs for Molecular Biology, in vitro and in vivo Pharmacology. Setting up the bioinformatics hardware and software infrastructure based on OpenSource packages such as Linux, Postgres Database, ACEDB, Perl, Java, Apache, C++ and Personal Libraries, and Freely available Software Tools such as Blast, Fasta, MEME, Dialign, and Phylip. Established relational and object database representing knowledge about all completed vertebrate genome. Create and Maintain G-protein-coupled Receptor (GPCR) Database. Narrowed Down Prostate Cancer Drug Target That are GPCR. Analyze Gene Expression Arrays using GeneSpring and Relational Database. Established a Web Interface for Custom and Commercial Chips, GPCR, Blast Search to Internal Databases, Pair-wise and Multiple Sequence Alignments Servers. Evaluated GPCR Coverage of Affymetrix Commercial Chip and Justified the Decision to Design a Custom Chip with the Most Comprehensive Coverage of GPCR, G-Proteins, and other Potential Drug Targets Based on the Most Current Version of the Human Genome. Computer-related Tasks Hardware selection for Bioinformatics Needs Configure the RAID to boost database performance Compile and Build the Linux OS to optimize performance of the hardware Compile gcc, gtk, apache, perl, and many other system libraries Compile, Build, configure, and install database tools: Postgres and ACEDB Compile, build and configure numerous open source bioinformatics tools Develop novel algorithms for quick construction of in-house GPCRDB Managing biological and sequencing data with SQL and PL/SQL Web application development with Apache, mod_perl, and JavaScript Web development with LogiXML and Oracle Affymetrix (formerly Neomorphic), Bioinformatics Division June 2000 April 2002 Staff Scientist II Gene identification, characterization and sequence selection for DNA-chip design. Construct a database of vertebrate orthologues and paralogues. Transcriptional regulatory site discovery by comparative genomics. Created a potential new product line by inventing a new method for genome wide nucleic acids foot print (detail if the patent) Invented and implemented a novel Cluster Algorithm: an essential tool for common bioinformatics tasks on large data sets, such as finding gene families, clustering EST, alternative splicing, protein domain analysis, etc Constructed a novel database of orthologues/paralogues for vertebrates: valuable starting material for data mining, such as finding regulatory elements and comparative genomics Provided key ideas behind the human U133 sequence selection which resulted DNA-Microarrays that are much better than the previous HG95 design Saved the company money by using Postgres in place of expensive commercial relational databases such as Oracle for research and development Written a Java Driver to the ACEDB object database involving a novel object representation of bio-objects (Equivalent to JDBC to relational databases) Facilitated top management by surveying genomes from sequenced organisms in a short time Implemented a distance-based phylogenetic tree construction software in C++ and a graphics display software in Java Gene Modeling with Hidden Markov Model (HMM) Computer skills Relational Database Postgres: compile, configure, DBA and schema design to handle large amounts of sequence and annotation data Java 2D graphics (swing), JDBC for database connection, and threaded programming C++ for algorithm development Perl for various projects Use Oracle to perform quality control of gene-chip design The Molecular Sciences Institute Oct 1996 - May 2000 Research Fellow under Sydney Brenner Ph.D. Founder and President and Nobel Prize Winner of Year 2002 Comparative genomics of vertebrates Comparative genomics approach to study cancer, write NIH grant based on promising preliminary results Pioneered the construction and Annotation of the Fugu Sequence Database Characterized the genomic and gene features of Fugu with 1% of genomic sequence data Demonstrated that comparative genomics between Fugu and other vertebrates is an efficient way to locate regulatory elements Discovered the subtlety of mRNA-splicing in vertebrates by expressing Fugu genes in mammalian cells. Computer-Related Tasks Use ACEDB to manage Fugu genomic and annotation data Write C++/Perl programs to obtain and integrate public bioinformatics resources Write C++ Programs for Macintosh to automate sample handling Write C++ Programs to design oligonucleotides and compute their concentration Manage lab reagents with ACEDB Build and configure the Linux OS on Windows platforms Operate and program robots for automated plasmid extraction The Scripps Research Institute, Department of Immunology and Cell Biology Jan 1995 - Oct 1996 Research Associate Signal transduction by PAK protein kinase, PI3-kinases, and the Rho family of small GTPases Discovered the principle that cell type diversity correlates with the number of guanine nucleotide exchange factors (GEF) in eukaryotes Discovered that the downstream specificity of small GTPase signaling pathway is dictated by the upstream GEF Revealed the in vivo function of PI3-kinase in modulating the actin cytoskeleton by double knockout of PI3-kinase genes. University of California San Diego, Department of Biology Aug 1991 - Jan 1995 Postdoctoral Fellow Role of heterotrimeric GTP-binding proteins and phosphatidylinositol kinases, and protein kinases in regulating the development of Dictyostelium discoideum. Discovered the G-alpha-8 gene Revealed principle of redundancy of G-protein signal transduction pathways Discovered the PI-kinase family in eukaryotes. Education Ph. D. Biochemistry & Molecular Biology, Department of Biochemistry Purdue University, West Lafayette, Indiana. 1985-1991 Elucidated the molecular switching mechanism of a yeast transcriptional regulator Leu3p Certificate in English Proficiency, GELC Zhongshan University, Guangzhou, China 1984-1985 B.S. Genetics & Breeding, China Agricultural University, Beijing, China.19801984 Honors & Awards 1995-1996 NIH Training Grant T32, HL07195-20 1988-1990 David Ross Fellowship, Purdue University. May 1986 Passed qualify exams with outstanding distinction. 1984 A Member CUSBEA (China-US Biochemistry Examination Admission). Each year about 50 top most students were selected and sent to pursue doctoral studies in the USA from the field of genetics, biochemistry, and molecular biology from China based on very stringent scrutiny. Patent Affymetrix Paten Pending on a Novel Experimental Method to Discover Genome wide protein-binding sites on Nucleic Acids (Genome wide foot print on DNA or RNA). 2002 Publications K. Zhou, P.R.G. Brisco, A.E. Hinkkanen, G.B. Kohlhaw (1987). Structure of yeast regulatory gene LEU3 and evidence that LEU3 itself is under general amino acid control. Nucl. Acids Res. 15, 5261-5273. K. Zhou, Y. Bai, and G.B. Kohlhaw (1990). Yeast regulatory protein Leu3: A structurefunction analysis. Nucl. Acids Res. 18, 291-198. K. Zhou and G.B. Kohlhaw (1990). Transcriptional activator LEU3 of yeast: Mapping of the transcriptional activation function and significance of activation domain tryptophans. J. Biol. Chem. 265, 17409-17412. A.B Cubitt, F. Carrel, S. Dharmawardhane, C. Gaskins, J. Hadwiger, P. Howard, S.K. Mann, K. Okaichi, K. Zhou, and R.A. Firtel. Molecular genetic analysis of signal transduction pathways controlling multicellular development in Dictyostelium. Cold Spring Harbor Symposia on Quantitative Biology, 1992, 57: 177-92. L. Wu, C. Gaskins, K. Zhou, R.A. Firtel, and P.N. Devreotes. (1994). Cloning and targeted mutations of Ga 7 and Ga8, two developmentally regulated G proteins in Dictyostelium. Mol. Biol. Cell. 5, 691-702. K. Zhou, K. Takegawa, S.D. Emr, and R.A. Firtel. (1995). A phosphatidylinositol (PI) kinase gene family in Dicyostelium discoideum: Biological roles of putative mammalian p110 and yeast Vps34 PI 3-kinase homologs during growth and development. Mol. Cell. Biol., 15: 5645-5656. D. Wang, Y. Hu, F. Zheng, K. Zhou, G.B. Kohlhaw. (1997). Evidence that intramolecular interactions are involved in masking the activation domain of transcriptional activator Leu3p. Journal of Biological Chemistry 272(31): 19383-19392. K. Zhou, S. Pandol, G. Bokoch, A. E. Traynor-Kaplan. (1998) Disruption of Dictyostelium PI3K genes reduces [32P]phosphatidylinositol 3,4 bisphosphate and [32P]phosphatidylinositol trisphosphate levels, alters F-actin distribution and impairs pinocytosis. Journal of Cell Science. 111 ( Pt 2): 283-294. K. Zhou, Y. Wang, J.L. Gorski, N. Nomura, J. Collard, and G.M. Bokoch (1998). Guanine nucleotide exchange factors regulate specificity of downstream signaling from Rac and Cdc42. Journal of Biological Chemistry 273(27): 16782-16786. C.Y. Chung, T.B. Reddy, K. Zhou, R.A. Firtel (1998) A novel, putative MEK kinase controls developmental timing and spatial patterning in Dictyostelium and is regulated by ubiquitin-mediated protein degradation. Genes Dev. 12(22):3564-3578. Palenik B, Grimwood J, Aerts A, Rouze P, Salamov A, Putnam N, Dupont C, Jorgensen R, Derelle E, Rombauts S, Zhou K, Otillar R, Merchant SS, Podell S, Gaasterland T, Napoli C, Gendler K, Manuell A, Tai V, Vallon O, Piganeau G, Jancek S, Heijde M, Jabbari K, Bowler C, Lohr M, Robbens S, Werner G, Dubchak I, Pazour GJ, Ren Q, Paulsen I, Delwiche C, Schmutz J, Rokhsar D, Van de Peer Y, Moreau H, Grigoriev IV. (2007) The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proc Natl Acad Sci U S A. 104(18):7705-10. Merchant SS, Prochnik SE, Vallon O, Harris EH, Karpowicz SJ, Witman GB, Terry A, Salamov A, Fritz-Laylin LK, Maréchal-Drouard L, Marshall WF, Qu LH, Nelson DR, Sanderfoot AA, Spalding MH, Kapitonov VV, Ren Q, Ferris P, Lindquist E, Shapiro H, Lucas SM, Grimwood J, Schmutz J, Cardol P, Cerutti H, Chanfreau G, Chen CL, Cognat V, Croft MT, Dent R, Dutcher S, Fernández E, Fukuzawa H, González-Ballester D, González-Halphen D, Hallmann A, Hanikenne M, Hippler M, Inwood W, Jabbari K, Kalanon M, Kuras R, Lefebvre PA, Lemaire SD, Lobanov AV, Lohr M, Manuell A, Meier I, Mets L, Mittag M, Mittelmeier T, Moroney JV, Moseley J, Napoli C, Nedelcu AM, Niyogi K, Novoselov SV, Paulsen IT, Pazour G, Purton S, Ral JP, Riaño-Pachón DM, Riekhof W, Rymarquis L, Schroda M, Stern D, Umen J, Willows R, Wilson N, Zimmer SL, Allmer J, Balk J, Bisova K, Chen CJ, Elias M, Gendler K, Hauser C, Lamb MR, Ledford H, Long JC, Minagawa J, Page MD, Pan J, Pootakham W, Roje S, Rose A, Stahlberg E, Terauchi AM, Yang P, Ball S, Bowler C, Dieckmann CL, Gladyshev VN, Green P, Jorgensen R, Mayfield S, Mueller-Roeber B, Rajamani S, Sayre RT, Brokstein P, Dubchak I, Goodstein D, Hornick L, Huang YW, Jhaveri J, Luo Y, Martínez D, Ngau WC, Otillar B, Poliakov A, Porter A, Szajkowski L, Werner G, Zhou K, Grigoriev IV, Rokhsar DS, Grossman AR. (2007) The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science. 2007 Oct 12;318 (5848):245-50. Worden AZ, Lee JH, Mock T, Rouzé P, Simmons MP, Aerts AL, Allen AE, Cuvelier ML, Derelle E, Everett MV, Foulon E, Grimwood J, Gundlach H, Henrissat B, Napoli C, McDonald SM, Parker MS, Rombauts S, Salamov A, Von Dassow P, Badger JH, Coutinho PM, Demir E, Dubchak I, Gentemann C, Eikrem W, Gready JE, John U, Lanier W, Lindquist EA, Lucas S, Mayer KF, Moreau H, Not F, Otillar R, Panaud O, Pangilinan J, Paulsen I, Piegu B, Poliakov A, Robbens S, Schmutz J, Toulza E, Wyss T, Zelensky A, Zhou K, Armbrust EV, Bhattacharya D, Goodenough UW, Van de Peer Y, Grigoriev IV. Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas. Science. 2009 Apr 10;324(5924):268-72. Zhou K, Panisko EA, Magnuson JK, Baker SE, Grigoriev IV. Proteomics for validation of automated gene model predictions. Methods Mol Biol. 2009;492:447-52. Presentations Cytoskeletal alterations in a Dictyostelium PI-3 Kinase double knockout mutant exhibiting reduced levels of phosphatidylinositol trisphosphate. American Society for Biochemistry and Molecular Biology. New Orleans, Louisiana. June 2-6, 1996. FASEB J., 10: A1277. Kemin Zhou Page 4 3/3/02 Chimera Gene Detection Improves Genome Annotation Quality. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York. May 10-14, 2006. The Biology of Genomes, Page 356. Kemin Zhou and Igor Grigoriev Fungal Intron Evolution: why a small genome have many introns. Fungal Genetics. Asilomar. March 17-22, 2009.