Summary - Orpara

advertisement
Kemin Zhou Ph.D.
14610 Via Bergamo, San Diego, CA 92127
Phone: (858) 366-8260
kmzhou4@yahoo.com
Summary
Bioinformatics scientist with background in computer science, genomics,
biochemistry, information technology, molecular biology, genetics, and drug target
discovery. Expert in eukaryotic genome annotation and comparative genomics.
Developed COMBEST algorithm to assembly billions of next-generation EST into
alternatively spliced gene models with digital expression levels. Recently written
two manuscripts on inton evolution using comparative genomics approach. Ability
to communicate and organize bioinformatics efforts to people from diverse
backgrounds.
Skills



Bioinformatics
o Drug Target Identification for Pharmaceutical Companies
o Database schema design for biological and genomic data
o Building bioinformatics infrastructure based on OpenSource products
o Expert experience in using public databases
o Genome annotation and molecular phylogeny
o Custom Affymetrix DNA-Chip (Microarray) Design and analysis
o Knowledge extraction (Web Crawler) from public database using perl
o Familiar with Chemoinformatics tool Pipline Polit
o Pipeline for genome annotation and data analysis
Computer Skills
o Expert knowledge in relational databases: MySQL, Postgres, and Oracle
o Expert Knowledge in web development technologies: PHP, Mod_perl,
JavaScript, CGI
o Fluent in many computer languages: C++, Perl, Java, Shell Script, SQL,
PL/SQL, AQL, HTML, DHTML, XML, SVG, JavaScript, Python, and
Rubby
o Expert in LINUX, Unix operating systems, and SGE
o Network Programming TCP/IP in C/C++, Java, and Perl.
Statistics Skills
o Basic training and knowledge in probability and statistics
o Fluent with R in data analysis
o Application of basic statistics in bioinformatics algorithm developments
Bioinformatics Endeavors

Probability model and simulation of intron loss



















Exon alignment algorithm to compare exon structure at whole genome level
Proteomics: peptide analysis on genome scale
NREP algorithm: Combine gene models and build non-redundant set by removing
low quality models such as ‘pregnant’, chimera, fragments, etc.
Algorithm to detection of fusion proteins
Written a paper on Ancient nature of alternative splicing and function of introns
Written a paper on Intron Number Evolution
GeneWise Pipeline: More accurate and efficient pipeline to derive proteinhomology models based on PFOG
CAIWE algorithm: combine predicted and EST gene model
PFOG algorithm: protein foot prints on genome
COMBEST algorithm: Assembly billions of next-generation EST into
alternatively spliced gene models with digital expression level
Invented Hornagram to analyze digital gene expression
Build a Vertebrate Ortholog/Paralog Database for regulatory element discovery
Implemented a versatile sequence alignment that works on databases
Implemented a single-linkage clustering algorithm using a novel HATREE data
structure
Build a G-Protein-Coupled Receptor (GPCR) Databases From Public Resources
Wrote a Parser in C++ that Converts GenBank Format into ACEDB
Wrote a Java Driver to ACEDB
Wrote a C++ Driver to ACEDB
Develop libraries for bioinformatics efforts
Professional and Research Experience
Syngenta Jan 2012-Dec 2012
Bioinformatics Programmer (Consultant through Adecco)


Develop and maintain Java ETL for genetic and genomic databases.
Develop portals for bioinformatics tools.
JGI May 2005 – Dec 2011
Bioinformatics Systems Analyst
Trouble-shoot, maintain, and improve JGI genome annotation pipeline and the underlying
mySQL-based database schema. Contribute new algorithms and data analysis procedures.





Develop and implement NREP algorithm for consolidating gene models from
diverse methods
Comparative genomic analysis of intron evolution
Contribute genome data analysis for various publications
Develop EST-based gene modeling method based on Illumina short reads
Design and implement automated procedures to do quality control









Convert JGI data into external release formats: GenBank, GFF3, and Flat files
Implement homology-based gene modeling module based on a new footprinting
algorithm
Develop and implement chimera detection algorithm for both improving the
quality of gene predictions and identifying fusion proteins
Implement procedures to map cosmids, fosmids, protein to genomes and display
in the result in the JGI genome-browser
Identify problems and bottle necks in existing pipelines, and implementing
procedures to improve the quality and efficiency of the pipelines
Develop pipelines based on parallel computing on SGE (Sun Grid Engine)
Develop pipeline to secreted and membrane proteins and display in genomebrowser
Develop procedures for mapping proteomics data onto genomes and display the
result in the genome browser
Fgenesh training both automatically and manually for ab initio gene prediction
Orpara Nov 2004 - Dec 2005
Bioinformatics Consultant
Provide Bioinformatics Services to Small and Medium-Sized Biotech Companies





Provide informational services to small biotech companies
Web Text mining to generate targeted email list for particular biotech products
Web graphics development using SVG
Integrate Dynamic Sequence Alignment algorithms with Postgres DB using
C/C++
Develop statistical learning algorithms for identifying true orthologues based on
evolutionary relationships among vertebrates
Ferring Research Institute May 2002 - Nov 2004
Bioinformatics Scientist
In charge of all bioinformatics needs for Molecular Biology, in vitro and in vivo
Pharmacology.





Setting up the bioinformatics hardware and software infrastructure based on
OpenSource packages such as Linux, Postgres Database, ACEDB, Perl, Java,
Apache, C++ and Personal Libraries, and Freely available Software Tools such as
Blast, Fasta, MEME, Dialign, and Phylip.
Established relational and object database representing knowledge about all
completed vertebrate genome.
Create and Maintain G-protein-coupled Receptor (GPCR) Database.
Narrowed Down Prostate Cancer Drug Target That are GPCR.
Analyze Gene Expression Arrays using GeneSpring and Relational Database.


Established a Web Interface for Custom and Commercial Chips, GPCR, Blast
Search to Internal Databases, Pair-wise and Multiple Sequence Alignments
Servers.
Evaluated GPCR Coverage of Affymetrix Commercial Chip and Justified the
Decision to Design a Custom Chip with the Most Comprehensive Coverage of
GPCR, G-Proteins, and other Potential Drug Targets Based on the Most Current
Version of the Human Genome.
Computer-related Tasks










Hardware selection for Bioinformatics Needs
Configure the RAID to boost database performance
Compile and Build the Linux OS to optimize performance of the hardware
Compile gcc, gtk, apache, perl, and many other system libraries
Compile, Build, configure, and install database tools: Postgres and ACEDB
Compile, build and configure numerous open source bioinformatics tools
Develop novel algorithms for quick construction of in-house GPCRDB
Managing biological and sequencing data with SQL and PL/SQL
Web application development with Apache, mod_perl, and JavaScript
Web development with LogiXML and Oracle
Affymetrix (formerly Neomorphic), Bioinformatics Division June 2000 April 2002
Staff Scientist II
Gene identification, characterization and sequence selection for DNA-chip design.
Construct a database of vertebrate orthologues and paralogues. Transcriptional regulatory
site discovery by comparative genomics.







Created a potential new product line by inventing a new method for genome wide
nucleic acids foot print (detail if the patent)
Invented and implemented a novel Cluster Algorithm: an essential tool for
common bioinformatics tasks on large data sets, such as finding gene families,
clustering EST, alternative splicing, protein domain analysis, etc
Constructed a novel database of orthologues/paralogues for vertebrates: valuable
starting material for data mining, such as finding regulatory elements and
comparative genomics
Provided key ideas behind the human U133 sequence selection which resulted
DNA-Microarrays that are much better than the previous HG95 design
Saved the company money by using Postgres in place of expensive commercial
relational databases such as Oracle for research and development
Written a Java Driver to the ACEDB object database involving a novel object
representation of bio-objects (Equivalent to JDBC to relational databases)
Facilitated top management by surveying genomes from sequenced organisms in
a short time


Implemented a distance-based phylogenetic tree construction software in C++ and
a graphics display software in Java
Gene Modeling with Hidden Markov Model (HMM)
Computer skills





Relational Database Postgres: compile, configure, DBA and schema design to
handle large amounts of sequence and annotation data
Java 2D graphics (swing), JDBC for database connection, and threaded
programming
C++ for algorithm development
Perl for various projects
Use Oracle to perform quality control of gene-chip design
The Molecular Sciences Institute Oct 1996 - May 2000
Research Fellow under Sydney Brenner Ph.D. Founder and President and Nobel Prize
Winner of Year 2002
Comparative genomics of vertebrates





Comparative genomics approach to study cancer, write NIH grant based on
promising preliminary results
Pioneered the construction and Annotation of the Fugu Sequence Database
Characterized the genomic and gene features of Fugu with 1% of genomic
sequence data
Demonstrated that comparative genomics between Fugu and other vertebrates is
an efficient way to locate regulatory elements
Discovered the subtlety of mRNA-splicing in vertebrates by expressing Fugu
genes in mammalian cells.
Computer-Related Tasks







Use ACEDB to manage Fugu genomic and annotation data
Write C++/Perl programs to obtain and integrate public bioinformatics resources
Write C++ Programs for Macintosh to automate sample handling
Write C++ Programs to design oligonucleotides and compute their concentration
Manage lab reagents with ACEDB
Build and configure the Linux OS on Windows platforms
Operate and program robots for automated plasmid extraction
The Scripps Research Institute, Department of Immunology and Cell
Biology Jan 1995 - Oct 1996
Research Associate
Signal transduction by PAK protein kinase, PI3-kinases, and the Rho family of small
GTPases



Discovered the principle that cell type diversity correlates with the number of
guanine nucleotide exchange factors (GEF) in eukaryotes
Discovered that the downstream specificity of small GTPase signaling pathway is
dictated by the upstream GEF
Revealed the in vivo function of PI3-kinase in modulating the actin cytoskeleton
by double knockout of PI3-kinase genes.
University of California San Diego, Department of Biology Aug 1991 - Jan
1995
Postdoctoral Fellow
Role of heterotrimeric GTP-binding proteins and phosphatidylinositol kinases, and
protein kinases in regulating the development of Dictyostelium discoideum.



Discovered the G-alpha-8 gene
Revealed principle of redundancy of G-protein signal transduction pathways
Discovered the PI-kinase family in eukaryotes.
Education

Ph. D. Biochemistry & Molecular Biology, Department of Biochemistry Purdue
University, West Lafayette, Indiana. 1985-1991
Elucidated the molecular switching mechanism of a yeast transcriptional regulator
Leu3p


Certificate in English Proficiency, GELC Zhongshan University, Guangzhou,
China 1984-1985
B.S. Genetics & Breeding, China Agricultural University, Beijing, China.19801984
Honors & Awards




1995-1996 NIH Training Grant T32, HL07195-20 1988-1990 David Ross
Fellowship, Purdue University.
May 1986 Passed qualify exams with outstanding distinction.
1984 A Member CUSBEA (China-US Biochemistry Examination Admission).
Each year about 50 top most students were selected and sent to pursue doctoral
studies in the USA from the field of genetics, biochemistry, and molecular
biology from China based on very stringent scrutiny.
Patent
Affymetrix Paten Pending on a Novel Experimental Method to Discover Genome wide
protein-binding sites on Nucleic Acids (Genome wide foot print on DNA or RNA). 2002
Publications
K. Zhou, P.R.G. Brisco, A.E. Hinkkanen, G.B. Kohlhaw (1987). Structure of yeast
regulatory gene LEU3 and evidence that LEU3 itself is under general amino acid control.
Nucl. Acids Res. 15, 5261-5273.
K. Zhou, Y. Bai, and G.B. Kohlhaw (1990). Yeast regulatory protein Leu3: A structurefunction analysis. Nucl. Acids Res. 18, 291-198.
K. Zhou and G.B. Kohlhaw (1990). Transcriptional activator LEU3 of yeast: Mapping of
the transcriptional activation function and significance of activation domain tryptophans.
J. Biol. Chem. 265, 17409-17412.
A.B Cubitt, F. Carrel, S. Dharmawardhane, C. Gaskins, J. Hadwiger, P. Howard, S.K.
Mann, K. Okaichi, K. Zhou, and R.A. Firtel. Molecular genetic analysis of signal
transduction pathways controlling multicellular development in Dictyostelium. Cold
Spring Harbor Symposia on Quantitative Biology, 1992, 57: 177-92.
L. Wu, C. Gaskins, K. Zhou, R.A. Firtel, and P.N. Devreotes. (1994). Cloning and
targeted mutations of Ga 7 and Ga8, two developmentally regulated G proteins in
Dictyostelium. Mol. Biol. Cell. 5, 691-702.
K. Zhou, K. Takegawa, S.D. Emr, and R.A. Firtel. (1995). A phosphatidylinositol (PI)
kinase gene family in Dicyostelium discoideum: Biological roles of putative mammalian
p110 and yeast Vps34 PI 3-kinase homologs during growth and development. Mol. Cell.
Biol., 15: 5645-5656.
D. Wang, Y. Hu, F. Zheng, K. Zhou, G.B. Kohlhaw. (1997). Evidence that
intramolecular interactions are involved in masking the activation domain of
transcriptional activator Leu3p. Journal of Biological Chemistry 272(31): 19383-19392.
K. Zhou, S. Pandol, G. Bokoch, A. E. Traynor-Kaplan. (1998) Disruption of
Dictyostelium PI3K genes reduces [32P]phosphatidylinositol 3,4 bisphosphate and
[32P]phosphatidylinositol trisphosphate levels, alters F-actin distribution and impairs
pinocytosis. Journal of Cell Science. 111 ( Pt 2): 283-294.
K. Zhou, Y. Wang, J.L. Gorski, N. Nomura, J. Collard, and G.M. Bokoch (1998).
Guanine nucleotide exchange factors regulate specificity of downstream signaling from
Rac and Cdc42. Journal of Biological Chemistry 273(27): 16782-16786.
C.Y. Chung, T.B. Reddy, K. Zhou, R.A. Firtel (1998) A novel, putative MEK kinase
controls developmental timing and spatial patterning in Dictyostelium and is regulated by
ubiquitin-mediated protein degradation. Genes Dev. 12(22):3564-3578.
Palenik B, Grimwood J, Aerts A, Rouze P, Salamov A, Putnam N, Dupont C, Jorgensen
R, Derelle E, Rombauts S, Zhou K, Otillar R, Merchant SS, Podell S, Gaasterland T,
Napoli C, Gendler K, Manuell A, Tai V, Vallon O, Piganeau G, Jancek S, Heijde M,
Jabbari K, Bowler C, Lohr M, Robbens S, Werner G, Dubchak I, Pazour GJ, Ren Q,
Paulsen I, Delwiche C, Schmutz J, Rokhsar D, Van de Peer Y, Moreau H, Grigoriev IV.
(2007) The tiny eukaryote Ostreococcus provides genomic insights into the paradox of
plankton speciation. Proc Natl Acad Sci U S A. 104(18):7705-10.
Merchant SS, Prochnik SE, Vallon O, Harris EH, Karpowicz SJ, Witman GB, Terry A,
Salamov A, Fritz-Laylin LK, Maréchal-Drouard L, Marshall WF, Qu LH, Nelson DR,
Sanderfoot AA, Spalding MH, Kapitonov VV, Ren Q, Ferris P, Lindquist E, Shapiro H,
Lucas SM, Grimwood J, Schmutz J, Cardol P, Cerutti H, Chanfreau G, Chen CL, Cognat
V, Croft MT, Dent R, Dutcher S, Fernández E, Fukuzawa H, González-Ballester D,
González-Halphen D, Hallmann A, Hanikenne M, Hippler M, Inwood W, Jabbari K,
Kalanon M, Kuras R, Lefebvre PA, Lemaire SD, Lobanov AV, Lohr M, Manuell A,
Meier I, Mets L, Mittag M, Mittelmeier T, Moroney JV, Moseley J, Napoli C, Nedelcu
AM, Niyogi K, Novoselov SV, Paulsen IT, Pazour G, Purton S, Ral JP, Riaño-Pachón
DM, Riekhof W, Rymarquis L, Schroda M, Stern D, Umen J, Willows R, Wilson N,
Zimmer SL, Allmer J, Balk J, Bisova K, Chen CJ, Elias M, Gendler K, Hauser C, Lamb
MR, Ledford H, Long JC, Minagawa J, Page MD, Pan J, Pootakham W, Roje S, Rose A,
Stahlberg E, Terauchi AM, Yang P, Ball S, Bowler C, Dieckmann CL, Gladyshev VN,
Green P, Jorgensen R, Mayfield S, Mueller-Roeber B, Rajamani S, Sayre RT, Brokstein
P, Dubchak I, Goodstein D, Hornick L, Huang YW, Jhaveri J, Luo Y, Martínez D, Ngau
WC, Otillar B, Poliakov A, Porter A, Szajkowski L, Werner G, Zhou K, Grigoriev IV,
Rokhsar DS, Grossman AR. (2007) The Chlamydomonas genome reveals the evolution
of key animal and plant functions. Science. 2007 Oct 12;318 (5848):245-50.
Worden AZ, Lee JH, Mock T, Rouzé P, Simmons MP, Aerts AL, Allen AE, Cuvelier ML,
Derelle E, Everett MV, Foulon E, Grimwood J, Gundlach H, Henrissat B, Napoli C,
McDonald SM, Parker MS, Rombauts S, Salamov A, Von Dassow P, Badger JH,
Coutinho PM, Demir E, Dubchak I, Gentemann C, Eikrem W, Gready JE, John U, Lanier
W, Lindquist EA, Lucas S, Mayer KF, Moreau H, Not F, Otillar R, Panaud O, Pangilinan
J, Paulsen I, Piegu B, Poliakov A, Robbens S, Schmutz J, Toulza E, Wyss T, Zelensky A,
Zhou K, Armbrust EV, Bhattacharya D, Goodenough UW, Van de Peer Y, Grigoriev IV.
Green evolution and dynamic adaptations revealed by genomes of the marine
picoeukaryotes Micromonas. Science. 2009 Apr 10;324(5924):268-72.
Zhou K, Panisko EA, Magnuson JK, Baker SE, Grigoriev IV. Proteomics for validation
of automated gene model predictions. Methods Mol Biol. 2009;492:447-52.
Presentations

Cytoskeletal alterations in a Dictyostelium PI-3 Kinase double knockout mutant
exhibiting reduced levels of phosphatidylinositol trisphosphate. American Society
for Biochemistry and Molecular Biology. New Orleans, Louisiana. June 2-6, 1996.
FASEB J., 10: A1277. Kemin Zhou Page 4 3/3/02


Chimera Gene Detection Improves Genome Annotation Quality. Cold Spring
Harbor Laboratory, Cold Spring Harbor, New York. May 10-14, 2006. The
Biology of Genomes, Page 356. Kemin Zhou and Igor Grigoriev
Fungal Intron Evolution: why a small genome have many introns. Fungal
Genetics. Asilomar. March 17-22, 2009.
Download