Structural Genomics & Proteomics: A big stage for the e-Science & 与疾病和创新药物

advertisement
Laboratory of Structural Biology,Tsinghua University
Structural Genomics & Proteomics:
A big stage for the e-Science
结构基因组学& 结构蛋白质组学
与疾病和创新药物
GENE
PROTEIN
STRUCTUURE
FUNCTION (DRUG)
With access to sequences of entire human genomes plus
those of various model organisms and many important microbial
pathogens, structural biology is on the verge of a dramatic
transformation. Our newfound wealth of sequence information
will serve as the foundation for an important initiative in
structural genomics. We are poised to embark on a systematic
program of high-throughput X-ray crystallography and NMR
spectroscopy aimed at developing a comprehensive view of the
protein structure universe. Structural genomics will yield a large
number of experimental protein structures (tens of thousands) and
an even larger number of calculated comparative protein structure
models (millions). This enormous body of structural data will be
freely available, and promises to accelerate scientific discovery in
all areas of biologic science, including biodiversity and evolution
in natural ecosystems, agricultural plant genetics, breeding of
farm and domestic animals, and human health and disease.
-- Stephen K Burley (Nature Structural Biology Volume 7 S)
Structure Genomics Initiative
in China
—— Another 1% ?
1% Human Genome Sequence
& a Draft of Whole Rice
Genome Sequencing
China is one of the six countries that
have provided facilities for the Human
Genome Project (HGP). After completion of
the HGP, China based scientists have been
continuously making achievements in
genomic studies, including the recently
sequenced a rice genome and related report
has recently been published by “Science”.
Three major Genome Research Centers in
China have been involved in the HGP:
BGI(Beijing Genomics Institute , CAS)
Beijing Genomics Institute (BGI) is the largest non-profit genomics
research institute in China. Founded in July 1999 by a group of overseas
Chinese scientists, BGI has been growing rapidly with the support from
the Chinese Academy of Sciences.
Major Sequencing Projects in BGI:
The Human Genome
Thermoanaerobacter tengcongensis Genome
The Porcine Genome
The Super Hybrid Rice Genome Project
Chinese National Human Genome Center,
Beijing
http://www.chgb.org.cn
CHGC (Chinese National Human Genome
Center at Shanghai)
Chinese National Human Genome Center at Shanghai was established
in Shanghai on March 4th, 1998 and hung out its brass plate on October
29th,1998. The Center is supported by National MST, Municipal
Government and Chinese Academy of Science (CAS).
Š
Š
Š
Š
Human Genome Sequencing
Pathogenic Microbe Genome Sequencing
EST Sequencing and Full-length cDNA cloning
Disease Gene Positional Cloning and Scanning
Structural Biology in China
In 1967, China set up one of the earliest protein
crystallographic research institutions: Peking Insulin Structure
Group, CAS. Solved crystal structure of pig insulin in 1972. In
the following years, China has been successfully exchanging
expertise with other countries all over the world. At present,
there are about 10 major research groups based in China
undertaking structural biological studies.
Institute of Biophysics, CAS
(Former Peking Insulin Structure Group)
Peking Insulin Structure Group (1973).
Crystal structure of Insulin (1.8A)
UK & China
Collaboration in
structural biology has
been established by
Professor Hodgkin in the
early of 70’s
Institute of Biophysics, CAS
Structural Biology Project,
National Biology
Macromolecular Laboratory
Protein Crystallography
Project
Protein Crystallography
Project, National Biology
Macromolecular Laboratory
含铜亚硝酸还原酶突变体(1.65Å)
Protein Engineering Project
天花粉蛋白(Trichosanthin)结构
由 Ca2+连接的蝮蛇毒素
(Agkistrodotoxin)二聚体
哺乳动物的蝎神经毒
Shanghai Institute of
Organic Chemistry, CAS
HIV-RT
重组细胞色素b5(cytochrome b5)
的胰蛋白酶( Trypsin)水解片段落
Shanghai Institute of
Biochemistry, CAS
(Dr Jianping Ding’s group)
FuJian Institute of
Structural Chemistry,
CAS
在不对称单位中两个
天花粉蛋白分子
热稳定的邻苯二酚2,3-双加氧酶
(catechol 2,3-dioxygenase)(2.0Å )
Fudan University
University of Science &Technology of
China ,CAS
NMR Project,
Structural Biology
Laboratory,
USTC
发 卡 DNA GCGCGAAAC-T-GTTTCGCGC 的 溶 液
构象,2D NMR 测定及分子动力学模拟结果,红线
及绿线分别代表IRMA计算中起始结构为A-DNA和BDNA所得结果
Protein Crystallography
Project,
Structural Biology
Laboratory,
USTC
C型凝集素类似蛋白
(Agkistrodon acutus 蛇毒组分)
Protein Structure Project
Peking University
Protein Structure Project,
Physical Chemistry Institute
斑头雁和灰雁的氧合血红蛋白
NMR Project
猪胰蛋白酶与绿豆抑制剂三元复合物 (2.7Å)
EM Project
α-淀粉酶抑制剂
Professor Ming Lou Group
水稻矮缩(RDV)病毒内壳层—骨架结构
Proposed by Professor Dong-Cai Liang, the former
leader of the Peking Insulin Structure Group.
Structure Genomics Initiative
Structure Genomics Initiative
in China
Š Supported by the National Biotechnology R&D Program and
National Frontier Research Program from the Ministry of
Science & Technology, together with the National Nature
Science Foundation, China officially started her "Structural
Genomics Initiative" by end of last year. Based on our
achievements in genomic and medical studies, we have
initiated pilot studies focused on human disease related
proteins. The aims of the initiative are: to establish a fully
integrated, high-throughput system for target gene selection;
expression and purification of selected proteins; protein
chemistry and biochemistry characterization; protein
crystallization and X-ray crystallography; and finally, to lead
to structure based functional genomics and drug discovery.
Now, we have two “SGI” centers in China
1, CAS (150 structures?) (Chinese Academy of Sciences)(Y SHI)
Institute of Biophysics Academia Sinica (D Wang)
ShangHai Institute of Biochemistry (J Ding)
University of Science &Technology of China (Y Shi)
2, MOE (300 structures?) (The Ministry of Education)(Z RAO)
Tsinghua University (Zihe Rao)
Peking University (Min Luo)
With joint efforts in China, as well as international
collaborations. Our ultimate goal is to determine the structures
of more than 1% of the total proteins expressed from the human
genome (say, >1,000 structures), or more than 1% of protein
folds still to be discovered.
(MOECenter) Structural Genomics Initiative
of Tsinghua University
Since December 2001,
Tsinghua University has
begun its Structural
Genomics Initiative project
for high-throughput,
structural and functional
studies of proteins, which is
funded by 863 (MOST).
The mission of Tsinghua Structural Genomics Initiative
(TSGI) is to select the human disease-related genes of
unknown structure and determine 150(?) structures (new fold
or sequence homology less than 30% against the PDB) by
high-throughput methodologies in 5 years. We also focus on
the human proteins having potential new fold.
Strategy:
Our main collaborators have offered to provide 840 soluble
proteins :
1.200 expression vectors (soluble) are expected from group led by
Prof.Qian Bo-qun, Chinese National Human Genome Center, Beijing.
2. Cancer Institute, Chinese Academy of Medical Sciences will supply 500
tumor related genes or proteins (soluble).
3.The group led by Prof. He Fuchu from AMMS will provide as much as
100 expression vectors (soluble).
4. More than 40 tumor (soluble) antigens selected by means of antibody
library selection platform will be on list within the coming 5 years from the
group led by Prof. Chen Zhinan from the FMMU.
We are also expecting to produce soluble proteins from our own lab:
A high-throughput group led by Prof. Pang Hai is to provide 1000(?)
constructs for protein expression (soluble).
Strategy:
Target selection strategy
1. Scientific literatures: CNS
Choosing an appropriate
Target Sources
target source is a key step
to do the effective target
selection. We have
developed several
strategies to solve this
seemingly tough issue.
In my personal view, there
are two poles of the
strategies to choose SG
(structural strategies)
targets:
1) issue-oriented targets
regardless of the
techniques obstacle;
2) the low hanging fruit
regardless of the functional
relevance;
Of course, the combination
of the two strategies is the
best solution to target
selection.
2. Partners: CAS, National Genome Center,etc.
3. Public Databases: NCBI,etc.
1.Have structures in PDB? (Similarity<30%)
Full length analysis
2. Gene size? (>2.0kb)
3. Have conserved domain? (New or old gene?)
Domain analysis
Selective
Filter
Single or multiple domain?(function clues)
1.Have signal peptide?
Property prediction
2.Have Transmembrane regions?
3.Secondary struc.,pI,Mw,Cys%,Met%,Extinction
Fold recognition
3D structure prediction by threading
1.Soluble and stable
Crystallizable
Targets
2.No transmembrane region
3.Appropriate protein size
4.No low complexity region
1. Medical relevance or functional significance
Targets Prioritization
2. Protein/protein or protein/nucleic acid complex
a. pI
3. Property
b. Number of Methionine (MAD)
c. Solubility and stability
Target Database
Strategy:
TSGI’s Target selection platform modules
Routine modules already contained in current selection platform:
• Automated modules:
a.Threading: z-score;(against PDB)
b.PSI-Blast: homology,e-value;(against PDB)
c.TMHMM: signal peptide,TMHs, topology;
d.SeqQ: solubility prediction;
• Manual modules:
e.Expasy -ProPram: Mw, pI, Extinction, Cys%, Met%
-GOR IV: sencondary structure prediction
f.Pfam/SMART: domain boundary determination
and function annotation(>20kD protein)
g.NCBI(PubMed,OMIM)/Genecard: reference paper
Strategy:
Target selection process
The targets selection strategy
of our laboratory which is based
on the Bioinformatics programs,
algorithms and public databases
comes to meet the demand of
high-throughput
structure
determination.
We have been developing an
automatic
targets
selection
platform integrating the existing
public server & database and
local BLAST and Threading
program, quickening up the
conversion from bit-by-bit
lab work to high throughput
methodologies.
• Legend:
A—Files or databases
B--Program developed by TSGI
C--Public available
server/software
D--Manual procedure
Target selection program (1)
Extract
Hs_seq_uniq
Target selection program (2)
Batch file / C shell writer
Target_sorting
PDB_new fold_sorting
Example: Target selection results
We have selected ~2500
tractable genes from several
different databases. All the
target
candidates
meet
following the criteria:
1. E-value>10-4;
2. No or just have one
transmembane region;
3. Homology <30%;
4. Amino acid number<600;
There are nearly 600 genes
having ‘no hits found’ against
PDB database by PSI-Blast.
We expect to find potential
new folds from this candidate
pool by threading.
Strategy:
The power of our target selection strategy
Target source
Total
Contain CDS
Tractable & unknown structure
1.HBV-related gene
-
-
7
2.Leukemia related gene
169
156
29
3.Genecard
2100
1264
84
Disease-related gene program:
New folds searching program:
1.Archea proteom
2802
2.Unigene cDNA/EST
95927
-
10311(<600aa)
583*/2303$
Note:
$-Evalue:0.01,TMHs:1,Homology:30%,a.a.num:600;
0-200aa: 3035 (2858 non-transmembrane proteins, 94.2%)
*-No hits found in PDB;
200-400aa: 4279 (37358 non-transmembrane proteins, 87.3%)
400-600aa: 2997 (2567 non-transmembrane proteins, 85.7%)
Experimental progress I
Target
Selection
Gl7aca acylase
TTR C2
Molecular Cloning
Protein
Purification
Crystallization
Data
Collection
Structure
Determination
PDB
Publication
1GHD
1GHE
SAK
C1027
1C76
1J48
TRXL
Histone
1GH2
1KU5
S100P
MGC
1J55
Fabp3
111
HBV X
AFP
Decorin
HCC1
XIP
AIP
HAF4
HCA56
HCA58
SSPC
CAB1
TDO
SARG
HPV E6
BCO
SCPT
SH
STM
TFX
5031
Experimental progress II
Target
Selection
H3
LKB1
WNK1
LKB1_D
Mad2
NDPP1
DIO1
HCCA2
NDPP1
TUCAN
CARD8
Y14
MAPKKK3
Mad3
RHAMM
RPL27
CDC20
BUB3
DEK
Pinx
HIRIP3
Rb1
E2F1AD
HIRAH4
HIRAH2B
NASP
NAP2
H2A
H2B
H4S
NAP1
H1S
Molecular Cloning
Protein
Purification
Crystallization
Data
Collection
Structure
Determination
PDB
Publication
Experimental progress III
Target
Selection
FOP
GAS41
SMARCB1
SIX3
SIM1
SH3GL1
SGSH
SGCG
SEDL
SCA1
SARDH
SAH
RPS19
RLBP1
RFXAP
RFXANK
REA
RAMP
RAG2
AIP
HCC1
HCA56
HCA58
XAF4
XIP
SCG3
IKBKG
HMGIC
HNMT
BCL7A
AGT
CLN5
Molecular Cloning
Protein
Purification
Crystallization
Data
Collection
Structure
Determination
PDB
Publication
Experimental progress IV
Target
Selection
MCCC2
MGAT2
LOR
MDS1
TRH
ITPA
SEPN1
FANCE
ACT
APOA2
NEU1
PDGFRL
TNNT1
BLNK
PEX3
PABPN1
MSF
PDHA1
ING1
P25
ARTEMIS
AMY
C10ORF2
TCAP
BBS4
NET1
BCS1L
LGS
PRCC
TPT
EPM2A
AAAS
Molecular Cloning
Protein
Purification
Crystallization
Data
Collection
Structure
Determination
PDB
Publication
strategy
Tsinghua Structural
Genomics Initiative
(TSGI) web page:
xtal.tsinghua.edu.cn
Introduction
Submit
Target list
Strategy:
Target selection platform hardware
The target selection and
sequence analysis of Human
genome-scale
cDNA
sequences
from
NCBI
unigene database is carried
out on the supercomputer at
Tsinghua High Performance
Computing Institute (THPCI),
one
of
our
close
collaborators. The computer
cluster reside in THPCI has
34 4-CPUs Pentium III Xeon
units, which can do ~1000
threading jobs in parallel way
in just one day.
Strategy:
High through-put cloning, expression
新一代高通量分子克隆手段-Gateway Technology,
NEB intein System, T vector….
新一代的分子克隆手段
利用同源重组的技术避免了
传统的分子克隆中使用限制
性内切酶和连接酶所带来的
效率低下的瓶颈。大大提高
了灵活性和可靠性,缩短了
克隆供表达的基因的周期。
strategy
High though-put purification, crystallization,
data collection, structural solution
An automated high-throughput protein crystallization system is needed.
strategy
Evaluation of high-throughput structure
determination methods
Data processing
• Goal is to derive intensities of diffraction spots (and their standard
deviations) from X-ray images and reduce data to appropriate
crystallographic space group.
• Many fast, user-friendly software suites for data processing.
• Most popular software suite is the HKL (DENZO) package.
• Alternatives include MOSFLM, DLS, D*TREK, DPS, POW
Phasing
• SOLVE program was first to provide fully automated phasing
from a MIR experiment
• Automated versions of other software (e.g. Auto-SHARP
and CHART) soon available.
• Direct methods approaches (SnB, SHELXD) very efficient but
only provide positions of heavy atom sites.
• CNS/CCP4i can proceed from heavy atom sites or molecular
replacement solution to density map - require user intervention
strategy
Evaluation of high-throughput structure
determination methods
Model building
• Many maps are of average-to-poor quality - not straightforward
to build accurate model.
• Interactive (I.e. manual) model building very time consuming,
introduces errors into model.
• Tools to build model from Ca coordinates; accurate recognition of Ca
positions remains a challenge.
• ARP/wARP provides complete automation for building model
structures at resolutions higher than 2.3 Å
Refinement
• Adjustment of parameters of the model through minimization of
residuals between experimental diffraction amplitudes (Fobs) and those
calculated from a model (Fcalc).
• Commonly used programs: CNS and REFMAC (CCP4)
• Other programs: TNT, SHELXL (for high resolution refinement),
BUSTER
• Explicit definition of stereochemical restraints required.
strategy
Evaluation of high-throughput structure
determination methods
Software for structure determination
General Packages
• CNS
• CCP4
Data Processing
• D*TREK
• DPS
• HKL2000/DENZO
• MOSFLM (CCP4)
• XDS
• STRATEGY
• PREDICT
Phasing
Molecular replacement:
• AMORE (CCP4)
• CNS
• MOLREP (CCP4)
• EPMR
• ARP/wARP*
Heavy atom sites:
• SHELXD
• SNB
• RANTAN, RSPS (CCP4)
Heavy atom phasing:
• CHART*
• MLPHARE (CCP4)
• PHASES
• SHARP*
• SOLVE/RESOLVE*
strategy
Evaluation of high-throughput structure
determination methods
Software for structure determination
Model Building
Pattern Searching
• ESSENS
• FFFEAR
Interactive Graphics
• MAIN
•O
• QUANTA
• TURBO-FRODO
• XTALVIEW
Automated model building
• ARP/wARP*
• RESOLVE*
Refinement
• BUSTER
• CNS
• REFMAC (CCP4)
• SHELXL
• TNT
Validation
• PROCHECK (CCP4)
• SFCHECK (CCP4)
• WHATCHECK
strategy
Evaluation of high-throughput structure
determination methods
Examples >> tabtoxin resistance protein (TTR)
• Use the TTR structure to compare several manual and automated
structure determination methods
Advantages:
• High resolution data (1.55 Å) collected at APS, Argonne, USA
• Three wavelength MAD data
• Data meets criteria for automated structure determination software
(e.g. ARP/wARP)
strategy
Evaluation of high-throughput structure
determination methods
Examples >> tabtoxin resistance protein (TTR)
Software
used
Method
Number
of
residues
Figure of
RMSD
merit <FOM>
Approx.
time
CNS & O
manual
170
0.81
~ 1 week
CNS &
semiARP/wARP automated
168
0.81
0.34 Å
~ 3 days
SOLVE &
automated
ARP/wARP
168
0.79
0.33 Å
~ 24 hours
CHART &
automated
ARP/wARP
168
0.76
0.26 Å
~ 6 hours
[All calculations performed using SGI Origin2000 server]
strategy
Evaluation of high-throughput structure
determination methods
Examples >> tabtoxin resistance protein (TTR)
— Another 1% from China?
I don’t know…but…
Laboratory of Structural Biology,Tsinghua University
Structural Genomics & Proteomics:
A big stage for the e-Science
Collaboration between UK & China in the field of
structural biology has a long history and also has a series
fruitful results. I am confident that under the umbrella
of e-science, with the joint efforts between UK &
Chinese structural biologists, we can achieve our goal.
GENE
PROTEIN
STRUCTUURE
FUNCTION (DRUG)
National Exhibition of “Art & Science”(a-Science),
May 1-14, 2001,
National Art Gallery, Beijing
Life Science Building
Campus
Main building and its
surroundings
Gymnasium
Incubating high-tech enterprises (5)
Tsinghua Science Park
20 hectares, 100,000 m2 built ; Major Tsinghua Enterprises
¾International Corporations; National Eng. Centers
Entrepreneur Park of Tsinghua University
School of Sciences
Thank you
Download