1 - 서울대 : Biointelligence lab

advertisement
Bioinformation Technology:
Case Studies in Bioinformatics and
Biocomputing with DNA Chips
Byoung-Tak Zhang
Center for Bioinformation Technology (CBIT)
Seoul National University
btzhang@bi.snu.ac.kr
http://bi.snu.ac.kr/~btzhang
Outline

Bioinformation Technology
 Bioinformatics
 DNA Chip Data Analysis: IT for BT
 DNA Computing: BT for IT
 DNA Computing with DNA Chips
 Outlook
2
Human Genome Project
A New
Disease
Encyclopedia
Goals
• Identify the approximate 40,000 genes
in human DNA
• Determine the sequences of the 3 billion
bases that make up human DNA
• Store this information in database
• Develop tools for data analysis
• Address the ethical, legal and social
issues that arise from genome research
Genome
Health
Implications
New Genetic
Fingerprints
New
Diagnostics
New
Treatments
3
Bioinformatics vs. Biocomputing
Bioinformatics
IT
BT
Biocomputing
4
Bioinformatics
5
What is Bioinformatics?
Bio – molecular biology
Informatics – computer science
Bioinformatics – solving problems arising from
biology using methodology from computer
science.


Bioinformatics vs. Computational Biology
Bioinformatik (in German): Biology-based computer
science as well as bioinformatics (in English)
6
Molecular Biology: Flow of
Information
DNA
RNA
Protein
Function
ACTGG
AAGCT
T
A
TC
DNA
Phe Cys
Cys
Protein
7
DNA (Gene)
Control
statement
RNA
TATA
start
Protein
Termination
stop
Control
statement
Gene
Ribosome
binding
5’ utr
Transcription (RNA polymerase)
mRNA
3’ utr
Translation (Ribosome)
Protein
8
Nucleotide and Protein Sequence
DNA (Nucleotide) Sequence
SQ sequence 1344 BP; 291 A; C; 401 G; 278 T; 0 other
aacctgcgga aggatcatta gcgggcccgc cgcttgtcgg cgcttgtcgg
ccgagtgcgg gtcctttggg ccgccggggg ggcgcctctg ccccccgggc
cccaacctcc catccgtgtc ccccccgggc ccgtgcccgc cggagacccc
tattgtaccc tgttgcttcg aacctgcgga aggatcatta ctgtctgaaa
gcgggcccgc cgcttgtcgg ccgagtgcgg gtcctttggg tgagttgatt
ccgccggggg ggcgcctctg cccaacctcc catccgtgtc agttaaaact
ccccccgggc ccgtgcccgc tattgtaccc tgttgcttcg gatctcttgg
cggagacccc aacacgaaca gcgggcccgc cgcttgtcgg ccgagtgcgg
ctgtctgaaa gcgtgcagtc agttaaaact ttcaacaatg cccaacctcc
tgagttgatt gaatgcaatc gatctcttgg ttccggctgc tattgtaccc
agttaaaact ttcaacaatg tattgtaccc tgttgcttcg gcgggcccgc
gatctcttgg ttccggctgc gcgggcccgc cgcttgtcgg ccgccggggg
tattgtaccc tgttgcttcg ccgccggggg ggcgcctctg agttaaaact
gcgggcccgc cgcttgtcgg ccccccgggc ccgtgcccgc gatctcttgg
ccgccggggg ggcgcctctg cggagacccc tgttgcttcg tattgtaccc
ccccccgggc ccgtgcccgc gcgggcccgc cgcttgtcgg gcgggcccgc
cggagacccc tgttgcttcg ccgccggggg cggagacccc ccgccggggg
gcgggcccgc cgcttgtcgg gcgggcccgc cgcttgtcgg ccccccgggc
ccgccggggg cggagacccc ccgccggggg ggcgcctctg cggagacccc
ccgccggggg
ccgtgcccgc
aacacgaaca
gcgtgcagtc
gaatgcaatc
ttcaacaatg
aacctgcgga
gtcctttggg
catccgtgtc
tgttgcttcg
cgcttgtcgg
ggcgcctctg
ttcaacaatg
ttccggctgc
tgttgcttcg
cgcttgtcgg
ggcgcctctg
ccgtgcccgc
tgttgcttcg
Protein (Amino Acid) Sequence
CG2B_MARGL Length: 388 April 2, 1997 14:55 Type: P Check:
9613 .. 1
MLNGENVDSR
ARNNLQAGAK
EKAKPQSPEP
NPQLCSEFVN
SILIDWLVQV
KLQLVGVTSM
RSMECNILRR
AKYLMELTLP
GTTLVHYSAY
YSSAKFMNVS
IMGKVATRAS
KELVKAKRGM
MDMSEINSAL
DIYQYMRKLE
HLRFHLLQET
LIAAKYEEMY
LDFSLGKPLC
EYAFVPYDPS
SEDHLMPIVQ
TISALTSSTV
SKGVKSTLGT RGALENISNV
TKSKATSSLQ SVMGLNVEPM
EAFSQNLLEG VEDIDKNDFD
REFKVRTDYM TIQEITERMR
LFLTIQILDR YLEVQPVSKN
PPEIGDFVYI TDNAYTKAQI
IHFLRRNSKA GGVDGQKHTM
EIAAAALCLS SKILEPDMEW
KMALVLKNAP TAKFQAVRKK
MDLADQMC
9
Some Facts





1014 cells in the human body.
3  109 letters in the DNA code in every cell in
your body.
DNA differs between humans by 0.2% (1 in 500
bases).
Human DNA is 98% identical to that of
chimpanzees.
97% of DNA in the human genome has no known
function.
10
Topics in Bioinformatics
Sequence analysis
 Sequence alignment
 Structure and function prediction
 Gene finding
Structure analysis
 Protein structure comparison
 Protein structure prediction
 RNA structure modeling
Expression analysis
 Gene expression analysis
 Gene clustering
Pathway analysis
 Metabolic pathway
 Regulatory networks
11
Extension of Bioinformatics
Concept

Genomics
 Functional genomics
 Structural genomics



Proteomics: large scale
analysis of the proteins of
an organism
Pharmacogenomics:
developing new drugs that
will target a particular
disease
Microarray: DNA chip,
protein chip
12
Applications of Bioinformatics





Drug design
Identification of genetic risk factors
Gene therapy
Genetic modification of food crops and animals
Biological warfare, crime etc.

Personal Medicine?
 E-Doctor?
13
Bioinformatics as Information
Technology
GenBank
SWISS-PROT
Database
Information
Retrieval
Hardware
Supercomputing
Biomedical text analysis
Bioinformatics
Algorithm
Agent
Information filtering
Monitoring agent
Sequence alignment
Machine
Learning
Clustering
Rule discovery
Pattern recognition
14
Background of Bioinformatics

Biological information infra
 Biological information management systems
 Analysis software tools
 Communication networks for biological research

Massive biological databases
 DNA/RNA sequences
 Protein sequences
 Genetic map linkage data
 Biochemical reactions and pathways

Need to integrate these resources to model biological
reality and exploit the biological knowledge that is being
gathered.
15
Areas and Workflow of
Bioinformatics
AGCTAGTTCAGTACA
TGGATCCATAAGGTA
CTCAGTCATTACTGC
AGGTCACTTACGATA
TCAGTCGATCACTAG
CTGACTTACGAGAGT
Microarray (Biochip)
Structural
Genomics
Functional
Proteomics
Genomics
Pharmacogenomics
Infrastructure of Bioinformatics
16
DNA Chip Data Analysis:
IT for BT
17
cDNA Microarray
Excitation
Laser 2
Scanning
Laser 1
PCR product amplification
purification
cDNA clones
(probes)
mRNA target
Emission
Printing
Overlay images and normalize
0.1nl/spot
Hybridize target
to microarray
Microarray
Analysis
18
The Complete Microarray
Bioinformatics Solution
Databases
Data
Management
Cluster
Analysis
Statistical
Analysis
Data
Mining
Image
Processing
Automation
19
DNA Chip Applications

Gene discovery: gene/mutated gene
Growth, behavior, homeostasis …

Disease diagnosis
Cancer classification

Drug discovery: Pharmacogenomics
 Toxicological research: Toxicogenomics
20
Disease Diagnosis:
Cancer Classification with DNA Microarray
cDNA microarray data of 6567
gene expression levels [Khan ’01].
-
Filter genes that are correlated to
the classification of cancer using
PCA and ANN learning.
-
Hierarchical clustering of the DNA
chip samples based on the filtered 96
genes.
-
Disease diagnosis based on DNA
chip.
-
[Fig.] Flowchart of the experimental
procedure.
21
Disease Diagnosis:
Hierarchical Clustering Based on Gene Expression Levels
Hierarchical clustering of
cancer by 96 gene expression
levels.
-
- The
relation between gene
expression and cancer
category.
Four cancer diagnostic
categories
-
[Fig.] The dendrogram of four
cancer clusters and gene expression
levels (row: genes, column:
22
AI Methods for DNA Chip Data
Analysis

Classification and prediction
ANNs, support vector machines, etc.
Disease diagnosis

Cluster analysis
Hierarchical clustering, probabilistic clustering, etc.
Functional genomics

Genetic network analysis
Differential models, relevance networks, Bayesian
networks, etc.
Functional genomics, drug design, etc.
23
Cluster Analysis
[Gene Cluster
1]
[Gene Cluster
2]
[Gene Cluster
3]
[DNA microarray dataset]
[Gene Cluster
24
Methods for Cluster Analysis






Hierarchical clustering [Eisen ’98]
Self-organizing maps [Tamayo ’99]
Bayesian clustering [Barash ’01]
Probabilistic clustering using latent variables
[Shin ’00]
Non-negative matrix factorization [Shin ’00]
Generative topographic mapping [Shin ’00]
25
Clustering of Cell Cycle-regulated
Genes in S. cerevisiae (the
Yeast)

Identify cell cycle-regulated
genes by cluster analysis.
 104 genes are already known to
be cell-cycle regulated.
 Known genes are clustered into
6 clusters.


Cluster 104 known genes and
other genes together.
The same cluster 
similar functional
categories.
[Fig.] 104 known gene expression
levels according to the cell cycle
(row: time step, column: gene).
26
Probabilistic Clustering Using
Latent Variables
gi: ith gene
zk: kth cluster
tj: jth time step
p(gi|zk): generating probability
of ith gene given kth cluster
vk=p(t|zk): prototype of kth
cluster
p (g i  z k )  p ( z k | g i ) 
p (g i | z k ) p ( z k ) similarity (x , v ) 
j xijvkj
i
k
p (g i )
f (g, t , z )   g ij  log( p( zk ) p(g i | zk ) p(t j | zk )) : (*) objective function
i
j
k
(maximized by EM)
27
Experimental Result:
Identify Cell Cycle-Regulated Genes

Clustering result
[Table] Clustering result with -factor arrest data. In 4 clusters, the genes, that
have high probability of being cell cycle-regulated, were found.
28
Experimental Result:
Prototype Expression Levels of Found Clusters
• The genes in the same
cluster show similar
expression patterns during
the cell cycle.
• The genes with similar
expression patterns are
likely to have correlated
functions.
[Fig.] Prototype expression levels of
genes found to be cell cycleregulated (4 clusters).
29
Clustering Using Non-negative
Matrix Factorization (NMF)

NMF (non-negative matrix factorization)
G  WH

NMF as a latent variable model
r
(G ) i  ( WH ) i   Wia H a
h1
a 1
G : gene expression data matrix
h2
…
hr
W
W : basis matrix (prototypes)
H : encoding matrix (in low
dimension)
…
g1
g2
 g  Wh
gn
Gi , Wia , H a  0
30
Experimental Result:
Five Clusters Found by NMF

5 prototype expression levels during the cell cycle.
0.18
Expression level
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Time step in cell cycle
31
Clustering Using Generative
Topographic Mapping (GTM)
• GTM: a nonlinear, parametric mapping y(x;W)
from a latent space to a data space.
Grid
t3
Generation
y(x;W): mapping
x2
t2
Visualization
x1
<Latent space>
t1
<Data space>
32
Experimental Result:
Clusters Found by GTM

Three cell cycle-regulated clusters found by GTM
Cluster center
S/G2
No. of train
Data/ no. in
cluster
Correct no. / test
data
Overall mean expression
levels (Cln/b) of known
genes
5 /
1 / 2
(.148 .184 -.367 -.044)
S
(0.111 –0.333)
5 / 5
5 / 5 (100%)
(1.075 1.482 -.233 -.375)
M/G1 c1
c2
c3
(0.111 0.333)
(-0.111 –0.111)
(0.323 0.1)
13 / 7
/ 2
/ 2
1 / 6
0 / 6
0 / 6
(-.171 -.573 .091 .311)
G2/M c1
c2
(0.111 0.333)
(0.111 0.111)
10 / 5
/ 3
0 / 5
3 / 5 (80%)
(-.616 –1.01 1.832 1.596)
G1
(-0.111 0.333)
(-0.111 0.111)
35 / 18
/ 7
10 / 16 (62%)
0 / 16
(.894 .907 -.766 -.479)
c1
c2
33
Experimental Result:
Comparison with other methods

Comparison of prototype expression levels
No. of
selected
genes
Mean expression
levels by GTM
No. of selected
genes by
Spellman
Mean expression
levels by Spellman
S/G2
92
(.13 -.06 -.1 .01)
121
(.13 .05 -.16 .03)
S
25
(.84 .81 -.42 -.33)
71
(.46 .47 -.43 -.18)
M/G1 c1
c2
c3
120
34
10
(.82 .65 -.65 -.38)
(-.04 -.37 -.01 -.11)
(.32 .29 -.3 .05)
113
(-.21 -.61 -.04 .07)
G2/M c1
c2
33
60
(-.59 -.96 1.34 1.29)
(.08 -.30 .51 .57)
195
(-.32 -.62 .49 .54)
G1
122
74
(total = 570)
(.92 .74 -.62 -.33)
(.79 .82 -.48 -.34)
300
(.66 .49 -.55 -.33)
c1
c2
(total = 800)
34
Genetic Network Analysis
Discover the complex regulatory
interaction among genes.
-
Disease diagnosis, pharmacogenomics
and toxicogenomics
-
-
Boolean networks
-
Differential equations
-
Relevance networks [Butte ’97]
Bayesian networks [Friedman ’00]
[Hwang ’00]
-
[Fig.] Basin of attraction of 12-gene
Boolean genetic network model
[Somogyi ’96].
35
Bayesian Networks

Represent the joint probability distribution among
random variables efficiently using the concept of
conditional independence.
A
An edge denotes the possibility of the
causal relationship between nodes.
B
C
D
•A, C and D are independent given B.
•C asserts dependency between A and B.
•A, B and E are independent given C.
E
P( A, B, C , D, E )
 P( A) P( B | A) P(C | A, B) P( D | A, B, C ) P( E | A, B, C , D) (by chain rule)
 P( A) P( B) P(C | A, B) P( D | B) P( E | C ) (by the example Bayes net)
36
Bayesian Networks Learning

Dependence analysis [Margaritis ’00]
Mutual information and 2 test

Score-based search
p ( D, S )  p ( S ) p ( D | S )
 p( S )  i 1  j 1
n
qi
( ij )
( ij  N ij )

ri
k 1
( ijk  N ijk )
( ijk )
• D: data, S: Bayesian network structure
NP-hard problem
Greedy search
Heuristics to find good massive network structures
quickly (local to global search algorithm)
37
The Small Bayesian Network for
Classification of Cancer
Zyxin
Leukemia
class
LTC4S
C-myb
MB-1
•The Bayesian network was learned by full search
using BD (Bayesian Dirichlet) score with
uninformative prior [Heckerman ’95] from the
DNA microarray data for cancer classification
(http://waldo.wi.mit.edu/MPR/).
[Table] Comparison of the classification performance
with other methods [Hwang ’00].
Training error
Test error
Bayes nets
0/38
2/34
Neural trees
0/38
1/34
RBF networks
0/38
1.3/34
38
Large-Scale Bayesian Network
with 1171 Genes
Genetic networks for
understanding the regulatory
interaction among genes and
their derivatives
-
Pharmacogenomics and
Toxicogenomics
-
[Fig.] The Bayesian network
structure constructed from DNA
microarray data for cancer
classification (partial view).
39
DNA Computing: BT for IT
40
DNA Computing: BioMolecules
as Computer
011001101010001
ATGCTCGAAGCT
41
Why DNA Computing?
6.022  1023 molecules / mole
 Immense, brute force search of all possibilities

Desktop: 109 operations / sec
Supercomputer: 1012 operations / sec
1 mol of DNA: 1026 reactions

Favorable energetics: Gibb’s free energy
G  8kcal mol -1
 1 J for 2  1019 operations
 Storage capacity: 1 bit per cubic nanometer
42
Flow of DNA Computing
Encoding
HPP
Node 0: ACG Node 3: TAA
Node 1: CGA Node 4: ATG
Node 2: GCA Node 5: TGC
Node 6: CGT
4
3
ATG ...
... ...
CGA
ACG GCA
...
...
...
...
TAA
...
... ...
... CGT... TGC
1
0
6
2
...
Ligation
ACGCGAGCATAAATGTGCACGCGT
...
...
...
...
ACGCGAGCATAAATGCGATGCACGCGT
... CGACGTAGCCGT
...
CGACGT
...
ACGCGAGCATAAATGTGCCGT
ACGGCATAAATGTGCACGCGT
...
PCR
(Polymerase
Chain
Reaction)
Decoding
4
Affinity Column
1
0
5
... ACGCGTAGCCGT
ACGCGAGCATAAATGTGCCGT
6
2
Gel Electrophoresis
ACGCGAGCATAAATGCGATGCCGT
Solution
3
TAAACGGCAACG
ACGCGAGCATAAATGTGCCGT
5
...
TAAACG ...
...
ATGTGCTAACGAACG
...
...
...
ACGCGAGCATAAATGTGCACGCGT...
...
ACGCGAGCATAAATGTGCCGT
...
...
...
... ACGCGT
ACGCGAGCATAAATGCGATGCACGCGT
43
Biointelligence on a Chip?
Bioinformation
Technology
Biological
Computer
Information
Technology
Biointelligence
Chip
Computing Models:
The limit of conventional
computing models
Computing Devices:
The limit of silicone
semiconductor technology
Biotechnology
Molecular
Electronics
44
Intelligent Biomolecular
Information Processing
Theoretical Models
InputInput
A
A Controller
GFP
Cytochrome c
Output
Reaction
Chamber
(Calculating)
S
Bio-Memory
Bio-Processor
Biocomputing
45
Evolvable Biomolecular
Hardware

Sequence programmable and evolvable molecular systems have been
constructed as cell-free chemical systems using biomolecules such as
DNA and proteins.
46
DNA Computers vs. Conventional
Computers
DNA-based computers
Microchip-based computers
slow at individual operations
fast at individual operations
can do billions of operations
simultaneously
can do substantially fewer
operations simultaneously
can provide huge memory in small
space
smaller memory
setting up a problem may involve
considerable preparations
setting up only requires keyboard
input
DNA is sensitive to chemical
deterioration
electronic data are vulnerable but
can be backed up easily
47
Molecular Operators for DNA
Computing
• Hybridization: complementary pairing of two single-
stranded polynucleotides
5’- AGCATCCA –3’
+
3’- TCGTAGGT –5’
5’- AGCATCCA –3’
3’- TGCTAGGT –5’
• Ligation: attaching sticky ends
ATGCATGC
TACG
+
TGAC
TACGACTG
to a blunt-ended molecule
ATGCATGCTGAC
TACGTACGTGAC
sticky end
48
Research Groups





MIT, Caltech, Princeton University, Bell Labs
EMCC (European Molecular Computing
Consortium) is composed of national groups from
11 European countries
BioMIP Institute (BioMolecular Information
Processing) at the German National Research
Center for Information Technology (GMD)
Molecular Computer Project (MCP) in Japan
Leiden Center for Natural Computation (LCNC)
49
Applications of Biomolecular
Computing








Massively parallel problem solving
Combinatorial optimization
Molecular nano-memory with fast associative search
AI problem solving
Medical diagnosis
Cryptography
Drug discovery
Further impact in biology and medicine:
 Wet biological data bases
 Processing of DNA labeled with digital data
 Sequence comparison
 Fingerprinting
50
NACST
(Nucleotide Acid Computing Simulation Toolkit)
DNA Sequence Generator
DNA Sequence Optimizer
Genetic Algorithm
GUI
NACST Engine
Controller
Ligation Unit
PCR Unit
Electrophoresis Unit
Affinity Column Unit
Enzyme Unit
51
NACST
Inputs
Outputs
52
Combinatorial Problem Solver
TSP (Traveling Salesman Problem)
3
4
3
1
7
AGCT TAGG
0 3
P1A P1B
11
5
3
3
3
2
9
11
6
3
5
1
3
01234560
7
ATCC ATCA TACC
P1B W12 P2A
2
ATGG CATG
P2A
P2B
3
ATCC GCCT GCTA
P1B W13 P3A
Representations
3
CGAT CGAA
P3A P3B
53
Combinatorial Problem Solver

Weight representation
methods
1.
2.
3.
Molecules with high G-C
content tend to hybridize
easily.
Molecules with high G-C
content tend to be
denatured at higher
temperature.
Molecules with larger
population in tube will
have more probability to
hybridize.
Hybridization/Ligation
PCR/Gel electrophoresis
Affinity chromatography
PCR/Gel electrophoresis
Temperature Gradient
Gel Electrophoresis
Graduate PCR
54
Experimental Results for 4-TSP
Ligation result
Hybridization (37°C)
Ligation (16 °C 15hr)
PCR (36 cycle)
Gel electrophoresis
(10% polyacrylamide gel)
50 bp marker
Final PCR
result
(140bp)
Oligomer mixture
55
Molecular Theorem Prover

Resolution refutation method

Problem under
P  Q  R P S  T  Q S
consideration:
P  Q  R, S  T  Q, S , T , P Q  R
T  Q
T R
R  true ?


Turn A  B
into A  B , add R as
R
 P   Q  R , S  T  Q
S , T , P , R
Q
R
R is true!
nil
56
Molecular Theorem Prover
(Abstract Implementation)

Implementation 1
¬S ¬T Q
S
T
P
¬R
¬Q ¬P R
P
¬S ¬T Q
¬Q ¬P R
S
T
Implementation 2
¬Q
¬T ¬R
¬P
¬S
R
Q T

P
S
¬R
¬S ¬T Q P ¬R
S T ¬Q ¬P R
57
Molecular Theorem Prover
(Experiments for Method 1)
20 bp DMA marker (Talara)

실험 과정

Mixture Reaction
실험 결과
I. 각 분자들을 혼합
100pmol/each  Total 20 ul
1
2
3
4
5
6
II. Denaturation
( 95°C 10 min)
200 bp
III. Annealing
95°C 1 min  15 °C : 1°C down/min
20 bp
IV. Polyacrylamide gel Electrophoresis(20%)
( PAGE )
V. Detection of solution
: 75bp ds DNA
58
Solving Logic Problems by
Molecular Computing

Satisfiability Problem
Find Boolean values for
variables that make the given
formula true

3-SAT Problem
Every NP problems can be
seen as the search for a
solution that simultaneously
satisfies a number of logical
clauses, each composed of
three variables.
( x1  x3  x4 )  ( x4 )  ( x2  x3 )
( x1 or x2 or x3 ) AND ( x4 or x5 or x6 )
( x1 or x2 or x3 ) AND ( x1 or x2 or x3 )
59
DNA Computing with DNA Chips
DNA Chips for DNA Computing
I. Make: oligomer synthesis
II. Attach (Immobilized):
5’HS-C6-T15-CCTTvvvvvvvvTTCG-3’
III. Mark: hybridization
IV. Destroy: Enzyme rxn (ex.EcoRI)
V. Unmark
* 문제를 만족시키지 않는 모든 strand
제거
VI. Readout:
N cycle의 마지막 단계에 해가 남게 되
면, PCR로 증폭하여 확인!
61
Variable Sequences and the
Encoding Scheme
62
Tree-dimensional Plot and
Histogram of the Fluorescence








S3: w=0, x=0, y=1, z=1
S7: w=0, x=1, y=1, z=1
S8: w=1, x=0, y=0, z=0
S9 : w=1, x=0, y=0, z=1
y=1:
(w V x V y)
z=1:
(w V y V z)
x=0 or y=1: (x V y)
w=0:
(w V y)
만족
만족
만족
만족

Four spots with high fluorescence
intensity correspond to the four
expected solutions.

DNA sequences identified in the
readout step via addressed array
hybridization.
63
Outlook

IT gets a growing importance in the advancement
of BT.
Bioinformatics
DNA Microarray Data Mining

IT can benefit much from BT.
Biocomputing and Biochips
DNA Computing (with DNA Chips)

Bioinformation technology (BIT) is essential as a
next-generation information technology.
In Silico Biology vs. In Vivo Computing
64
References






[Barash ’01] Barash, Y. and Friedman, N., Context-specific Bayesian
clustering for gene expression data, Proc. of RECOMB’01, 2001.
[Butte ’97] Butte, A.J. et al., Discovering functional relationships
between RNA expression and chemotherapeutic susceptibility using
relevance networks, Proc. Natl Acad. Sci. USA, 94, 1997.
[Eisen ’98] Eisen, M.B. et al., Cluster analysis and display of genomewide expression patterns, Proc. Natl Acad. Sci. USA, 95, 1998.
[Friedman ’00] Friedman, N. et al, Using Bayesian networks to
analyze expression data, Proc. of RECOMB’00, 2000.
[Heckerman ’95] Heckerman, D. et al., Learning Bayesian networks:
the combination of knowledge and statistical data, Machine Learning,
20(3), 1995.
[Hwang ’00] Hwang, K.-B. et al., Applying machine learning
techniques to analysis of gene expression data: cancer diagnosis,
CAMDA’00, 2000.
65
References





[Khan ’01] Khan, J. et al., Classification and diagnostic prediction of
cancers using gene expression profiling and artificial neural networks,
Nature Medicine, 7(6), 2001.
[Margaritis ’00] Margaritis, D. and Thrun, S., Bayesian network
induction via local neighborhoods, Proc. of NIPS’00, 2000.
[Shin ’00] Shin, H.-J. et al., Probabilistic models for clustering cell
cycle-regulated genes in the yeast, CAMDA’00, 2000.
[Somogyi ’96] Somogyi, R. and Sniegoski, C.A., Modeling the
complexity of genetic networks: understanding multigenic and
pleiotropic regulation, Complexity, 1(6), 1996.
[Tamayo ’99] Tamayo, P. et al., Interpreting patterns of gene
expression with self-organizing maps: methods and application to
hematopoietic differentiation, Proc. Natl Acad. Sci. USA, 96, 1999.
66
More information at
http://cbit.snu.ac.kr/
http://bi.snu.ac.kr/
67
Download