Discovery of Structural and Functional Features in RNA Pseudoknots Adviser: Yu-Chiang Li

advertisement
Discovery of Structural and
Functional Features
in RNA Pseudoknots
Qingfeng Chen and Yi-Ping Phoebe Chen, Senior Member, IEEE
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 21, NO. 7, JULY 2009
Adviser: Yu-Chiang Li
Speaker: Shao-Hsiang Hung
Date:2009/12/10
1
Outline




Introduction
Material and Methods
Results
Conclusion and Discussion
2
I. Introduction
3
I. Introduction(1/6)


Accurately predicting the functions of biological
macromolecules is one of the biggest challenges
in functional genomics.
RNA molecules play a central role in a number of
biological functions within cells, from the
transfer of genetic information from DNA to
protein, to enzymatic catalysis.
4
I. Introduction(2/6)

To fulfill this range of functions, a simple
linear nucleotide string of RNA including:

uracil, guanine, cytosine, adenine,


forms a variety of complex three-dimensional
structures.
pseudoknot


an RNA structure
base pairing between a loop

formed by an orthodox secondary structure
5
I. Introduction(3/6)
6
I. Introduction(4/6)

PseudoBase is the only online database
containing:


Structural, functional, and sequence data of
RNA pseudoknots
Unfortunately, the analysis of this valuable
data set is underdeveloped


Difficulty in modeling
Complexity in computing structural
7
I. Introduction(5/6)

Association rule mining has been successfully
used to discover valuable information in a larger
data set.

Limitations with multivalued variables



Categorical multivalued valuables (such as color {red, blue,
green})
Quantitative multivalued variables (such as weight {[40, 50],
[50, 75]})
The relationships are captured by using
 Conditional probability matrix
MY | X
8
I. Introduction(6/6)

We develop a framework to identify potential top-k
covering rule groups in RNA pseudoknots

Relationships






Structure-function
Structure-category
Significant ratios of stems and loops.
Allows users to regulate k and the minsupp threshold and
compare between rules in the same group.
Handling high dimensional data
Enhances the understanding of structure-function relationships
9
II. Material and Methods
10
II. Material and Methods (1/20)

Pseudoknot Data.

S1, S2, L1, L2, and L3


A, G, C, and U


adenine, guanine, cytosine, and uracil,
vr, vt, vf, v3, v5, vo, rr, mr, tm, ri, ap, ot, and ar


stem 1, stem 2, loop 1, loop 2, and loop 3
viral ribosomal readthrough signals, viral tRNA-like
structures , viral ribosomal frameshifting signals, other viral
30-UTR, other viral 50-UTR ,viral others, rRNA, mRNA,
tmRNA, Ribozymes, Aptamers, artifical molecules, others
ss, tc and fs

self-splicing, translation control, and viral frameshifting
11
II. Material and Methods (2/20)





Let X and Y be multivalued attribute valuables
x and y be items
p(X)
p(Y|X)
minsupp be the minimum support in the
context
12
II. Material and Methods (3/20)

The data here is collected from
PseudoBase



Organism
RNA type
Bracket view of structure

Classified by two stems and three loops


Nucleotide squence
Size
13
II. Material and Methods (4/20)

A data set consisting of 225 Hpseudoknots is obtained
14
II. Material and Methods (5/20)
15
II. Material and Methods (6/20)

Partition of Attributes


{class, function, stem, loop, base, ratio,
length}
the last one is a quantitative attribute.

Propose a novel partition in conjunction with the
properties of pseudoknot data and top-k rule
groups.
16
II. Material and Methods (7/20)
17
II. Material and Methods (8/20)

The domain of quantitative attribute has
to be partitioned into intervals
1)
2)

The number of intervals
The size of each intervals
For example

(14,15] included in stem 1, stem 2, and loop
1 but not in loop 3
18
II. Material and Methods (9/20)

Definition 1.



a quantitative attribute y divided into a set of
intervals {y1, . . . , yn} using the categorical item xi
such that for any base interval yj, yj consists of a
single value for 1 ≦ j ≦ n.
The partition using xi is defined as {(y1i,
max(y2i)]; . . . ; (max(ym1i), max(ymi)]}.
Table 2 presents the distribution of sizes of stem 1
and stem 2 of pseudoknots in PseudoBase.
19
II. Material and Methods (10/20)


Definition 1.
For example

Y1 = {0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8],
(8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15],
(15, 16], (16, 17], (17, 18], (18, 19], (19, 20], (20, 21], (21, 22]}
20
II. Material and Methods (11/20)

Denfinion 2.


Suppose Yi = {y1i, . . . , ymi} and Yi+1 =
{y1i+1, . . . , yni+1} are two adjacent partitions.
Let Y =ψ.
The integration of them is defined as
21
II. Material and Methods (12/20)


Denfinion 2.
For example


stem 1 as Y1 ={0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5],
(5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12],
(12, 13], (13, 14], (14, 15], (15, 16], (16, 17], (17,
18], (18, 19], (19, 20], (20, 21], (21, 22]}.
stem 2 as Y2 ={0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5],
(5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12],
(12, 13], (13, 14], (14, 15], . . . , (31, 32], (32, 33]}
22
II. Material and Methods (13/20)

the integrated partition of Y1 and Y2

{0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6],
(6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11,
12], (12, 13], (13, 14], (14, 15], (15, 16], (16,
17], (17, 18], (18, 19], (19, 22], (22, 33]}.
23
II. Material and Methods (14/20)

In comparison, the values of ratio
attributes are positive real numbers rather
than integers.



|yi| = 1 in Definition 3.1 needs to be changed
to |yi| =1 or |yi| =0.5.
|x| =1 and |xc| =1 in Definition 3.2 are
changed to |x| =1 and |xc| =1 or |x| =0.5
and |xc| =0.5.
Avoid missing interesting knowledge.
24
II. Material and Methods (15/20)

Generation of rule groups.


Work out the conditional probabilities for X
and Y in the probability matrix below.
the conditional probability

Y = yi, given X = xi ,as p(yi|xi) = p(xi|yi) *
p(yi)/p(xi)
25
II. Material and Methods (16/20)

For example:
x,y as stem1,the size interval => (3,4] of stem1
By Table2, n = 255, p(255/255)=1
Addition Table2, (3,4] of stem1 with four nuleotides =
42
And p ( y  (3,4]  x  stem1)  42 / 255  0.19
So
p( y  (3,4] | x  stem1)
 p( y  (3,4]  x  stem1) p( x  stem1)
 0.19
26
II. Material and Methods (17/20)


Compute the entire conditional
probabilities of stem 1, namely [p(y1|
stem1) p(y2 | stem1) . . . p(yn | stem1)]
Stem 2, loop1, loop3 can computed
27
II. Material and Methods (18/20)

Suppose MY|X corresponding to an association AS
consists of a set of rows {r1, . . . , rn}.



A ={A1, . . . , Am} be the complete set of antecedent
items of AS
C = {C1, . . . , Ck} be the complete set of consequent
items of AS
Namely
PS ( x)  {( x, yj ) | yj  C , p( yj | x)  0}
28
II. Material and Methods (19/20)

Definition 3 (Rule group)


Let Gx  {x  Cj | ( x, Cj )}  PS ( x)}
be a rule group with an antecedent item x and
consequent support set C.
Definition 4

Let
Ri : X  Yi and Rj : X  Yj
1  k  k max
Ri is ranked high than Rj if p (Yi | X )  p (Yj | X )
29
II. Material and Methods (20/20)

For example
In Table 2
kmax = 21
top-1 covering rule group = {stem1→(2,3],
stem2→(5,6]}.
top-2 covering rule group = {stem1→(2,3],
stem1→(3,4], stem2→(5,6], stem2→(4,5]}.
30
III. Results
31
III. Results (1/4)
32
III. Results (2/4)
33
III. Results (3/4)
34
III. Results (4/4)
35
IV. Conclusion and Discussion
36
IV. Conclusion and Discussion (1/2)


If more rules are considered together, a
further understanding of pseudoknot’s
structure and function can be achieved.
This paper aims to analyze increasingly
available RNA pseudoknot data and
identifies interesting patterns from
PseudoBase.
37
IV. Conclusion and Discussion (2/2)


The obtained rule groups reveal the structural
properties of pseudoknots and imply potential
structurefunction and structure-class
relationships in RNA molecules.
Moreover, the interpretation of rules
demonstrates their significance in the sense of
biology.
38
Download