TPJ_4586_sm_DataS2

advertisement

Supplemental Method 2 – Significance of loci

To identify enriched properties of sRNA loci using the χ 2

test , a proper background set has to be established first. For example, the strand bias of sRNA loci (defined as the relative proportion of sRNAs from the two strands) shows different distributions for loci at the low and high end of loci abundances. We employed an entropy measure to identify a suitable background set, used for all enrichment tests, as described here.

To test for enrichment of properties (e.g. sRNA strand bias), it is crucial to identify a threshold of locus abundance (where abundance is the sum of the expression values in the time series) above which the properties is not influenced significantly by random events due to the low number of discrete measurements (sRNA sequence reads). As an example, a locus with a ten sRNA reads coming from the same strand would show a strand bias of 1 in our analysis but a conclusion that all sRNAs in this locus derive from the same strand would be unreliable since only a small number of sequences were captured. To overcome this problem, we used the Kulback-Leibler divergence measure to identify an abundance threshold and discarded sRNA loci with abundance below the threshold before analysing enrichment of properties of sRNA loci.

Figure 1. Strand bias (y-axis) vs index of sRNA loci (x-axis) for loci with abundances 1-5, 5-

10, and 10-15

To choose a background set containing loci with strand-biases approximately uniformly distributed, we selected those loci with abundance above a certain threshold that was determined by, for each abundance A , comparing the random uniform distribution with the strand-bias distribution using the Kullback-Leibler divergence measure ( Kullback &

Leibler, 1951)

D

A

=

 i

P(i) log

P(i)

Q(i)

, where, for

N= 100

, P(i) is the number of loci with abundance A and strand-bias in the interval

[ i

N

, i+ 1

N

]

divided by the total number of loci with abundance A, and Q(i) =

1

(see

N

Figure 2). This gave abundance threshold 65 corresponding to the abundance where the minimum of the curve in Figure 2 occurs.

Figure 2. The Kullback-Leibler divergence

(x-axis)

D

A

(y-axis) for abundance A between 0 and 250

Kullback, S. and Leibler, R.A.

(1951) On Information and Sufficiency. Annals of

Mathematical Statistics 22 : 79–86.

Download