Word file (103 KB )

advertisement
F07811A_ST3_Footnote_1
Footnote to Supplementary Table 3: Probability Calculation
To determine the probability of finding 3 to 6 chromosomally consecutive loci
expressed as proteins in any given life cycle stage, the following statistical analysis was
applied. For any number of consecutive genes (m) and for any number of total genes in a
chromosome (n), the probability of finding m consecutive genes by chance alone is the
number of possible favorable cases divided by the binomial coefficient which describes
the number of ways of picking x unordered outcomes from X possibilities. The
probability of finding m consecutive genes then simplifies to:
The absolute frequency of events in the experimental analysis is the number of times a
cluster encoding co-expressed protein appears in any given chromosome. To compare
the probability of finding a cluster by chance to the absolute frequency, the probability of
finding a cluster for any given chromosome is multiplied by the number of protein
identifications detected in any given chromosome. Lastly, the χ2 value for each
chromosome was calculated for the absolute frequency and absolute probability using 3
degrees of freedom (d.f.) for the number of life cycle stages independently analyzed
minus one. For a d.f. of 3, a χ2 value must be greater than 10.8 to reach the 99.9%
confidence interval. A table of the probability of finding a cluster of 3-6 loci encoding
co-expressed protein on any given chromosome, the frequency of events on each
chromosome, and the χ2 values is shown below. In every case, the χ2 values were greater
than the 99.90% confidence interval threshold indicating that the clusters detected and
identified in the analysis were non-random events and were true clusters of adjacent
genes encoding co-expressed proteins.
F07811A_ST3_Footnote_2
Probability of finding three to six consecutive genes in a row
Total Number
Of Genes per
Absolute
Absolute
Absolute
Absolute
Chromosome
P3
Frequency
P4
Frequency
P5
Frequency
P6
Frequency
Chr-1
143
3.0E-04
0.018
8.4E-06
0.00052
3.0E-07
1.9E-05
1.3E-08
8.0E-07
Chr-2
223
1.2E-04
0.010
2.2E-06
0.00019
5.0E-08
4.2E-06
1.4E-09
1.2E-07
Chr-3
239
1.1E-04
0.012
1.8E-06
0.00020
3.8E-08
4.3E-06
9.6E-10
1.1E-07
Chr-4
237
1.1E-04
0.012
1.8E-06
0.00020
3.9E-08
4.2E-06
1.0E-09
1.1E-07
Chr-5
312
6.2E-05
0.0095
8.0E-07
0.00012
1.3E-08
2.0E-06
2.5E-10
3.8E-08
Chr-6
312
6.2E-05
0.0087
8.0E-07
0.00012
1.3E-08
1.8E-06
2.5E-10
3.5E-08
Chr-7
277
7.9E-05
0.010
1.1E-06
0.00015
2.1E-08
2.7E-06
4.6E-10
6.0E-08
Chr-8
295
6.9E-05
0.010
9.4E-07
0.00014
1.6E-08
2.3E-06
3.3E-10
4.8E-08
Chr-9
365
4.5E-05
0.0077
5.0E-07
8.5E-05
6.9E-09
1.2E-06
1.1E-10
1.9E-08
Chr-10
403
3.7E-05
0.0063
3.7E-07
6.2E-05
4.6E-09
7.8E-07
6.9E-11
1.2E-08
Chr-11
503
2.4E-05
0.0054
1.9E-07
4.3E-05
1.9E-09
4.3E-07
2.3E-11
5.2E-09
Chr-12
526
2.2E-05
0.0054
1.7E-07
4.1E-05
1.6E-09
3.9E-07
1.8E-11
4.5E-09
Chr-13
672
1.3E-05
0.0041
7.9E-08
2.5E-05
5.9E-10
1.8E-07
5.3E-12
1.6E-09
Chr-14
769
1.0E-05
0.0036
5.3E-08
1.8E-05
3.5E-10
1.2E-07
2.7E-12
9.6E-10
All values rounded to two significant digits
Total Number Number of Chromosomal Clusters Encoding
of Identifications
Co-expressed Proteins Identified
from all 4 Stages 3 in a row 4 in a row 5 in a row 6 in row
Chr-1
62
2
Chr-2
85
2
Chr-3
114
6
4
Chr-4
108
2
1
Chr-5
153
3
4
945
131059
Chr-6
140
7
1
5646
8950
Chr-7
131
9
1
7860
6685
Chr-8
145
5
2
2482
29205
Chr-9
170
6
3
4677
106379
Chr-10
169
6
3
5739
144145
Chr-11
226
10
3
18601
209903
Chr-12
249
15
4
1
41558
387415
2532778
Chr-13
309
8
4
1
15549
651795
5450707
Chr-14
353
17
1
1
80551
53466
8191353
98
32
5
totals
1
Chi Squared Calculations
3 in a row 4 in a row 5 in a row 6 in row
214
1922
384
total number of clusters
138
2981
1
2
1
1
3
78828
341
237320 36864104
426450
193971997
Download