Supplementary Statistics

Supplementary Statistics 1) Statistical analysis of chromosome distribution in exponentially growing cells. Results in the text are for the 108 interactions that occurred in the untreated sample and were flanked by MspI restriction sites. Tests were also performed for all interactions (123); and interactions flanked by restriction sites but eliminating chromosome 10, where for the treated case a large number of interactions without flanking sites suggested possible registration problems. We note where these different interaction sets lead to different conclusions. Although chromosome 10 does not have any interactions for the untreated case, tests of how the interactions are distributed across the genome are affected by elminating chromosome 10. Chromosomes were concatenated and normalized to length 1; the positions of the interactions were converted to this scale. The Kolomgorov-Smirnov test for unifomity was used to detect any large scale differences from the null hypothesis that the positions of the interactions are uniformly distributed across the genome (ie large regions with an elevated rate of interactions). We fail to reject the null hypothesis (p=0.149). The Kolomgorov Smirnov test for an exponential distribution of inter-event distances, using the observed mean of inter-event distances, was used to test for an excess of long or short inter-event distances. We fail to reject the null hypothesis (p=0.265). This test was significant (p=0.006) when all interactions (i.e. MspI flanked and unflanked) were included, showing an excess of very short and very long distances. A Chi-Squared goodness of fit test was used to detect small scale clumping of the interactions. The Poisson process implies a Poisson distribution for the number of observations in an interval of fixed width. The genome was divided into 50 intervals of equal length. The null hypothesis for the test is that the number of interactions in each interval should be Poisson with mean 108/50=2.16. The test requires that, under the null hypothesis, the number of intervals with 0, 1, 2, 3, or ≥4 interactions exceeds 5, otherwise further combining of the categories is required. The division of the genome into 50 intervals was chosen as a convenient number of intervals that met this requirement. The observed and expected number of intervals, with the given number of interactions, are shown in the table below. Number of hits Expected no. of intervals with this many hits Observed no. of intervals with this many hits 0 1 2 3 >=4 5.77 12.46 13.45 9.69 8.64 12 7 16 5 10 The Chi-squared statistic (sum over cells of [(observed-expected)2/ expected]) is 12.09, to be compared to a Chi-squared distribution with 4 degrees of freedom. The pvalue is 0.0167, so the null hypothesis is rejected. We see there is an excess of intervals with zero hits, and of cells with >=4 hits. Thus, when examined on a fine scale, the genome has excess regions without interactions, and excess regions of dense interactions compared to a Poisson process. This test is not significant (p=0.734) when Chromosome 10 is eliminated, as many of the excess intervals with zero hits are eliminated. Disregarding the 16 interactions that overlap inter and intra genic regions, we consider the null hypothesis that the number of interactions within open reading frames is proportional to the amount of the genome within open reading frames, 0.72 [1]. Again, this would be the expectation if the positions of the interactions followed a Poisson process. This null hypothesis is rejected (p=7.7 x 10-6; z-test for single proportion, with null value 0.72, with continuity correction, as implemented in R function “prop.test”). If the 16 interactions on the gene borders are included as inter genic, ther result is merely suggestive (p=0.097). Confidence intervals for the two cases are (0.858-0.973) and (0.706, 0.865) respectively. In the case where all interactions are included, the result when the overlapping interactions are pooled with the intergenic ones is significant (p=0.028). 2) Statistical analysis of phenanthroline treated samples. The phenanthroline data set of 136 interactions with flanking restriction sites was analysed as for the exponential growth interaction set. In this case, the full set of interactions had 153 events, and the set eliminating chromosome 10 had 132 events, but none of the test conclusions changed. The Kolmogorov-Smirnov test of uniformity failed to reject the null hypothesis (p=0.52). The Kolmogorov-Smirnov test of exponential inter-event distances, using observed mean 80772 bp, was significant (p=0.001). There is an excess of very short distances. The Chi-Squared goodness of fit test (this time testing against a Poisson distribution with parameter as 136/80=1.70), was non significant (Chi-Squared statistic 6.11, p=0.19), indicating small scale clustering is not a major feature of this data set. The table of observed and expected values is given below. Number of hits Expected no. of intervals with this many hits Observed no. of intervals with this many hits 0 1 2 3 >=4 14.61 24.84 21.12 11.97 7.46 17 30 17 6 10 Disregarding the interactions that overlap inter and intra genic regions, a test of the null hypothesis that the proportion of hits that are intra genic = 0.72 gives p= 1.1 x 105 . Including the 13 gene border interactions in the inter-genic group, the p-value is 0.024 (z-test for single proportion, with null value 0.72, with continuity correction, as implemented in R function “prop.test”). The corresponding confidence intervals for the proportion of intragenic interactions are (0.832, 0.946) and (0.732,0.870). Note these intervals have substantial overlap with those obtained for the exponentially growing yeast, indicating the two conditions yield similar proportions of intra-genic interactions. Below are the results of an analysis which included interactions on chromosome 10 and those which were not directly flanked by MspI sites. There were only minor differences between these two analyses. 3) Statistical analysis of chromosome distribution in exponentially growing cells. Chromosomes were concatenated and normalized to length 1; the positions of the 123 interactions were converted to this scale. The Kolomgorov-Smirnov test was used to detect any large scale differences from the null hypothesis that the positions of the interactions are uniformly distributed across the genome (ie large regions with an elevated rate of interactions). We fail to reject the null hypothesis (p=0.19). A Chi-Squared goodness of fit test was used to detect small scale clumping of the interactions. The Poisson process implies a Poisson distribution for the number of observations in an interval of fixed width. The genome was divided into 80 intervals of equal length. The null hypothesis for the test is that the number of interactions in each interval should be Poisson with mean 123/80=1.54. The test requires that, under the null hypothesis, the number of intervals with 0, 1, 2, 3, or ≥4 interactions exceeds 5, otherwise further combining of the categories is required. The division of the genome into 80 intervals was chosen as a convenient number of intervals that met this requirement. The observed and expected number of intervals, with the given number of interactions, are shown in the table below. Number of hits Expected no. of intervals with this many hits Observed no. of intervals with this many hits 0 1 2 3 >=4 17.19 26.43 20.32 10.41 5.63 31 17 14 9 9 The Chi-squared statistic (sum over cells of [(observed-expected)2/ expected]) is 18.62, to be compared to a Chi-squared distribution with 4 degrees of freedom. The pvalue is 0.001, so the null hypothesis is rejected. We see there is an excess of intervals with zero hits, and of cells with >=4 hits. There is actually one cell on chromosome 12 that has 17 hits, which is exceedingly unlikely in a Poisson distribution with mean 1.54. Thus, when examined on a fine scale, the genome has excess regions without interactions, and excess regions of dense interactions compared to a Poisson process. Disregarding the 16 interactions that overlap inter and intra genic regions, we consider the null hypothesis that the number of interactions within open reading frames is proportional to the amount of the genome within open reading frames, 0.72 [6]. Again, this would be the expectation if the positions of the interactions followed a Poisson process. This null hypothesis is rejected (p=1.3 x 10-6; z-test for single proportion, with null value 0.72, with continuity correction, as implemented in R function “prop.test”). It is also rejected if the 16 interactions on the gene borders are included as inter genic (p=0.028). Confidence intervals for the two cases are (0.865, 0.971) and (0.731, 0.875) respectively. 4) Statistical analysis of phenanthroline treated samples. The phenanthroline data set was analysed as for the exponential growth interaction set. The Kolmogorov-Smirnov test gave a p-value of less than 1.0 x10-15, indicating there are differences in the rate of interactions over different (large scale) genome regions. The Chi-Squared goodness of fit test (this time testing against a Poisson distribution with parameter as 153/80=1.91), was non significant (Chi-Squared statistic 6.44, p=0.17), indicating small scale clustering is not a major feature of this data set. The table of observed and expected values is given below. Number of hits Expected no. of intervals with this many hits Observed no. of intervals with this many hits 0 1 2 3 >=4 11.85 22.63 21.61 13.76 10.16 14 29 18 7 12 Disregarding the interactions that overlap inter and intra genic regions, a test of the null hypothesis that the proportion of hits that are intra genic = 0.72 gives p= 1.0 x 10 6 . Including the gene border interactions in the inter-genic group, the p-value is 0.016 (ztest for single proportion, with null value 0.72, with continuity correction, as implemented in R function “prop.test”). The corresponding confidence intervals for the proportion of intragenic interactions are (0.848, 0.952) and (0.737, 0.867). Note these intervals have substantial overlap with those obtained for the exponentially growing yeast, indicating the two conditions yield similar proportions of intra-genic interactions. 1. Dujon B: The yeast genome project: what did we learn? Trends Genet 1996, 12(7):263-270.

Supplementary Statistics

Related documents

Products

Support

Supplementary Statistics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib