SUPPORTING INFORMATION LEGENDS SUPPORTINGFIGURES Figure S1. Validation of selected methylcytosine sites by PCR experiments. Each graph shows the results for a particular region of one gene, with the upper panel illustrating the distribution of reads supporting methyl-cytosines in meristems (M) and late flowers (L), and barplot in the lower panel showing the Realtime-PCR results for endonuclease digested or control materials from meristems or late flowers. Error bars are for standard deviation of three replicates. Endonucleases LpnPI and FspEI were utilized for comparison with MspJI. Scale atop each graph indicates the genomic coordinates (in Mb) of each region on the respective chromosome. Two different regions AT5G24670-A and AT5G24670-B are shown for the gene AT5G24670. Figure S2. Methylation profiles determined by MspJI-seq and BS-seq were consistent for most of the randomly selected genes and genomic regions. In each graph, the top figures display the log2-transformed read numbers supporting mC at each nucleotide position of the respective gene or genomic regions, while the bottom figures show the methylation levels (in %) for each nucleotide position of the gene or genomic regions. Subfiguresfrom 1 to 44 are for genes, those of 45-73 for genomic regions.Gene IDs or chromosome coordinates are given at the top. M, meristem; E, early flower (stage1-9); L, late flower (stage 1012). Figure S3. Density of genes and TEs across Arabidopsis genome. Density of genes/TEs was calculated as percentages of nucleotides in genes or in TEs within each of 200kb windows of each chromosome. (a-e), chromosome 1-5. Figure S4. Percentages of genes of each class among all methylated genes. Figure S5. Methylation of TEs during Arabidopsis floral development. (a) Percentage of TEs/genes/ differentially methylated between meristem and early flower. (b) Percentage of differentially methylated TEs and genes targeted by siRNAs. (c) Methylated (in meristem) and differentially methylated TEs (between meristem and early flower) of each TE family. 1 Figure S6. The enrichment of TEs of different families in different mC sequence contexts. Color gradient shows the statistical significance of the enrichment of TEs of each family in each m C context class, as indicated by the color bar atop the heatmap. Figure S7. The distribution of normalized methylation levels of each mC context for genes of different classes. RKCM values were calculated as reads per kilo-base of cytosines in the context of CNNR (each site counts as 1 bp) per million of mapped reads, for each m C context and type of genes separately. Figure S8. Examples of genes with correlated variations in methylation and expression levels. In each graph, the MspJI-seq and RNA-seq tracks are shown for each gene, encompassing the transcribed as well as the up and downstream 1kb regions, at each development stage separately. Gene structures are shown at the bottom of each graph, with boxes in blue representing exons and arrows indicating introns and the transcribe direction of the respective gene. The methylation and expression patterns for VIM1 and VIM2 were similar to those of VIM3, but not shown here. M, meristem; E, early flower stage; L, late flower stage. Figure S9. Phenotypes of the three Arabidopsis floral stages used for experiments. (a) Clusters of meristems of the ap1cal mutant plant. The orange dashed linesencirclethe parts collected for this study. (b-c) Shown are individual inflorescences of the Landsberg erectaecotype. Late flowers (stage 10-12, usually 7-8)are labeled with numbers and surroundthe early flowers (stage1-9, smaller and without labels). Bars = 200 μm. Figure S10. MspJI digestion and DNA library recovery. (a) Optimization of MspJI digestion conditions. 5 g (lanes 2-4), 2.5 g (lanes 5-7), or 0 g (lane 8) of MspJI were added to 100 l of digestion mix with 5 g of genomic DNA in the presence (3, 4, 6, and 7) or absence (2 and 5) of a DNA activator. The red arrows indicate the ~32 bp band of MspJI-digested DNA fragments. The blue arrow showed the DNA activator bands. (b) Purified 2 DNA library visualized in 4% agarose gel electrophoresis. The red arrow indicates the 100 bp band that we recovered and used for sequencing. Figure S11. Identification of mCs based on MspJI-seq. (a-f) The six scenarios where MspJI recognized a pair of methyl-cytosines and cleaved the double-stranded DNA segments into fragments of proper length that were collected. In each graph, the methylated cytosines are colored in red or in purple, with surrounding letters specifying the sequence pattern recognized by MspJI. ‘R’ denotes A or G; ‘Y’ for C or T; ‘N’ for any base. The symbol ‘x’ marks the cut positions of MspJI, with the same colors as the corresponding mCsrecognized by MspJI. The orange lines represent the double-stranded DNA fragments released by a pair of MspJI digestions from the DNA molecules (blue lines), and the green dashed lines for the synthesized strand complementary to the 5’ overhangs generated during MspJI digestion. The lengths of specific DNA stretches after the MspJI digestion are given in nucleotides (nt). (g-h)Examples illustrating the alignment of the 3’ end of the read to the corresponding reference genome, in the letter and color space, respectively. Sequence of the SOLiD sequencing adaptor is also shown. Matched nucleotide between the read and the adaptor are colored in red. Overhang adaptor bases are shown in green and could be matched with the read in other cases. Orange numbers denote nucleotide positions on read and blue numbers for the adaptor positions. (i) Table of the evidence codes (ECs) used to evaluate the confidence of the reads arising from MspJI cleavage. Sequence similarities were obtained by comparing the 3’ end of the reads with the reference genome and the adaptor sequences, as shown in g and h. na and ng represent the number of pairs of adjacent sites that were mismatched between the SOLiD color sequences of read and adaptor and genome, respectively; na’ and ng’, the number of sites that were mismatched between the SOLiD color sequences of read and adaptor and genome, respectively. A1, the first base of the adaptor; R1, the first base of the read 3’ end in comparison to the adaptor; G1, the genomic base corresponding to R1. Semicolons separate equally applicable rules. SUPPORTING TABLES Table S1. Summary of SOLiD reads sequenced and mapped against the Arabidopsis reference genome and reads that were identified as arising from MspJI digestion for each possible recognition site pattern. 3 Table S2. Primers used in PCR experiments for selected gene regions digested by MspJI. Table S3. The DNA methylation and expression levels for genes in Arabidopsis flowers. This table shows the raw read counts supporting mC sites, the normalized methylation levels and expression levels for genes in floral tissues. Table S4. Arabidopsis genes differentially methylated and differentially expressed during floral development. Statistical significance (false discovery rate) of differential methylation and gene expression fold change are provided for each gene, together with comparison between paired tissues. Table S5. Statistics of differentially expressed and methylated genes between flower meristems and early flowers. Table S6. Significantly enriched biological processes for each gene cluster in Figure 6. Table S7. Number of genes for enriched GO terms for each gene cluster in Figure 6. This table shows the original data used to construct the heatmap in Figure 6c. GO terms were sorted according to annotation. Table S8. The relative frequencies of the wobbling cut positions of MspJI. Methods S1. Mapping of SOLiD short sequencing reads. Methods S2. Identification of methylcytosines (mCs) based on MspJI-seq. 4