tpj12173-sup-0017-legend

advertisement
SUPPORTING INFORMATION
Supporting Text. Experimental Methods and URL used
Supporting Tables. Table S1-S22
Table S1. Sequencing data generated for chickpea genotypes. a) Sequence data
generated for C. arietinum ICC4958. i) Sequence generated using 454/Roche
pyrosequencing platform (WGS, Whole genome shotgun. MP, Mate-pair). ii) Pairedend sequence data generated using Illumina/Solexa platform. b) Sequence data
generated for other chickpea genotypes (454/Roche pyrosequencing platform)
Table S2. Statistics of draft assembly
Table S3. Anchoring of scaffolds to linkage groups
Table S4. Estimation of chickpea genome length based on read alignment
Table S5. Estimated heterozygosity in ICC4958 draft genome
Table S6. Transcriptome coverage in the assembled chickpea genome
Table S7. Repeat content in the assembled chickpea draft genome
Table S8. Comparative analysis of microsatellite sequences in chickpea draft genome
with those in other legumes
Table S9. Statistics of protein coding gene prediction
Table S10. Assessment of gene prediction using CEGMA pipeline
Table S11. Experimental evidence for the predicted protein-coding genes
Table S12. Statistics of protein coding genes from different plant species
Table S13. Functional annotation of the predicted protein-coding genes
Table S14. Features of lineage-specific genes in chickpea
Table S15. Transcription factor/regulator families in the chickpea draft genome
Table S16. Comparison of R-gene family in chickpea draft genome with other
sequenced plant genomes
Table S17. Comparison of nodulation-associated gene families in chickpea draft
genome with other sequenced plant genomes
Table S18. Comparison of number of genes associated with carotenoid and flavonoid
metabolism in chickpea draft genome with other sequenced plant genomes
Table S19. Non-coding RNA genes in the chickpea draft genome
Table S20. Summary of RNA-seq data generated from different tissues/treatments to
study gene expression
Table S21. Summary of tissue-preferential and stress-responsive gene expression
results based on RNA-seq data
Table S22. GO terms enriched in the chickpea genes expressed in tissue-specific
manner
Supporting Figures. Figures S1-S16
Figure S1: Fragment distribution of the de novo assembly of ICC4958. Number of
fragments covering different percentile of the de novo assembly plotted against
different length percentile.
Figure S2: Read depth at assembled bases of chickpea ICC4958 (Based on
454/Roche read alignment). Frequency of 454/Roche reads at the assembled bases
(x-axis) plotted against the number of bases (y-axis). The poison-shaped distribution
showing a peak at 15 denotes the average 15X throughput of the assembled reads. The
x-axis and y-axis in the figure have been limited to 1001 and 1.0x108, respectively.
Figure S3. GC content distribution in the genome sequence of chickpea as
compared to other plant species. The x-axis represents GC content percentage and
y-axis represents fraction of bins (bin size of 500 bp in sliding non-overlapping
window).
Figure S4. Top 20 GO terms represented in chickpea geneset. GO terms were
assigned using Blast2Go pipeline.
Figure S5. Top 20 PFAM domains represented in the chickpea geneset. PF00069
Protein kinase domain, PF07714 Protein tyrosine kinase, PF00067 Cytochrome P450,
PF00249 Myb-like DNA-binding domain, PF00076 RNA recognition motif, PF12854
PPR repeat, PF12678 RING-H2 zinc finger, PF00847 AP2 domain, PF00501 AMPbinding enzyme, PF00010 Helix-loop-helix DNA-binding domain, PF03171
2OGFe(II) oxygenase superfamily, PF00106 short chain dehydrogenase, PF12697
Alpha/beta hydrolase family, PF00083 Sugar (and other) transporter, PF00072
Response regulator receiver domain, PF00201 UDP-glucoronosyl and UDP-glucosyl
transferase, PF00400 WD domain, G-beta repeat, PF03401 Tripartite tricarboxylate
transporter family receptor, PF00005 ABC transporter, PF00270 DEAD/DEAH box
helicase
Figure S6. Strategy for the identification of lineage-specific genes in chickpea
genome. The genes that showed significant hits with non-Fabaceae plant species are
in dotted boxes. “Yes” represents a significant hit, and “No” represents no significant
hit in BLAST searches as per the given criteria (E ≤1e-5). The genes identified as
candidate chickpea-specific (CS) and legume-specific (LS) are highlighted in gray
boxes.
Figure S7. Top 10 GO terms represented in the genes included in chickpeaspecific gene families. The distribution of the top ten GO terms in the genes included
in gene families unique to chickpea (CS), conserved in legumes (chickpea, soybean,
M. truncatula and pigeonpea; LS) and conserved in the five plants species (chickpea,
soybean, M. truncatula, pigeonpea and grapevine; all) has been shown. The p-value
for the enrichment of these GO terms in gene families unique to chickpea was at least
0.001. Asterisks indicate the GO terms were enriched with p-value of at least 1E-10.
Figure S8. Gene distribution in different transcription factor families in
chickpea, other sequenced legumes and Arabidopsis.
Figure S9. Phylogenetic analysis of chickpea and Medicago genes belonging to
CC-NBS-LRR (a) and Leghaemoglobin (b) families. Medicago, chickpea and
soybean genes are shown in red, green and blue respectively. Bootstrap values are
mentioned next to the branches. Medicago and soybean show a clear expansion in
these families. Chickpea genes form distinct clusters suggesting diversification.
Figure S10. Ks distribution analysis of paralogous chickpea gene pairs to
determine the genome duplication event. The number of paralog pairs within a Ks
range (bin size of 0.05) are shown. The peak observed at 0.7 corresponds to the
duplication event in legume genomes.
Figure S11. The whole genome dot-plot was generated between chickpea linkage
groups (x-axis) and Medicago truncatula chromosome arms (y-axis). An asterisk
before a chromosome number indicates reverse complement. Order and orientation of
chromosomes are rearranged so that the synteny observed is easier to visualize.
Syntenic blocks are formed by red or blue dots representing best hits across any two
chromosomes in the same or opposite direction, respectively. A total of 12406 hits
were observed, out of which 9673 hits were in syntenic blocks. The syntenic blocks
are shown in green circles.
Figure S12. Microsynteny of chickpea (Ca) LG 5 with M. truncatula (Mt)
chromosome 3. Chickpea gene models are mapped on both the pseudomolecules to
show gene order. The upper panel shows overall synteny with local rearrangements.
The microsynteny presented in the lower panel shows conserved gene order between
two genomes.
Figure S13. The whole genome dot-plot was generated between chickpea linkage
groups (x-axis) and Glycine max chromosome arms (y-axis). An asterisk before a
chromosome number indicates reverse complement. Order and orientation of
chromosomes are rearranged so that the synteny observed is easier to visualize.
Syntenic blocks are formed by red or blue dots representing best hits across any two
chromosomes in the same or opposite direction, respectively. A total of 10387 hits
were observed, out of which 4842 hits were in syntenic blocks. Duplicated syntenic
blocks within green circles refer to recent whole genome duplication in the Glycine
max genome.
Figure S14. Scatter plot showing distribution of Ka/Ks (ω) with respect to Ks
between gene pairs present in the collinear blocks of chickpea and Medicago. The
gene pairs are distributed in four clusters according their Ks values. Average Ka/Ks
values of the clusters are decreasing with Ks. Clusters with average Ks≥1.5 attribute
to pan-eudicot palaeopolyploidization indicating genes in the other cluster with higher
ω are under purifying selection.
Figure S15. Ka/Ks distribution analysis of chickpea gene pairs. Distribution of
ratio of non-synonymous vs. synonymous substitution rates within the chickpea gene
families of size 2-6. The number of gene pairs within a Ka/Ks range 0.2 to 2.0 (bin
size of 0.1) are shown.
Figure S16. Distribution of various GOSlim categories (level 2) in chickpea gene
pairs with Ka/Ks >1.
Supporting Data 1-13. SNP and SSR marker resources
Supporting Data 1. SSR primer :ICC4958
Supporting Data 2. PolymorphicSSR primer : PI489777 vs. JG62
Supporting Data 3. PolymorphicSSR primer : PI489777 vs. ICCV2
Supporting Data 4. PolymorphicSSR primer :ICC4958 vs. PI489777
Supporting Data 5. PolymorphicSSR primer :ICC4958 vs. ICCV2
Supporting Data 6. PolymorphicSSR primer : ICC4958 vs. JG62
Supporting Data 7. PolymorphicSSR primer : ICCV2 vs. JG62
Supporting Data 8. SNP primer : ICC4958 vs. PI489777
Supporting Data 9. SNP primer : ICC4958 vs. ICCV2
Supporting Data 10. SNP primer : ICC4958 vs. JG62
Supporting Data 11. SNP primer : PI489777 vs. ICCV
Supporting Data 12. SNP primer : PI489777 vs. JG62
Supporting Data 13. SNP primer : ICCV2 vs. JG62
Download