Plant molecular genetics • • • • • • • • • • • Plant genome Chromatine and DNA methylation RNA interference Genome of plastids and mitochondria Transposible elements Viruses Classical genetic mapping Transgenosis and reverse genetics Genomics, next generation sequencing Transcriptomics Proteomics Components of plant genome • nuclear genome = genome sensu stricto • plastids - plastome • mitochondria - chondriome Plant genome sizes 54 Mbp – Cardamine amara 124 852 Mbp - Fritillaria 149 000 Mbp - Paris japonica - currently the largest (not only plant) http://data.kew.org/cvalues/ Plant genome sizes 10 Mb Ostreococcus (single cell alga) 54 Mb Cardamine amara 64 Mb Genlisea aurea Ratio of globe volumes differing 3000 times 125 Mb Arabidopsis 500 Mb Oryza 5 000 Mb Hordeum 17 000 Mb Triticum 84 000 Mb Fritillaria (largest diploid) 143 000 Mb Paris (oktaploid) - Angiosperms – size differences up to almost 3 000 times - Gymnosperms – genome sizes often around 10 000 Mb - Gene number differences much lower (approx. 20 – 200 fold) Plant genome sizes What we can deduce? - Genomes are increasing in evolution - Average increase is higher in Monocots C-value paradox - there is no strong correlation between complexity of an organism and the size of its genome • C-value = size of genome in non-replicated gamete genome size (bp) = (0.910 x 109) x DNA content (pg) DNA content (pg) = genome size (bp) / (0.910 x 109) 1 pg = cca 910 Mbp; MW (1 bp) = cca 660 Da • genomes of related organisms often strongly differ in size causes: - duplications of whole genomes (polyploidization) or chromosome segments - replication of invasive DNA (transposable elements) - but reductions also possible (recombination – diploid cotton sp.) Sequences in plant genomes Unique sequences – genes, but also non-coding (!) Repetitive: • Duplications of chromosomal regions • Medium repetitive DNA – Tandem repeats of rRNA, tRNA a histon genes – Gene families with multiple members – Transposable elements – also high repetitive • Highly repetitive – low complexity DNA - Tandem arranged simple sequence repeats (SSR) – Centromers (180 bp repeat Arabidopsis) a telomers (TTTAGGG)n Types of sequences in plant genomes • Unique sequences – coding genes, but also noncoding regulatory (!) • Medium repetitive DNA – Tandem repeats of rRNA, tRNA a histon genes – Gene families with multiple members – Transposable elements – also highly repetitive • Low complexity DNA (highly repetitive) – Tandem arranged simple sequence repeats (SSR) – Centromers (180 bp repeat Arabidopsis) a telomers (TTTAGGG)n - some behave as satelite DNA Aside – term definition: sequence complexity (~ the amount of information) repetitive AAAAAAAAAAAAAAAAAAAAA complexity 1 (21xA) ATCATCATCATCATCATCATC complexity 3 (7xATC) (what is the complexity if it is a coding sequence?) unique ATCGTATCGCGATTTTAACGT complexity 21 (1xAT…) - unique x repetitive – depends on the size of the evaluated frame (= size of analyzed DNA fragments) Sequence complexity of plant genomes Higly repetitive Medium repetitive Unique Sequence complexity Examples of repetitive DNA representation in u Soybean and Silene (clusters of related sequences) Silene latifolia Gypsy, copia = retrotransposon families clDNA = chloroplast DNA (partially contamination, but also recent insertions) Measuring of genome complexity reasociation kinetics • DNA fragmented to 300 - 500 bp, denatured • Monitoring of reassociation in time - separation (chromatographic) of ss and ds DNA • Analysis of kinetics (Cot curves) shows representation of various types of repetitive DNA – rare sequences reasociate more slowly that repetitive Reasociation kinetics depends on sequence complexity Eucaryotic genomes usually contain three fractions of sequences with different complexity Low complexity = highly repetitive Middle repetitive Unique sequences = High complexity Reasociation kinetics of small and large genomes Unique Medium repetitive Highly repetitive Repetitive sequences can be easily detected in situ FISH = fluorescent in situ hybridization (possible even with unique seq.) 180 bp A.th. 45S rDNA Crocus copia A.th. tandem repeats dp5a1 wheat (Heslop-Harrison, Plant Cell 12:617, 2000) Subtelomeric repeats in rye (Heslop-Harrison, Plant Cell 12:617, 2000) Telomers in rye (TTTAGGG)n Differences in small and large genome arrangements large genomes: genes present in „gene-rich islands“ isolated with long regions of repetitive DNA Reconstruction of gradual cummulation of transposable elements in maize genome In Panicum in the presented region no transposible elements, in maize 60 % of its size Plant Genome Sequencing http://genomevolution.org/wiki/index.php/Sequenced_plant_genomes April 13 – less complete in gray Large Genome Sequencing - sequencing per partes (separated chromosomes) - sequencing of non-methylated DNA (= transcriptionally active) - sequencing of ESTs Aside – term definition: Expressed Sequence Tags (ESTs) - short sequenced regions of cDNA (300-600 nt) - mostly gene segments (primarily from mRNA) - alternative sourse of coding sequences for large genomes (rapid and inexpensive) Weak points: - highly redundant, incomplete (!) - problems: various transcript levels - gene expression regulated spatially and temporally, developmentally, environmentally - regulatory sequences not represented (promotors, introns,...) Expressed Sequence Tags (ESTs) Preparation of EST library - mRNA - RT with oligoT primer cDNA -cleavage of RNA from heteroduplex RNAseH - 2nd strand cDNA synthesis - cleavage with restriction endonuclease - adaptor ligation cloning sequencing Aside: Arabidopsis thaliana 1 week 3 weeks the most important model of plant biology 4 weeks 6 weeks Arabidopsis genome: 125 Mbp genes ESTs TEs genes ESTs TEs genes ESTs TEs genes ESTs TEs genes ESTs TEs High density low density Total gene number prediction in time (after whole genome sequencing) Genome of Arabidopsis statistics Value Feature DNA molecule Chr.1 Length (bp) Top arm (bp) Bottom arm (bp) 29,105,111 14,449,213 14,655,898 Base composition (%GC) Overall Coding Non-coding Number of genes Gene density (kb per gene ) Average gene Length (bp) Average peptide Length (bp) Exons Number Total length (bp) Average per gene Average size (bp) Number of genes With ESTs (%) Number of ESTs Chr.2 Chr.3 Chr.4 Chr.5 SUM 19,646,945 3,607,091 16,039,854 23,172,617 13,590,268 9,582,349 17,549,867 3,052,108 14,497,759 25,53,409 11,132,192 14,803,217 115,409,949 33.4 44.0 32.4 35.5 44.0 32.9 35.4 44.3 33.0 35.5 44.1 32.8 34.5 44.1 32.5 6,543 4.0 4,036 4.9 5,220 4.5 3,825 4.6 5,874 4.4 2,078 1,949 1,925 2,138 1,974 446 421 424 448 429 35,482 8,772,559 5.4 247 19,631 5,100,288 4.9 259 26,570 6,654,507 5.1 250 20,073 5,150,883 5.2 256 31,226 7,571,013 5.3 242 60.8 56.9 59.8 61.4 61.4 30,522 14,989 20,732 16,605 22,885 25,498 132,982 33,249,250 105,773 + hundreds of MIR genes - role in regulation of gene expression Gene function The majority of plant genes form gene families Number of paraloques • gene families are often in tandem arrangement, but also spead in the genome • tandem repeats are composed of near, but also far paralogues (recombinations) • duplications of long chromosomal regions Aside – terms definition: Homologous genes genes with similar sequences derived from the same ancestral gene (quantification – sequence identity, similarity) • Paralogous genes genes with similar sequences derived from the same ancestral gene present at different loci within the same genome. • Orthologous genes genes in different species that are similar to each other because they originated from a common ancestral gene in a common ancestor. (if more paralogues are present – genes serving the same function are regarded to be orthologs) Orthologues vs. paralogues Orthologous genes Species A Gene A” Ancestral Gene A Species Species B Gene A’ Paralogous genes = genes duplicated within the species Species A Ancestral Gene A Species Gene A” Gene A’” Paralogous genes Species B Gene A’ Mechanisms of gene duplications (increase in paralogue number) • tandem duplication • transpozition • segmental duplications • whole genome duplications Differences in genes/gene families in genomes Genes Gene families Arabidopsis x Populus – large overlap, about 1,5 times more paralogues in poplar (Arabidopsis + Populus) x Oryza – many genes specific for Monocots Arabidopsis is ancient tetraploid (as well as probably the majority of plants) Duplicated chromosomal regions form about 60 % of genome (67.9 Mb) Polyploidization significantly increases genome (and organism) plasticity and played very important role in plant (genome) evolution; About 30-80% plant species are polyploid Polyploidization in Angiosperm evolution Fawcett et al. 2009 Dating of whole genome duplication according to the number of synonymous mutations per synonymous site - Ks Ks=3/2,66 Phe Leu Met Val UUU CUA AUG GUU UUC UUG AUG GUU 0 0 1/3 1/3 0 1 0 0 0 0 0 1 Gene number = number of syn. sites Comparisons of paralogue pairs Peaks indicate genome duplications Ks Fawcet et al. 2013 Polyploidization in plant evolution • 35 % species neopolyploids • most species repeatedly polyploid in evolution • viable aneuploid variants – (frequetly after allopolyploidization – hexaploid wheat) stabile wheat lines with missing chromosomal arm (of homeologic chromosome) Blue dots – duplications, asterix – triplication K-T (Fawcett et al. 2013) Polyploidization - fusion of non-reduced gametes or endoreduplication n=x=4 n=x=4 n=x=4 x x 2n = 4x = 16 n=x=7 Spontaneous duplication (endoreduplication) 2n = 4x = 22 autopolyploidy allopolyploidy Similar frequency in polyploidic plant species Chromosome doubling is necessary for meiosis in hybrids species A species B X sterile Genome duplication fertile Preferential pairing of homologous chromosomes Related from different species (homeologous) can also pair Allopolyploidic genomes in Brassica genus BB Species Caryotype Genom e Brassica rapa 2n = 2x = 20 A B. nigra 2n = 2x = 16 B B. oleracea 2n = 2x = 18 C B. juncea 2n = 4x = 36 AB B. napus 2n = 4x = 38 AC B. carinata 2n = 4x = 34 BC Brassica nigra BBCC AABB Brassica juncea Brassica carinata CC Brassica olarecea Ancient interspecies hybrids AACC Brassica napus AA Brassica rapa Allopolyploid tobacco species – DNA size changes Fade of duplicated genes differ (gene dosage balance theory) genes encoding interacting proteins “connected genes“ (signal pathways, complex subunits, …) easily preserve in genome after duplication • - loss or partial duplication of one component results in gene inbalance decreasing fitness, - whole duplicated complex can be specialized for a new function and increase organism complexity -secondary function probably present already in the ancestral complex (pathway), but only duplication allowed adaptive evolution for both functions without selection constrains - Escape from adaptive conflict - EAC model • other „single genes“ more easily lost after genome duplication, but can be preserved after individual duplication - most of duplicated genes is lost after whole genome duplication - loss is not as even (↑) in both copies - probably frequent epigenetic marks in one copy (methylation) - preferential gene loss and mutagenesis of methylated copy - gene conversion and homogenization can occur (!) de novo allopolyploids (~ rape seed) – recombinations preferentially in homeologous chromosomes without preference of any parental genome (= homologní, v jednom genomu, ale původem od různých rodičů) Changes in newly formed allopolyploid genome: - DNA methylation changes - losses of parts or whole chromosomes (aneuploidy – decreased fertility) - frequent activation of TE - expression of homeologous genes is not usually additive - transcriptome usually more reduced than genome - different regulation of expression - often organ specific expression of genes from each parent, new sites of expression, new regulation - „divergent resolution“ - speciation (different gene loss in individuals - lethality in F2, - absence of essential gene = reproduction barrier Plants can survive also with haploi genome! - reprogramming of male or female gametophyte development in vitro – no gamete formation, but development resembling embryogenesis - usually from immature microspores = androgenesis - female gametophyte = gynogenesis - haploid plants are sterile - through endoreduplication (colchicin or spontaneous) – completely homozygous plants – dihaploids Androgenesis in rape seed (pollen embryogenesis) ... But genomes are still similar Colinearity, syntheny Paterson et al., Plant Cell 12: 1523-1539, 2000 „Syntheny“ is usually missused to describe colinearity Syntheny = orthologous loci in two species on the same chromosome Species A Ancestral Species A B C Species B A’ C’ B’ C” B” A” Colinearity = group of loci in two species on a chromosom in the same order Species A Ancestral Species A B C Species B A’ B’ C’ A” B” C” Changes in colinearity caused by chromosomal arm inversion Colinearity of Poaceae genomes Colinear regions differ mainly in repetitive DNA Summary: • Current plant genomes result from repeated cycles of partial and complete duplications, followed by reduction and modification of duplicated sequences. • There are no genomes without redundancy. • Plant genomes are still very dynamic. • High portion of genome consists of repetitive DNA