GENETIC MARKERS IN PLANT BREEDING Marker Gene of known function and location, or a mutation within a gene that allows studying the inheritance of that gene Genetic information resides in the genome Genetic Marker Any phenotypic difference controlled by the genes, that can be used for studying recombination processes or selection of a more or less closely associated target gene Genetic Marker Morphological marker Molecular marker Readily detectable sequence of protein or DNA that are closely linked to a gene locus and/or a morphological or other characters of a plant Readily detectable sequence of protein or DNA whose inheritance can be monitored and associated with the trait inheritance independently from the environment 1. Protein marker 2. DNA marker Molecular markers Sequencing (SNPs) Microsatellites (SSRs) AFLP (Amplified Fragment Length Polymorphism) RAPD (random amplified polymorphic DNA) chloroplastDNA PCR-RFLP allozymes (protein-electrophoresis) Morphological marker (phenotypic/naked eye marker) 2-rowed 6-rowed hulled naked Black white non-waxy waxy Karl Von Linne (1707-1778) Molecular markers Important aspect: Polymorphism The existence of two or more forms that are genetically distinct from one another but contained within the same interbreeding population Pattern of inheritance The pattern of genetic information transmission from parents to progeny Polymorphism Co-dominant marker Gel configuration P1 P2 O1 O2 Polymorphism -Parent 1 : one band -Parent 2 : a smaller band -Offspring 1 : heterozygote = both bands -Offspring 2 : homozygote parent 1 Dominant marker Polymorphism Gel configuration Parent 1 : one band P1 P2 O1 O2 -Parent 2 : no band -Offspring 1 : homozygote parent 1 -Offspring 2 : ???? Dominant versus Co-dominant Dominant No distinction between homo- and heterozygotes possible No allele frequencies available RAPD Co-dominant Homozygotes can be distinguished from heterozygotes; Allele frequencies can be calculated microsatellites, SNP, RFLPs Desirable properties for a good molecular marker High Polymorphic Co-dominant inheritance Occurs throughout the genome Reproducible Easy, fast and cheap to detect Selectivity neutral High resolution with large number of samples Nondestructive assay Random distribution throughout the genome Assay can be automated Protein markers Genetic markers which based on protein polymorphisms a. Allozyme isoenzymes of proteins nature whose synthesis is usually controlled by codominant alleles and inherited by monogenic ratios. They show a specific banding pattern if separated by electrophoresis b. Isozyme A species of enzyme that exists in two or more structural form, which are easily identified by electrophoretic methods Proteins Polymorphisms Seed storage proteins Isozymes Isozyme Isozyme Starch gel of the isozyme malate dehydrogenase (MDH). The numbers indicate first the MDH locus, and next the allele present (ie. 3-18 is locus 3 allele 18). Some bands are heterodimers (intralocus or interlocus). DNA marker Segments of DNA with an identifiable physical location on a chromosome and whose inheritance can be followed A marker can be a gene, or it can be some section of DNA with no known function Types of DNA Marker can be differentiated based on molecular technique used to develop the marker 1. Restriction enzymes 2. Hybridization 3. PCR 4. Sequencing DNA structure Chromosome to DNA Stretch of nitrogen fixation gene in soybean 1 ccacgcgtcc gtgaggactt gcaagcgccg cggatggtgg gctctgtggc tgggaacatg 61 ctgctgcgag ccgcttggag gcgggcgtcg ttggcggcta cctccttggc cctgggaagg 121 tcctcggtgc ccacccgggg actgcgcctg cgcgtgtaga tcatggcccc cattcgcctg 181 ttcactcaga ggcagaggca gtgctgcgac ctctctacat ggacgtacag gccaccactc 241 ctctggatcc cagagtgctt gatgccatgc tcccatacct tgtcaactac tatgggaacc 301 ctcattctcg gactcatgca tatggctggg agagcgaggc agccatggaa cgtgctcgcc 361 agcaagtagc atctctgatt ggagctgatc ctcgggagat cattttcact agtggagcta 421 ctgagtccaa caacatagca attaaggtag gaggagggat ggggatgttg tgtggccgac 481 agttgtgagg ggttgtggga agatggaagc cagaagcaaa aaagagggaa cctgacacta 541 tttctggctt cttgggttta gcgattagtg cccctctctc atttgaactc aactacccat 601 gtctccctag ttctttctct gcctttaaaa aaaaatgtgt ggaggacagc tttgtggagt 661 ctgaaatcac catctacctt tacttaggtt ctgagtgcca aacccaaggc accaggcatg 721 cgtccttgac tccggagcca tcaggcaggc tttcctcagc cttttgcagc caagtctttt 781 agcctattgg tctgagttca gtgtggcagt tggttaggaa agaaggtggt tcttcgacca 841 ctaacagttt ggatttttta ggatgctagt cctttaaaa ………. DNA marker 1 ccacgcgtcc gtgaggactt gcaagcgccg cggatggtgg gctctgtggc tgggaacatg 61 ctgctgcgag ccgcttggag gcgggcgtcg ttggcggcta cctccttggc cctgggaagg 121 tcctcggtgc ccacccgggg actgcgcctg cgcgtgtaga tcatggcccc cattcgcctg 181 ttcactcaga ggcagaggca gtgctgcgac ctctctacat ggacgtacag gccaccactc 241 ctctggatcc cagagtgctt gatgccatgc tcccatacct tgtcaactac tatgggaacc 301 ctcattctcg gactcatgca tatggctggg agagcgaggc agccatggaa cgtgctcgcc 361 agcaagtagc atctctgatt ggagctgatc ctcgggagat cattttcact agtggagcta 421 ctgagtccaa caacatagca attaaggtag gaggagggat ggggatgttg tgtggccgac 481 agttgtgagg ggttgtggga agatggaagc cagaagcaaa aaagagggaa cctgacacta 541 tttctggctt cttgggttta gcgattagtg cccctctctc atttgaactc aactacccat 601 gtctccctag ttctttctct gcctttaaaa aaaaatgtgt ggaggacagc tttgtggag DNA M1 Gene A M2 MFG Gene B MFG AACCTGAAAAGTTACCCTTTAAAGGCTTAAGGAAAAAGGGTTTAACCAAGGAATTCCATCGGGAATTCCG readily detectable sequence of DNA whose inheritance can be monitored and associated with the trait inheritance Image from UV light table Image from computer screen Basis for DNA marker technology •Restriction Endonucleases •DNA-DNA hybridization •Polymerase chain reaction (PCR) •DNA sequencing RFLP techniques RFLP Polymorphisms interpretation MFG 1 2 3 4 5 6 1 2 3 4 5 6 RFLP based markers Examine differences in size of specific DNA restriction fragments Require pure, high molecular weight DNA Usually performed on total cellular genome Advantages and disadvantages of RFLP • Advantages – Reproducible – Co-dominant – Simple • Disadvantages – Time consuming – Expensive – Use of probes AFLP Markers Most complex of marker technologies Involves cleavage of DNA with two different enzymes Involves ligation of specific linker pairs to the digested DNA Subsets of the DNA are then amplified by PCR The PCR products are then separated on acrylamide gel 128 linker combinations are readily available Therefore 128 subsets can be amplified AFLP Markers Technically demanding Reliable and stable Moderate cost Need to use different kits adapted to the size of the genome being analyzed. Like RAPD markers need to be converted to quick and easy PCR based marker RAPD Amplifies anonymous stretches of DNA using arbitrary primers Fast and easy method for detecting polymorphisms • Domimant markers • Reproducibility problems RAPD Polymorphisms among landraces of sorghum Sequences of 10-mer RAPD primers RAPD gel configuration Name Sequence OP A08 M OP A15 OP A 17 OP A19 OP D02 5’ –GTGACGTAGG- 3’ 5’ –TTCCGAACCC- 3’ 5’ –GACCGCTTGT- 3’ 5’ –CAAACGTCGG- 3’ 5’ –GGACCCAACC- 3’ RAPD Markers There are other problems with RAPD markers associated with reliability Because small changes in any variable can change the result, they are unstable as markers RAPD markers need to be converted to stable PCR markers. How? RAPD Markers The polymorphic RAPD marker band is isolated from the gel It is used a template and re-PCRed The new PCR product is cloned and sequenced Once the sequence is determined, new longer and specific primers can be designed VNTR Variable Number of Tandem Repeats Tandem repeats (TR): DNA sequences which are existed in repeated numbers in the genome • Satellite DNA • Minisatellites • Microsatellites Variable Number (VN) High polymorphism in number of repeats VNTR Variable Number of Tandem Repeats • Satellite DNA 2-250 bp repeat unit size Constitutes 1- 60% of the genome Some can be separated in CsCl • ‘satellite band’ • Minisatellites 9-50 bp repeat unit size 100 – 1000 x repeated • Microsatellites 2-6 bp repeat unit size 10s – 100 x repeated Microsatellites Short tandem repeats (simple sequence repeat) • 2 – dinucleotides • 3 – trinucleotides • 4 – tetranucleotides Randomly distributed in genome Non-coding • Some within coding sequences Especially trinucleotides • Some related to diseases Nomenclature • Perfect GCTAGCCACACACACACACATGCATC • Interrupted GCTAGCCACACGTCACACACTGCATC • Compound GCTAGCCACACATATATGTGTGCATC SSR repeats and primers Repeat GGT(5) Sequence GCGCCGAGTTCTAGGGTTTCGGAATTTGAACCGTC ATTGGGCGTCGGTGAAGAAGTCGCTTCCGTCGTTTGATTCC GGTCGTCAGAATCAGAATCAGAATCGATATGGTGGCAGTGG TGGTGGTGGTGGTGGTTTTGGTGGTGGTGAATCTAAGGCG GATGGAGTGGATAATTGGGCGGTTGGTAAGAAACCTCTTCC TGTTAG ATTCTGGAATGGAACCAGATCGCTGGTCTAGAGGTTCTGCT GTGGAACCA….. SSR polymorphisms P1 AATCCGGACTAGCTTCTTCTTCTTCTTCTTTAGCGAATTAGG P2 AAGGTTATTTCTTCTTCTTCTTCTTCTTCTTCTTAGGCTAGGCG P1 Gel configuration P2 SNP (Single Nucleotide Polymorphisms) SNPs on a DNA strand Hybridization using fluorescent dyes • Any two unrelated individuals differ by one base pair every 1,000 or so, referred to as SNPs. • Many SNPs have no effect on cell function and therefore can be used as molecular markers. Genetic marker characteristics Characteristics Morphological markers Protein markers RFLP markers RAPD markers Number of loci Limited Limited Almost unlimited Unlimited High Inheritance Dominant Codominant Codominant Dominant Codominant Positive features Visible Easy to detect Utilized before Quick assays the latest with many technologies markers were available Well distributed within the genome, many polymorphism Negative features Possibly negative linkage to other characters Possibly tissue specific Radioactivity requirements, rather expensive Long development of the markers, expensive High basic investment SSR markers Developing a Marker Best marker is DNA sequence responsible for phenotype i.e. gene If you know the gene responsible and has been isolated, compare sequence of wild-type and mutant DNA Develop specific primers to gene that will distinguish the two forms Developing a Marker If gene is unknown, screen contrasting populations Use populations rather than individuals Need to “blend” genetic differences between individual other than trait of interest Developing Markers Cross individual differing in trait you wish to develop a marker Collect progeny and self or polycross the progeny Collect and select the F2 generation for the trait you are interested in Select 5 - 10 individuals in the F2 showing each trait Extract DNA from selected F2s Pool equal amounts of DNA from each individual into two samples - one for each trait Screen pooled or “bulked” DNA with what method of marker method you wish to use Types of traits (types of markers) Single gene trait: seed shape MF G Multigenic trait; ex: plant growth =Quantitative Trait Loci MFG USES OF MOLECULAR MARKER Clonal identity Parental analysis Family structure Population structure Gene flow Hybridisation Phylogeny Measure genetic diversity Mapping Tagging Genetic Diversity Define appropriate geographical scales for monitoring and management (epidemology) Establish gene flow mechanism identify the origin of individual (mutation detection) Monitor the effect of management practices manage small number of individual in ex situ collection Establish of identity in cultivar and clones (fingerprint) paternity analysis and forensic Genetic Diversity Mapping The determination of the position and relative distances of gene on chromosome by means of their linkage Genetic map A linear arrangement of genes or genetic markers obtained based on recombination An ordering of genes and markers in a linear arrangement corresponding to their physical order along the chromosome, based on linkage. Physical map A linear order of genes or DNA fragments An ordering of landmarks on DNA, regardless of inheritance, measured in base pairs. Physical Mapping It contains ordered overlapping cloned DNA fragment The cloned DNA fragments are usually obtained using restriction enzyme digestion QTL Mapping A set of procedures for detecting genes controlling quantitative traits (QTL) and estimating their genetics effects and location To assist selection Fundamental Genetics (Background for Linkage Analysis) Rule of Segregation • offspring receive ONE allele (genetic material) from the pair of alleles possessed by BOTH parents Rule of Independent Assortment • alleles of one gene can segregate independently of alleles of other genes • (Linkage Analysis relies on the violation of Independent Assortment Rule) Linkage Analysis Goal: find a marker “linked” to a disease gene. LOD score = log of likelihood ratio LR[θ;data] == k P[data; θ] θ = estimate of genetic distance (recombination fraction) between marker and quantitative traits = proportion of recombinant gametes/total gametes Linkage Analysis Genes near each other on a chromosome tend to be inherited together, that is, they are linked. Linkage analysis are the techniques used to identify such linkages among genes Linkage groups which include genetic markers and genes determinative of phenotype allow the identification of determinative alleles (and therefore prediction) Linkage Mendel showed that alleles segregate independently. Then he tested genes Sometimes inheritance of two genes are independent of another, that is phenotype ratios are 9:3:3:1 Sometimes inheritance of two genes are linked together, showing a ratio of 3:0:0:1 Linkage can vary continuously from perfectly correlated to uncorrelated. Why genes are linked Alleles are arranged linearly Each parent passes only one of its two chromosomes to an offspring. Recombination periodically switches which chromosome in the parent is passed along Alleles near each other are more likely to be passed along than ones further apart Alleles on different chromosomes are always inherited independently. Marker Assisted Selection Breeding for specific traits in plants and animals is expensive and time consuming The progeny often need to reach maturity before a determination of the success of the cross can be made The greater the complexity of the trait, the more time and effort needed to achieve a desirable result. MAS The goal to MAS is to reduce the time needed to determine if the progeny have trait The second goal is to reduce costs associated with screening for traits If you can detect the distinguishing trait at the DNA level you can identify positive selection very early. Marker Assisted Breeding MAS allows for gene pyramiding - incorporation of multiple genes for a trait Prevents development of biological resistance to a gene Reduces space requirements - dispose of unwanted plants and animal early QTL study Trait M. 1 M. 2 M. 3 P.1 P.2 I.1 I.2 I.3 I.4 2.5 8.4 7.1 2.5 4.5 2.3 1 3 3 2 2 1 1 3 1 1 3 1 1 3 1 1 2 3 Statistical programs used in molecular marker studies * SAS * ANOVA * Mapmaker * Cartographer Types of population used for molecular markers studies: F2, RILs, Backcrosses (MILs), DH. QTL Mapping Recombination picture Crossover is the alternation of allele generating chromatid (half of chromosome)