RFLP analysis RFLP= Restriction fragment length polymorphism Refers to variation in restriction sites between individuals in a population These are extremely useful and valuable for geneticists (and lawyers) On average two individuals (humans) vary at 1 in 1000 bp The human genome is 3x109 bp This means that they will differ in more than 3 million bp. By chance these changes will create or destroy the recognition sites for Restriction enzymes 1 RFLP Lets generate a restriction map for a region of human Xchromosome 5kb 3kb The restriction map in the same region of the X chromosome of a second individual may appear as 8kb Normal GAATTC Mutant GAGTTC 2 RFLP The internal EcoRI site is missing in the second individual For X1 the sequence at this site is GAATTC CTTAAG This is the sequence recognized by EcoRI The equivalent site in the X2 individual is mutated GAGTTC CTCAAG Now if we examine a large number of humans at this site we may find that 25% possess the EcoRI site and 75% lack this site. We can say that a restriction fragment length polymorphism exits in this region These polymorphisms usually do not have any phenotypic consequences Silent mutations that do not alter the protein sequence because of redundancy in Codon usage, localization to introns or non-genic regions or do not affect protein Structure/function. 3 RFLP RFLP are identified by southern blots In the region of the human X chromosome, two forms of the X-chromosome are Segregating in the population. X1 B R 4 R R 5 3 B 6 R 3.5 2 1 Digest DNA with EcoRI and probe with probe1 What do we get? X2 B R 4 R 8 1 B 6 R 3.5 2 4 RFLP Digesting with BamHI and performing Southern blots with the above probe produces the following results: X1 B R 4 R 5 R 3 B 6 R 3.5 X2 B R 4 R 8 1 B 6 R 3.5 2 There is no variation with respect to the BamHI sites, all individuals produce the same banding patterns on Southern blots 5 RFLP in individuals If we used probe1 for southern blots with a BamHI digest what would be the Results for X1/X1, X1/X2 and X2/X2 individuals? 18 18 18 If we used probe1 for southern blots with a EcoRI digest what would be the results for X1/X1, X1/X2 and X2/X2 individuals? 5 & 3 8, 5 & 3 8 6 RFLP RFLP’s are found by trial and error and they require an Appropriate probe AND appropriate enzyme They are very valuable because they can be used just like any other genetic marker to map genes They are employed in recombination analysis (mapping) in the same way as conventional morphological allele markers are employed The presence of a specific restriction site at a specific locus on one chromosome and its absence at a specific locus on another chromosome can be viewed as two allelic forms of a gene The phenotype in this case is a Southern blot rather than white eye/red eye 7 Using RFLPs to map human disease genes Which RFLP pattern segregates with the diseased individuals Top or bottom Using DNA probes for different RFLPs you screen individuals for a RFLP pattern that shows co-inheritance with the disease Conclusion: the actual mutation resides at or near the RFLP 8 Mapping Lets review standard mapping: To map any two genes with respect to one another, they must be heterozygous at both loci.Gene W and B are responsible for wing and bristle development W Centromere B Telomere To find the map distance between these two genes we need allelic variants at each locus W=wings w= No wings B=Bristles b= no bristles To measure genetic distance between these two genes, the double heterozygote is crossed to the double homozygote 9 Mapping Female gamete Male gamete (wb) Map distance= # recombinants /Total progeny 7/101= 7 M.U. 10 Mapping Both the normal and mutant alleles of gene B (B and b) are sequenced and we find W Centromere B B GAATTC 3 2 E Telomere E E b E 5 E AAATTC By chance, this mutation disrupts the amino acid sequence and also a EcoRI site! If DNA is isolated from B/B, B/b and b/b individuals, cut with EcoRI and probed in A Southern blot, the pattern that we will obtain will be B/B Bristle B/b Bristle b/b No bristle 11 Mapping Therefore in the previous cross (WB/wb x wb/wb), the genotype at the B locus can be distinguished either by the presence and absence of bristles or Southern blots WB/wb Female x wb/wb Male Wings Bristles No wings No Bristles Southern blot: Southern blot 5 and 2 kb band 5 kb band There are some phenotypes for specific genes that are very painful to measure Having a RFLP makes the problem easier Just like Genes, RFLPs mark specific positions on chromosomes and can be for mapping. 12 Mapping Male gamete (wb) Genotype phenotype WB WB/wb Wings 5kb 2kb 51 wb wb/wb No wings 5kb 43 Wb Wb/wb Wings 5kb 3 wB wB/wb No wings 5kb 2kb 4 Female gamete Parental Recombinant 13 Map distance= # recombinants /Total progeny 7/101= 7 M.U. Mapping The same southern blot method can be employed for the (W) wing Locus with a different restriction enzyme (BamHI) if an RFLP exists at this locus !! You make the DNA, digest half with EcoRI and probe with bristle probe Digest the other half with BamHI and probe with the wing probe. W GTATCC 8 B w B B 4 B 4 B GGATCC 14 Mapping To find the map distance between genes, multiple alleles are required. We can determine the distance between W and B by the classical Method because multiple alleles exist at each locus (W & w, B & b) Centromere W B C R Telomere You find a new gene C. There are no variants of this gene that alter the phenotype of the fly, that you can observe. Say we don’t even know the function of this gene. You can’t even predict its phenotype. However the researcher identified an RFLP variant in this gene. 15 Mapping C c E 8 E 6 E 2 E E With this RFLP, the C gene can be mapped with respect to other genes: Genotype/phenotype relationships for the W and C genes WW and Ww = Red eyes ww = white eyes CC = 8kb band C/c = 8, 6, 2 kb bands cc = 6, 2 kb bands To determine map distance between R and C, the following cross is performed W C ----------------------w c w c ----------------------w c 16 Mapping W B C W C(8) w R c(6,2) w c(6,2) w c(6,2) Female gamete Male gamete (wc) 17 Mapping Prior to RFLP analysis, only a few classical markers existed in humans Now over 7000 RFLPs have been mapped in the human genome. Newly inherited disorders are now mapped by determining whether they are linked to previously identified RFLPs 18 Genetic polymorphism •Genetic Polymorphism: A difference in DNA sequence among individuals, groups, or populations. •Genetic Mutation: A change in the nucleotide sequence of a DNA molecule. Genetic mutations are a subset of genetic polymorphism. Genetic Variation Single nucleotide Polymorphism (point mutation) Repeat heterogeneity 19 SNP •A Single Nucleotide Polymorphism is a source variance in a genome. •A SNP ("snip") is a single base change in DNA. •SNPs are the most simple form and most common source of genetic polymorphism in the human genome (90% of all human DNA polymorphisms). •There are two types of nucleotide base substitutions resulting in SNPs: –Transition: substitution between purines (A, G) or between pyrimidines (C, T). Constitute two thirds of all SNPs. –Transversion: substitution between a purine and a pyrimidine. While a single base can change to all of the other three bases, most SNPs have only one allele. 20 SNPs- Single Nucleotide Polymorphisms -----------------------ACGGCTAA -----------------------ATGGCTAA Instead of using restriction enzymes, these are found by direct sequencing They are extremely useful for mapping Markers Classical Mendelian RFLPs SNPs ~200 7000 1.4x106 SNPs occur every 300-1000 bp along the 3 billion long human genome Many SNPs have no effect on cell function 21 SNPs Humans are genetically >99 per cent identical: it is the tiny percentage that is different Much of our genetic variation is caused by single-nucleotide differences in our DNA : these are called single nucleotide polymorphisms, or SNPs. As a result, each of us has a unique genotype that typically differs in about three million nucleotides from every other person. SNPs occur about once every 300-1000 base pairs in the genome, and the frequency of a particular polymorphism tends to remain stable in the population. Because only about 3 to 5 percent of a person's DNA sequence codes for the production of proteins, most SNPs are found outside of "coding sequences". 22 How did SNPs arise? F2a----ACGGACTGAC----CCTTACGTTG----TACTACGCAT---| F1 ----ACTGACTGAC----CCTTACGTTG----TACTACGCAT---- P ----ACTGACTGAC----CCTTACGTTG----TACTACGCAT---| F1 ----ACTGACTGAC----CCTTACGTTG----TACTAGGCAT---| | F2b----ACTGACTGAC----CCATACGTTG----TACTAGGCAT---- Compare the two F2 progeny Haplotype1 (F2a) = SNP allele1 ----ACGGACTGAC----CCTTACGTTG----TACTACGCAT---Haplotype2 (F2b) = SNP allele2 ----ACTGACTGAC----CCATACGTTG----TACTAGGCAT---- 23 SNPs, RFLPs, point mutations GAATTC GAATTC GAATTC GAATTC GAGTTC GAATTC RFLP SNP SNP Pt mut SNP GAATTC GAATTC GAATTC GACTTC RFLP Pt mut SNP 24 Coding Region SNPs •Types of coding region SNPs –Synonymous: the substitution causes no amino acid change to the protein it produces. This is also called a silent mutation. –Non-Synonymous: the substitution results in an alteration of the encoded amino acid. A missense mutation changes the protein by causing a change of codon. A nonsense mutation results in a misplaced termination. –One half of all coding sequence SNPs result in non-synonymous codon changes. Intergenic SNPs Researchers have found that most SNPs are not responsible for a disease state because they are intergenic SNPs Instead, they serve as biological markers for pinpointing a disease on the human genome map, because they are usually located near a gene found to be associated with a certain disease. Scientists have long known that diseases caused by single genes and inherited according to the laws of Mendel are actually rare. Most common diseases, like diabetes, are caused by multiple genes. Finding all of these genes is a difficult task. Recently, there has been focus on the idea that all of the genes involved can be traced by using SNPs. By comparing the SNP patterns in affected and non-affected individuals—patients with diabetes and healthy controls, for example—scientists can catalog the specific DNA variations that underlie susceptibility for diabetes PCR If a region of DNA has already been cloned and sequenced, the sequence information can be used to isolate and amplify that sequence from other individuals in a population. Individuals with mutations in p53 are at risk for colon cancer To determine if an individual had such a mutation, prior to PCR one would have to clone the gene from the individual of interest (construct a genomic library, screen the library, isolate the clone and sequence the gene). With PCR, the gene can be isolated directly from DNA isolated from that individual. No lengthy cloning procedure Only small amounts of genomic DNA required 30 rounds of amplification can give you >109 copies of a gene 27 PCR and RFLP WT ----------CCTGAGGAG-------------------------GGACTCCTC---------------MSTII Mut ----------CCTGTGGAG-------------------------GGACACCTC---------------- PCR amplify DNA from normal and sickle cell patient Digest with MstII WT Mut 500 400 300 200 100 28 Genotype and Haplotype In the most basic sense, a haplotype is a “haploid genotype”. Haplotype: particular pattern of sequential SNPs (or alleles) found on a single chromosome in a single individual. The DNA sequence of any two people is 99 percent identical. Sets of nearby SNPs on the same chromosome are inherited in blocks. Blocks may contain a large number of SNPs, but a few SNPs are enough to uniquely identify the haplotypes in a block. The HapMap is a map of these specific SNPs that identify the haplotypes are called tag SNPs. This will make genome scan approaches to finding regions with genes that affect diseases much more efficient and comprehensive. Haplotyping: involves grouping individuals by haplotypes, or particular patterns of sequential SNPs, on a single chromosome. There are thought to be a small number of haplotype patterns for each chromosome. Microarrays, PCR and sequencing are used to accomplish haplotyping. SNP mapping is used to narrow down the known physical location of mutations to a single gene. The human genome sequence provided us with the list of many of the parts to make a human. The HapMap provides us with indicators which we can focus on in looking for genes involved in common disease. By using HapMap data to compare the SNP patterns of people affected by a disease with those of unaffected people, researchers can survey the whole genome and identify genetic contributions to common diseases more efficiently than has been possible without this genome-wide map of variation: the HapMap Project has simplified the search for gene variants. Oligonucleotide chips contain thousands of short DNA sequences immobilised at different positions. Such chips can be used to discriminate between alternative bases at the site of a SNP. Chips allow many SNPs to be analyzed in parallel. Short DNA sequences on the chip represent all possible variations at a polymorphic site; A labeled DNA will only stick if there is an exact match. The base is identified by the location of the fluorescent signal. 30 A recessive disease pedigree 31 Mapping recessive disease genes with DNA markers DNA markers are mapped evenly across the genome. The markers are polymorphic- they look slightly different in different individuals. We can tell looking at a particular individual which grandparent contributed a certain part of its DNA. If we knew that grandparent carried the disease, we could say that part of the DNA might be responsible for the disease. 1 2 3 4 5 6 7 8 9 4 different alleles at each locus Position1 can be A or C or G or T Position2 can be A or C or G or T Position3 ……………….. Grand parent 1 A-A-A-A-A-A-A-A-A Chromosome A-A-A-A-A-A-A-A-A 2 C-C-C-C-C-C-C-C-C C-C-C-C-C-C-C-C-C 3 G-G-G-G-G-G-G-G-G G-G-G-G-G-G-G-G-G 4 T-T-T-T-T-T-T-T-T T-T-T-T-T-T-T-T-T 32 1 Grand-parent 1 A-A-A-A-A-A-A-A-A A-A-A-A-A-A-A-A-A 2 2 C-C-C-C-C-C-C-C-C C-C-C-C-C-C-C-C-C 3 4 5 6 7 3 G-G-G-G-G-G-G-G-G G-G-G-G-G-G-G-G-G 8 9 4 T-T-T-T-T-T-T-T-T T-T-T-T-T-T-T-T-T 33 Haplotyping with microarrays AlleleA AlleleB SNP SNP Design 20mer oligonucleotide probes complementary to the Polymorphisms The probes are arrayed on a slide Each spot corresponds to a polymorphism Isolate DNA Label DNA and hybridize to array Labeled Chromosomal 20mer probe Hybridization signal No signal There are ~3 thousand different probes per microarray 34 Genetic polymorphism •Genetic Polymorphism: A difference in DNA sequence among individuals, groups, or populations. Genetic mutations are a kind of genetic polymorphism. Genetic Variation Single nucleotide Polymorphism (point mutation) Repeat heterogeneity 35 Repeats Variation between people- small DNA change – a single nucleotide polymorphism [SNP] – in a target site, RFLPs and point mutations are proof of variation at the DNA level. Satellite sequences: a short sequence of DNA repeated many times. Chr1 Interspersed Chr2 tandem 36 Mini Satellite Repeats and Blots Mini Satellite sequences: a short sequence (20-100bp long) of DNA repeated many times (alleles vary in length from 0.5 to 20 kb) E E 2 E 5 E 6 Chr1 Chr2 3 1 E E 4 tandem E 0.5 E 5 3 1 37 Repeat probe Repeat expansion Tandem repeats expand and contract during recombination. Mistakes in pairing leads to changes in tandem repeat numbers These can be detected by Southern blotting Individual 1 E 2 E Individual 2 E E Ind2 Ind1 3 5 3 There are on average between 2 and 10 alleles (repeats) per mini-sat locus 1 38 Micro-satellite and PCR 39 DNA finger printing Variation between people- small DNA change – a single nucleotide polymorphism [SNP] – in a target site, RFLPs and SNPs are proof of variation at the DNA level, Satellite sequences: a short sequence of DNA repeated many times. Micro satellite are 2-4 bp repeats in tandem repeats 15-100 times in a row Mini satellite are 20-100 bp repeats in tandem (0.5 to 20kb long) Class size No of loci method SNP 1 bp 100 million PCR/microarray Micro ~200bp 200,000 PCR Mini 0.2-20kb 30,000 southern blot 40