The Blackett Family DNA Activity Introduction Welcome to the Blackett Family DNA Activity. Bob Blackett is a DNA analyst. As part of his training, he made a DNA profile of his own family using a technique called RFLP analysis. Family studies are a good way to learn about DNA profiling and RFLP analysis because you can follow the inheritance of DNA markers (alleles) from one generation to the next. In this activity, you will: learn the concepts and techniques behind DNA profiling interpret DNA autoradiograms evaluate DNA profiles to determine familial relationships There is also a sequel to this activity, Blackett Family DNA Activity 2 , where you will learn about the most current method of DNA profiling based on analysis of short tandem repeat polymorphisms. RFLP Analysis Anatomy of An Autorad Questions on Activity Career of a DNA analyst back row: grandparents Norma and Fred middle row: parents Robert and Anne front row: children David and Katie Blackett Family DNA Activity Restriction Fragment Length Polymorphism analysis In RFLP analysis, RF stands for Restriction Fragments. Those are the fragments of DNA that were cut by restriction enzymes. L stands for Length, and refers to the length of the restriction fragments. P stands for Polymorphism, a Greek term that literally means "many shapes". The lengths of some of the restriction fragments differ greatly between individuals, thus there are many shapes, or lengths, of DNA possible. Molecular biologists have identified regions of the human genome where restriction fragment lengths are highly variable between individuals. These regions are called RFLP markers. Blackett Family DNA Activity Inheritance of RFLP markers Humans have a total of 23 pairs of chromosomes. Each pair contains one chromosome from mom and one from dad. The RFLP markers most commonly used for DNA profile analysis are found on chromosomes 1, 2, 4, 5, 10 and 17. These RFLP markers are named after their locations on these chromosomes. For example, the marker on chromosome 2 is called D2S44 (section 44 of chromosome 2). These chromosomal locations are also referred to as DNA loci (from Latin: locus is singular, loci are plural). The DNA loci used in profile analysis are shown on the karyotype below. Blackett Family DNA Activity Anatomy of the Autorad Autorads are x-ray films with dark bands representing RFLP markers. The bands are found in lanes, and each lane in this autorad contains DNA fragments from a different source. In the autorad to the left, the tops of the 15 lanes are numbered in red. Bands containing longer fragments of DNA are toward the top of the autorad and bands containing shorter fragments are toward the bottom. This is where the "Length Polymorphism" of RFLP is important; Because different individuals will have many different lengths of DNA possible at the location of an RFLP marker, different people will have bands at different places. Blackett Family DNA Activity Anatomy of the Autorad In lanes 13 and 14, the DNA analyst loaded in DNA from the grandparents. Notice that Norma and Fred do not share any bands. Although they are married, they are not "blood relatives" and you would generally not expect unrelated individuals to have the same bands. Blackett Family DNA Activity Anatomy of the Autorad In lanes 8 and 9, the DNA analyst loaded DNA fragments from himself (Bob) and his wife (Anne). Notice that Bob and Anne each have 2 bands in their respective lanes. At any given DNA locus, most people have two bands. Sometimes they only have one band and sometimes they actually have three. Also notice that like Fred and Norma, Bob and Anne do not share any bands at this DNA locus. Although Bob's top band is close to Anne's bottom band, they are not close enough to consider them a match. Blackett Family DNA Activity Anatomy of the Autorad As mentioned earlier, children inherit 23 chromosomes from their mother and 23 from their father. In this autorad, we are looking at the RFLP marker D1S7 located on chromosome 1. In lanes 10 and 12, the DNA analyst loaded DNA from his children David and Katie. Each child inherited one copy of the D1S7 marker from their mother and one copy from their father. These markers show up as bands on the autorad. Most people have two bands because they inherit one band from each of their parents. Note that David inherited his mother's top band and his father's top band. David's sister Katie, however, inherited mom's bottom band and dad's bottom band. Sometimes siblings will inherit the same bands from their parents, although this is not the case for David and Katie at this DNA locus (i.e., they share no bands at D1S7). Blackett Family DNA Activity Anatomy of the Autorad In lanes 4, 5, and 6 the DNA analyst loaded DNA from 3 unrelated individuals. Notice how none of their bands match with one another. Unrelated individuals will, however, occasionally share bands. For example, in this case, it appears that the top band in lane 4 could match the bottom band in lane 9 (Bob's DNA). DNA analysts are careful to always use a control when performing DNA analysis. In lane 3, Bob loaded a DNA sample that should always have bands in the same place on an autorad. If the control bands do not appear where the analyst expects them to be, the integrity of the rest of the information in the autorad is often questioned. If the control bands do appear where they should be, then the analyst has confirmation that the autorad contains usable information. In this case, the control bands were good. Blackett Family DNA Activity Anatomy of the Autorad Every autorad also has several lanes containing DNA ladders. Each band in these lanes contains a known length of DNA. The ladders are used to determine the length of the DNA in bands in other lanes. Blackett Family DNA Activity Evaluating the DNA profiles D1S7 D2S44 D4S139 D10S28 Here are some questions to further your understanding of the Blackett Family Activity. 1. Are the grandparents maternal or paternal? 2. The autorad contains 8 alleles for the siblings tested. Which of the alleles and how many are shared between each of the siblings? 3. Are any of the unknowns related to the family? If so, which ones? 4. Are any of the other unknowns tested related to each other? If so, which ones? 5. Are there any 1-locus matches between non Blackett family members? Blackett Family DNA Activity 2 Overview In this activity, you will learn the concepts and techniques behind DNA profiling of the 13 core CODIS (Combined DNA Index System) "Short Tandem Repeat" loci used for the national DNA databank. You will then have the opportunity to collect and interpret actual STR (Short Tandem Repeat)data, and to answer one or more of the following questions: 1. How is STR data used in a DNA Paternity Test? 2. How can STR data from close relatives be used to create a genetic profile of a missing person? 3. How much genetic diversity exists among siblings? 4. How does one calculate the probability for a specific DNA profile? The Science of STR DNA Profile Analysis STR Polymorphisms Most of our DNA is identical to DNA of others. However, there are inherited regions of our DNA that can vary from person to person. Variations in DNA sequence between individuals are termed "polymorphisms" (many different forms, in this case, length). As we will discover in this activity, sequences with the highest degree of polymorphism are very useful for DNA analysis in forensics cases and paternity testing. This activity is based on analyzing the inheritance of a class of DNA polymorphisms known as "Short Tandem Repeats", or simply STRs. STRs are short sequences of DNA, normally of length 2-5 base pairs, that are repeated numerous times in a head-tail manner, i.e. the 16 bp (base pair) sequence of "gatagatagatagata" would represent 4 head-tail copies of the tetramer (groups of 4 nucleotides) "gata". The polymorphisms in STRs are due to the different number of copies of the repeat element that can occur in a population of individuals. D7S280 D7S280 is one of the 13 core CODIS STR genetic loci. This DNA is found on human chromosome 7. The DNA sequence of a representative allele of this locus is shown below. This sequence comes from GenBank, a public DNA database. The tetrameric repeat sequence of D7S280 is "gata". Different alleles of this locus have from 6 to 15 tandem repeats of the "gata" sequence. How many tetrameric repeats are present in the DNA sequence shown below? Notice that one of the tetrameric sequences is "gaca", rather than "gata". 1 aatttttgta ttttttttag agacggggtt tcaccatgtt ggtcaggctg actatggagt 61 tattttaagg ttaatatata taaagggtat gatagaacac ttgtcatagt ttagaacgaa 121 ctaacgatag atagatagat agatagatag atagatagat agatagatag atagacagat 181 tgatagtttt tttttatctc actaaatagt ctatagtaaa catttaatta ccaatatttg 241 gtgcaattct gtcaatgagg ataaatgtgg aatcgttata attcttaaga atatatattc 301 cctctgagtt tttgatacct cagattttaa ggcc What are the 13 core CODIS loci? A National DNA Databank The Federal Bureau of Investigation (FBI) of the US has been a leader in developing DNA typing technology for use in the identification of perpetrators of violent crime. In 1997, the FBI announced the selection of 13 STR loci to constitute the core of the United States national database, CODIS. All CODIS STRs are tetrameric repeat sequences. All forensic laboratories that use the CODIS system can contribute to a national database. DNA analysts like Bob Blackett can also attempt to match the DNA profile of crime scene evidence to DNA profiles already in the database. There are many advantages to the CODIS STR system: The CODIS system has been widely adopted by forensic DNA analysts STR alleles can be rapidly determined using commercially available kits. STR alleles are discrete, and behave according to known principles of population genetics The data are digital, and therefore ideally suited for computer databases Laboratories worldwide are contributing to the analysis of STR allele frequency in different human populations STR profiles can be determined with very small amounts of DNA A DNA Profile: The 13 CODIS STR loci As part of his training and proficiency testing for DNA Profile analysis of STR (Short Tandem Repeat) Polymorphisms, Forensic Scientist and DNA Analyst Bob Blackett created a DNA profile on his own DNA. Here is Bob's DNA Profile for the 13 core Genetic Loci of the United States national database, CODIS (Combined DNA Index System): For each genetic locus, Bob has determined his "genotype", and the expected frequency of his genotype at each locus in a representative population sample. For example, at the genetic locus known as "D3S1358", Bob has the genotype of "15, 18". This genotype is shared by about 8.2% of the population. By combining the frequency information for all 13 CODIS loci, Bob can calculate that the frequency of his profile would be 1 in 7.7 quadrillion Caucasians (1 in 7.7 times 10 to the 15th power! In Bob's forensic DNA work, he often compares the DNA profile of biological evidence from a crime scene with a known reference sample from a victim or suspect. If any two samples have matching genotypes at all 13 CODIS loci, it is a virtual certainty that the two DNA samples came from the same individual (or an identical twin). Methods of Analysis of STRs We will assume that you have a basic understanding of the Polymerase Chain Reaction (PCR), and gel electrophoresis, especially as applied to DNA sequence analysis. We will focus here on the special features of PCR and gel electrophoresis as they are applied to STR characterization. If you are unfamiliar with these techniques, you should still be able to complete this activity. Methods in Analysis of the 13 CODIS STR loci 1. DNA extraction: DNA can be extracted from almost any human tissue. Buccal cells from the inside of the cheek are most commonly used for paternity tests. Sources of DNA found at a crime scene might include blood, semen, tissue from a deceased victim, cells in a hair follicle, and even saliva. DNA extracted from items of evidence is compared to DNA extracted from reference samples from known individuals. 2. PCR Amplification: DNA primers have been optimized to allow amplification of multiple STR loci in a single reaction mixture. By carefully adjusting the distance of the primers from the tetrameric repeat sequence, products from different loci will not overlap during gel electrophoresis. In the partial results shown below, the three STRs D3S1358, vWA, and FGA are being analyzed simultaneously. The lengths of the amplified DNAs are shown by the scale from 100 bp to 280 bp at the top of the figure. The middle panels with multiple peaks are reference standards with the known alleles for each STR locus. Notice that the alleles for the three different loci do not overlap. The lower panel shows the alleles for Bob Blackett's mother Norma for the D3S1358, vWA, and FGA loci. Norma's alleles have been compared by computer to the refrence standards, and labeled. To interpret this result, Norma's genotype is 15, 15 at the locus D3S1358, 14, 16 at vWA, and 24, 25 at FGA. 3. Detection of DNAs after PCR Amplification: The PCR primers in the commercial kits used for STR analysis have fluorescent molecules covalently linked to the primer. To extend the number of different loci that can be analyzed in a single PCR reaction, multiple sets of primers with different "color" fluorescent labels are used. Following the PCR reaction, internal DNA length standards are added to the reaction mixture and the DNAs are separated by length in a capillary gel electrophoresis machine. As DNA peaks elute from the gel they are detected with laser activation. The sequencing machines used for allele separation and detection are the same type currently being used in the Human Genome Sequencing project, with digital output that can be analyzed by special computer software. In the AmpFLSTR™ Profiler Plus™ PCR Amplification Kit from Applied Biosystems used by Bob Blackett, 9 STRs are analyzed by using three sets of primers. Each set has a different colored fluorescent label. In the figure above, three sets of STRs are represented by blue, three by green, and three by yellow (shown as black) fluorescent peaks. The red peaks are the DNA size standards. Special computer software is used to display the different colors as separate panels of data and determine the exact length of the DNAs. A tenth marker called AMEL is used to distinguish male DNA as X, Y or female DNA as X, X. A second kit, called Cofiler Plus, is used in a second PCR reaction to amplify 4 additional STR loci, plus repeat some of the loci from the Profiler Kit. The result from 2 PCR reactions is the analysis of the entire CODIS set of 13 STRs, with overlap of some loci, and a test for the sex chromosomes. The results are obtained as discrete, digital alleles determined from the exact size of the amplified products compared to known standards. Genetics of STR Inheritance Since there are no phenotypes associated with the CODIS STR loci, understanding the genetics of STR inheritance is simplified compared to other genetic problems. We need only consider the genotypes of the parents and their offspring. The alleles of different STR loci are inherited like any other Mendelian genetic markers. Diploid parents each pass on one of their two alleles to their offspring according. Here is brief review of the genetic concepts and terms important for understanding STR allele inheritance. Allele. The different forms of a gene. Different STR repeat lengths represent different alleles at a genetic locus, i.e. 8 and 9 are different alleles of the THO1 locus. Locus. The position on a specific chromosome where the different alleles of a genetic marker are located. The plural is loci. Monohybrid Cross. Genetic cross involving parents differing in only one trait. Inheritance of each of the 13 STR loci can be treated as a separate Monohybrid Cross. Genotype. The genetic composition of the alleles at a locus. Since we are diploid, we each have two alleles at each locus. Homozygous. Both alleles at a locus are the same, i.e. Fred has a genotype of 29, 29 at the D21S11 locus. Heterozygous. Alleles at a locus are not the same, i.e. Normal has a genotype of 29, 31 at the D21S11 locus. Multiple Allelic Series. Many different alleles at a locus, i.e. the known alleles at the vWA locus are 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, and 21. Punnett Square. A diagram used to determine all possible genotypes that can occur in a genetic cross. All of the diagrams on this page are Punnett Squares. Here are some examples of the how STR data can be interpreted in a family DNA study. The numbers outside the Punnett Squares are the parental alleles that can be present in the egg or sperm of the parents. The numbers inside the squares are the genotypes possible for the resulting children. Case 1: If the genotypes of both parents are known, we use a Punnett Square to predict the possible phenotypes of their offspring. Each child inherits one allele of a given locus from each parent. Panel (a) - At the D21S11 locus, the children of Bob Blackett and wife Anne can have four different genotypes. Son David is 28, 31. Daughter Katie is 29, 30. Panel (b) - Bob Blackett inherited the 31 allele from his mother, Norma. Therefore the 29 allele is paternal. If Bob's paternal allele was not 29, what would be your conclusion? Case 2: In the genotypes of a mother and several children are known, it is often possible to unambiguously predict the genotype of the father. In this case, Karen is the mother with a genotype of 9, 9.3 at the THO1 locus. From the Punnett Square we can determine that the paternal alleles of Tiffany, Melissa, and Amanda are 8, 9.3, and 9.3, respectively. Therefore, their father Steve must have a genotype of 8, 9.3. If the three daughters had three different paternal alleles, what would be your conclusion? Case 3: Sometimes only one allele of the father can be predicted when the genotypes of a mother and several children are known. In this example, the genotype of Karen, the mother, is 16, 17 at the D18S51 locus. The genotypes of the daughters are either 16 and 18 or 17 and 18. In each case, Melissa, Tiffany, and Amanda inherited the 18 allele from their father, Steve. We cannot determine if the genotype of Steve is homozygous, 18, 18 or 18, ? where the ? means any other allele. Case 4: Is it possible to determine parental genotypes when only the genotypes of their children are known? Consider the case of Bob Blackett's 4 first cousins, Marilyn, Buddy, Dick and Janet. Bob did not have DNA samples for their parents, Bud and Louise, who are both deceased. In a real forensic case, Bud and Louise might represent "missing persons". In panel (a) we can arrange the 3 known genotypes of the 4 children. In panel (b) we predict the only two paternal genotypes for the parents that can account for the children. Note that we cannot determine which genotype goes with which parent. Case 5: A variation on Case 4 is when there are only two genotypes known for the children, and both parental genotypes must be predicted. Panel (a) - Marilyn and Janet are 15, 16 at the locus D3S1358. Buddy and Dick are 18, 18. Panel (b) - The only parental genotypes that can give this result are 15, 18 and 16, 18. Once again, we cannot predict which parent as which genotype. Case 6: Sometimes the parental genotypes cannot be predicted unambiguously from the genotypes of their children. Marilyn is 16, 17 at the locus vWA. Buddy, Dick, and Janet are 16, 18. What are the parental genotypes? Panel (a) - One interpretation is that the parents are 16, 18 and 16, 17. Panel (b) - Another possibility is that one parent is 17, 18 and the other is 16, ?, where ? is any allele. DNA Profile Frequency Calculations Genotype Probability at any STR Locus Part of the work of forensic DNA analysis is the creation of population databases for the STR loci studied. Probability calculations are based on knowing allele frequencies for each STR locus for a representative human population (and showning Hardy-Weinberg equilibrium for the population by statistical tests). Allele frequency is defined as the number of copies of the allele in a population divided by the sum of all alleles in a population. For a heterozygous individual, if the two alleles have frequencies of p and q in a population, the probability (P) of an individual of having both alleles at a single locus is P = 2pq If an individual is homozygous for an allele with a frequency of p, the probability (P) of the genotype is P = p 2. We saw earlier that Bob Blackett has the genotype 15, 18 at the locus D3S1358. In a reference database of 200 U.S. Caucasians, the frequency of the alleles 15 and 18 was 0.2825 and 0.1450, respectively. The frequency of the 15, 18 genotype is therefore P = 2 (0.2825) (0.1450) = .0819, or 8.2%. Probability for a DNA profile of Multiple Loci If databases of allele frequency for different loci can be shown to be independently inherited by appropriate statistical tests, the probability for the combined genotype can be determined by the multiplication (product rule). The probability (P) for a DNA profile is the product of the probability (P1, P2, ... Pn) for each individual locus, i.e. Profile Probability = (P1) (P2) ... (Pn) The probability can be an extremely low numbers when all 13 CODIS STR markers are included in the DNA profile. As mentioned earlier, Bob Blackett calculated his own profile probability at 1.3 times 10-16, or no more frequent than 1 in 7.7 quadrillion individuals (7.7 million billion), which is more than a million times the population of the planet. Assignments Create a Blackett Family Pedigree The Blackett Family DNA Activity is largely a genetic study of the inheritance of alleles in an extended family. Bob Blackett has tested DNA samples from himself and 13 other relatives. The first task of a human geneticist is the creation of a family tree, or pedigree to help with the interpretation of genotypes. From the following relationships, construct a pedigree for the Bob and his relatives. Person Bob Anne David Katie Fred Norma Karen Steve Tiffany Melissa Amanda Louise Bud Buddy Dick Marilyn Janet Family Relationship Our propositus Wife Son Daughter Father Mother Sister Husband of Karen Daughter of Karen and Steve Daughter of Karen and Steve Daughter of Karen and Steve Sister of Fred; Bob's Aunt Husband of Louise Son of Bud and Louise Son of Bud and Louise Daughter of Bud and Louise Daughter of Bud and Louise Collecting STR DNA profile data STR Data for the Blackett Family These data are from the actual DNA analysis of the Blackett family members by Bob Blackett. The tracings below show the genotypes for three of the 13 CODIS STR loci. In this activity, you will record the data for use in the ensuing genetic analysis of the Blackett family. Data on the other 10 loci will be provided later. Collect the data for Bob, Anne, David, Katie, Fred and Norma for the "Paternity Testing with STR" Activity. Collect the data for Karen, Tiffany, Melissa, and Amanda for the "DNA Profile of a Missing Person" Activity. You will not need to collect the results for Buddy, Dick, Marilyn and Janet. They are provided for you to create your own activity, i.e. Can you make any conclusions about Louise and Bud? Paternity Testing with STR Data In this activity, you will assume the role of a Human Geneticist in a DNA Paternity Testing Laboratory. You have just obtained the DNA Profiles for Bob, Anne, David and Katie. You also have information about Bob's parents, Fred and Norma. In your role as a Human Geneticist, it is not essential that you know all of the laboratory techniques used to obtain the Blackett family genotypes. Your work is based on understanding the principles of Mendelian Genetics as applied to STR loci. Here are your options: Go immediately to the questions below and interpret the data you have already collected. Review the principles of genetics needed for this activity Use the data that we have collected for you. Choose from among the following questions to test your understanding of human genetics. 1. Who are the parents of David and Katie? : Do all of the data you have collected on the genotypes of Bob, Anne, Katie, and David support the conclusion that Bob and Anne are the biological parents of David and Katie? You should justify your answer by reference to the specific genotypes for the STR loci. 2. What is the genetic legacy of Fred and Norma? : The alleles that Bob passes on to his children have in turn been inherited from Bob's parents, Fred and Norma. Identify the alleles among the 13 CODIS STR loci in the genotypes of Katie and David that have been unambiguously inherited from each of their paternal grandparents. Now identify any additional alleles that might have been inherited from their paternal grandparents. 3. Genetic Diversity and Sexual Reproduction : Human geneticists are often asked why children have not inherited a particular trait from their parents. As a human geneticist, you know that one mechanism to insure genetic diversity is the independent assortment of alleles of different loci during gamete (egg and sperm) production, i.e. Mendel's Second Law of Genetics. To illustrate this important genetic principle, calculate how many genotypes would be possible among the children of Bob and Anne for the combined DNA profile from the D3S1358, vWA, and FGA. If you feel really ambitious, now calculate the possible genotypes of the children of Bob and Anne for all 13 CODIS STR loci. 4. How many genotypes are possible in a population for a three locus DNA Profile? : If there are two alleles, A and B, at a genetic locus in a population, there are three possible genotypes, namely AA, BB, and AB. If there are three alleles, A or B or C, there are six possible genotypes, namely AA, BB, CC, AB, AC, and BC. For N different alleles, the total possible genotypes are given by the following expression: If we assume that the allele reference ladders from our data collection exercise represent all possible alleles (a conservative estimate), how many genotypes are possible in a population for the combined STR loci of D3S1358, vWA, and FGA? 5. How many genotypes are possible in a population for the combined CODIS 13 STR loci? : If you feel really ambitious, you may wish to calculate the number of possible genotypes considering all 13 CODIS STR markers. The table below shows the number of alleles for each locus. Beware, the number will be very large. DNA Profile of a "Missing Person" In this activity you will assume the role of a forensic DNA analyst. Your task will be to determine the DNA profile for a "missing person" from the analysis of close family members. DNA analysts often have to recreate genotypes for those whose DNA is not readily available for analysis. A recent case of great national interest was the identification of the remains of the Vietnam soldier who had been interred in the Tomb of the Unknown Soldier. Here are your options: Go immediately to the questions below and interpret the data you have already collected. Review the principles of genetics needed for this activity Skip the data collection, and use the data that we have collected for you for Question #1. Completed data for Buddy, Dick, Marilyn, and Janet in for Question #2. 1. What is Steve's Genotype? : In our activity, we obtained data for Karen and her three daughters, Tiffany, Melissa, and Amanda. Bob Blackett has not yet had the opportunity to test the DNA of Steve, so Steve can play the role of the "missing person" in our activity. Determine Steve's genotype at the 13 CODIS STR loci. Indicate whether there is an unambiguous genotype where both alleles are known, or some uncertainty about both paternal alleles. 2. What are the Genotypes of Bud and Louise? : What happens when we have two missing people? Human geneticists are often asked to determine if adult children in the same family all have the same biological parents. Demonstrate that all of the genetic information for the children of Bud and Louise is consistent with all 4 having the same two parents. DNA Profile Frequency Calculations In this activity, you can calculate the probability for some of the DNA profiles you have been studying. The following sets of data are tables of allele frequency for the three STR loci D3S1358, vWA, and FGA for a combined, Caucasian population. The frequency data come from the web site of the Royal Canadian Mounted Police. 1. What is the Probability for a 3-Locus DNA profile? : Based on a population database of Caucasians developed by Bob Blackett and colleagues in Arizona, Bob can calculate the genotype frequency of his combined profile for the three STR loci D3S1358, vWA, and FGA to be 6 x 10-5. Compare this frequency with the frequency you calculate from the Royal Canadian Mounted Police data. For help with this calculation, review the DNA Profile Frequency Calculation earlier in the packet (page 15). 2. Check your answers. : As an alternative to doing all of the arithmetic yourself, you can calculate a profile's Random Match Probability using the RCMP on-line calculator at http://www.csfs.ca/pplus/profiler.htm DATA PAGES: