Contents Introduction ............................................................................................................................................ 2 Module 1 ................................................................................................................................................. 3 Part 1. Tomato genome ...................................................................................................................... 3 Part 2. Gene structure......................................................................................................................... 7 Part 3. Gene detection ........................................................................................................................ 9 PCR .................................................................................................................................................. 9 Southern blotting .......................................................................................................................... 12 BLAST............................................................................................................................................. 15 Module 2.Transcription and RNA processing in eukaryotes................................................................. 15 Part 1. Transcription and processing ................................................................................................ 15 RNA prcessing steps: ..................................................................................................................... 17 5’-cap............................................................................................................................................. 17 Splicing .......................................................................................................................................... 17 Polyadenylation ............................................................................................................................ 18 Differences between Eukaryotes and Bacteria ............................................................................. 19 Alternative splicing........................................................................................................................ 20 Part 2. RT-PCR ................................................................................................................................... 22 RT-PCR ........................................................................................................................................... 22 RNA isolation................................................................................................................................. 22 Quantitative RT-PCR; qPCR ........................................................................................................... 24 Part 3. Sequencing ............................................................................................................................ 25 Sanger dideoxy-sequencing .......................................................................................................... 25 Module 3 ............................................................................................................................................... 27 Part 1. Western blot .......................................................................................................................... 27 Part 2. Mutations .............................................................................................................................. 28 Module 4. Recombinant DNA technology ............................................................................................ 29 Part 1. Cloning ................................................................................................................................... 29 Part 2. Applications ........................................................................................................................... 33 Introduction You will get a better insight into the genome complexity of eukaryotes: what’s the structure of a gene like and how are genes organized in the genome? We’ll be doing research on the RbcS1 gene from tomato. Module 1: Genome organisation and gene structure in eukaryotes – How to identify, detect and isolate a specific DNA fragment/gene in the genome? – How does the structure of an eukaryotic gene look like? – How many homologous genes/DNA sequences are present in the genome? Module 2: Transcriptional analysis – Where, when and how strong is a gene expressed? Module 3: Protein analysis – How to detect a specific protein? – What effects can different mutations in the DNA have on the encoded protein? Module 4: Cloning – How to clone a gene; for example to make a fusion with the green fluorescent protein GFP? In total: You will isolate a specific gene fragment, encoding the RbscS1 protein, from the mRNA that is produced in tomato leaves, and this cDNA fragment will be cloned into E. coli creating a genetically modified bacterium. The experiment is around a gene that encodes for a small subunit of the proteincomplex RuBisCo, RbcS1 from tomato. RuBisCo, ribulose 1-5 biphosphate carboxylate oxygenase, is the best known and most crucial enzyme in all forms of photosynthesis and therefore considered the most abundant protein on Earth. RuBisCo is also part of the Calvin cyclus that fixes CO2 in the chloroplast. Module 1 This module has three parts: Part 1: Visualize genome complexity by isolating and analysing the (genomic) DNA of tomato and compare it to the DNA of E. coli bacteria. Part 2: Examine how an eukaryotic gene, ie tomato RbcS1, is built up. Therefor, you will use a software program to analyse DNA sequences. Part 3: How to detect and isolate a specific DNA fragment/gene from the complex genomic DNA. Therefore, you will specifically isolate the RbcS1 gene from tomato and the TRYP gene from E. coli. Part 1. Tomato genome Only a small part of the genome of (higher) eukaryotes actually codes for functional genes or proteins. It was long thought that a higher complexity of an organism would be reflected in it’s genome size (more protein coding genes more complexity). However, we now know that there is no correlation between genome size and organismal complexity. The colored bars in the figure depict the range of genome size that are found within a particular class of organisms. Angiosperms: all flowering plants Protists: relatively simple eukaryotes (amoeba) When we are studying biological processes at the molecular level, most often we are interested in studying the genes. We want to know what they encode and how they are regulated, to understand how they influence a biological process or cause disease. So we need to identify which parts of the genome represent (active) genes. The nuclear genome of tomato consists of 900 million basepairs and is divided over 12 chromosomes. Like humans, tomato is a diploid, which means that there are two copies of each chromosome; one of each parent. So for each gene there are also two copies. The copies can vary a bit in sequence, and we call them alleles. An allele represents a variant of a gene at a particular position (locus) on a chromosome. If a diploid organism has two alleles that are identical, than we call it a homozygote. If the two alleles differ in sequence than we call it a heterozygote. There is more DNA at the darkly stained areas than in the lighter stained areas. The lighter stained areas are more accessible for DNA binding proteins. The darker stained areas we call heterochromatin. These parts of the chromosome are very compact. Eukaryotic chromosomes are packaged into chromatin, which is composed of DNA and proteins (mostly histones). The basic unit of chromatin is the nucleosome. The nucleosome contains about 150 bp DNA wrapped ~1.8 times around a core of eight histones. The nucleosomes are surrounded by another histone, H1, by which they can be arranged into higher order structures. In this way the DNA is basically “wound-up” to fit in the relatively small nucleus. This packaging makes the DNA not readily accessible to regulatory proteins and for example RNA polymerases that need to transcribe DNA into RNA. So the genes in these areas are mostly inactive. The lighter stained areas we call euchromatin. Here the DNA is much less densely packaged, it is much more accessible to regulatory proteins and the transcription machinery. So active genes are mostly located in the euchromatin. The modification of the chromatin structure plays a distinctive role in eukaryotic gene regulation. The exact chromatin structure, so the distribution of heterochromatin and euchromatin areas, can differ between different cell types in an organism and is controlled by proteins called chromatin remodelling enzymes. In addition to the nuclear genome. Mitochondria and chloroplasts also contain their own DNA that encodes for specific genes/proteins. The most accepted theory, endosymbiotic theory, states that mitochondria and chloroplasts arose when an eukaryotic cell (or its predecessor) took up an aerobic proteobacterium (mitochondria) and later, in case of plants, a photoautotrophic cyanobacterium (chloroplasts) inside its cell as an endosymbiont that later evolved into an organelle. The organization of the DNA in mitochondria and chloroplasts is strikingly similar to the organization of DNA in bacteria. During evolution, many of the genes of the original bacterium were lost from the genome of the organelle, and were transferred to the nucleus of the host. Agarose gel electrophoresis To study DNA, you need to visualtize it. With agarose gel electrophosresis you can separate and visualize DNA molecules based on their size. When a suspension of agarose (a polysaccharide polymer form seaweed) is boiled and subsequently cooled down it will form a gel containing pores. Because small DNA fragments can migrate easier through these pores, then large DNA fragments. So with agarose gel electrophresis you can separate DNA fragments of different sizes. The separating capacity of an agarose gel is however limited. Very small fragments (less than 50 bp) or very large fragments (more than 20 kb) can in principle not be separated on a standard agarose gel. Generally, an agarose gel is used to separate DNA fragments between 50 bp and 20 kb. There is a linear relation between the mobility and the agarose concentration. Mostly 1% agarose is used for agarose gel. A higher percentage of agarose will result in smaller pores en andersom. To make sure that the DNA stays in the slot of the agarose gel and to visualize where you loaded your sample on the gel, a loading buffer, containing 50% glycerol (to raise the density of the DNA solution) and a dye is added to your DNA solution. To visualize the DNA in the gel, the DNA is stained with a fluorescent dye, called ethidium bromide. It’s a compound, meaning that is binds between the two strands of the (double stranded) DNA molecule. Under UV-light of 590 nm the DNA will fluoresce. To be able to determine the size of the DNA fragments, a so-called DNA size marker is loaded next to your samples. So you can see what the sizes in bp are of the different bands. Measuring DNA/RNA concentration The concentration of a DNA or RNA solution can be measured using a spectrophotometer. You measure the optical density (O.D.) (aka the absorbance of light) at wavelengths of 260 nm and 280 nm. DNA and RNA adsorb light at 260 nm, while proteins absorb light mostly at 280 nm. The concentration of DNA can be calculated according to the law of Lambert-Beer: O.D. (A) = ε x c x L A= adsorbance L = the length of the light-path (cm) c = the concentration (mol/l) ε = the molar extinction coëfficiënt (l/mol/cm) (for DNA: 0.020) (for RNA: 0.025) The ratio (O.D. 260/280) should be above ~1.8 for DNA, and ~2.0 for RNA. When the ratio is lower, it means that you are measuring a lot of (contaminating) proteins that cause the relatively high absorbance at 280 nm. Restriction enzymes A very important and much used method to analyze and manipulate DNA, is to digest (cut/cleave) the DNA with so-called restriction enzymes. Restriction enzymes naturally occur in bacteria, where they function to protect the genome against invading DNA, such as for example bacteriophages that try to inject their genome into the bacteria. Restriction enzymes in the bacteria take care that this invading DNA is cleaved. Many restriction enzymes are named after the bacterial species where they were first isolated from; for example EcoRI stands for Restriction enzyme I from E. coli. Many restriction enzymes recognize specific palindromic sequences and different enzymes recognize different (unique) sequences, which we call restriction sites. Depending on the number of bases that is recognized we speak of for example: 4-, 6-, 8-cutters. ExoRI is an exampleof a 6-cutter, that recognizes the 6-bp sequence (restriction site): GAATTC. When EcoRI cleaves the DNA it generates a so-called sticky end. This is called a sticky end because the AATT-overhang that is generated can hybridize (anneal) with the TTAAoverhang generated in the opposite (complementary) strand, due to basepairing. The restriction enzyme SmaI cleaves the restriction site GGGCCC exactly in the middle, thereby generating socalled blunt end fragments. Restriction enzymes cut extremely reproducible. This means that when you add sufficient restriction enzyme, it will digest the DNA at every place where there is a recognition (restriction) site for that enzyme. So you get a very reproducible digestion pattern. Different restriction enzymes will generate different patterns. The concentration of a restriction enzyme is expressed as units/µl; where 1 unit of enzyme can digest 1 µg of DNA in 1 hour at the appropriate temperature. Most restriction enzymes work best at 37 OC. To make sure that each site is digested we usually add ~5-10x more enzyme than strictly required based on the units. If you: add too little enzyme, incubate the reaction at a too low temperature, or during a too short period of time the enzyme will not digest at the restriction site. This is called partial digest. Digesting the DNA with two different restriction enzymes at the same time is called double digest. Part 2. Gene structure In this part we will focus on the structural organization of a typical eukaryotic (protein-coding) gene. Only a very small part of the genome actually codes for a protein…. So from the complex genomic DNA, we are often only interested in a small fraction of the DNA. Intron/exon 1. RbcS1 genomic DNA . This is the nucleotide sequence of a small part of chromosome 3 of tomato. This piece contains the complete RbcS1 -gene. 2. RbcS1 cDNA . cDNA stands for copy DNA and is a DNA copy of the RbcS1 messenger RNA (mRNA). Base T instead of base U. The difference between these two sequences is: the genomic sequence contains introns and exons, the cDNA (mRNA) sequence contains only exons. You can determine the position of the exons and introns by comparing the genomic DNA sequence with the cDNA sequence. Transcription start/stop signals The distribution of exon/intron is only relevant after transcription has occurred, because splicing occurs on the primary transcript. For transcription to start, a proper startsignal is required. This startsignal occurs in a region that we call the promoter. A promoter is a DNA element that determines/regulates the expression (the transcription of the gene into RNA). The promoter is located before the first exon and is therefore not transcribed into RNA. The promoter region contains oa. a sequence (TATA box) where the RNA-polymerase enzyme complex binds. Thereby, the promoter determines the startpoint for making RNA. Eukaryotic RNA polymerases are not able to bind to promoter sequences on naked DNA. They require additional DNA binding proteins to bind to the DNA first. These DNA binding proteins are called transcription factors, and they are essential to initiate RNA synthesis by the RNA polymerase. In eukaryotes, transcriptional regulatory regions, socalled enhancers, can additionally be located far away from the actual transcription start site. The threedimensional organization of the DNA makes sure that the regulatory proteins are oriented in the proper way to assemble the transcription complex. The signal that markes the end of the mRNA, polyadenylationsite, is the place on the mRNA where polyadenylationcomplex binds. This complex stabilizes the end of the mRNA by adding a poly-A tail. Polyadenylation Polyadenylation is one of the processing steps during the production of mRNA in eukaryotes. Most eukaryotic mRNAs have a poly-A tail at their 3’-end. This poly-A tail is not encoded as stretch of T’s in the DNA. The end of the mature mRNA is marked by the polyadenylationsite on the primary transcript. When the process of transcription occurs, the RNA polymerase synthesizes RNA far beyond the end of the gene. Therefore, primary transcripts can be hundreds of nucleotides longer at their 3’end than the processed/mature mRNA. These ends are cleaved off by the so-called polyadenylationcomplex, which binds at a conserved ponlyadenylationsite, AAUAAA, in the mRNA. After cleavage behind this site, a stretch of 100-250 A’s is added, the so-called poly-A tail. Open reading frame, 5’-UTR and 3’-UTR mRNA is translated into protein by the ribosomes. The ribosome “reads” the mRNA and couples the appropriate aminoacids into a chain. Different combinations of three bases code for different aminoacids. The open reading frame (ORF) is the part of the mRNA that is translated into aminoacids. The first codon (AUG) of the ORF is called the start codon and the stopcodon, stops the chain. The stopcodon itself does not make part of the ORF, it is not translated! There are sequences in the cDNA before the startcodon (AUG). The ribosome starts making protein at the startcodon, so the region in the mRNA located before the startcodon will not be translated into aminoacid sequence. We call this region the 5’-UTR, which stands for 5’-UnTranslated Region. Similarly there is a region after the stopcodon called the 3’-UTR. Part 3. Gene detection There are different methods to detect a specific DNA sequence. We’ll learn three methods: PCR, Southern blotting, BLAST searchs. PCR Polymerase Chain Reaction, PCR, is a method to very strongly amlify a specific piece of DNA. Using PCR you can create millions/billions of copies from one specific DNA molecule. There is one prerequisite: you need to know the sequence of the ends of the DNA fragment that you want to amplify, because you need to attach primers. Primers are single stranded DNA molecules of 20-30 bp, that are complementary to the ends of the fragment that you want to amplify. You need two primers. One primer (the forward primer) complementary to one strand of the DNA (template) and the other (reverse primer) complementary to the opposite strand of the template DNA. !! DNA and RNA molecules can only be extended at their 3’-ends (; synthesis proceeds from the 5’ end (phosphate) to the 3’ end !! To make a lot of copies of DNA fragments you need a heat-stable DNA polymerase, the Taq polymerase. Its very crucial for the cycles of DNA strand separation (denaturation), primer annealing and primer elongation (the actual synthesis of new strands). Summarized: Thinks necessary for PCR: - genomic tomato DNA forward primer reverse primer dNTP’s (A, C, G, T nucleotides) Taq polymerase a PCR buffer 30 cycles of PCR: 230 = ~1000000000 Annealingtemperature Tm is: Tm (0C) = 2 x [number of A+T] + 4 x [number of G+C] – 5 5’-TGGCGACCCTGGAAAAGCTG................................CTGTGCAGTGATGACGCAGA 3’ 3’-ACCGCTGGGACCTTTTCGAC.................................GACACGTCACTACTGCGTCT 5’ Forward: 5’-ACCCTGGAAAAGCTG-3’ reverse: 5’-GTCATCACTGCACAG-3’ To design primers for this PCR reaction, the forward primer should have the same sequence as the 5' end of the indicated strand so it will bind to the 3'end of the complementary strand. The reverse primer is the reverse complement of the 3' end of the indicated strand. The sequence of the primers will also end up in the final PCR product. You can use this to add extra sequences, such as a restriction site, to the ends of your PCR product. Q: Why is it important to use a positive control and what would you use? Use a previously purified and verified gene fragment (corresponding to the gene you want to amplify), to check that the PCR reaction was functional. By using a known/verified DNA fragment as template you can check whether your primers and PCR components all work fine, and can generate the correct PCR product. In this experiment you will get a plasmid containing the RbcS1 gene or TRYP gene from your supervisor to use as positive control. When possible, always use a positive control to verify that the PCR could work! Note, in practice it is sometimes not possible to use a positive control as it may not be available in all experimental setups. Q: Why is it important to use a negative control and what would you use? Use all the ingredients, without the template (genomic) DNA, to check whether there is already amplification of the fragments. By leaving out the template DNA you can check whether there is any contamination (for example genomic DNA or gene fragments in the water or contamination from the pipette). The negative control should not give any products in the PCR! Southern blotting Southern blotting is a technique to detect specific DNA fragments in a complex mix based on the principle of basepairing (hybridization) between complementary single stranded DNA molecules, just like the annealing of primers to the template DNA in a PCR reaction. So the fact that an A hybridizes to a T, and a G to a C, by means of hydrogen-bridges. A Southern blot can be used for example to: - determine whether a DNA fragment is present in the genome determine whether a gene isolated from organism X is also present in the genome of organism Y. determine how many homologous genes are present within the genome of an organism. For example, to find out whether a certain gene is part of a gene family, such as RbcS1. For a southern blot you first need to make a so called filter-replicate of your DNA fragments, for instance after these fragments have been separated through agarose gel electrophoresis. A simple way to transfer DNA from the agarose gel to the membrane filter, and for which no special equipment is needed, is via capillary force. (setup above). It is important that the DNA that you transfer is single stranded, because you want to use it for hybridisation!! The gel gets incubated in a very basic solution (salt solution in the image above) and so the DNA gets denatured. Next, a membrane that will bind the DNA is placed on top of the gel. By stacking a pile of (moistureabsorbing paper) towels on top of this membrane, the solution together with the DNA will be sucked into the membrane by capillary force. To definitely bind the DNA to the membrane, the membrane is often “baked” at 80 0C (or treated with UV-light) after transfer. Next, you incubate the blot (the membrane containing the single stranded DNA) with a single stranded labelled DNA probe, representing the DNA fragment that you want to detect. Often a radioactive labelled single stranded probe is used, because this allows a very sensitive detection method. Alternatively, fluorescent labeled or enzyme-labeled probes can be used. A radioactive labeled probe can be detected using a photosensitive X-ray film, called an autoradiogram. The single stranded labeled probe will now try to "bind/anneal/hybridize" to its complementary DNA sequence on the blot. Probes that cannot bind are washed away. Just like primers in a PCR reaction, the stringency of hybridization depends on the temperature (the higher the temperature, the more specific the hybridization will be because the binding-strength between the probe and the DNA on the blot needs to be stronger at higher temperature) and the salt concentration of the hybridization/wash-buffer; the lower the salt concentration, the more specific the hybridization will be as the strength of the hydrogen-bridges between the complementary bases will be lowered. So, when your probe is 100% complementary to the DNA on the blot, you can use a high stringency (high temperature (ie. 65 oC) and a low salt concentration) when incubating the blot with the probe. Is the probe less specific, for example a homologous gene from a different organism (which will have a slightly different sequence), you need to use a lower stringency to allow sufficient (strong enough) binding of the probe. However, a too low stringency will cause the probe to hybridize to aspecific places on the DNA. Therefore, optimal hybridization conditions need to be found in practice. Autogram of a southern blot performed on genomic DNA. Left DNA size marker. Right digested EcoRI and HindIII. The fact that only one band hybridizes in both digests strongly indicates that the probe was generated from a single copy (unique) gene. In case the probe would be a gene that is part of a gene family, or a repetitive sequence, you would expect multiple hybridizing bands. The term Southern blotting is used when you transfer and analyse DNA on a blot. There is another practice named Northern blot. With this you run RNA on an agarose gel and transfer it to a membrane. This can be used to determine ewhether a gene is transcribed. So to determine where (in the tissue), when an d how strong a gene is active. Western blot is also a different method where you separate proteins based on their size and transfer them to a membrane. (module 3) A variation on Southern blotting is FISH: Fluorescent In Situ Hybridisation In a FISH experiment, a fluorescently labelled DNA probe is used to hybridize to the chromosomes spread on a microscope slide. BLAST Basic Local Alignment Search Tool Instead of comparing all sequences by hand BLAST uses a mathematical algorithm fo compare sequences. This algorith works in two steps. First, small pieces of the input sequence are compared to sequences in the database. Next, the sequences that "match" to this small piece are more thoroughly compared in the second step. There are two standard methods to compare sequences: BLASTn and BLASTp. BLASTn (n = nucleic acid) compares a nucleotide sequence with a database of all nucleotidesequences. BLASTp (= protein) compares an aminoacidsequence with a database of all aminoacidsequences. Bit score The Bit score gives an indication of how homologous two sequences are. The higher the bit score, the more two sequences resemble each other. The maximal bit score equals twice the length of the sequence that u put in (your query). So, a sequence of 800 bp has a maximal bit score of 1600. E-value The Expect value, or E-value, is a parameter that describes the number of hits one can "expect" to see by chance when searching a database of a particular size. It represents the chance that you retrieve the same bit score when u use an arbitrary sequence in a BLAST search. The lower the Evalue, or the closer it is to zero, the more "significant" the match is. However, keep in mind that virtually identical short alignments have relatively high E values. This is because the calculation of the E value takes into account the length of the query sequence. These high E values make sense because shorter sequences have a higher probability of occurring in the database purely by chance. Module 2.Transcription and RNA processing in eukaryotes The production of mRNA is called transcription. Gene transcribing into RNA is called expression of a gene. Part 1. Transcription and processing Transcription summarized in a image: DNA and RNA contain different nucleotides. DNA deoxyribonucleotides (A, C, G, T; lacking an -OH group at the 2nd C-atom of the ribose). RNA ribonucleotides (A, C, G, U; containing an -OH group at the 2nd C-atom of the ribose). Because DNA is double stranded, a gene can be encoded in either the upper strand or in the bottom strand. The DNA strand that is used as a template to make the mRNA is called the template strand . The complementary strand is called the non-template strand. RNA prcessing steps: 1. 2. 3. 4. 5. 6. 7. 8. 9. Binding of RNA polymerase to the promoter region Transcription Addition of a 5’-cap Cleavage of the primary transcript at the 3’-end Addition of a poly-A tail Removing the introns, ie. splicing Transport to the cytoplasm Binding by the ribosomes Translation 5’-cap The first modification that occurs when the primary RNA transcript is being made is the addition of a 5’-cap at the 5’-end of the RNA molecule. This is a modified Guanine linked by three phosphate groups at the start of the RNA. The function is to stabilize the 5’-end of the mRNA and aid transport to the ribosomes. Splicing Eukaryotic genes contain exons and introns. Exons are the pieces translated into protein, and introns need to be removed before the mRNA transcript can be translated into protein. The removal of introns and the joining of exons is called splicing. Spliceosome is the enzyme that does this. This enzyme complex consists of (>100) proteins as well as so-called small nuclear RNAs(snRNAs). These core components of the spliceosome recognize conserved nucleotides in the sequence of the intron. These conserved nucleotides occur in every intron, and are called splice-sites. They are GU at the 5'-end and AG at the 3'end; the so-called GU-AG rule. Another conserved site is an A residue between 15 and 45 nucleotides upstream of the 3'-splice site; the branch point. The spliceosome makes that the mRNA is properly folded to allow the removal of the intron. First the 5’-donor end joins to the interal branch point. Second, the two exons are joined together. Polyadenylation Almost all eukaryotic mRNA contain a poly-A tail, 100-250 A’s, at the 3’-end of the mature mRNA. This tail protects the 3’-end of mRNA against nucleases that try to break down the RNA and aids the transport of the mRNA to the cytoplasm. Polyadenylation is performed by an enzyme complex, called the polyadenylation complex. This complex recognizes and binds to a specific sequence in the RNA, the socalled polyadenylation-signal. Often this polyadenylation-signal is AAUAAA (although some variation in this sequence is observed). The RNA is cleaved several basepairs behind this signal at the polyadenylation-site. Next, a poly-A polymerase enzyme adds a stretch of A's to the end of the mRNA molecule. Differences between Eukaryotes and Bacteria Bacteria have a circular genome that is located in the cytoplasm (no nucleus). In general this genome is much smaller than eukaryotic genomes. This is partly because bacterial genes do not contain introns. Furthermore, the genes are located much closer together and importantly, many genes are organized in so-called operons. An operon is a functioning unit of bacterial DNA containing a cluster of genes under the control of a single promoter. The genes in this cluster/operon are all transcribed as one big mRNA, by which different proteins are encoded by one big mRNA. The expression of ^^these 5 genes is controlled by one transcription regulation region (the promotor region) upstream of the genes. when transcription occurs, all 5 genes are transcribed end become one large mRNA molecule. Ribosomes synthesize the mRNA immediately since it’s in in the cytoplasm. The bacterial mRNA does not have a 5’-cap and no poly-A tail, so they are less stable, and break down faster. This way of organizing genes, only one transcription regulation/promoter region is required to control the mRNA synthesis of all the genes for proteins that work together. And the bacteria can turn on or off whether proteins for one thing are made or not. Alternative splicing Higher eukaryotes, such as plants and mammals, are considered to be more complex organisms than lower eukaryotes; compare for example humans to amoeba. However, as you know from the introductory lecture, the complexity of an organism does not correlate with the genome size or the number of genes in an organism! One of the factors that contribute to a higher complexity is through variations in the splicing of the introns; so-called alternative splicing. Through alternative splicing, multiple different mRNA molecules can be made from one gene, that will consist of different (combinations of) exons. And as a result different proteins can be encoded by that one gene. An extreme example is the DSCAM gene in Drosophila (fruitfly), which controls the growth direction of nerve-cells. This gene has multiple (cassette) exons, which can result in over 38.000 different proteins through alternative splicing. Compared to this the number of ~18.000 genes in Drosophila is rather small. So the number of different proteins is much dependent on the alternative splicing of the primary RNA transcript. Below you see several examples of possible results of alternative splicing: Alternative splicing is a very regulated process, which is often cell- or tissue-specific. In other words, different cell-types can show different alternative splicing (different exons are joined) and therefore have different mRNA's from the same gene. In the image below you see the exon-intron distribution in the primary transcript of the rat alpha-tropomyosin gene (a component of the cell cytoskeleton). This gene is alternatively spliced in different cell-types. As you can see in the image, different cells contain different spliced mRNA's. Note that each mRNA needs to have a poly-A tail, and therefore multiple poly-adenylation signals have to be present in the gene. Part 2. RT-PCR RT-PCR RT-PCR is a very sensitive method to determine where, when and how strong a gene is expressed. PCR is used to amplify specific mRNA’s to study their abundance in a certain organism or tissue. Every cell has its own transcriptome: ie. its own collection of mRNA’s that occur in that cell type. One of the most used techniques to study the occurrence of a specific mRNA in a tissue is via RT-PCR. Before you can PCR on mRNA, the mRNA first needs to be converted into so-called copy DNA, or cDNA. This can be achieved by an enzyme called reverse-transcriptase. Reverse transcriptase is a RNA-dependent DNA polymerase, that can use RNA as a template to make a DNA strand. Like all DNA polumerases, reverse transcriptase needs a small double-stranded piece to start the synthesis of a new strand. Therefore a primer is needed that can attach to the mRNA so that a doublestranded region is created from which the reverse transcriptase can start making the new cDNA strand. A much used primer to make cDNA is an oligo-dT primer; a stretch of ~25 T’s that can anneal to the poly-A tail of eukaryotic mRNA’s. After this primer anneals the reverse transcriptace can, in the presence of dNTP’s (A, C, T, G) make a new cDNA strand. This cDNA can then serve as a template for the PCR. And then analysed on agarose gel. The amount of PCR product that you see on the gel is directly proportional to the amount of cDNA that was in the sample, so the amount of mRNA that was in the sample. The primers determine which gene is amplified, so the amount of PCR product in an RT-PCR reaction is proportional to the amount of mRNA for that specific gene. RNA isolation You need isolated RNA from a certain organism or tissue before you can start an RT-PCR experiment. There are different types of RNA: mRNA: messenger RNA tRNA: transfer RNA (brings correct amino acid to the mRNA during translation by the ribosomes) snRNA: small nuclear RNA (have a catalytic function in for example spliceosome) rRNA: ribosomal RNA (largest class, is present in any cell) Ribosomes are build up out of proteins as well as functional RNA molecules, the rRNA. Prokaryotic and eukaryotic ribosomes are almost the same, only the subunits have different sizes. The rRNA's are encoded in the genome by the rDNA genes. These rDNA genes form a gene family, with hundreds of members that are organized as tandem repeats in the genome. So there are many rDNA genes, at a particular place in the genome, that are all transcribed into rRNA. Along each gene many RNA polymerases are transcribing in one direction. The growing RNA transcripts appear as threads extending outward from the DNA backbone. The shorter transcripts are close to the start of transcription, the longer ones are near the end of the gene. Only mRNA’s contain a poly-A tail!! Isolating only the mRNA’s For some experiment you only want mRNA. To isolate mRNA’s you make use of the fact that all eukaryotic mRNA’s contain a poly-A tail. By applying the total RNA isolation to a column containing an oligo-dT matrix (see below), the complementary poly-A tails will hybridize/bind to the oligo-dT matrix. As the other RNA classes do not have a poly-A tail, they do not bind and can be washed away. Next the mRNA's can be eluted from the column, resulting in a pure mRNA preparation. Quantitative RT-PCR; qPCR If you want to know exactly how much stronger a gene is expressed in one tissue or treatment compared to the other, you can use quantitative RT-PCR, or qPCR. This technique is also called realtime PCR, because you follow the amount of double stranded DNA that is produced in the PCR in real time. A commonly used method to perform a qPCR makes use of a fluorescent dye called SYBR green, which binds to double stranded DNA. It is only fluorescent when it is bound to double stranded DNA. To detect the SYBR green fluorescence a special type of PCR machine is used which can measure the amount of fluorescence after each PCR cycle. First a certain number of cycli is needed to get sufficient signal (above background)... next there is an exponential amplification of the specific PCR product (a specific gene)... and eventually there is a plateau reached because the reaction will be saturated. The different curves in the plot, represent different cDNA samples. To quantify the difference in expression for this gene in the different samples, a threshold is marked with a red line in the figure in the exponential amplification phase. By comparing the number of PCR cycles that is needed to reach the threshold you can compare the expression level of the gene in the different tissues. As PCR amplification is exponential (2^n, where n = # cycles)), a difference of 5 Ct values (five PCR cycles) between two samples means a difference in expression of 2^5 = 32. So in such case the gene is 32 higher expressed in one sample compared to the other. Part 3. Sequencing Sanger dideoxy-sequencing Sequencing is a technique to determine the nucleotide sequence of a piece of DNA. Much used sequencing method is Sanger dideoxy sequencing. Nowadays, there are also several so-called next-generation sequencing methodes on the market. When all cDNA's (an therefore all mRNA's) from a certain tissue are sequenced by next-gen sequencing we call this RNA-seq. By counting the number of sequence reads belong to a particular gene, you know how strong this gene is expressed and u can directly compare it the the number of reads of all other genes. Two much used methods are: 454-Sequencing and Illumina-sequencing. These methodes are much faster and cheaper than Sanger sequencing, however they generate only relatively short sequence reads (~100 bp or < 400 bp) and they often make small mistakes. In this course we will stick to Sanger dideoxy sequencing. The term dideoxy comes from a special modified nucleotide, called a dideoxynucleotide (generally a ddNTP). A dideocynucleotide lacks the 3’-OH group. The OH is nodig for DNA to synthesize, so with this one the DNA strand can no longer be elongated DNA synthesis is blocked. In a sequencing reaction, a low concentration of dideoxynucleotide (ddNTP) is added in addition to the normal nucleotides (dNTPs). As a result, there is a chance that a normal nucleotide is incorporated (allowing the strand to be elongated) or a dideoxynucleotide (elongation stops). The ingredients of a sequencing reaction are added together with a low concentration of dideoxy ATP (ddATP). So whenever an A needs to be incorporated there is a chance that a dideoxy-A is incorporated blocking further strand elongation. Like any DNA polymerase reaction, you need to add a primer to the single stranded DNA template as the polymerase needs a small piece of double stranded to elongate/synthesize the new strand. Note, you only add 1 primer that will be elongated using the complementary strand as template. This can be done for all nucleotides, so ddATP, ddTTP, ddCTP and ddGTP. This will results in many differently sized products with a dideoxynucleotide at every possible position in the DNA fragment. When you analyse these different products in a very sensitive gel electrophoresis you can "read" the sequence of the DNA fragment starting from the smallest DNA fragment (from 5'- to -3'). By using fluorescently labeled ddNTP's, where each different ddNTP has a different color, the detection can be automized using a column to separate the differently sized DNA fragments and a laser to detect the terminal nucleotide color. The template that is sequenced should not be very complex as it can interfere with the sequencing reaction. For example, you cannot just sequence on isolated genomic DNA. Therefore you first need to purify a selected piece of DNA (such a single gene) either by PCR (sequence a PCR fragment) or by cloning. https://www.youtube.com/watch?v=lgASqWbemCc Module 3 This module focusses on protein detection and analysis. Although not every gene encodes for a protein, in many cases proteins are the key executors in a biological process. In PART 1 (Western blot), you will learn a technique to detect a protein of interest by making use of specific antibodies. In PART 2 (Mutations), you will learn what effects different types of mutations in the DNA can have on the protein that is encoded by a gene. Part 1. Western blot The amount of mRNA of a protein-coding gene that is present in a certain cell-type, does not necessarily correlate with the amount of protein in that cell-type. The amount of protein depends among others on the stability of the mRNA, the efficiency of translation, the stability of the protein, and whether the protein (or mRNA) is transported to other cells. To see how much protein is present in a certain tissue you need to detect the protein itself. One of the methods is by using labelled antibodies that specifically bind to that protein. Such antibodies can either detect the protein in the tissue itself or with a method called Western blotting: the proteins are fist isolated from a tissue, separated by size on a gel and transferred to a membrane, after which the antibody detects the specific protein. Another method is with Green Fluorescent Protein. To perform a western blot, you first need to extract the proteins from a certain tissue, for example the leafs of tomato. Next, you separate the protein based on their size by using polyacrylyamide gel electrophoresis, or PAGE. Polyacrylamide gel electrophoresis is somewhat comparable to the separation of DNA fragments based on size in an agarose gel electrophoresis. Acrylamide, when polymerized, forms a gel with pores. The size of these pores depends on the percentage of acrylamide in the gel. Smaller proteins can move more easily (run faster) through the pores of the gel than large proteins. A higher percentage of polyacrylamide means smaller pores in the gel, which is more suited to separate small proteins. You want to adjust the percentage on the size of protein. Most often proteins are first denatured before they are loaded onto a protein gel. An easy way to do this is to boil the proteins in a protein sample buffer. Furthermore, the protein samplebuffer contains beta-mercaptoethanol (or DTT) and SDS (see also picture above). beta-Mercaptoethanol (or DTT) in a high concentration breaks the disulfide bonds that contribute to the secundaray and tertiary structure of proteins. SDS (a negatively charged soap-like molecule) binds to the (denatured) proteins and makes sure that the proteins all get the same negative charge and an elongated shape. Therefore, all proteins will now migrate to the positive pole (anode) After separation of the proteins by size, the gel can be stained using a protein dye such as coomassie blue to visualize the proteins in the gel. http://www.youtube.com/watch?v=isr1ZQKWUQU Antibody detection In many cases the detection involves a two-step procedure. First a primary antibody is used, which binds to the protein of interest. (the antibody is made with the help of an animal, by injecting the protein) The secondary antibody is used to detect the animal antibody (and also made in a different animal). The secondary antibody can be chemically labelled with a fluorescent group or with an enzyme to allow detection (often is HorseRadisch Peroxidase used (HRP)). To prevent the a-specific binding of antibodies to the positively charged membrane, the areas of the membrane where no protein is bound have to be blocked. Often proteins such as BSA (bovine serum albumin) or nonfat dry milk are used to block the membrane so that the antibodies will not aspecifically stick to the membrane. Part 2. Mutations A mutation is a permanent, structural change in the DNA (sequence). Often such a mutation does not negatively affect the functioning of an organism, but in certain cases is does. For example in the coding region of a gene. Mutations come in all kinds of shapes and sizes; from deletion of a single basepair to the relocation (translocation) of entire pieces of chromosomes. Changes in DNA sequence can occur at different scales in the DNA, for example at the level of a whole chromosoom or at the level of a single gene. - The first type of mutation is callad a SNP (pronounce SNIP). SNP stands for Single Nucleotide Polymorphism. A SNP is a mutation where one nucleotide (for example an A) is replaced by another nucleoted (in this case C,G or T). Such a mutation is also called a base(pair)substitution. - - A different, often occurring type of mutation is a deletion. As the name implies, a deletion is a mutation where part of a DNA sequence is lost. As a results the mutated sequence is shorter than the original one. Deletions can vary in length from one (or a few) lost nucleotides to large pieces of a chromosome missing. The third type of mutation that we will cover is an insertion. An insertion is an extra DNA sequence that is inserted somewhere in the DNA. So in case of an insertion the mutated sequence is always longer than the original (wild-type) sequence. Insertions can vary in length from one to several nucleotides or even complete pieces of a chromosome. A mutation can lead to a so-called "frame-shift" in the reading frame. For example, if one base is deleted a different group of 3 bases forms the new codon. As a result all following codons will also be changed. Both deletions and insertions can lead to a frame-shift. As a result the aminoacid sequence behind the mutations can be completely different from the wild-type sequence. So, in most cases a frameshift will have a dramatic effect on the encoded protein. Splicing frame-shifts occur, resulting in one large reading frame. A mutation can impact the splicing of the primary mRNA if the mutation affects the conserved splice-sites. A mutation in a conserved splicesite will prevent the correct splicing of the intron, by which it will become part of the mature mRNA. This will severely impact the open reading frame of the mRNA as it will change the codon sequences. Module 4. Recombinant DNA technology The development of recombinant DNA technology has triggered a revolution in biology. This breaktrough allowed any DNA fragment, from any genome, to be isolated and to be fused to any other DNA sequence of your choice; the so-called cloning of DNA. One of the many reserach applications for cloning is for example to fuse a protein with a green fluorescent protein (GFP), to study where a protein is located inside a cell. Part 1. Cloning Plasmids Cloning involves the amplification (copying) of a specific DNA fragment in bacteria. Therefore, the DNA fragment to be cloned is inserted into a plasmid that can replicate itself inside the bacteria. Plasmids are circular, double stranded DNA molecules that occur naturally in bacteria. A plasmid can replicate independently from the chromosomal DNA in bacteria and during each cell division at least one copy of the plasmid is passed on to the daughter cell. Bacteria can exchange plasmids during conjugation. Therefore, resistance to antibiotics can spread rapidly in a natural bacterial population. When a plasmid is used for cloning, it is also called a (cloning) vector. Below, you see an example of a typical (artificial) plasmid (pUC18) used for cloning. ori = origin of replication, allows the plasmid to be replicated independent from the bacteria. ampR = resistance gene against the antibiotic ampicillin Polylinker = a region on the plasmid containing a collection of unique restriction sites, where new DNA fragment can be inserted. Also multiple cloning site. lacZ = open reading frame (ORF) for a gene encoding a beta-galactosidase. Note: the Polylinker is located in side the open reading frame of the lacZ gene. Cloning a gene using a plasmid consists of three steps Step 1. Digestion and Ligation DNA fragments are inserted into a plasmid at specific restriction sites in the Polylinker. As a first step the plasmid (vector) is digested with one (or two) restriction enzymes at the place where the DNA fragment needs to be inserted into the vector. The same enzyme(s) is used to digest the DNA fragment that needs to be cloned; this DNA fragment is called the insert. This creates fragments with the same sticky ends as the linearized (digested) plasmid. Only the fragments that have the same sticky ends will efficiently hybridize/anneal. Next, the enzyme DNA ligase is used to repair the phosphodiester bonds between the DNA insert and the plasmid. As a result the DNA fragment (the insert) is ligated into the plasmid, creating a socalled recombinant plasmid. Step 2. Transformation of competent E. coli cells After your DNA fragment is ligated into the plasmid you need to introduce the recombinant plasmid into a bacterium to multiply it. To introduce DNA molecules into bacteria we make use of so-called competent cells, mostly competent E. coli cells. There are several different ways to make bacteria "competent", so that they can take up a plasmid. In this practical course we will treat E. coli bacteria with calcium chloride (CaCl2) to make them competent. Treatment of actively growing E.coli cells with a solution of CaCl2 changes the cell wall in such a way that cells become capable of absorbing DNA molecules that are sticking to the outside of the cell. The bacteria are given a heatshock (90 seconds at 42 0C) to trigger the uptake of the adhering DNA molecules. Therefore, we call these chemically competent cells also "heatshock cells". A different, much used, method to introduce DNA molecules into bacteria is by electroporation, where a short high voltage (5 ms, 2500 V) electroshock creates pores in the cell walls of the bacteria and triggers the uptake of plasmid DNA in solution. After the heatshock or electroshock the bacteria are incubated in a rich growth medium to let them recover from their "shock". Introduction of a recombinant plasmid (or ligation mixture) into bacteria is called transformation. Note, that one bacterium will in general only take up one plasmid molecule. If you apply a mixture of plasmids, different bacteria will take up different plasmids. After a bacterium has taken up a plasmid it will replicate itself independently from the chromosomal DNA, and as a result many identical copies of that plasmid will be present in a bacterium. As the bacterium also divides, you get more and more bacteria, called clones, containing multiple copies of a specific recombinant plasmid. Step 3. Selection of transformants To make sure that the plasmid is maintained inside the bacteria, you need to apply a selection pressure. You use a antibiotic resistance gene for this that is located on the plasmid. For different plasmids this antibiotic is different. By growing the bacteria on a growth medium to which the antibiotics is added, only those bacteria will grow that contain the (recombinant) plasmid. These bacteria are called transformants, and are now genetically modified organisms (GMO). After 12-16 hours growth you get colonies of transformations on the plate. Each colony consists of a collection of identical bacteria, clones, that all contain the same plasmid. To make sure the plasmid in bacterial colony’s has a insert, an additional reporter system is present in the plasmid. This is lacZ. LacZ is a bacterial gene that encodes for a Β-galactosidase. This enzyme can process the chemical substance X-gal into a blue precipitate. When bacteria that contain the pUC18 plasmid are grown on a plate containing ampicilline and the chemical X-gal, it will result in blue colonies. The polylinker, where a foreign DNA fragment is inserted, is located inside the open reading frame of the LacZ gene. So when a DNA fragment is ligated into the polylinker region, it will disturb the reading frame of LacZ and as a result no functional protein can be made. So when bacteria containing a recombinant plasmid are grown on a medium with ampicilline, IPTG and X-gal, white colonies will appear. Therefore, this method is called bluewhite screening... Calculating the efficiency of competent cells. For a cloning experiment to work you need good competent cells. If your competent cells are of bad quality the efficiency of taking up recombinant plasmids from a ligation mixture will be too low, resulting in no transformants (ie. no colonies). The quality of competent cells is expressed as colony forming units (cfu), so the number of colonies that you get when you transform the cells with 1 microgram of (undigested) plasmid DNA. For example, when the competent cells have an efficiency of 10^8 cfu, it means that if you use 1 µg of (pUC18) plasmid you should get 10^8 (100 million) colonies. Because you cannot count 100 million colonies on a plate, usually several plasmid dilutions are used to calculate the competence of the cells. Competent cells with an efficiency of 10^7 cfu (preferably higher) are suited for cloning experiments. Analyzing plasmids on gel Plasmids are circular DNA molecules! Therefore, the speed at which they migrate in an agarose gel depends on the three dimensional structure of the circular DNA and as a result a plasmid will typically not run at the same speed as a linear (digested) DNA fragment. The following different conformations can be seen when analyzing an undigested plasmid on an agarose gel. - Most of the isolated plasmids will be twisted, forming a so-called supercoil structure. Such a supercoiled plasmid is much more compact, and as a result it will run must faster in an agarose gel, than expected based on the size of the plasmid. The plasmid can also be in its open-circular form. In this case the plasmid is fully circular (relaxed), which is often the result of a nick in one of the strands. When this is the case we speak of the nicked open-circular form. Such an open circular form runs at a slower speed than expected based on the size of the plasmid These two forms will be the most apparent bands when analyzing an undigested plasmid on a gel. In case the plasmid is nicked at both strands, for example due to damage, you can also sometimes see the linear form of the plasmid. The linear form will run faster than the open-circular form, but slower than the supercoiled form. https://www.youtube.com/watch?v=sjwNtQYLKeU Part 2. Applications Genome library A genomic DNA library is a collection of recombinant plasmids, each containing a different piece of genomic DNA. Therefore, the DNA is digested and sub-cloned into plasmids. So the aim of a genomic library is to have each part of the chromosome represented in a plasmid. To reduce the amount of plasmids to a more workable number, a much used vector is a socalled BAC, Bacterial Artificial Chromosome, vector. In such a BAC vector you can easily clone fragments up to 300 kb. So for the human genome (10x coverage) this would mean 300.000 plasmids. To select a clone you can either use PCR or Southern blotting to select your clone of interest, containing for example a gene that you want to study further. In case of a Southern blot, we speak of a colony blot, where all the colonies in the library are spotted on to membranes. A single stranded labelled probe of the gene that you want to detect can then be used to identify the colony/plasmid of interest. cDNA library cDNA library is made to sub-clone every mRNA present in an organism or in a specific tissue-/celltype of an organism. As you will know you cannot clone RNA directly into a plasmid, you first need to convert the RNA into DNA. This is done by an enzyme called Reverse Transcriptase. To be able to clone the cDNA's they need to contain unique restriction sites at their ends. These are generated by ligating so-called adapters (or linkers), small pieces of DNA containing the restriction site, to the cDNA fragments (see image above). Now you can digest and ligate the cDNA's in to a vector and create a collection of clones representing every cDNA/mRNA of an organism. GFP-fusion One of the most used fluorescent proteins is the Green Fluorescent Protein (GFP). The make a GFPfusion protein the open reading frame (orf) of GFP can be fused to the orf of your gene of interest by means of cloning. Depending on the protein GFP can be attached at the N-terminus of the protein (N-terminal GFP fusion) or at the C-terminus of the protein (C-terminal GFP) fusion. Reporter construct A reporter gene (often simply reporter) is a gene that is expressed under the control of the regulatory sequences (the promoter) of a gene to study if, where and when that gene is expressed. A reporter has the characteristics that is easy to visualize (with high sensitivity) or can be used as a selectable marker. Examples of commonly used reporters in plants are for example GFP, the betaglucoronidase enzyme GUS (converting X-gluc into a blue precipitate inside cells) or the enzyme luciferase (a light emitting enzyme, for example from fireflies). When a promoter is cloned in front of the open reading frame of a reporter gene, it often includes the 5'UTR region of the gene to which that promoter belongs. This is because in many cases the exact transcription start site is not (yet) known. Expression vector Using GMO's as bioreactors for protein synthesis Two well known examples are the production of insulin and the production of chymosin (to make cheese). Insulin is a peptide hormone, produced by beta cells of the pancreas, and is central to regulating carbohydrate and fat metabolism in the body. It causes cells in the liver, skeletal muscles, and fat tissue to absorb glucose from the blood. Before recombinant DNA technology insulin was isolated from the pancreas of cows, horses or pigs. Therefore, the part of the human gene that codes for insulin was cloned into an expression vector and expressed in an E. coli host. As a result the bacteria produced synthetic insulin, which closely resembled the human insulin. Chymosin is an important enzyme to make cheese. It cleaves the casein proteins in milk by which these start to coagulate and precipitate. The enzyme is collected from the stomachs of young cows (calfs). However, in many countries there is a shortage of calf-stomach extracts and because many vegetarians do not approve the use of animal extracts, alternative sources for this enzyme were needed. To express a protein in bacteria, the open reading frame of a gene needs to be cloned into an expression vector where it comes under the control of a bacterial promoter. Below you see an example of an expression vector for E. coli. In this case the lacZ regulatory region is used to control the expression of the protein. Therefore, the protein coding sequence for lacZ is replaced by the open reading frame of the protein of interest. Bacteria containing this recombinant plasmids, can be specifally induced to produce the protein by adding the lactose analog IPTG (same as used in bluewhite screening).