Introduction to Genetics – as relevant to this course (Ack: Roche Genetics CD-ROM, Mishra’s notes at NYU, …) Background (1/18) • Genome, Chromosome, Genes – made up of DNAs • Genetics research (largely over last 100yrs, accelerated in last 30 yrs) – Has led to important advances in medical science. • Nucleus of a cell : contains chromosomes (made up of DNA); and proteins. • DNA (Deoxy Ribo Nucleic acid) – Is the genetic material that is inherited. – Contains the information needed by living cells to specify their structure, function, activity and interaction with other cells and environment. – A DNA molecule can be thought of as a very long sequence of nucleotides or bases. DNA structure (2/18) • The Nobel Prize in Physiology or Medicine 1962 -Crick, Watson and Wilkins – for their discoveries concerning the molecular structure of nucleic acids and its significance for information transfer in living material • Made up of 4 different building blocks (so called nucleotide bases), each an almost planar nitrogenic organic compound – – – – – Adenine (A) Thymine (T) Guanine (G) Cytosine (C) Base pairs (A -- T, C -- G) DNA Structure cont. (3/18) • Base pairs (A -- T,C -- G) are attached to a sugar phosphate backbone to form one of 2 strands of a DNA molecule. – Phosphate ((PO4) -3) – Deoxyribose • Two strands are bonded together by the base pairs (A – T, C – G). • Results in mirror image or complementary strands, each is twisted (or helical), and when bonded they form a double helix. • Direction of each strand (5’ meaning beginning or 3’ meaning end of the strand) – 5’ and 3’ refer to position of bases in relation to the sugar molecule in the DNA backbone. – Are important reference points to navigate the genome. – 2 complementary strands are oriented in opposite direction to each other. DNA Structure Genome Size Species Genome Size (in base pairs) No. of Chromosomes E. Coli 4.64 X 106 1 1.205X107 16 108 11/12 S. Cerevisae C. Elegans (yeast) (nematode) D. Melanogaster 1.7X108 4 M. Musculus 3 X 109 20 H. Sapiens 3 X 109 23 6 feet when completely stretched out A. Cepa (onion) 1.5X1010 8 DNA hybridization (DL 3/18) • Hybridization between complementary DNA sequences to form a double stranded DNA molecule. – One of the most important DNA technology • Applications of Hybridization – PCR (Polymerase Chain Reaction) • Enzymetically generating millions of copies of a tiny amount of a particular nucleic acid sequence. – Northern blots analysis • Possible to study (in a semi-qualitative manner) the level of transcription of a particular gene. – DNA Microarrays • Can interrogate the level of transcription of several thousand of different genes in one sample in one experiment. PCR (Polymerase Chain Reaction) (DL 3/18) • PCR allows selected amplification of a DNA sequence. – – Only a tiny amount of DNA is necessary to obtain a PCR product (a drop of blood or less is enough). Complementary DNA primers need to be designed. • • • For this the DNA sequence flanking the target sequence needs to be known in advance. Primers are short synthetic DNA sequences of about 20 bases (so called oligonucleotides) that can specifically hybridize to a unique complementary DNA sequence. The approach – – – – – – Genomic DNA (the template), Primers (the starters), deoxynucleotides (building blocks), a special DNA polymerase that is resistant to heat (the motor of the reaction) are mixed together in one reaction tube. Reaction takes place in a thermocycler (an apparatus that allows one to precisely heat and cool the reaction). DNA is heated to almost boiling temperature which separates the 2 strands (whole process is called heat denaturation) Cooling of the mixture allows the primer to bind to their complementary sequence of the genomic DNA. Once the primers bind the DNA polymerase uses them as the start site to generate a copy of each strand of the targeted gene fragment building 2 new double stranded molecules. Doing it (denaturation followed by cooling) 30 times, results in 230 = 109 (1 billion) copies. Northern Blot analysis • • The complete RNA content from a sample is separated according to size by electrophoresis. Usually done in a sheet of agarose (similar to gelatine) – • • • • • • • In response to electric current, larger molecules move slower, and smaller move faster, thus separating different RNA molecules by size. Then RNAs are transferred from gel to a filter membrane (blot) Blot is then exposed to a solution containing a nucleic acid (probe) complement to the sequence whose presence in the blot one wants to interrogate. The probe may be cRNA or cDNA with detectable marking (radioactive isotope or a fluoroscent tag) If the targeted sequence is present in the blot then the probe hybridizes and sticks to the blot at the location where the targeted sequence is located. After washing off of excess probe -- a signal is detectable and its specificity can be checked based on the expected size of the RNA that will correlate with how far it has migrated during electrophoresis. With this method – It is possible to study in a semiqualitative manner the level of transcription of a particular gene. Comparison of the results from different samples (e.g. different organs etc.) provides information about the transcriptional regulation of the gene. DNA Structure cont. (4/18) • The order of nucleotide bases along a DNA strand is known as the sequence. • The genetic information is encoded in the precise order of the base pairs. • DL – GenBank database http://www.ncbi.nlm.nih.gov/Entrez/ – Human genome project http://www.genome.gov/page.cfm?pageID=10001694 – DNA sequencing • Is the process designed to precisely determine the sequence of bases in the DNA. Cells, DNA and genome (5,6,8/18) • During cell division (Mitosis) the entire DNA of the cell is copied – 2 strands separate, complementary strands are generated. – Two duplicate DNA sequences are produced. • Genome: an organism’s total DNA content • Diploid cells: cells that carry 2 genome copies • Haploid cells: have a single copy of the genome – Reproductive germs cells (gametes), i..e., egg & sperm cells • Human genome consists of – 22 autosomal chromosomes (same in males and females) – 2 sex chromosomes X and Y (males XY, females XX) Structure of Chromosomes (7/18) • Center is called centromere. • Two ends called Telomere. • Center separates two arms – Short arm p – Long arm q. Structure of genes 9-11/18 • Genes are those parts of the genome that contain the information necessary for the building of proteins. (size:100-several million base pairs) • Exon (coding sequence), Intron (non-coding sequence), regulatory region (at the two ends –for regulating how actively protein is to be synthesized from them) – Eukaryotes (organisms whose cell have nucleus) have genes segmented into exons and introns – Introns can occur between individual codons or within a single codon. • Promoter (a regulatory element in the 5’ end) – Consists of several short sequences which are consensus binding sites for a number of proteins called transcription factors. (DL 10/18) • Prokaryotes (do not have nucleus) – genes are not segmented to exons and introns. • Eukaryotes (normally segemented to exons and introns) – Except mitochondrial genes & a few nuclear genes. • During gene expression exons and introns are transcribed to form a pre-mRNA • RNA splicing -- removes introns and exons and produces mature mRNA molecule that codes for a polypeptide. • Exons – sequences that are represented in the mature mRNA – May or may not code for a protein – Eg. Exons at the 3’ or 5’ end of mRNA may not be translated to proteins Some Genes (from Mishra’s slides) Gene Product Organism Exon #Introns Length Intron Length Adenoshine deaminase Human 1500 11 30,000 Apolipoprotein B Human 14,000 28 29,000 Erythropoietin Human 582 4 1562 Thyroglobulin Human 8500 = 40 100,000 a-interferon Human 600 0 0 Fibroin Silk Worm 18,000 1 970 Phaseolin French Bean 1263 5 515 Some human gene locations (From Mishra’s slides) Genes chromosome Genes chromosome a-globin cluster 16 Insulin 11 b-globin cluster 11 Galactokinase 11 Immunoglobulin k (light chain) 2 l (light chain) 22 Heavy Chain Pseudogenes Viral oncogene homologues C-sis 22 C-mos 8 14 C-Ha-Ras-1 11 9,32,15,18 C-myb 6 Growth Hormone gene cluster 17 Thymidine kinase 17 Interferons a & b luster 9 g 12 Gene expresion (12/18) • Gene expression (Transcription and Translation) – from genes to making proteins the 2 step process • Transcription: genetic information in DNA is copied into messenger RNA (mRNA) • Translation: mRNA is used as a template to synthesize a protein. Central Dogma • Due to Francis Crick – 1958 states that these information flows are all unidirectional: – “The central dogma states that once `information' has passed into protein it cannot get out again. The transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein, may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.” Transcription (13/18) • RNA (Ribonucleic acid) – – – – • Similar to DNA (except for a chemical modification of the sugar backbone) Instead of T contains U (Uracil) which binds with A. Is not double stranded but single stranded RNA molecules tend to fold back on themselves to make helical twisted and rigid segments. RNA is synthesized – By unwinding the DNA double helix separating the 2 strands. – Using one of the strands as a template along which to build the RNA molecule – Accomplished by Enzyme RNA polymerase (binds to promoter and copies or transcribes the gene in its full length) – Resulting molecule is called Pre-mRNA – Single stranded pre-mRNA is then procesed. – Splicing (mediated by spliceosome consisting of RNA and proteins) removes the introns. – Ends modified (Capping modifies 5’ end and Polyadenylation adds adenines at the 3’ end) to enhance stability Translation (14/18) • • • • mRNA is used as a template to synthesize a protein. Translation takes place outside the nucleus in the cytoplasm within organelles called endoplasmic reticulum. Except for the 5’ & 3’ end of the mRNA (which are non-coding) the rest of the molecule codes for 1 protein Proteins: made up of aminoacids – – – – • 20 different aminoacids used to build proteins in humans Each encoded by one or more sets of 3 nucleotides (called triplets or codons) Initial codon is always AUG (coding for methionine) Translation is terminated by one of 3 `stop’ codons. Translation process is carried out by ribosomes which scan the mRNA, & build the polypetide chain from aminoacids supplied by transport RNAs (tRNA). – – – – Starts at a particular location of the mRNA called the translator start sequence (usually AUG) tRNA (transfer RNA) are made up of a group of small RNA molecules each with specificity for a particular amino acid. tRNAs carry the aminoacids to the ribosomes, the site of protein synthesis, where they are attached to a growing polypetide. Translation stops when one of UAA, UAG or UGA is encountered Post-translational modification (DL) • The polypetide chain that results from mRNA translation is often subject to chemical modifications. Eg. – Glycosylation, phosphorylation, hydrooxylation – Addition of lipid groups (eg. Fatty acyl or prenyl groups) – Addition of co-factors (e.g. a heme molecule) – Or proteolytic cleavage • The type of modification a protein undergoes depends on its function and sub-cellular location. Genetic Code (15/18) • The combination of nucleotides that build the different codons represents the genetic code. • Codon = 3 nucleotides; 4 kinds of nucleotides. So 4X4X4 = 64 possible codons. • But 20 amminoacids + start & stop. • So several codons can specify the same aminoacid. (genetic code is degenerate) • Start codon (AUG) and Stop codons (UAA, UAG, UGA). • Open reading frame (ORF) – the sequence of nucleotides between and including the start and stop codons. • The Nobel Prize in Physiology or Medicine 1968 – Holley, Khorana and Nirenberg – for their interpretation of the genetic code and its function in protein synthesis – http://www.nobel.se/medicine/laureates/1968/ Amino Acids with Codes (From Mishra’s slide) A C D E F G H I K L M N P Q R S T V W Y Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr alanine cysteine aspertic acid glutamic acid phenylanine glycine histine isoleucine lysine leucine methionine asparginine proline glutamine arginine serine threonine valine tryptophan tyrosine GC(U+A+C+G) UG(U+C) GA(U+C) GA(G+A) UU(U+C) GG(U+A+C+G) CA(U+C) AU(U+A+C) AA(A+G) (C+U)U(A+G) + CU(U+C) AUG AA(U+C) CC(U+A+C+G) CA(A+G) (A+C)G(A+G)+CG(U+C) (AG+UC)(U+C)+UC(A+G) AC(U+A+C+G) GU(U+A+C+G) UGG UA(U+C) Biological Function of Proteins • Enzyme catalysis: DNA polymerases, lactate dehydrogenase, trypsin • Transport: hemoglobin, membrane transporters, serum albumin • Storage: ovalbumin, egg-white protein, ferritin • Motion: myosin, actin, tubulin, flagellar proteins • Structural and mechanical support: collagen, elastin, keratin, viral coat proteins • Defense: antibodies, complement factors, blood clotting factors, protease inhibitors • Signal transduction: receptors, ion channels, rhodopsin, G proteins, signalling cascade proteins • Control of growth, differentiation and metabolism: repressor proteins, growth factors, cytokines, bone morphogenic proteins, peptide hormones, cell adhesion proteins • Toxins: snake venoms, cholera toxin Differential Gene Expression 17/18 • All cells in the body (that contain a nucleus) carry the full set of genetic information, but only express about 20% of the genes at any particular time. • Gene expression is selective – Different proteins are expressed in different cells according to the function of the cell. • Gene expression is tightly controlled and regulated. – The differential expression of genes ensures that cells develop correctly and can differentiate into and function as specialized cell types. For eg. Neurons, muscle cell, or fibroblast. cDNA and gene expression (DL) • Goal: Identify all possible genes expressed in one tissue or cell line. (Use cDNA libraries) • cDNA libraries are prepared from mRNA isolated from the cells or tissue being studied. • cDNA are DNA molecules that are complementary to the mRNA sequences in a sample. • cDNA is synthesized by the enzyme reverse transcriptase (RT), that uses the mRNA as a tenplate. – RT is a viral enzyme used by viruses whose genome is made of RNA, not DNA. • A cDNA library represents the collection of all genes expressed in a particular cell or tissue type. • DNA sequence mRNA sequence cDNA sequence (much smaller as while generating mRNAs the introns are eliminated) – Hence very useful when trying to isolate a particular gene to study the protein it codes. NEXT SEC. Gene Cloning 1/11 • • First step in identifying genes and their function is to isolate it from the rest of genome and produce a large quantity of it (called cloning a gene). Cloning a DNA fragment using bacteria – DNA fragment is isolated from the entire genome using restriction enzyme. • These enzymes can cut the DNA (in a staggered fashion or straight through) at specific sites defined by a short sequence. • Typically they recognize specific DNA sequences of 4, 6, or 8 bases • These enzymes are found in bacterias, where their role is to protect the bacteria from foreign DNA by digesting them into smaller pieces – This fragment is inserted into a vector (like a mini-chromosome) using DNA ligase and the recombinant product is introduced into bacteria (this process is called transformation) • Cloning vectors are DNA fragments that are able to replicate within a cell and allow the addition of exogenous DNA. • They are derived from plasmids, viruses, phages or chromosomes. • Vectors are classified according to: the type of host cell they can replicate in, or the size of the exogenous DNA they are able to carry. – The bacteria now makes new copies with every cell division. DNA Sequencing (DL 1/11) • It is the process designed to precisely determine the sequence of bases in the DNA. • Involves enzymetically copying the DNA in the presence of compounds that terminate this copying process in a base specific manner, resulting in a mixture of DNA copies that differ in size by one base. • Different technologies are used to resolve the mixture and detect the different fragments. Cloning issues (2-3/11) • Clones from genomic DNA contain introns (non-coding sequence) and is very large and difficult to analyze for function. • Alternative: start from mRNA. Convert to cDNA and clone the cDNA. Gene function characterization(4/11) • To characterize the function of a gene it is important to know the sequence and compare it to other sequences in the databases. Identify where and under what condition it is expressed and what function, if known, it has in other organisms. • Also do gene expression studies. Gene expression studies (5/11) • • • Allow you to understand how a gene is regulated in a tissue or a cell type. Most useful way of studying gene expression is by measuring the levels of mRNA produced from a particular gene in a particular tissue. Application: to understand certain biological process it is useful to study the differences in gene expression which occur during such processes. E.g. – It is of interest to know which genes are induced or repressed, say in the liver, after a particular drug is taken. – Or which genes are expressed in a tumor but not in the surrounding normal tissue. • Some techniques for analyzing mRNA level of a single gene or to quantify gene expression – – – – Northern blots Quantitative reverse transcriptase PCR (QT-RT-PCR) DNA microarrays Proteomics (analysis of the protein synthesis that results from gene expression) DNA microarrays (6/11) • • • • • • • Consist of thousands of DNA probes corresponding to different genes arranged as an array. Each probe (sometimes consisting of a short sequences of synthetic DNA) is complementary to a different mRNA (or cDNA) mRNA isolated from a tissue or cell type is converted to fluoroscently labeled mRNA or cDNA and is used to hybridize the array. All expressed genes in the sample will bind to one probe of the array and generate a fluoroscent signal. A DNA microarray can interrogate the level of transcription of several thousand of different genes from one sample in one experiment. (One DNA microarray experiment reveals the mRNA levels of 1000s of genes from one tissue or cell type at one time point) Particularly useful when studying the effect of environmental factors on gene expression. A fingernail size chip can interrogate 10,000 different transcripts. Chip has 30-40 different probes; half of them are designed to perfectly match 20 nucleotide stretches of the gene and the other half contains a mismatch as a control to test for specificity of the hybridization signal. Pharmacogenomics 7/11 • It refers to the study of differential gene expression applied to drug discovery and optimization. • Applications (Differential gene expression studies in special tissues or cell types may) – – – – Find new disease mechanisms of a drug Discover new drug targets Confirm expected action of mechanism of a drug Choose from best candidate compound based on optimal expression profile. – Figure out apriori with who will benefit from a drug and who won’t. Model organisms 9/11 • Indispensable tool to study the function of a gene. • Range from bacteria and yeast to animals amenable to genetic modification. – Worms, insect cells, frog eggs, flies, zebra fish, mice, mammalian (human) cell lines. • In general, more complex the organism more difficult to do genetic modification, but more relevant the model becomes to humans.