UMS Spring Semester 2014 Molecular genetics Definition(s) A reading frame is a sequence of nucleotides in DNA that contains no termination codon and so can potentially translate as a polypeptide chain An ORF begins with a start codon and contains no stop codon for a distance long enough to encode a protein EST is: A short sub-sequence of a cDNA sequence, they may be used to identify gene transcripts, and are instrumental in gene discovery and gene sequence determination. comparative genomics: Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease. Understand the uniqueness between different species Solitary genes: About 25-50 percent of the protein-coding genes are represented only once in the haploid genome Duplicated genes: These genes are close but non-identical sequences that often are located within 5-50 kb of one another called “gene family” Gene family: A set of duplicated genes that encode proteins with similar but not identical amino acid sequences Low-complexity regions are often defined as regions of biased composition containing simple sequence repeats Genetic linkage is the tendency of genes that are located close to each other on a chromosome to be inherited together during meiosis. A chiasma (plural: chiasmata), in genetics, is thought to be the point where two homologous non-sister chromatids exchange genetic material during chromosomal crossover during meiosis (sister chromatids also form chiasmata between each other, but because their genetic material is identical, it does not cause any change in the resulting daughter cells) Single Nucleotide Polymorphisms (SNPs) in the human genome are the change of single nucleotides at a particular loci. A single-base sequence variation between individuals at a particular point in the genome called SNPs (pronounced as snip). 1 Ashgan Abougabal Choose the right answer 1. 2. 3. 4. A reading frame contains all the following except (A) start codon (B) stop codon (C) termination codon An open reading frames starts with (A) AGT (B) ATG (C) TAG (D) TAG ORF is defined as a stretch of DNA containing at least (A) 100 bp (B) 200 (C) 210 (D) 300 By scanning for “Open Reading Frame” (ORF), the genes in bacteria and yeast have been identified (A) more than 90% 5. (C) 85% (D) 99% ESTs represent partial sequences of cDNA clones (300 bp -> 700 bp) (A) (300 bp -> 700 bp) bp) 6. (B) 90% (B) (300 bp -> 500 bp) (C) (400 bp -> 700 bp) (D) (500 bp -> 700 ESTs represent all the following except (A) portions of expressed genes. (B) m RNA (C) protein portion (D) c DNA more than 700 mouse genes—have counterparts in the human genome, 7. Solitary genes: are found only once in the haploid genome and represented about (a) 25-50% of the protein-coding genes (b) 15-50% of the protein-coding genes ( c) 25-70% of the protein-coding genes 8. By comparing the genome compositions between genomes, scientists can better understand (a) the evolutionary history of a given genome (b) genome composition ( c) portions of expressed genes (b) genome sequence 9- Sequences like ATATATACTTATATA are called ( a) low-complexity (b) high -complexity. (c) intermediate-complexity. 10- the α globin and β globin families are existed on different chromosomes through 2 Ashgan Abougabal (a) Transposition (b) duplication ( c) mutation 11 Transposable elements (transposons) contains all the following except (a) Long interspersed elements (LINEs) (b) Short interspersed elements (SINEs) (c) Long terminal repeats (LTRs) (d) Dead transposons (e) Simple sequence repeats (SSRs) Simple sequence repeats (SSR )contains (a) (b) (c) (d) One- to six-nucleotide sequences repeated hunderds of times one- to twenty -nucleotide sequences repeated hunderds of times One- to six-nucleotide sequences repeated thousands of times Six to ten -nucleotide sequences repeated hunderds of times Introns cutting and splicing recognized by (A) (snRNPs) (B) SNPs (C) HnRNA (D) S m RNA Each cell in our bodies has about (A) 7 feet of DNA stuffed into it (B) 12 feet of DNA stuffed into it (C) 10 feet of DNA stuffed into it (D) 6 feet of DNA stuffed into it Which of the following is corrects When eukaryotic DNA is fragmented and centrifuged to equilibrium in a Cesium chloride (CsCl) density gradient, the following are ( a) One band only observed (a) One main and plasmid bands ( b) one main band and satellite band (b) Multiple bands 3 Ashgan Abougabal Microsatellite, also called A) B) C) D) transposable elements 1-13 bp. Interspersed repetitive DNA dispersed throughout the genome Highly repetitive DNA Coding sequence ( A,B,D) , (B,C,D), (C,D,E) , (A,B,C) IF NO CROSSING OVER IN REGION BETWEEN THE TWO GENES a) 100% Non-Recombinants b) 100% Recombinants c) 50% Non-Recombinants & 50% Recombinants The units of distance are called map units (mu), They are also referred to as centiMorgans (cM) , One map unit is equivalent to A) 1% recombination frequency b) 10% recombination frequency c) I00% recombination frequency The nucleus of a human cell contains between a) b) c) d) 30 000 and 40 000 genes. 30 000 and 50 000 genes. 20 000 and 40 000 genes. 10 000 and 30 000 genes. The human genome is identical to everyone else's by a) b) c) d) 99.9% 79.9% 90.9% 95.9% Human mtDNA is a) double stranded DNA b) single stranded circular DNA c) closed double stranded circular DNA 4 Ashgan Abougabal a) Complete the following 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Both very short genes and long genes are missed by …………………….method ……………. is powerful method for identifying human by comparing the human genomic sequence with that of the ……………. since human and mouse are sufficiently related to have most genes in common In short, the human and mouse genomes are remarkably similar not only in the ……………………………… but also at the level of …………………………….. The Celera team found that much of mouse chromosome 16 corresponds to human chromosome ……………………. which contains genes involved in ………………………………. Genome composition is used to describe the make up of contents of a ………………….., which should include : ……………………., proportions of ………………………… and ……………………………. in details. Each gene family could contain from a few to 30 members The genomes of prokaryotes are contained in ……………………….., which are usually ……………………………………… In contrast, the genomes of eukaryotes are composed of multiple chromosomes, each containing a linear molecular of DNA. Genome composition is used to describe the make up of contents of …………………….., which should include :…………………….., proportions of nonrepetitive ……………………………. and …………………………. Many genes occur as ……………………, can be clustered on the …………………………. or scattered throughout …………………………….. Larger genomes are generated by increasing the number …………………………….. constitutive heterochromatin are localized to …………………………and ……………………… Simple sequence repeats (SSR )contains …………………………-nucleotide sequences repeated ………………………….of times Six major types of noncoding human DNA have been described, ………………, …………………………, ……………………), ………………….., ………………..and …………………………………………. Complex genomes have roughly …………………….. more DNA than is required to encode all the RNAs or proteins in the organism or have any apparent regulatory function 16. Length of tandemly repeated DNA in bp of regular satellite …………………….., mini satellite ………………………… and for microsatellite ……………………………… 17. When eukaryotic DNA is fragmented and centrifuged to equilibrium in a Cesium chloride (CsCl) density gradient two components are observed: ……………., most of the genomic DNA, density of ……………..with a G-C content of …………………. 5 Ashgan Abougabal Satellite band, …………………… band DNA. : one or multiple miner bands; has the buoyant density of …………………. with a G-C content ………………….. Microsatellite, also called as …………………………. 1-13 bp, …………………. repetitive DNA …………………….. throughout the genome Most eukaryotic chromosomes have short, species-specific sequences tandemly repeated called …………………………… chromosome lengths are maintained by ……………………… which adds repeats without using the cell’s regular replication machinery. The ends of eukaryotic chromosomes are formed by an enzyme called ……………………………….Telomerase an enzyme adds repeats of 3´ ends of eukaryotic chromosomes The chiasmata become visible during the …………………………… of prophase I of …………………………, but the actual "…………………………" of genetic material is thought to occur during the previous …………………….. stage. Eukaryotes can have Nuclear genome, Mitochondrial genome, Eukaryotes can have………………………genomes. …………….., ………………………….. and ………………………. If not specified, “genome” usually refers …………………………………. 6 Ashgan Abougabal Correct the following 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. Scanning of the ORF is a good method to identify eukaryotic genes . researchers at Celera Genomics in Rockville, Maryland, provide the strongest evidence that many genes in humans is not present in mice Fourteen genes on mouse chromosome 16 are not found in humans Duplicated genes are close but non-identical sequences that often are located within 550 kb of one another called “gene family” Much of mouse chromosome 20, corresponds to human chromosome 16 which contains genes involved in Down syndrome and similar disorders. Identical genes include: rRNA and globin genes , Non-identical genes include histones Heterogeneous nuclear RNA hn RNA is a transcript before splicing is complete Introns, translated intervening sequences in m RNA Introns contain invariant 5’-GU and 3’-GA sequences at their borders the simple-sequence DNAs repeats are localized near the centromers and secondary construction of mouse chromosome Most Simple-Sequence DNAs are Concentrated in Specific Chromosomal Locations. Satellite DNAs Lie in euochromatin ~3/4 of the human genome consists of interspersed repetitive sequences If no crossing over, the alleles of all genes located different chromosomes would be inherited together The number of linkage groups is equal to the number of genes of the species 15. Two genes that carry out independent assortment have recombination frequency of 50 % and are located homologous chromosomes or far apart on the same chromosome = un-linked 16. Genes with recombination frequencies more than 50 % are on the same chromosome 17. Eukaryotes can have 2-4 genomes. If not specified, “genome” usually refers to the nuclear genome. 18. Human mtDNA consists of approximately 20.5 kb , 37 genes , it is closed single stranded circular DNA 19. Triallelic SNPs are known to occur at a very high frequency within the human genome 20- Two strand crossing over counted as single crossover 21- Three strand counted as no crossovers 22- Four strand crossing over is the only one counted as a double crossover, result in 30% recombinants and 70% non-recombinants. 23- One map unit is equivalent to 10% recombination frequency 24 Crossing over Occurs during interphase of meiosis, Non-sister chromatids of non homologous chromosomes exchange DNA segments 7 Ashgan Abougabal If no crossing over, the alleles of all genes located on the same chromosome would be inherited dependently IF NO CROSSING OVER IN REGION BETWEEN THE TWO GENES = 50% NonRecombinants Give reasons ORF is not a good method to identify eukaryotic genes . - due to the presence of multiple exons and introns, Why we need to compare the genome - By comparing the genome compositions between genomes, scientists can better understand the evolutionary history of a given genome SNPs are being used for linkage studies in human genome to track genetic diseases. Individual 1 Individual 2 AGTCAGTCCTAGGA AGTCAGACCTAGGA Draw a diagram showing the use of EST in identifying genes What are the main information Each EST must have? 1. • A sequence ID (ex. sequence-run ID) 2. • Location in respect of the poly A (3' or 5') 3. • The CLONE ID from which the EST has been generated 4. • Organism 8 Ashgan Abougabal 5. • Tissue and/or conditions 6. • The sequence What are the main fields of the comparative genomics Gene location Gene structure , Gene characteristics Gene structure ( Exon number, Exon lengths, Intron lengths, Sequence similarity) Gene characteristics ( Splice sites, Codon usage, Conserved synteny ) What do you know about Alu elements (Short interspersed elements), Length = ~300 bp Repetitive: > 1,000,000 times in the human genome Alu elements are found in primates. Early in primate evolution, Alu transposition rate was approximately one new jump in every live birth. Today, it is about one new jump in every 200 live births. Constitute >10% of the human genome Found mostly in intergenic regions and introns Propagate in the genome through retroposition (RNA intermediates). Alu elements can be sorted into distinct families according to shared patterns of variation. At any given point in time, only one or several Alu “master copies” are capable of transposing. All the millions of Alu elements have accumulated in a mere ~65 million years. GENETIC AND EVOLUTIONARY EFFECTS OF TRANSPOSITION 1. 2. 3. Duplicative transposition increases genome size Bacterial transposons often carry genes that confer antibiotic or other forms of resistance. Plasmids can carry such transposons from cell to cell, so that resistance can spread throughout a population or an ecosystem Gene expression may be altered by the presence of a transposable element. - An insertion may eliminate the reading frame (phenotypic effects). - A transposable element may contain regulatory elements (effects on transcription of nearby genes). - Transposable elements may contain even if the element is in an intron). 9 splice sites (effects on RNA processing Ashgan Abougabal 4- . Transposable elements promote gross genomic rearrangement a - directly (moving a DNA sequence from one genomic location to another). - indirectly (as a result of transposition, two sequences become similar to one another so that unequal crossing-over between them is possible). Compare between Non-repetitive DNA: Intermediate (Moderate) Repetitive DNA: Highly Repetitive DNA once per genome “Single copy Disperse throughout the genome in eukaryotes Short repetitive DNA (<100 bp) present up to 1 million times DNA R=1 or 2 10<R<10,000 R (repetition frequency) >100,000 eukaryotic genome in the eukaryotic genome Much information, Little information, Almost no information, high complexity moderate complexity Found in prokaryotic and eukaryotic low complexity Compare between the arrangement of coordinately controlled genes in prokaryotes and eukaryotes Prokaryotic Prokaryotic genes that are turned on and off together are often clustered into operons which are transcribed into one mRNA molecule and translated together Eukaryotic Eukaryotic genes coding for enzymes of a metabolic pathway are often scattered over different chromosomes and are individually transcribed Tandemly repetitive & Interspersed repetitive DNA Proportion of mammalian DNA 10-15% 10 Proportion of mammalian DNA 25-40% Ashgan Abougabal The tandem repeat DNA are usually identical The interspread repeat DNA are very similar but not identical Length of each repeated unit 1-10 bp Length of each repeated unit100-10,000bp Transposons Transposons transposable elements jump and interrupt the normal functioning may increase or decrease production of one or more proteins can carry a gene that can be activated when inserted downstream from an active romoter and vice versa Retrotransposons Retrotransposons transposable elements that move within a genome by means of an RNA intermediate, a transcript of the retrotransposon DNA to insert it must be converted back to DNA by reverse transcriptase . SINEs = LINEs = Short interspersed repetitive sequence Long interspersed repetitive sequence SINEs are retrosequences >5kbp range in length from 75 to 500 bp. < 500bp • < 105 Most abundant Non-autonomous transposable elements (lacking the ability to mediate their own transposition) and their degenerate descendents • >10 5 copies Moderately Abundant . Active or degenerate descendants of transposable elements SINEs do not possess any reading frame. Thus, their retroposition must be aided by other genetic elements. Discuss the main advantages of using EST in identifying eukaryotic genome Fast & cheap (almost all steps are automated) • They represent the most extensive available survey of the transcribed portion of genomes. 11 Ashgan Abougabal • There are necessary for gene structure prediction, gene discovery and genome mapping: -> provide experimental evidence for the position of exons -> provide regions coding for potentially new proteins -> characterization of splice variants and alternative polyadenilation • Provide an alternative to library screening -> short tag can lead to a cDNA clone • Provide an alternative to full-length cDNA sequencing -> sequences of multiple ESTs can reconstitute a full-length cDNA • Single Nucleotide Polymorphism (SNP) data mining Mention and describe the role of three types of regulatory DNA elements in eukaryotes o Promoters – recognition sequences for binding of RNA polymerase o Enhancers – increase transcription of a related gene o Silencers – decrease transcription of a related gene Insulators or boundary elements – block undesirable influences on genes Describe where satellite DNA is found and what role it may play in the cell. 1) Satellite DNA Simple-sequence DNA (6% of the human genome), size 14 to 500 bp 2) highly repetitive DNA characterized by rapid rate of hybridization, consisting of short unusual nucleotide sequences that are tandemly repeated 1000’s of times in large clusters 3) In addition, multi-cellular eukaryotes have complex satellites with longer repeat units mainly in heterochromatic region 4) It is found at the tips of chromosomes and the centromere (Centromeric heterchromatin---necessary for separation of chromosome to daughter cells 5) Its function is not known, perhaps it plays a structural role during chromosome replication and separation. Describe the effects of transposons and retrotransposons Transposons jump and interrupt the normal functioning may increase or decrease production of one or more proteins can carry a gene that can be activated when inserted downstream from an active promoter and vice versa 12 Ashgan Abougabal Retrotransposons transposable elements that move within a genome by means of an RNA intermediate, a transcript of the retrotransposon DNA to insert it must be converted back to DNA by reverse transcriptase What is the main role of the following a. b. c. d. e. f. g. h. i. j. snRNPs catalyze the cutting and splicing reactions of introns Promoters – recognition sequences for binding of RNA polymerase Enhancers – increase transcription of a related gene Silencers – decrease transcription of a related gene Insulators or boundary elements – block undesirable influences on genes enhancer blockers – prevent ‘communication’ between enhancers and unrelated promoters barrier sequences – prevent spread of heterochromatin combined centromiric heterochromatin necessary for separation of chromosome to daughter cells Telomere prevents chromosomes from shortening with each replication cycle Genetic maps allow us to estimate the relative distances between linked genes, based on the probability that a crossover will occur between them k. 13 Ashgan Abougabal UMS Spring Semester 2014 Fundamental of microeconomics ( Econ 260) Section no: …………………….. no Name Section time………………………………… ID Signature UMS Spring Semester 2014 14 Ashgan Abougabal Fundamental of microeconomics ( Econ 260) Section no: …………………….. no 15 Name Section time………………………………… ID Signature Ashgan Abougabal