bi190-2013 Problem Set 3. Due Friday May 17 (5 pm). Problem 1. You sequence 1 kilobase of genomic DNA and obtain 143 37 bp reads (about five-fold coverage). a. Assemble the 37 nt rea ds into one contig. To make it easier, the fragments are all in the same orientation (DO not reverse complement them!) HINT; You can use a word-processing program and “Find” to match sequences. b. Given the extent of coverage, are the anomalies likely more likely sequencing errors or polymorphism? c. What organism is this DNA from? [Try BLAST at http://www.ncbi.nlm.nih.gov] National Center for Biological Information at the National Library of Medicine, NIH d. What is the closest C. elegans gene? [Try NCBI or www.WormBase.org] aaaatacaaatttcgcataacacattgaggtcatgtt aaaccgtatataatcgaaattctttgccccttcagat aaagcaaattcttcctctgggtggtcaagttcaactc aaaggaaggcctcatacctagtaattacatagaaatg aaattggtatcgcgcggagctggatggaaaggaaggc aagacgattcaaattggtatcgcgcggagctggatgg aagaggtgggtgtgccgtccattccattccattttga aagctttctttattagttatttaatcaaaaatatggt aaggaaggcctcatacctagtaattacatagaaatga aatatcatcggacggcgagcgtgtcccggtcgcagga aattaacttacatcccttctttttacacagatattaa aattttatgcccattaagatccttttttccacaaatc acagatattaaatatggaagacgattcaaattggtat acagcgatgccaagagtattagattggattcattatg acatttacaaacccattgatttgtattgcttaaatta acccattgatttgtattgcttaaattaacttacatcc accgtatataatcgaaattctttgccccttcagatgc acctgaagaggtgggtgtgccgtccattccattccat actataaacaaacagcgatgccaagagtattagattg actccctcaacgaactggtcgaatatcatcggacggc acttacatcccttctttttacacagatattaaatatg agaatcacgagtaagtcgagcctgaaatttccaatat agaatcacgagtaagtcgagcctgaaatttccaatat agacccaaataatgatcatttctattgtagctggtat agcctgaaatttccaatatcgagacccaaataatgat agcgaatccagtcccggcgatttctccttatcagtca aggaaggcctcatacctagtaattacatagaaatgaa aggttctgcgcgatgcccaaagcaaattcttcctctg agtaagtcgagcctgaaatttccaatatcgagaccca ataacacattgaggtcatgtttgcgtacgatgccgcc ataatgatcatttctattgtagctggtattacggacg atataatcgaaattctttgccccttcagatgccccga atatcatcggacggcgagcgtgtcccggtcgcaggat atcaagtcccggcgatttctccttatcagtcaagtga atgagtttgctcagcaaacatgcagattcattttaaa atgcccattaagatccttttttccacaaatcaagctt atgctgagaagctgctgtcaaataagcacgaaggcgc atgtccgaacgattttttttattgcagatgctcgtgc attacggacgcatcacacgcgccgatgctgagaagct attcattatgtccgaacgattttttttattgcagatg atttacaaacccattgatttgtattgcttaaattaac attttttttattgcagatgctcgtgcaggcgctgtac caaacatgcagattcattttaaaaatacaaatttcgc caaacatgcagattcattttaaaaatacaaatttcgc caaataagcacgaaggcgccttcttgatacgcatcag cacagatcgctccgatgagaactggtggaacggcgag cacccatttgaactataaacaaacagcgatgccaaga cagcgaatccagtcccggcgatttctccttatcagtc caggatgtcaaattgcgtgatatgatacctgaagagg catcccttctttttacacagatattaaatatggaaga catgtttgcgtacgatgccgccgccaccgcccaccca catttacaaacccattgatttgtattgcttaaattaa ccaagagtattagattggattcattatgtccgaacga ccacaggaatccggggaattggacttcagacgcggcg ccaccgcccacccatttgaactataaacaaacagcga cccgatggcgttcagcatttcaaggttctgcgcgatg cctcaacgaactggtcgaatatcatcggacggcgagc ccttatcagtcaagtgagttgcgcttgaatggcttca cgaatccagtcccggcgatttctccttatcagtcaag cgagacccaaataatgatcatttctattgtagctggt cgcatcagcgaatccagtcccggcgatttctccttat cgcggcgatgtcatcaccgtcacagatcgctccgatg cgtacgatgccgccgccaccgcccacccatttgaact cgtccattccattccattttgatgtgatgagtttgct cgtccattccattccattttgatgtgatgagtttgct cgtgtcccggtcgcaggatgtcaaattgcgtgatatg ctattgtagctggtattacggacgcatcacacgcgcc ctctgggtggtcaagttcaactccctcaacgaactgg ctgagaagctgctgtcaaataagcacgaaggcgcctt ctgggtggtcaagttcaactccctcaacgaactggtc cttacatcccttctttttacacagatattaaatatgg cttcagacgcggcgatgtcatcaccgtcacagatcgc ctttattagttatttaatcaaaaatatggttcattac ctttcatatttatgattgaaaaccgtatataatcgaa gaaatgaagaatcacgagtaagtcgagcctgaaattt gaactggtcgaatatcatcggacggcgagcgtgtccc gaagacgattcaaattggtatcgcgcggagctggatg gagttgcgcttgaatggcttcagctttcatatttatg gagttgcgcttgaatggcttcagctttcatatttatg gatcatttctattgtagctggtattacggacgcatca gatgccccgatggcgttcagcatttcaaggttctgcg gcaaacatgcagattcattttaaaaatacaaatttcg gcataacacattgaggtcatgtttgcgtacgatgccg gccaagagtattagattggattcattatgtccgaacg gccaccgcccacccatttgaactataaacaaacagcg gcccaaagcaaattcttcctctgggtggtcaagttca gcccattaagatccttttttccacaaatcaagctttc gcgagcgtgtcccggtcgcaggatgtcaaattgcgtg gcgatgtcatcaccgtcacagatcgctccgatgagaa gcgctgtacgattttgtgccacaggaatccggggaat gcgctgtacgattttgtgccacaggaatccggggaat gcgcttgaatggcttcagctttcatatttatgattga gcgtacgatgccgccgccaccgcccacccatttgaac gctgctgtcaaataagcacgaaggcgccttcttgata ggaaaggaaggcctcatacctagtaattacatagaaa ggacgcatcacacgcgccgatgctgagaagctgctgt ggcgctgtacgattttgtgccacaggaatccggggaa gggtggtcaagttcaactccctcaacgaactggtcga ggtattacggacgcatcacacgcgccgatgctgagaa gtgcaggcgctgtacgattttgtgccacaggaatccg gtgccacaggaatccggggaattggacttcagacgcg gtgccacaggaatccggggaattggacttcagacgcg gtgtcccggtcgcaggatgtcaaattgcgtgatatga gttcagcatttcaaggttctgcgcgatgcccaaagca taaacaaacagcgatgccaagagtattagattggatt taacacattgaggtcatgtttgcgtacgatgccgccg taattttatgcccattaagatccttttttccacaaat tacaaacccattgatttgtattgcttaaattaactta tacctagtaattacatagaaatgaagaatcacgagta tacctgaagaggtgggtgtgccgtccattccattcca tatataatcgaaattctttgccccttcagatgccccg tatcgagacccaaataatgatcatttctattgtagct tatgcccattaagatccttttttccacaaatcaagct tatggaagacgattcaaattggtatcgcgcggagctg tcaaattgcgtgatatgatacctcaagaggtgggtgt tcaaattgcgtgatatgatacctgaagaggtgggtgt tcattacatttacaaacccattgatttgtattgctta tcccttctttttacacagatattaaatatggaagacg tcttgatacgcatcagcgaatccagtcccggcgattt tctttattagttatttaatcaaaaatatggttcatta tgaaaaccgtatataatcgaaattctttgccccttca tgaagaatcacgagtaagtcgagcctgaaatttccaa tgagaagctgctgtcaaataagcacgaaggcgccttc tgatgtgatgagtttgctcagcaaacatgcagattca tgccgtccattccattccattttgatgtgatgagttt tggtatcgcgcggagctggatggaaaggaaggcctca tgtcaaataagcacgaaggcgccttcttgatacgcat tgtcatcaccgtcacagatcgctccgatgagaactgg tgtgatgagtttgctcagcaaacatgcagattcattt ttatgtccgaacgattttttttattgcagatgctcgt ttcaaggttctgcgcgatgcccaaagcaaattcttcc ttctttattagttatttaatcaaaaatatggttcatt ttctttattagttatttaatcaaaaatatggttcatt ttctttgccccttcagatgccccgatggcgttcagca ttgcgcttgaatggcttcagctttcatatttatgatt ttgcgcttgaatggcttcagctttcatatttatgatt ttggacttcagacgcggcgatgtcatcaccgtcacag tttattagttatttaatcaaaaatatggttcattaca tttcaaggttctgcgcgatgcccaaagcaaattcttc ttttaaaaatacaaatttcgcataacacattgaggtc ttttatgcccattaagatccttttttccacaaatcaa ttttattgcagatgctcgtgcaggcgctgtacgattt ttttttattgcagatgctcgtgcaggcgctgtacgat bi190-2013 PS3. Problem 2. You are studying a nematode gene that governs the attraction of males to aged females. This gene is named cgar-1 for “constitutive generator of attractive response”, pronounced “cougar” for short. Knowing the rough location of the gene from mapping, you find the following predicted gene model in WormBase, with four exons, three introns, and proposed 5’ and 3’ UTRs. You check RNAseq databases and find that this proposed transcript is indeed found within the worm. With this knowledge in hand, you go to the wonderful “million mutation project” and search for strains that bear frameshifting or nonsense-inducing polymorphisms in your gene. You had mapped this gene with the only mutation you know for it, which appears to be a gain of function mutation. You are hoping to find these putative loss-of-function mutations to learn more about cougar. You have tried RNAi against cgar-1, and so you have an idea the phenotype you should get when the gene is knocked out (hermaphrodites no longer attract males). You are in luck and find a strain that bears a premature stop codon shortly after the start codon. You test this strain and find to your surprise that it still attracts males normally! a. Propose a hypothesis for how a premature stop codon could fail to eliminate gene activity. Diagram a new gene model that fits this hypothesis. Well, that was disappointing. Wanting to go for a more sure-fire way to knock out gene activity, you find a strain that has a frame-shifting mutation near the 3’ end of exon 2. To your shock, this strain also has normal male actractivity! b. Propose a hypothesis for how a frameshift mutation in an interior exon could fail to knock out gene activity. Diagram a new gene model that fits this hypothesis. Alright, we’re going all out this time. You find a strain that bears a deletion of the entirety of exons 1 and 2, along with a kilobase of upstream DNA. But this strain still has wildtype activity! You even raise an antibody against the CGAR-1 protein, and upon immunostaining, find that the protein is still found within all of your mutant worms. c. Assuming that there are no other copies of cgar-1 in the genome, explain how a deletion of most of the gene and its promoter could still fail to eliminate its function, and how even antibodies could still detect the protein. Diagram a new gene model that fits this hypothesis. d. Propose experiments you can do on your mutants to provide evidence for or against your hypotheses. You may propose experiments using techniques to detect, measure the length of, or sequence RNA and protein.