bi190-2013 Problem Set 3. Due Friday May 17 (5 pm). Problem 1

advertisement
bi190-2013 Problem Set 3. Due Friday May 17 (5 pm).
Problem 1.
You sequence 1 kilobase of genomic DNA and obtain 143 37 bp reads (about
five-fold coverage).
a. Assemble the 37 nt rea ds into one contig. To make it easier, the fragments
are all in the same orientation (DO not reverse complement them!) HINT; You
can use a word-processing program and “Find” to match sequences.
b. Given the extent of coverage, are the anomalies likely more likely sequencing
errors or polymorphism?
c. What organism is this DNA from? [Try BLAST at http://www.ncbi.nlm.nih.gov]
National Center for Biological Information at the National Library of Medicine, NIH
d. What is the closest C. elegans gene? [Try NCBI or www.WormBase.org]
aaaatacaaatttcgcataacacattgaggtcatgtt
aaaccgtatataatcgaaattctttgccccttcagat
aaagcaaattcttcctctgggtggtcaagttcaactc
aaaggaaggcctcatacctagtaattacatagaaatg
aaattggtatcgcgcggagctggatggaaaggaaggc
aagacgattcaaattggtatcgcgcggagctggatgg
aagaggtgggtgtgccgtccattccattccattttga
aagctttctttattagttatttaatcaaaaatatggt
aaggaaggcctcatacctagtaattacatagaaatga
aatatcatcggacggcgagcgtgtcccggtcgcagga
aattaacttacatcccttctttttacacagatattaa
aattttatgcccattaagatccttttttccacaaatc
acagatattaaatatggaagacgattcaaattggtat
acagcgatgccaagagtattagattggattcattatg
acatttacaaacccattgatttgtattgcttaaatta
acccattgatttgtattgcttaaattaacttacatcc
accgtatataatcgaaattctttgccccttcagatgc
acctgaagaggtgggtgtgccgtccattccattccat
actataaacaaacagcgatgccaagagtattagattg
actccctcaacgaactggtcgaatatcatcggacggc
acttacatcccttctttttacacagatattaaatatg
agaatcacgagtaagtcgagcctgaaatttccaatat
agaatcacgagtaagtcgagcctgaaatttccaatat
agacccaaataatgatcatttctattgtagctggtat
agcctgaaatttccaatatcgagacccaaataatgat
agcgaatccagtcccggcgatttctccttatcagtca
aggaaggcctcatacctagtaattacatagaaatgaa
aggttctgcgcgatgcccaaagcaaattcttcctctg
agtaagtcgagcctgaaatttccaatatcgagaccca
ataacacattgaggtcatgtttgcgtacgatgccgcc
ataatgatcatttctattgtagctggtattacggacg
atataatcgaaattctttgccccttcagatgccccga
atatcatcggacggcgagcgtgtcccggtcgcaggat
atcaagtcccggcgatttctccttatcagtcaagtga
atgagtttgctcagcaaacatgcagattcattttaaa
atgcccattaagatccttttttccacaaatcaagctt
atgctgagaagctgctgtcaaataagcacgaaggcgc
atgtccgaacgattttttttattgcagatgctcgtgc
attacggacgcatcacacgcgccgatgctgagaagct
attcattatgtccgaacgattttttttattgcagatg
atttacaaacccattgatttgtattgcttaaattaac
attttttttattgcagatgctcgtgcaggcgctgtac
caaacatgcagattcattttaaaaatacaaatttcgc
caaacatgcagattcattttaaaaatacaaatttcgc
caaataagcacgaaggcgccttcttgatacgcatcag
cacagatcgctccgatgagaactggtggaacggcgag
cacccatttgaactataaacaaacagcgatgccaaga
cagcgaatccagtcccggcgatttctccttatcagtc
caggatgtcaaattgcgtgatatgatacctgaagagg
catcccttctttttacacagatattaaatatggaaga
catgtttgcgtacgatgccgccgccaccgcccaccca
catttacaaacccattgatttgtattgcttaaattaa
ccaagagtattagattggattcattatgtccgaacga
ccacaggaatccggggaattggacttcagacgcggcg
ccaccgcccacccatttgaactataaacaaacagcga
cccgatggcgttcagcatttcaaggttctgcgcgatg
cctcaacgaactggtcgaatatcatcggacggcgagc
ccttatcagtcaagtgagttgcgcttgaatggcttca
cgaatccagtcccggcgatttctccttatcagtcaag
cgagacccaaataatgatcatttctattgtagctggt
cgcatcagcgaatccagtcccggcgatttctccttat
cgcggcgatgtcatcaccgtcacagatcgctccgatg
cgtacgatgccgccgccaccgcccacccatttgaact
cgtccattccattccattttgatgtgatgagtttgct
cgtccattccattccattttgatgtgatgagtttgct
cgtgtcccggtcgcaggatgtcaaattgcgtgatatg
ctattgtagctggtattacggacgcatcacacgcgcc
ctctgggtggtcaagttcaactccctcaacgaactgg
ctgagaagctgctgtcaaataagcacgaaggcgcctt
ctgggtggtcaagttcaactccctcaacgaactggtc
cttacatcccttctttttacacagatattaaatatgg
cttcagacgcggcgatgtcatcaccgtcacagatcgc
ctttattagttatttaatcaaaaatatggttcattac
ctttcatatttatgattgaaaaccgtatataatcgaa
gaaatgaagaatcacgagtaagtcgagcctgaaattt
gaactggtcgaatatcatcggacggcgagcgtgtccc
gaagacgattcaaattggtatcgcgcggagctggatg
gagttgcgcttgaatggcttcagctttcatatttatg
gagttgcgcttgaatggcttcagctttcatatttatg
gatcatttctattgtagctggtattacggacgcatca
gatgccccgatggcgttcagcatttcaaggttctgcg
gcaaacatgcagattcattttaaaaatacaaatttcg
gcataacacattgaggtcatgtttgcgtacgatgccg
gccaagagtattagattggattcattatgtccgaacg
gccaccgcccacccatttgaactataaacaaacagcg
gcccaaagcaaattcttcctctgggtggtcaagttca
gcccattaagatccttttttccacaaatcaagctttc
gcgagcgtgtcccggtcgcaggatgtcaaattgcgtg
gcgatgtcatcaccgtcacagatcgctccgatgagaa
gcgctgtacgattttgtgccacaggaatccggggaat
gcgctgtacgattttgtgccacaggaatccggggaat
gcgcttgaatggcttcagctttcatatttatgattga
gcgtacgatgccgccgccaccgcccacccatttgaac
gctgctgtcaaataagcacgaaggcgccttcttgata
ggaaaggaaggcctcatacctagtaattacatagaaa
ggacgcatcacacgcgccgatgctgagaagctgctgt
ggcgctgtacgattttgtgccacaggaatccggggaa
gggtggtcaagttcaactccctcaacgaactggtcga
ggtattacggacgcatcacacgcgccgatgctgagaa
gtgcaggcgctgtacgattttgtgccacaggaatccg
gtgccacaggaatccggggaattggacttcagacgcg
gtgccacaggaatccggggaattggacttcagacgcg
gtgtcccggtcgcaggatgtcaaattgcgtgatatga
gttcagcatttcaaggttctgcgcgatgcccaaagca
taaacaaacagcgatgccaagagtattagattggatt
taacacattgaggtcatgtttgcgtacgatgccgccg
taattttatgcccattaagatccttttttccacaaat
tacaaacccattgatttgtattgcttaaattaactta
tacctagtaattacatagaaatgaagaatcacgagta
tacctgaagaggtgggtgtgccgtccattccattcca
tatataatcgaaattctttgccccttcagatgccccg
tatcgagacccaaataatgatcatttctattgtagct
tatgcccattaagatccttttttccacaaatcaagct
tatggaagacgattcaaattggtatcgcgcggagctg
tcaaattgcgtgatatgatacctcaagaggtgggtgt
tcaaattgcgtgatatgatacctgaagaggtgggtgt
tcattacatttacaaacccattgatttgtattgctta
tcccttctttttacacagatattaaatatggaagacg
tcttgatacgcatcagcgaatccagtcccggcgattt
tctttattagttatttaatcaaaaatatggttcatta
tgaaaaccgtatataatcgaaattctttgccccttca
tgaagaatcacgagtaagtcgagcctgaaatttccaa
tgagaagctgctgtcaaataagcacgaaggcgccttc
tgatgtgatgagtttgctcagcaaacatgcagattca
tgccgtccattccattccattttgatgtgatgagttt
tggtatcgcgcggagctggatggaaaggaaggcctca
tgtcaaataagcacgaaggcgccttcttgatacgcat
tgtcatcaccgtcacagatcgctccgatgagaactgg
tgtgatgagtttgctcagcaaacatgcagattcattt
ttatgtccgaacgattttttttattgcagatgctcgt
ttcaaggttctgcgcgatgcccaaagcaaattcttcc
ttctttattagttatttaatcaaaaatatggttcatt
ttctttattagttatttaatcaaaaatatggttcatt
ttctttgccccttcagatgccccgatggcgttcagca
ttgcgcttgaatggcttcagctttcatatttatgatt
ttgcgcttgaatggcttcagctttcatatttatgatt
ttggacttcagacgcggcgatgtcatcaccgtcacag
tttattagttatttaatcaaaaatatggttcattaca
tttcaaggttctgcgcgatgcccaaagcaaattcttc
ttttaaaaatacaaatttcgcataacacattgaggtc
ttttatgcccattaagatccttttttccacaaatcaa
ttttattgcagatgctcgtgcaggcgctgtacgattt
ttttttattgcagatgctcgtgcaggcgctgtacgat
bi190-2013 PS3.
Problem 2.
You are studying a nematode gene that governs the attraction of males to aged
females. This gene is named cgar-1 for “constitutive generator of attractive
response”, pronounced “cougar” for short. Knowing the rough location of the gene
from mapping, you find the following predicted gene model in WormBase, with four
exons, three introns, and proposed 5’ and 3’ UTRs. You check RNAseq databases and
find that this proposed transcript is indeed found within the worm.
With this knowledge in hand, you go to the wonderful “million mutation project”
and search for strains that bear frameshifting or nonsense-inducing polymorphisms
in your gene.
You had mapped this gene with the only mutation you know for it, which appears to
be a gain of function mutation. You are hoping to find these putative loss-of-function
mutations to learn more about cougar. You have tried RNAi against cgar-1, and so
you have an idea the phenotype you should get when the gene is knocked out
(hermaphrodites no longer attract males).
You are in luck and find a strain that bears a premature stop codon shortly after the
start codon. You test this strain and find to your surprise that it still attracts males
normally!
a. Propose a hypothesis for how a premature stop codon could fail to eliminate gene
activity. Diagram a new gene model that fits this hypothesis.
Well, that was disappointing. Wanting to go for a more sure-fire way to knock out
gene activity, you find a strain that has a frame-shifting mutation near the 3’ end of
exon 2. To your shock, this strain also has normal male actractivity!
b. Propose a hypothesis for how a frameshift mutation in an interior exon could fail to
knock out gene activity. Diagram a new gene model that fits this hypothesis.
Alright, we’re going all out this time. You find a strain that bears a deletion of the
entirety of exons 1 and 2, along with a kilobase of upstream DNA. But this strain still
has wildtype activity! You even raise an antibody against the CGAR-1 protein, and
upon immunostaining, find that the protein is still found within all of your mutant
worms.
c. Assuming that there are no other copies of cgar-1 in the genome, explain how a
deletion of most of the gene and its promoter could still fail to eliminate its function,
and how even antibodies could still detect the protein. Diagram a new gene model
that fits this hypothesis.
d. Propose experiments you can do on your mutants to provide evidence for or against
your hypotheses. You may propose experiments using techniques to detect,
measure the length of, or sequence RNA and protein.
Download