Bio102: Introduction to Cell Biology and Genetics Key for BioSynthesis on Decoding DNA Below is a 200 base-pair (bp) region of the chromosome of the bacterium E. coli. This region of the chromosome includes the gene kdpF, which encodes the shortest known protein in E. coli, a subunit of a membrane transporter. The sequence of each DNA strand is written three times to show that there are three possible reading frames (three ways to read potential codons) on each strand. For example, the sequence ATCCGA could be read ATC|CGA, A|TCC|GA, or AT|CCG|A. How can we identify a gene in a DNA sequence? One way is to look for open reading frames (ORFs): sequences of codons starting with a start codon and ending with a stop codon. An open reading frame of reasonable length could be a gene. In the sequence below, find all of the possible ORFs by circling each start codon, underlining the codons in the open reading frame, and putting a box around the stop codon at the end of the ORF. There can be more than one ORF in each of the six reading frames. 5′ 5′ 5′ 3′ 3′ 3′ TTG CAG CCA GAA TTC TAC CCT TCC GGT ATC ACT TTT AGG CCA CTG GAG GTG CAC TAT TT GCA GCC AGA ATT CTA CCC TTC CGG TAT CAC TTT TAG GCC ACT GGA GGT GCA CTA T T TGC AGC CAG AAT TCT ACC CTT CCG GTA TCA CTT TTA GGC CAC TGG AGG TGC ACT AT AAC GTC GGT CTT AAG ATG GGA AGG CCA TAG TGA AAA TCC GGT GAC CTC CAC GTG ATA AA CGT CGG TCT TAA GAT GGG AAG GCC ATA GTG AAA ATC CGG TGA CCT CCA CGT GAT A A ACG TCG GTC TTA AGA TGG GAA GGC CAT AGT GAA AAT CCG GTG ACC TCC ACG TGA TA 57 57 57 57 57 57 58 58 58 58 58 58 GAG TGC AGG CGT GAT AAC CGG CGT ATT GCT GGT GTT TTT ATT ACT GGG TTA TCT GGT GA GTG CAG GCG TGA TAA CCG GCG TAT TGC TGG TGT TTT TAT TAC TGG GTT ATC TGG T G AGT GCA GGC GTG ATA ACC GGC GTA TTG CTG GTG TTT TTA TTA CTG GGT TAT CTG GT CTC ACG TCC GCA CTA TTG GCC GCA TAA CGA CCA CAA AAA TAA TGA CCC AAT AGA CCA CT CAC GTC CGC ACT ATT GGC CGC ATA ACG ACC ACA AAA ATA ATG ACC CAA TAG ACC A C TCA CGT CCG CAC TAT TGG CCG CAT AAC GAC CAC AAA AAT AAT GAC CCA ATA GAC CA 114 114 114 114 114 114 115 115 115 115 115 115 TTA TGC CCT GAT CAA TGC GGA GGC GTT CTG ATG GCT GCG CAA GGG TTC TTA CTG ATC TT ATG CCC TGA TCA ATG CGG AGG CGT TCT GAT GGC TGC GCA AGG GTT CTT ACT GAT C T TAT GCC CTG ATC AAT GCG GAG GCG TTC TGA TGG CTG CGC AAG GGT TCT TAC TGA TC AAT ACG GGA CTA GTT ACG CCT CCG CAA GAC TAC CGA CGC GTT CCC AAG AAT GAC TAG AA TAC GGG ACT AGT TAC GCC TCC GCA AGA CTA CCG ACG CGT TCC CAA GAA TGA CTA G A ATA CGG GAC TAG TTA CGC CTC CGC AAG ACT ACC GAC GCG TTC CCA AGA ATG ACT AG 171 171 171 171 171 171 172 172 172 172 172 172 GCC ACG TTT TTA CTG GTG TTA ATG GTG C GC CAC GTT TTT ACT GGT GTT AAT GGT GC G CCA CGT TTT TAC TGG TGT TAA TGG TGC CGG TGC AAA AAT GAC CAC AAT TAC CAC G CG GTG CAA AAA TGA CCA CAA TTA CCA CG C GGT GCA AAA ATG ACC ACA ATT ACC ACG 3′ 3′ 3′ 5′ 5′ 5′ 1. Highlight the ORF that you think is most likely to be the kdpF gene with a highlighter pen, or circle it in a different color. Why did you choose this ORF? All of the possible ORFs are shown above, with start codons in dark blue and stop codons in red; the 5 different ORFs are highlighted in yellow, light blue, teal, green and gray. There are only two ORFs with both a start and stop codon within this sequence; the gray one is way too short to be a gene of any kind; the most likely choice is the one highlighted in yellow. 2. Name a DNA sequence that we could look for as additional evidence that one or more of these ORFs is actually a gene that can be translated in E. coli. There would be a promoter sequence somewhere upstream, but it could be fairly far away; the most immediate indication that it’s translated would be a Shine-Dalgarno sequence that the ribosome could bind just before the start codon. The standard Shine-Dalgarno sequence is AGGAGG, and there is a good match for this sequence just before the start of the yellow ORF (boxed in black). 3. If this were a region of eukaryotic DNA, you’d be very unlikely to find an actual gene gene by looking for an ORF. Why is this true? Introns disrupt coding sequences frequently in eukaryotes—almost every gene in higher eukaryotes. There could be stop codons within the intron which would fool us into thinking we’d found an ORF, when actually it continues on after splicing.