(transcription) and looking for Start and Stop codons.

advertisement
12.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5
PLAYING WITH PYTHON: DE-CODING DNA
A brief reminder of concepts we have covered before and which we will
extensively use today.
The range function:
>>> print range(5) [0, 1, 2, 3, 4] >>> The range and for loop functions:
>>> for x in range(5): ... print "AATT" ... AATT AATT AATT AATT AATT >>>
The range, printing the value of x and for loop functions:
>>> for x in range(5): ... print x,"AATT" ... 0 AATT 1 AATT 2 AATT 3 AATT 4 AATT >>> The len function: allows us to count how many characters there are in a string.
>>> my_dna="AATT" >>> print len(my_dna) 4 >>> London Research Institute, Cancer Research UK
1 12.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5
Use the code window to type the programs below. Remember:
Enter the program
DNA
RNA
Save it
Output appears in the shell window
Run it
The DNA bases (letters)
The pairs:
A = Adenine
T = Thymine
C = Cytosine
G = Guanine
A=T
C=G
Protein
0 9 GTATACAAGT GT . . . Guanine
Thymine
Output
Input
Program 1A. Write a program to read a DNA base and list its full name. We
can write what is called a dictionary to make the process faster:
Print number_of_bases
(len function defines this)
Print my_base
Print dna_dictionary[my_base]
my_dna = raw_input("Enter a string of DNA bases, A,G,C or T: ") number_of_bases = len(my_dna) print "Number of bases:” print number_of_bases list_of_positions=range(number_of_bases) dna_dictionary = {"A":"Adenine", "G":"Guanine", "C":"Cytosine", "T":"Thymine"} for base in list_of_positions: my_base=my_dna[base] print my_base print dna_dictionary[my_base] London Research Institute, Cancer Research UK
2 12.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5
Program 1B. A slight modification of the previous program: write a
sentence with the DNA base letter and its full name right adjacent to it.
This is was the output in the shell window after you ran the
previous program
>>> Enter a string of DNA bases, A, G, C, or T ATT A Adenine T Thymine T Thymine >>> Lets create this output
with the new program
A is Adenine
T is Thymine
T is Thymine
Remember this?
Adding strings together and making new “sentences” (concatenation)
>>> a = “Donald” >>> b = “ Duck” >>> c = a + b + “ is coming!” >>> print c Donald Duck is coming! Can you come up with a solution of how to modify the last part of your previous program to
list the DNA base at a particular position and its full name in one line? Feel free to modify
your previous program, save it under a new name and run it.
my_dna = raw_input("Enter a string of DNA bases, A,G,C or T: ") number_of_bases = len(my_dna) print "Number of bases:” print number_of_bases list_of_positions=range(number_of_bases) dna_dictionary = {"A":"Adenine", "G":"Guanine", "C":"Cytosine", "T":"Thymine"} for base in list_of_positions: my_base=my_dna[base] print my_base + " is " + dna_dictionary[my_base] London Research Institute, Cancer Research UK
3 12.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5
Program 2. Remember that DNA consists of two strands. Lets practice
writing a program and designing a new dictionary, which will store the
complementing DNA bases.
DNA
The DNA bases pairs:
RNA
A=T
C=G
Protein
Remember to give your program a name and save it before running it.
You can enter any sequence you wish in my_dna variable. Once you have run the
program, you will receive the complementary DNA strand in the shell window (to
your left will be your input DNA sequence and to the right will be the complementing
strand your program has calculated).
# This is our input sequence my_dna = "CGTATACAAGTATCTGCTCAATTAGTCGACT" # Get the length of the input sequence number_of_bases = len(my_dna) # Generate a list of numbers, # one for each base, starting from zero list_of_positions = range(number_of_bases) # This creates a dictionary of base pairs dna_dictionary = {"A":"T", "T":"A", "G":"C", "C":"G"} # We step through the sequence one base at a time for base in list_of_positions: # Store the base in a new variable called my_base my_base = my_dna[base] # Find the complementary base pair with the dictionary dna_complement = dna_dictionary[my_base] # We use the + symbol to join sequences together print my_base + "-­‐" + dna_complement London Research Institute, Cancer Research UK
4 12.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5
Program 3. DNA to RNA: building a new dictionary.
Use program 2, which you saved and run already, as a template to write the
program below. You will need to write the DNA to RNA dictionary, and modify the
end of the program to give the appropriate output.
my_dna = raw_input("Enter a string of DNA bases, A,G,C or T: ") number_of_bases = len(my_dna) print "Number of bases:” print number_of_bases list_of_positions=range(number_of_bases) dna_dictionary = {"A":"T", "T":"A", RNA dictionary: fill in
"G":"C", the missing code here
"C":"G"} rna_dictionary = for base in list_of_positions: This allows you to
my_base = my_dna[base] see the two DNA
rna_base = rna_dictionary[my_base] strands side by side
print my_base + "-­‐" + rna_base Practice. Use the DNA sequence files from the “DNA and RNA sequences” folder on
your Raspberry Pi. Do you get the same results?
DNA sequence 1
CGTATACAAGTATCTGCTCAATTAGTCGACT GCAUAUGUUCAUAGACGAGUUAAUCAGCUGA DNA sequence 2
DNA RNA TATCTACACGGCGTAAACATTTATTG AUAGAUGUGCCGCAUUUGUAAAUAAC DNA RNA TTACGGGTATACAACGCGACTGTATAATAACATAACT AAUGCCCAUAUGUUGCGCUGACAUAUUAUUGUAUUGA DNA RNA TAATACCCCCTTTTGCTCTGATAGACAATCGGT AUUAUGGGGGAAAACGAGACUAUCUGUUAGCCA DNA RNA DNA sequence 3
DNA sequence 4
London Research Institute, Cancer Research UK
5 12.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5
Program 4. How to read the alphabet one, two, or three letters at a time.
This is very similar to reading the DNA code one, two or three bases at a
time.
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
Comments are in red
and are intended to
help you keep track
what the program
below does
# Input string of letters in the alphabet alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # How many letters? Should be 26, how would you check? length_of_alphabet = len(alphabet) # Generate a list of numbers # One for each letter, starting from zero [0,1,2,.....25] list_of_positions = range(length_of_alphabet) # Step through (for loop) one at a time and print one letter print "One letter at a time" for letter in list_of_positions: print alphabet[letter] # Step through (for loop) one at a time and print two letters print "Two letters at a time" for letter in list_of_positions: print alphabet[letter:letter+2] # Step through (for loop) one at a time and print three letters Fill in the
missing code
Why is reading three DNA bases at a time important? Discuss!
London Research Institute, Cancer Research UK
6 12.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5
Program 5A. Write a similar program to step one RNA base at a time and
print three bases.
Use the comments in red and the previous alphabet example to build the program
below. Use one of the RNA sequence files in the “DNA and RNA sequences” folder on
your Rasp Pi to test your program. Remember to use the code window and save
your program before you run it.
Fill in the
missing code
below the
comments in
red
# Input string of RNA bases (your RNA sequence) rna_seq = # How many RNA bases? length_of_rna = # Generate a list of numbers # one for each letter, starting from zero [0,1,2,.....25] list_of_positions = # Step through one at a time and print three letters Program 5B. Adapt your program to search for the START codon in an RNA
sequence. Use one of the RNA sequences in your folder.
DNA
RNA
Searching for
` the START codon: AUG
Protein
my_rna = raw_input("Enter a string of DNA bases, A,G,C or U: ") number_of_bases = len(my_rna) print "Number of bases:" print number_of_bases for base in range(number_of_bases): if my_rna[base:base+3] == "AUG": print "Found the Start Codon!" else: print "This is not the Start Codon... " GCAUAUGUUCAUAGACGAGUUAAUCAGCUGA RNA START
London Research Institute, Cancer Research UK
7 12.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5
Program 6. Write a similar program to search for the STOP codon in an
RNA sequence. Use one of the RNA sequences saved on your Raspberry Pi.
DNA
RNA
Protein
Searching ` for STOP codons: UAA
UGA
UAG
Modify the program from exercise 2, save it as a new file and run it: my_rna = raw_input("Enter a string of DNA bases, A,G,C or U: ") number_of_bases = len(my_rna) print "Number of bases:" print number_of_bases for base in range(number_of_bases): if my_rna[base:base+3] == "AUG": print "Found the Start Codon!" elif my_rna[base:base+3] == "UAA": print "Found a Stop Codon!" elif my_rna[base:base+3] == "UGA": print "Found a Stop Codon!" elif my_rna[base:base+3] == "UAG": print "Found a Stop Codon!" else: print "Not a start or stop codon" GCAUAUGUUCAUAGACGAGUUAAUCAGCUGA STOP
London Research Institute, Cancer Research UK
8 
Download