12.11.2014 Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5 PLAYING WITH PYTHON: DE-CODING DNA A brief reminder of concepts we have covered before and which we will extensively use today. The range function: >>> print range(5) [0, 1, 2, 3, 4] >>> The range and for loop functions: >>> for x in range(5): ... print "AATT" ... AATT AATT AATT AATT AATT >>> The range, printing the value of x and for loop functions: >>> for x in range(5): ... print x,"AATT" ... 0 AATT 1 AATT 2 AATT 3 AATT 4 AATT >>> The len function: allows us to count how many characters there are in a string. >>> my_dna="AATT" >>> print len(my_dna) 4 >>> London Research Institute, Cancer Research UK 1 12.11.2014 Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5 Use the code window to type the programs below. Remember: Enter the program DNA RNA Save it Output appears in the shell window Run it The DNA bases (letters) The pairs: A = Adenine T = Thymine C = Cytosine G = Guanine A=T C=G Protein 0 9 GTATACAAGT GT . . . Guanine Thymine Output Input Program 1A. Write a program to read a DNA base and list its full name. We can write what is called a dictionary to make the process faster: Print number_of_bases (len function defines this) Print my_base Print dna_dictionary[my_base] my_dna = raw_input("Enter a string of DNA bases, A,G,C or T: ") number_of_bases = len(my_dna) print "Number of bases:” print number_of_bases list_of_positions=range(number_of_bases) dna_dictionary = {"A":"Adenine", "G":"Guanine", "C":"Cytosine", "T":"Thymine"} for base in list_of_positions: my_base=my_dna[base] print my_base print dna_dictionary[my_base] London Research Institute, Cancer Research UK 2 12.11.2014 Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5 Program 1B. A slight modification of the previous program: write a sentence with the DNA base letter and its full name right adjacent to it. This is was the output in the shell window after you ran the previous program >>> Enter a string of DNA bases, A, G, C, or T ATT A Adenine T Thymine T Thymine >>> Lets create this output with the new program A is Adenine T is Thymine T is Thymine Remember this? Adding strings together and making new “sentences” (concatenation) >>> a = “Donald” >>> b = “ Duck” >>> c = a + b + “ is coming!” >>> print c Donald Duck is coming! Can you come up with a solution of how to modify the last part of your previous program to list the DNA base at a particular position and its full name in one line? Feel free to modify your previous program, save it under a new name and run it. my_dna = raw_input("Enter a string of DNA bases, A,G,C or T: ") number_of_bases = len(my_dna) print "Number of bases:” print number_of_bases list_of_positions=range(number_of_bases) dna_dictionary = {"A":"Adenine", "G":"Guanine", "C":"Cytosine", "T":"Thymine"} for base in list_of_positions: my_base=my_dna[base] print my_base + " is " + dna_dictionary[my_base] London Research Institute, Cancer Research UK 3 12.11.2014 Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5 Program 2. Remember that DNA consists of two strands. Lets practice writing a program and designing a new dictionary, which will store the complementing DNA bases. DNA The DNA bases pairs: RNA A=T C=G Protein Remember to give your program a name and save it before running it. You can enter any sequence you wish in my_dna variable. Once you have run the program, you will receive the complementary DNA strand in the shell window (to your left will be your input DNA sequence and to the right will be the complementing strand your program has calculated). # This is our input sequence my_dna = "CGTATACAAGTATCTGCTCAATTAGTCGACT" # Get the length of the input sequence number_of_bases = len(my_dna) # Generate a list of numbers, # one for each base, starting from zero list_of_positions = range(number_of_bases) # This creates a dictionary of base pairs dna_dictionary = {"A":"T", "T":"A", "G":"C", "C":"G"} # We step through the sequence one base at a time for base in list_of_positions: # Store the base in a new variable called my_base my_base = my_dna[base] # Find the complementary base pair with the dictionary dna_complement = dna_dictionary[my_base] # We use the + symbol to join sequences together print my_base + "-­‐" + dna_complement London Research Institute, Cancer Research UK 4 12.11.2014 Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5 Program 3. DNA to RNA: building a new dictionary. Use program 2, which you saved and run already, as a template to write the program below. You will need to write the DNA to RNA dictionary, and modify the end of the program to give the appropriate output. my_dna = raw_input("Enter a string of DNA bases, A,G,C or T: ") number_of_bases = len(my_dna) print "Number of bases:” print number_of_bases list_of_positions=range(number_of_bases) dna_dictionary = {"A":"T", "T":"A", RNA dictionary: fill in "G":"C", the missing code here "C":"G"} rna_dictionary = for base in list_of_positions: This allows you to my_base = my_dna[base] see the two DNA rna_base = rna_dictionary[my_base] strands side by side print my_base + "-­‐" + rna_base Practice. Use the DNA sequence files from the “DNA and RNA sequences” folder on your Raspberry Pi. Do you get the same results? DNA sequence 1 CGTATACAAGTATCTGCTCAATTAGTCGACT GCAUAUGUUCAUAGACGAGUUAAUCAGCUGA DNA sequence 2 DNA RNA TATCTACACGGCGTAAACATTTATTG AUAGAUGUGCCGCAUUUGUAAAUAAC DNA RNA TTACGGGTATACAACGCGACTGTATAATAACATAACT AAUGCCCAUAUGUUGCGCUGACAUAUUAUUGUAUUGA DNA RNA TAATACCCCCTTTTGCTCTGATAGACAATCGGT AUUAUGGGGGAAAACGAGACUAUCUGUUAGCCA DNA RNA DNA sequence 3 DNA sequence 4 London Research Institute, Cancer Research UK 5 12.11.2014 Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5 Program 4. How to read the alphabet one, two, or three letters at a time. This is very similar to reading the DNA code one, two or three bases at a time. alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" Comments are in red and are intended to help you keep track what the program below does # Input string of letters in the alphabet alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # How many letters? Should be 26, how would you check? length_of_alphabet = len(alphabet) # Generate a list of numbers # One for each letter, starting from zero [0,1,2,.....25] list_of_positions = range(length_of_alphabet) # Step through (for loop) one at a time and print one letter print "One letter at a time" for letter in list_of_positions: print alphabet[letter] # Step through (for loop) one at a time and print two letters print "Two letters at a time" for letter in list_of_positions: print alphabet[letter:letter+2] # Step through (for loop) one at a time and print three letters Fill in the missing code Why is reading three DNA bases at a time important? Discuss! London Research Institute, Cancer Research UK 6 12.11.2014 Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5 Program 5A. Write a similar program to step one RNA base at a time and print three bases. Use the comments in red and the previous alphabet example to build the program below. Use one of the RNA sequence files in the “DNA and RNA sequences” folder on your Rasp Pi to test your program. Remember to use the code window and save your program before you run it. Fill in the missing code below the comments in red # Input string of RNA bases (your RNA sequence) rna_seq = # How many RNA bases? length_of_rna = # Generate a list of numbers # one for each letter, starting from zero [0,1,2,.....25] list_of_positions = # Step through one at a time and print three letters Program 5B. Adapt your program to search for the START codon in an RNA sequence. Use one of the RNA sequences in your folder. DNA RNA Searching for ` the START codon: AUG Protein my_rna = raw_input("Enter a string of DNA bases, A,G,C or U: ") number_of_bases = len(my_rna) print "Number of bases:" print number_of_bases for base in range(number_of_bases): if my_rna[base:base+3] == "AUG": print "Found the Start Codon!" else: print "This is not the Start Codon... " GCAUAUGUUCAUAGACGAGUUAAUCAGCUGA RNA START London Research Institute, Cancer Research UK 7 12.11.2014 Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 5 Program 6. Write a similar program to search for the STOP codon in an RNA sequence. Use one of the RNA sequences saved on your Raspberry Pi. DNA RNA Protein Searching ` for STOP codons: UAA UGA UAG Modify the program from exercise 2, save it as a new file and run it: my_rna = raw_input("Enter a string of DNA bases, A,G,C or U: ") number_of_bases = len(my_rna) print "Number of bases:" print number_of_bases for base in range(number_of_bases): if my_rna[base:base+3] == "AUG": print "Found the Start Codon!" elif my_rna[base:base+3] == "UAA": print "Found a Stop Codon!" elif my_rna[base:base+3] == "UGA": print "Found a Stop Codon!" elif my_rna[base:base+3] == "UAG": print "Found a Stop Codon!" else: print "Not a start or stop codon" GCAUAUGUUCAUAGACGAGUUAAUCAGCUGA STOP London Research Institute, Cancer Research UK 8