Advanced Python Data Structures BCHB524 2014 Lecture 7 9/17/2014 BCHB524 - 2014 - Edwards Outline Revision of list data-structures Advanced Data-structures Dictionaries, Sets, Files Reading, parsing files (codon tables) Exercises 9/17/2014 BCHB524 - 2014 - Edwards 2 Data-structures: Lists Compound data-structure: Many objects in order numbered from 0 [] indicates list. Item access and iteration Same as for string, "l[i]" for item i "for item in l" for each item of the list. List modification items can be changed, added, or deleted. Range is a list String ↔ List 9/17/2014 BCHB524 - 2014 - Edwards 3 Python Data-structures: Dictionaries Compound data-structure, stores any number of arbitrary key-value pairs. 9/17/2014 Keys and/or value can be different types Can be empty Values can be accessed by key Keys, values, or pairs can be accessed by iteration Values can be changed Key, value pairs can be added Key, value pairs can be deleted BCHB524 - 2014 - Edwards 4 Dictionaries: Syntax and item access # Simple dictionary d = {'a': 1, 'b': 2, 'acdef': 3} print d # Access value using its key print d['a'] # Change value associated with a key d['acdef'] = 5 print d # Add value by assigning to a dictionary key d['newkey'] = 10 print d 9/17/2014 BCHB524 - 2014 - Edwards 5 Dictionaries: Iteration # Initialize d = {'a': 1, 'b': 2, 'acdef': 5, 'newkey': 10} # keys from d print d.keys() # values from d print d.values() # key-value pairs from d print d.items() # Iterate through the keys of d for k in d.keys(): print k, print # Iterate through the key-value pairs of d for k,v in d.items(): print k,"=",v, print 9/17/2014 BCHB524 - 2014 - Edwards 6 Dictionaries: Different from lists? # Initialize d = {} # Add some values, integer keys! d[0] = 1 d[1] = 2 d[10] = 1000 # See how the dictionary looks print d # Test whether a key is in the dictionary print "Is key 15 in d?",d.has_key(15) # Access value with key 15 with default -1 print "Value for key 15, or -1:",d.get(15,-1) # Access value with key 15 - error! print "Value for key 15:",d[15] 9/17/2014 BCHB524 - 2014 - Edwards 7 Python Data-structures: Sets Compound data-structure, stores any number of arbitrary distinct data-items. 9/17/2014 Data-items can be different types Can be empty Items can be accessed by iteration only. Items can be tested for membership. Items can be added Items can be deleted BCHB524 - 2014 - Edwards 8 Sets: Add and Test Elements # Make an empty set s = set() print s # Add an element, and then a list of elements s.add('a') s.update(['b','c','d']) print s # Test for membership print "e is in s",('e' in s) print "e is not in s",('e' not in s) print "c is in s",('c' in s) 9/17/2014 BCHB524 - 2014 - Edwards 9 Python Data-structures: Files Read strings from file, or Write strings to file. Get access to strings by iteration. Write by printing strings to file. Need to open and close files: 9/17/2014 Need to indicate whether we want to read or write. BCHB524 - 2014 - Edwards 10 Files: Reading # Open a file, store "handle" in f f = open('anthrax_sasp.nuc') # MAGIC! print ''.join(f.read().split()) # Close the file. f.close() # Slowly, now... f = open('anthrax_sasp.nuc') # Store the entire file's contents in s (as string) s = f.read() print s # Split s at whitespace sl = s.split() print sl # Join split s with nothing in between jl = ''.join(sl) print jl # Close the file f.close() 9/17/2014 BCHB524 - 2014 - Edwards 11 Files: Reading # Open a file f = open('anthrax_sasp.nuc') # Iterate line-by-line for line in f: print line # Close the file f.close() # Open a file f = open('anthrax_sasp.nuc') # Iterate line-by-line, and accumulate the sequence seq = "" for line in f: seq += line.strip() print "The sequence is",seq # Close the file f.close() 9/17/2014 BCHB524 - 2014 - Edwards 12 DNA Translation First read a codon table from a file AAs Starts Base1 Base2 Base3 9/17/2014 Codon table from NCBI's on-line taxonomy resource Read line by line and use initial word to store 3rd word appropriately. = = = = = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG ---M---------------M---------------M---------------------------TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG BCHB524 - 2014 - Edwards 13 DNA Translation f = open('standard.code') data = {} for l in f: sl = l.split() key = sl[0] value = sl[2] data[key] = value f.close() b1 b2 b3 aa st = = = = = data['Base1'] data['Base2'] data['Base3'] data['AAs'] data['Starts'] codons = {} init = {} n = len(aa) for i in range(n): codon = b1[i] + b2[i] + b3[i] codons[codon] = aa[i] init[codon] = (st[i] == 'M') 9/17/2014 BCHB524 - 2014 - Edwards 14 DNA Translation f = open('anthrax_sasp.nuc') seq = ''.join(f.read().split()) f.close() seqlen = len(seq) aaseq = [] for i in range(0,seqlen,3): codon = seq[i:i+3] aa = codons[codon] aaseq.append(aa) print ''.join(aaseq) 9/17/2014 BCHB524 - 2014 - Edwards 15 Exercise 1 Using just the concepts introduced so far, find as many ways as possible to code DNA reverse complement (at least 3!) You may use any built-in function or string or list method. You may use only basic data-types and lists and dictionaries. Compare and critique each technique for robustness, speed, and correctness. Prize for the most original solution Prize for the most (different) solutions 9/17/2014 BCHB524 - 2014 - Edwards 16 Exercise 2 Write a program that takes a codon table file (such as standard.code from the lecture) and a file containing nucleotide sequence (anthrax_sasp.nuc) as command-line arguments, and outputs the amino-acid sequence. 9/17/2014 Modify your program to indicate whether or not the initial codon is consistent with the codon table's start codons. Use NCBI's taxonomy resource to look up and download the correct codon table for the anthrax bacterium. Re-run your program using the correct codon table. Is the initial codon of the anthrax SASP gene a valid translation start site? BCHB524 - 2014 - Edwards 17 Homework 4 Due Monday, September 22. Submit using Blackboard Use only the techniques introduced so far. Make sure you can run the programs demonstrated in lecture(s). Exercises 1, 2 from Lecture 7 Rosalind exercises 8, 9 9/17/2014 BCHB524 - 2014 - Edwards 18