PYTHON DICTIONARIES CHAPTER 11 FROM THINK PYTHON HOW TO THINK LIKE A COMPUTER SCIENTIST WHAT IS A REAL DICTIONARY? It’s a book or file that contains the definitions of words. In particular there is a pairing between a word and its definition. You look up the word and get the definition. Can you look up the definition and get the word?? de·moc·ra·cy : a system of government by the whole population or all the eligible members of a state, typically through elected representatives. word : definition is the pairing In Python we can define a variable to contain a dictionary. >>>d1 = {1:’one’,2:’two’,3:’three’,4:’four’} >>> print d1[2] two what is the value that goes with key 2 KEY:VALUE PARING Lets look at a English to German translation table engGer ={‘one’:’eins’,’two’:’zwei’,’three’:’drei’,’four’:’vier’} print engGer[‘three’] drei The keys can be anything you want. How about >>> decToBinary = {0:0,1:1,2:10,3:11,4:100,5:101,6:110} >>> print decToBinary[3] 11 NOTE: Dictionaries are mutable ! THE VALUES() METHOD To see whether something appears as a value in a dictionary, you can use the method values(), which returns the values as a list, and then use the in operator: >>> vals = eng2Ger.values() >>> print vals [’eins’,’zwei’,’drei’,’vier] so you can do things like if ‘seiben’ in vals: #this function returns the number #of key:value pairs print len(eng2Ger) 4 do_something dictionary NOTE: if you put in a key that not there you get an error! i.e. eng2Ger[‘seiben’] throws and exception! DICTIONARY ACCESS IS VERY FAST! You recall we can use the in operator in both lists, sets and dictionaries. If we have a dictionary that contains 1000000 key: value pairs and a list that has 1000000 elements in it the speed of value in dictionary is much faster than element in list This is because dictionaries are implemented in a special way under the hood , so to speak. See Exercise 10.11 DICTIONARY AS A SET OF COUNTERS Suppose you are given a string and you want to count how many times each letter appears. There are several ways you could do it: 1. You could create 26 variables, one for each letter of the alphabet. Then you could traverse the string and, for each character, increment the corresponding counter, probably using a chained conditional. 2. You could create a list with 26 elements. Then you could convert each character to a number (using the built-in function ord), use the number as an index into the list, and increment the appropriate counter. 3. You could create a dictionary with characters as keys and counters as the corresponding values. The first time you see a character, you would add an item LET USE DICTIONARIES def histogram(s): d = dict() >>> h = histogram('brontosaurus') >>> print h {'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1} for c in s: if c not in d: d[c] = 1 The histogram indicates that the letters 'a' and 'b' appear once; 'o' appears twice, and so on. else: d[c] += 1 return d Add the key c to the dictionary and set its value to 1 if not found if it is already there then just increment the value How about doing this for an entire book! or a DNA string LOOPING OVER A DICTIONARY def print_hist(h): for c in h: print c, h[c] Here’s what the output looks like: >>> h = histogram('parrot') >>> print_hist(h) a1 p1 i.e. You can format this anyway you choose. Dictionaries have a method called keys that returns the keys of the dictionary, in no particular order, as a list. r2 t1 o1 How would you do this so they were in alphabetical order? Remember you can sort a list. Lets do it in class! CLICK TO SEE ANSWER def histogram(s): d = dict() for c in s: if c not in d: d[c] = 1 else: d[c] += 1 return d h = histogram(‘bothriolepus') print_hist(h) def print_hist(h): keylist = h.keys() keylist.sort() for c in keylist: print c, h[c] REVERSE LOOKUP Given a dictionary d and a key k, it is easy to find the corresponding value v = d[k]. This operation is called a lookup. But what if you have v and you want to find k? You have two problems: first, there might be more than one key that maps to the value v. Depending on the application, you might be able to pick one, or you might have to make a list that contains all of them. SEARCH THE DICT def reverse_lookup(d, v): for k in d: if d[k] == v: return k raise ValueError no k found such that k:v exists This function is yet another example of the search pattern, but it uses a feature we haven’t seen before, raise. The raise statement causes an exception; in this case it causes a ValueError, which generally indicates that there is something wrong with the value of a parameter. Note: this is slower than the other way. RETURN A LIST OF MATCHING CASES #Returns a list of the keys that give v. If no key gives v then #return the empty list () def reverse_lookup(d, v): r=() for k in d: if d[k] == v: r.append(k) return r RNA AMINO ACID TRANSLATION TABLE DNA_codon { 'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M', 'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T', 'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K', 'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R', 'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L', 'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P', 'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q', 'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R', 'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V', 'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A', 'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E', 'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G', 'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S', 'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L', 'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_', 'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W‘ } # A tricky translation for those of you who love this stuff. def translate( sequence ): """Return the translated protein from 'sequence' assuming +1 reading frame""" return ''.join([DNA_codon.get(sequence[3*i:3*i+3],'X') for i in range(len(sequence)//3)]) ANOTHER WAY (MORE UNDERSTANDABLE) def translate( sequence ): s = '‘ initialize to empty string numcodons = len(sequence)//3 pos=0 for i in range(numcodons): s=s+DNA_codon[sequence[pos:pos+3]] pos+=3 goes to every third char return s pos sequence = ACTGTAAGCCGTACA’