Slides - Edwards Lab

advertisement
Advanced Python
Data Structures
BCHB524
2014
Lecture 7
9/17/2014
BCHB524 - 2014 - Edwards
Outline

Revision of list data-structures

Advanced Data-structures

Dictionaries, Sets, Files

Reading, parsing files (codon tables)

Exercises
9/17/2014
BCHB524 - 2014 - Edwards
2
Data-structures: Lists

Compound data-structure:
 Many objects in order numbered from 0
 [] indicates list.

Item access and iteration
Same as for string, "l[i]" for item i
 "for item in l" for each item of the list.
List modification
 items can be changed, added, or deleted.
Range is a list
String ↔ List




9/17/2014
BCHB524 - 2014 - Edwards
3
Python Data-structures:
Dictionaries

Compound data-structure, stores any number
of arbitrary key-value pairs.







9/17/2014
Keys and/or value can be different types
Can be empty
Values can be accessed by key
Keys, values, or pairs can be accessed by
iteration
Values can be changed
Key, value pairs can be added
Key, value pairs can be deleted
BCHB524 - 2014 - Edwards
4
Dictionaries: Syntax and item
access
# Simple dictionary
d = {'a': 1, 'b': 2, 'acdef': 3}
print d
# Access value using its key
print d['a']
# Change value associated with a key
d['acdef'] = 5
print d
# Add value by assigning to a dictionary key
d['newkey'] = 10
print d
9/17/2014
BCHB524 - 2014 - Edwards
5
Dictionaries: Iteration
# Initialize
d = {'a': 1, 'b': 2, 'acdef': 5, 'newkey': 10}
# keys from d
print d.keys()
# values from d
print d.values()
# key-value pairs from d
print d.items()
# Iterate through the keys of d
for k in d.keys():
print k,
print
# Iterate through the key-value pairs of d
for k,v in d.items():
print k,"=",v,
print
9/17/2014
BCHB524 - 2014 - Edwards
6
Dictionaries: Different from
lists?
# Initialize
d = {}
# Add some values, integer keys!
d[0] = 1
d[1] = 2
d[10] = 1000
# See how the dictionary looks
print d
# Test whether a key is in the dictionary
print "Is key 15 in d?",d.has_key(15)
# Access value with key 15 with default -1
print "Value for key 15, or -1:",d.get(15,-1)
# Access value with key 15 - error!
print "Value for key 15:",d[15]
9/17/2014
BCHB524 - 2014 - Edwards
7
Python Data-structures: Sets

Compound data-structure, stores any number
of arbitrary distinct data-items.






9/17/2014
Data-items can be different types
Can be empty
Items can be accessed by iteration only.
Items can be tested for membership.
Items can be added
Items can be deleted
BCHB524 - 2014 - Edwards
8
Sets: Add and Test Elements
# Make an empty set
s = set()
print s
# Add an element, and then a list of elements
s.add('a')
s.update(['b','c','d'])
print s
# Test for membership
print "e is in s",('e' in s)
print "e is not in s",('e' not in s)
print "c is in s",('c' in s)
9/17/2014
BCHB524 - 2014 - Edwards
9
Python Data-structures: Files





Read strings from file, or
Write strings to file.
Get access to strings by iteration.
Write by printing strings to file.
Need to open and close files:

9/17/2014
Need to indicate whether we want to read or write.
BCHB524 - 2014 - Edwards
10
Files: Reading
# Open a file, store "handle" in f
f = open('anthrax_sasp.nuc')
# MAGIC!
print ''.join(f.read().split())
# Close the file.
f.close()
# Slowly, now...
f = open('anthrax_sasp.nuc')
# Store the entire file's contents in s (as string)
s = f.read()
print s
# Split s at whitespace
sl = s.split()
print sl
# Join split s with nothing in between
jl = ''.join(sl)
print jl
# Close the file
f.close()
9/17/2014
BCHB524 - 2014 - Edwards
11
Files: Reading
# Open a file
f = open('anthrax_sasp.nuc')
# Iterate line-by-line
for line in f:
print line
# Close the file
f.close()
# Open a file
f = open('anthrax_sasp.nuc')
# Iterate line-by-line, and accumulate the sequence
seq = ""
for line in f:
seq += line.strip()
print "The sequence is",seq
# Close the file
f.close()
9/17/2014
BCHB524 - 2014 - Edwards
12
DNA Translation

First read a codon table from a file


AAs
Starts
Base1
Base2
Base3
9/17/2014
Codon table from NCBI's on-line taxonomy
resource
Read line by line and use initial word to store 3rd
word appropriately.
=
=
=
=
=
FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
---M---------------M---------------M---------------------------TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
BCHB524 - 2014 - Edwards
13
DNA Translation
f = open('standard.code')
data = {}
for l in f:
sl = l.split()
key = sl[0]
value = sl[2]
data[key] = value
f.close()
b1
b2
b3
aa
st
=
=
=
=
=
data['Base1']
data['Base2']
data['Base3']
data['AAs']
data['Starts']
codons = {}
init = {}
n = len(aa)
for i in range(n):
codon = b1[i] + b2[i] + b3[i]
codons[codon] = aa[i]
init[codon] = (st[i] == 'M')
9/17/2014
BCHB524 - 2014 - Edwards
14
DNA Translation
f = open('anthrax_sasp.nuc')
seq = ''.join(f.read().split())
f.close()
seqlen = len(seq)
aaseq = []
for i in range(0,seqlen,3):
codon = seq[i:i+3]
aa = codons[codon]
aaseq.append(aa)
print ''.join(aaseq)
9/17/2014
BCHB524 - 2014 - Edwards
15
Exercise 1

Using just the concepts introduced so far, find
as many ways as possible to code DNA
reverse complement (at least 3!)





You may use any built-in function or string or list
method.
You may use only basic data-types and lists and
dictionaries.
Compare and critique each technique for
robustness, speed, and correctness.
Prize for the most original solution
Prize for the most (different) solutions
9/17/2014
BCHB524 - 2014 - Edwards
16
Exercise 2

Write a program that takes a codon table file (such
as standard.code from the lecture) and a file
containing nucleotide sequence (anthrax_sasp.nuc)
as command-line arguments, and outputs the
amino-acid sequence.


9/17/2014
Modify your program to indicate whether or not the initial
codon is consistent with the codon table's start codons.
Use NCBI's taxonomy resource to look up and download
the correct codon table for the anthrax bacterium. Re-run
your program using the correct codon table. Is the initial
codon of the anthrax SASP gene a valid translation start
site?
BCHB524 - 2014 - Edwards
17
Homework 4






Due Monday, September 22.
Submit using Blackboard
Use only the techniques introduced so far.
Make sure you can run the programs
demonstrated in lecture(s).
Exercises 1, 2 from Lecture 7
Rosalind exercises 8, 9
9/17/2014
BCHB524 - 2014 - Edwards
18
Download