HaploReg, RegulomeDB and more on Python programming

advertisement
HaploReg, RegulomeDB and
more on Python programming
Lin Liu
Yang Li
• HaploReg retrieves the ENCODE annotation for the
selected SNP, as well as other SNPs in LD
• Using the “Set Options” tab, the user can configure
values such as the LD threshold and the population used
from 1000 Genomes data used to calculate LD
RegulomeDB
Python programming wrap-up
•
•
•
•
if else
for and while loop
index: starts from 0, different from R
four important data structure:
–
–
–
–
list: a = [1, 2, 3, 4]; a.append(5)
tuple: a = (‘cat’, ‘dog’); a[0], a[1] = a[1], a[0]
dictionary: a = {‘chr1’:{10254:’G’, 13257:’T’}}; a.keys();
sets:
•
•
•
•
•
•
from sets import Set
species = Set([‘hs’, ‘mm’, ‘chimp’])
zoos = Set([‘mm’, ‘wolf’, ‘chimp’])
zoos | species
zoos & species
zoos - species
• Some tricky fact:
– Shallow copy and deep copy
• Shallow copy: a = [1,2,3]; b = a; b[2] = 4; print(a)
• Deep copy:
– from copy import deepcopy
– a = [1, 2, 3]; b = deepcopy(a); b[2] = 4; print(a)
– List comprehension:
• Like in R: loops are slow slow slow
• a = [1, 2, 3]; a = [b + 1 for b in a]; print(a)
• How to read bam (binary) files in python?
– import pybedtools
• How to perform numerical computation in
python?
– import numpy as np
– Include array and matrix calculation, very useful
• How to use shell script in python?
– Get all files in a folder
– import os
– os.listdir(“yourdirectory”)
Object oriented programming
•
Class and objects in python
class HMM:
#constructor
#transition_probs[i, j] is the probability of transitioning to state i from state j
#emission_probs[i, j] is the probability of emitting emission j while in state i
def __init__(self, transition_probs, emission_probs):
self._transition_probs = transition_probs
self._emission_probs = emission_probs
#accessors
def emission_dist(self, emission):
return self._emission_probs[:, emission]
@property
def num_states(self):
return self._transition_probs.shape[0]
@property
def transition_probs(self):
return self._transition_probs
Interface with other programming
language
• Rpy: R and python interface
• cygwin: python and C interface
• When to use python?
– Text manipulation
– Some simple machine learning implementation
(like using matlab)
– Some very well-written package available: PyStan
(Bayesian MCMC sampler), matlablib, pybedtools
etc
• When not to use python:
– Large scale simulation: most often you cannot get
rid of loops
– Statistical analysis: R is much better and well
curated
– Best strategy: C interface python
Some good reference code for python
• Check MACS14 python script
• You can learn how to write a python script
into an executable software from MACS14
Download