Biology intro, Rosalind, DNA and proteins

advertisement
Bioinformatics
Prof. William Stafford Noble
Department of Genome Sciences
Department of Computer Science and Engineering
University of Washington
thabangh@gmail.com
Outline
•
•
•
•
Introductions
Biology background
Rosalind
Sequence alignment
Washington state
2,763 miles
Washington, DC
Scout
Jack
Nan
Introductory survey
͟ p-value
͟ t test
͟ BLAST
͟ Python sys.argv
͟ support vector machine
͟ false discovery rate
͟ dynamic programming
͟ recursion
͟ hierarchical clustering
͟ Wilcoxon test
͟ Python tuple
͟ Smith-Waterman
͟ Bonferroni correction
͟ Python dictionary
Next to each concept, please enter 1, 2 or 3, as follows:
1. I do not know this concept.
2. I have heard of this concept, but could not give a precise definition.
3. I know this concept well.
Do not sign your name.
On the index card, write
1.
2.
3.
4.
5.
6.
Your name
Your email address
Your home country
The year and subject of your bachelors degree
The name of your favorite undergraduate course
Your proficiency (introductory, intermediate,
advanced) in Python, linear algebra, statistics, and
biology
7. An interesting thing you have done
8. One question for me
•
•
•
•
•
•
•
•
•
•
Name: Bill Noble
Email: thabangh@gmail.com
Country: United States
Degree: 1991 Symbolic Systems
Favorite course: Programming Methodology
Python: advanced
Statistics: advanced
Biology: intermediate
Interesting thing I have done: unicycling
Question: …
Course goals
By the end of this course, you
should be able to
• Describe some basic
computational challenges in
bioinformatics.
• Implement and use several
basic algorithms in this
field.
• Describe several advanced
algorithms.
This course will not
• Teach molecular biology
techniques.
• Teach you how to use offthe-shelf bioinformatics
software.
What is life?
• Entropy: the tendency toward disorder.
– Living organisms have low entropy.
– Soil has high entropy.
The cell
• Primary low entropy
compartment
• Tasks:
– Gather energy
– Maintain inside/outside
distinction
• Strategies:
–
–
–
–
Movement
Signal transduction
Energy capture
Reproduction
Biological macromolecules
• Lipids (fat):
– membranes
– energy storage
• Carbohydrates (sugar):
– energy storage
– structure
– cell-cell communication
• Nucleic aids
– genetic material
• Proteins
– workhorses of the cell
The central dogma of molecular
biology
DNA
Transcription
lejeuneusa.org
RNA
Translation
Protein
Video
rcsb.org
4-letter DNA alphabet
• DNA consists of an
alphabet of four
bases
– Adenine
– Cytosine
– Guanine
– Thymine
Rosalind
• Visit http://rosalind.info and create a login.
• Enroll in this class via
http://rosalind.info/classes/enroll/e7948c7e32/
• Solve the problem, “Installing Python”
• Solve the problem, “Counting DNA
nucleotides.”
Reverse complement
T
C
TCAGG
TCAGGTCACAGTT
A
Write down the rest of the DNA sequence.
G
G
AAC
AACTGTGACCTGA
Write down the sequence you get by
reading from the blue strand, starting at
the bottom.
Reverse complement
TCAGGTCACAGTT
|||||||||||||
AACTGTGACCTGA
Rosalind: Complementing a strand of DNA
One-minute response
At the end of each class
• Write for about one minute.
• Provide feedback about the class.
• Was part of the lecture unclear?
• What did you like about the class?
• Do you have unanswered questions?
• Sign your name
I will begin the next class by responding to the oneminute responses
Download