Class 38: Fixed Points and Biological Computing

advertisement
Class 38:
Fixed Points
and Biological
Computing
CS200: Computer Science
University of Virginia
Computer Science
David Evans
http://www.cs.virginia.edu/~evans
Menu
• Making Recursive Definitions without
define
• Computing with DNA
• How Biology Programs
24 April 2002
CS 200 Spring 2002
2
Lambda Calculus
term ::= variable |term term | (term)|  variable . term
-reduction
(renaming)
y. M  v. (M [y  v])
where v does not occur in M.
-reduction
(substitution)
(x. M)N   M [ x  N ]
24 April 2002
CS 200 Spring 2002
3
Lambda Calculus is a Universal Computer
z z
z
z
z
z
z
z
z
z
z
z
z
z
z
z z
z
z
z
), X, L
), #, R
(, #, L
2:
look
for (
1
Start
(, X, R
#, 1, -
HALT
#, 0, -
Finite State Machine
We have this, but
we cheated using 
to make recursive
definitions!
24 April 2002
• Read/Write Infinite Tape

Mutable Lists
• Finite State Machine

Numbers to keep track of state
• Processing

Way of making decisions (if)

Way to keep going
CS 200 Spring 2002
4
Fixed Point Theorem
• The fixed point of a function f, is
a value x such that f(x) = x
• If we can find the fixed point of
our Turing Machine simulator,
then we have something that
keeps going until it halts!
fixed-point TM input
  result of running TM on input
24 April 2002
CS 200 Spring 2002
5
All Lambda Calculus Terms
have Fixed Points!
• For any Lambda Calculus term F,
there exists a Lambda Calculus
Term X such that FX = X
• Proof:
Let W =  x.F(xx) and X = WW.
X = WW = ( x.F(xx))W
  F (WW) = FX
24 April 2002
CS 200 Spring 2002
We can
make F a
parameter!
6
Why of Y?
• Y is  f. WW:
Y   f. ( x.f (xx)) ( x. f (xx))
• Y calculates a fixed point of any
lambda term!
• Hence: we don’t need define to do
recursion!
• Works in Scheme too - check the
“lecture” from the Adventure Game
24 April 2002
CS 200 Spring 2002
7
Lambda Calculus
is Turing Universal!
• All you need is beta-reduction and you can
compute anything
• This is just one way of representing
numbers, if, etc. – many others are
possible
• Integers, booleans, if, while, +, *, =, <,
classes, define, inheritance, etc. are for
wimps! Real programmers only use .
24 April 2002
CS 200 Spring 2002
8
Models of Computation
• Mechanical: Turing Machine
• Symbolic: Lambda Calculus
• Next: Biological
24 April 2002
CS 200 Spring 2002
9
Computing with DNA
Leonard Adleman
(Mathematical
Consultant for
Sneakers), 1995
24 April 2002
CS 200 Spring 2002
10
DNA
• Sequence of
nucleotides: adenine
(A), guanine (G),
cytosine (C), and
thymine (T)
• Two strands, A must
attach to T and G must
attach to C
24 April 2002
CS 200 Spring 2002
G
C
T
A
11
Hamiltonian Path Problem
• Input: a graph, start vertex and end vertex
• Output: either a path from start to end that
touches each vertex in the graph exactly
once, or false indicating no such path
exists
RIC
start: CHO
end: BWI
BWI
CHO
IAD
24 April 2002
CS 200 Spring 2002
Hamiltonian Path
is NP-Complete
12
Encoding The Graph
• Make up a two random 4-nucleotide
sequences for each city:
CHO:
RIC:
IAD:
BWI:
CHO1 = ACTT
RIC1 = TCGG
IAD1 = GGCT
BWI1 = GATC
CHO2 = gcag
RIC2 = actg
IAD2 = atgt
BWI2 = tcca
• If there is a link between two cities (AB),
create a nucleotide sequence: A2B1
CHORIC
RICCHO
24 April 2002
gcagTCGG
actgACTT
CS 200 Spring 2002
Based on Fred Hapgood’s notes
on Adelman’s talk
http://www.mitre.org/research/nanotech/hapgo
od_on_dna.html
13
Encoding The Problem
• Each city nucleotide sequence binds with
its complement (A  T, G  C) :
CHO: CHO1 = ACTT
CHO2 = gcag
CHO’:
TGAA
cgtc
RIC:
TCGGactg
RIC’: AGCCtgac
IAD:
GGCTatgt IAD’ = CCGAtaca
BWI: GATCtcca BWI’ = CTAGaggt
• Mix up all the link and complement DNA
strands – they will bind to show a path!
24 April 2002
CS 200 Spring 2002
14
Path Binding
BWI’
RIC’
IAD’
CHO’
TGAAcgtcCCGAtacaAGCCtgacCTAGaggt
gcagGGCTatgtTCGG actgGATC
CHOIAD IADRIC RICBWI
TCGGactg
RIC
BWI
GATCtcca
CHO
ACTTgcag
IAD
GGCTatgt
24 April 2002
CS 200 Spring 2002
15
Getting the Solution
• Extract DNA strands starting with CHO
and ending with BWI
– Easy way is to remove all strands that do not
start with CHO, and then remove all strands
that do not end with BWI
• Measure remaining strands to find ones
with the right weight (7 * 8 nucleotides)
• Read the sequence from one of these
strands
24 April 2002
CS 200 Spring 2002
16
Why don’t we solve NPComplete problems this way?
• Speed: shaking up the DNA strands does
1014 operations per second ($400M
supercomputer does 1010)
• Memory: we can store information in DNA
at 1 bit per cubic nanometer
• How much DNA would you need?
– Volume of DNA needed grows exponentially
with input size
– To solve ~45 vertices, you need ~20M gallons
24 April 2002
CS 200 Spring 2002
17
DNA-Enhanced PC
24 April 2002
CS 200 Spring 2002
18
How does Nature program?
24 April 2002
CS 200 Spring 2002
19
How Big is the
Make-a-Human Program?
• 3 Billion Base Pairs
– Each nucleotide is 2 bits (4 possibilities)
– 3 B pairs * 1 byte/4 pairs = 750 MB
1 CD ~ 650 MB
24 April 2002
CS 200 Spring 2002
20
Encoding is Redundant
• DNA encodes proteins
• Every sequence of 3 base pairs one of 20
amino acids (or stop codon)
– 21 possible codons, but 43 = 64 possible
values
– So, really only 750MB * (21/64) ~ 250 MB
24 April 2002
CS 200 Spring 2002
21
People are almost all the Same
• Genetic code for 2 humans differs in only
2.1 million bases
– 4 million bits = 0.5 MB
24 April 2002
CS 200 Spring 2002
22
How big is .5 MB?
• 1/3 of a floppy disk
• <1% of Windows 2000
• ~22 times the size of the PS6
adventure game code
24 April 2002
CS 200 Spring 2002
23
Is DNA Really a
Programming Language?
24 April 2002
CS 200 Spring 2002
24
Nerdy Linguist’s Definition
A description of pairs (S, M),
where S stands for sound, or any
kind of surface forms, and M
stands for meaning. A theory of
language must specify the
properties of S and M, and how
they are related.
24 April 2002
CS 200 Spring 2002
25
Programming Language
(Definition from Lecture 1)
A description of pairs (S, M),
where S stands for sound, or
any kind of surface forms, and
M stands for meaning intended
to be read and written by
humans and processed by
machines.
24 April 2002
CS 200 Spring 2002
26
Stuff Programming Languages
are Made Of
• Primitives
codons (sequence of 3 nucleotides that encodes a protein)
• Means of Combination
?? Morphogenesis? Not well understood (by anyone).
This is where most of the expressiveness comes from!
• Means of Abstraction
DNA itself – separate proteins from their encoding
Genes – group DNA by function (sort of)
Chromosomes – package Genes together
Organisms – packages for reproducing Genes
24 April 2002
CS 200 Spring 2002
27
Biology is (becoming) a
subfield of Computer Science
• Biological mechanisms are mostly
understood (proteomics still has a way
to go)
• What is not understood is how those are
combined to create meaning
24 April 2002
CS 200 Spring 2002
28
Charge
• Noon (now): President Casteen’s State of
the University in Old Cabal Hall
– Extra credit question: “Given that Computer
Science is the most liberal art, how come UVa
College students are not able to major in
Computer Science?”
• Friday: review
– Chance to ask questions about anything you
want
24 April 2002
CS 200 Spring 2002
29
Download