Curing Cancer with Your Cell Phone: Why all Sciences are Becoming Computing Sciences David Evans http://www.cs.virginia.edu/evans Computer Science = Doing Cool Stuff with Computers? College Science Scholars 2 Toaster Science = Doing Cool Stuff with Toasters? College Science Scholars 3 Computer Science • Mathematics is about declarative (“what is”) knowledge; Computer Science is about imperative (“how to”) knowledge • The Study of Information Processes Language – How to describe them – How to predict their properties Logic – How to implement them Engineering quickly, cheaply, and reliably College Science Scholars 4 Most Science is About Information Processes Which came first, the chicken or the egg? How can a (relatively) simple, single cell turn into a chicken? College Science Scholars 5 Agenda • Three Big Ideas: – All Computers are Equally Powerful – Programs are Data, Data are Programs – Many Surprisingly Different Problems are Equally Difficult • One Open Question – Is a machine that can always guess correctly able to solve problems a normal machine can’t? College Science Scholars 6 “Computers” before WWII College Science Scholars 7 Mechanical Computing College Science Scholars 8 Modeling Pencil and Paper ... # C S S A 7 2 3 ... How long should the tape be? “Computing is normally done by writing certain symbols on paper. We may suppose this paper is divided into squares like a child’s arithmetic book.” Alan Turing, On computable numbers, with an application to the Entscheidungsproblem, 1936 College Science Scholars 9 Modeling Brains •Rules for steps •Remember a little “For the present I shall only say that the justification lies in the fact that the human memory is necessarily limited.” Alan Turing College Science Scholars 10 Turing’s Model ... # 1 0 1 1 0 1 1 1 Input: # Write: # Move: Start 1 Input: 1 Write: 1 Move: College Science Scholars 1 1 0 1 1 1 # ... Input: 1 Write: 0 Move: 2 Input: 0 Write: 0 Move: 0 Input: 0 Write: # Move: 3 11 Universal Machine Input Description Result tape of a of Turing running Machine M on M Input Universal Machine A Universal Turing Machine can simulate any Turing Machine running on any Input! College Science Scholars 12 Church-Turing Thesis • All mechanical computers are equally powerful* *Except for practical limits like memory size, time, energy, etc. • There exists a Turing machine that can simulate any mechanical computer • Any computer that is powerful enough to simulate a Turing machine, can simulate any mechanical computer College Science Scholars 13 What This Means • Your cell phone, watch, iPod, etc. has a processor powerful enough to simulate a Turing machine • A Turing machine can simulate the world’s most powerful supercomputer • Thus, your cell phone can simulate the world’s most powerful supercomputer (it’ll just take a lot longer and will run out of memory) College Science Scholars 14 Recap • All Computers are Equally Powerful • Programs are Data, Data are Programs • Many Problems are Equally Difficult – But no one knows how difficult! College Science Scholars 15 A “Hard” Problem? College Science Scholars 16 Generalized Pegboard Puzzle • Input: a configuration of n pegs on a cracker barrel style pegboard (of any size) • Output: if there is a sequence of jumps that leaves a single peg, output that sequence of jumps. Otherwise, output false. Is this a “hard” problem? College Science Scholars 17 Solving Problems • A solution to a problem instance: given a pegboard configuration, here’s the sequence of jumps • A solution to a problem: a procedure that (1) always finds the correct answer, and (2) always finishes. College Science Scholars 18 “Brute Force” Solvers • Enumerate all possible answers – Every possible sequence of jumps • Try them all until you find one that works – Simulate the jumps • This works for almost all problems! • Problem: how long does it take? College Science Scholars 19 Problem Solving Time 1200 1000 Time 800 ~ 2n 600 ~ n3 400 200 ~ n 0 1 2 College Science Scholars 3 4 5 6 7 8 Problem Input Size 9 10 20 Increasing Problem Size 70000 ~ 2n 60000 50000 40000 30000 20000 10000 ~ n3 0 1 2 3 College Science Scholars 4 5 6 7 8 9 10 11 12 13 14 15 16 21 Tractable and Intractable Problems 1200000 1000000 800000 “intractable” 600000 400000 “tractable” 200000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 I do nothing that a man of unlimited funds, superb physical endurance, and maximum scientific knowledge could not do. – Batman (may be able to solve intractable problems, but computer scientists can only solve tractable ones for large n) College Science Scholars 22 This makes a huge difference! time since “Big Bang” ~ 2n 1E+30 1E+28 1E+26 1E+24 1E+22 1E+20 1E+18 1E+16 1E+14 1E+12 2032 today 1E+10 1E+08 100000 0 10000 ~ n3 100 1 2 4 8 16 32 64 128 log-log scale College Science Scholars 23 Back to the Pegboard... • A brute force solution is easy...but on the pink line • Is there a tractable solution? 1E+30 1E+28 1E+26 1E+24 1E+22 1E+20 1E+18 1E+16 1E+14 1E+12 1E+10 1E+08 100000 0 10000 100 1 2 College Science Scholars 4 8 16 32 64 128 24 Deciding a Problem Is Hard • “I tried really hard and still couldn’t solve it.” – Maybe the speaker isn’t smart enough – Maybe a few days more effort will find it • “Lots of really smart people tried really hard and no one could solve it.” • “It seems sort of like this other problem that we think is hard...” College Science Scholars 25 Output Input Reduction Input Transformer Pegboard Solver Output Transformer jumps Hard Problem Solver College Science Scholars 26 Reading the Genome Whitehead Institute, MIT College Science Scholars 27 Gene Reading Machines • One read: about 700 base pairs • But…don’t know where they are on the chromosome Read 3 TACCCGTGATCCA Read 2 Read 1 Actual Genome College Science Scholars TCCAGAATAA ACCAGAATACC ACCAGAATACCCGTGATCCAGAATAA 28 Genome Assembly Read 1 ACCAGAATACC Read 2 TCCAGAATAA Read 3 TACCCGTGATCCA Input: Genome fragments (but without knowing where they are from) Ouput: The full genome College Science Scholars 29 Genome Assembly Read 1 ACCAGAATACC Read 2 TCCAGAATAA Read 3 TACCCGTGATCCA Input: Genome fragments (but without knowing where they are from) Ouput: The smallest genome sequence such that all the fragments are substrings. College Science Scholars 30 Genome Assembly Solver ACCAGAATACC TCCAGAATAA TACCCGTGATCCA ACCAGAATACCCGTGATCCAGAATAA (~30M reads, ~900 bp) College Science Scholars Input Transformer Pegboard Solver Output Transformer jumps Genome Assembly Solver 31 What This Means • We already know the shortest common superstring (genome assembly) problem is “hard” • The pegboard problem must also be hard, since we could use a solver for it to solve the genome assembly problem – Requires: we can build fast transformers that don’t increase the problem size exponentially College Science Scholars 32 Non-Deterministic Machines ... # 1 0 1 1 0 1 1 1 0 Input: # Write: # Move: Start 1 Input: 1 Input: 0 Write: 1 Write: 0 Move: Move: College Science Scholars 1 1 0 1 1 1 # ... Input: 1 Write: 0 Move: Input: # Write: 0 Move: 4 2 3 33 Non-Deterministic Machine • Everytime there is a choice, it can guess the correct choice without looking ahead • If we had such a machine, solving Pegboard (or Genome Assembly, etc.) problem would be easy: – It can guess the solution one step (alignment) at a time College Science Scholars 34 Big Open Question Is a nondeterministic machine able to solve problems that are intractable on a deterministic machine? 1E+30 1E+28 1E+26 1E+24 1E+22 1E+20 1E+18 1E+16 1E+14 1E+12 1E+10 1E+08 100000 0 10000 100 1 2 4 8 16 32 64 128 Seems obvious that the magic guess correctly ability should be useful...but no one knows for sure! College Science Scholars 35 Recap • “P vs NP” problem (one of the millennium prize problems) • Solving the pegboard puzzle is equivalent to solving genome assembly • With a non-deterministic machine, we could solve both • With a mechanical computer, we don’t know if a tractable solution exists (but can’t prove it doesn’t): We don’t know if checking a solution is really easier than finding it College Science Scholars 36 Summary • Computer Science is the study of information processes: all about problem solving • Many seemingly paradoxical results: – All Computers are Equally Powerful! – Many Surprisingly Different Problems are Equivalent! • And seemingly obvious open problems: – Is checking a solution is really easier than finding it? College Science Scholars 37 Computer Science at UVa • New Interdisciplinary Major in Computer Science for A&S students (approved last year) • Take CS150 this Spring – Every scientist needs to understand computing, not just as a tool but as a way of thinking • Lots of opportunities to get involved in research groups College Science Scholars 38 My Research Group • Computer Security: computing in the presence of adversaries • Recent student projects: – Proof that the Pegboard puzzle is hard (Mike Peck and Chris Frost) – Disk-level virus detection (Adrienne Felt) – Web Application Security (Sam Guarnieri) – N-Variant Systems: run variants of a program simultaneously (Sean Talts) College Science Scholars 39 Questions http://www.cs.virginia.edu/evans evans@cs.virginia.edu College Science Scholars 40