What is Computer Science About? Part 1: Computational Thinking Main Points • There is more to computer science than just programming. • Computer science is about computational thinking and algorithms. • We try to formalize activities into repeatable procedures and concrete decisions. • Generalizing a procedure into an abstract algorithm helps us recognize if there are known solutions, and how complex the problem is. • Programming is just translating an algorithm into a specific syntax. Some definitions... • Computational thinking – translating processes/procedures into step-by-step activities with well-defined choice points and decision criteria • Design and analysis of algorithms – expression of a procedure in terms of operations on abstract data structures like graphs, lists, strings, and trees – finite number of steps (clear termination conditions; it has to halt) – is the algorithm correct? – are all cases handled, or might it fail on certain inputs? – how much time will it take? how much space (memory)? • Programming – translating algorithms into a specific language • Software engineering – managing the development and life-cycle of a system, including design/specification, documentation, testing, use of components/libraries, release of new/updated versions – usually a team effort Computational Thinking • CT has infused into all kinds of fields from cooking and sports, to transportation, medical diagnosis, and particle physics • many intelligent activities are often ill-defined, and CT is about formalizing them into concrete decisions and repeatable procedures – think about how to find a good place to eat in a new town • ask a friend? desired type of food? consult Zagat’s? look for restaurant with many cars in the parking lot? – think about how choose a book to read • interest? availability? recommendations? reviews? – “finding Waldo” (how do you search for shapes in images?) Google’s ideas on Computational Thinking http://www.google.com/edu/computational-thinking/what-is-ct.html Four components: Example: baking a cake Computationally: DECOMPOSION breaking a problem into (decoupled) sub-problems mixing dry ingredients, then wet ingredients divide-and-conquer PATTERN RECOGNITION identifying repeatable operations crack egg1, crack egg2, crack egg3... for/while-loops, sub-routines GENERALIZATION and ABSTRACTION baking chocolate cake or carrot cake or pound cake is similar, except add/substitute a few different ingredients adding parameters to code; also, can we apply the same procedure to other data like vectors, arrays, lists, trees, graphs? ALGORITHM DESIGN formalize procedure into recipe others can use; define things like how you know when it is done (bake 30 min at 350 or until crust is “golden”...) step-by-step procedure with clear initialization, decision and termination conditions • mechanical analogies – think about how a thermostat does temperature control • mechanical analogies – think about how a thermostat does temperature control • what are the actions that can be taken? • what are the conditions under which these actions are triggered? • what parameters affect these decisions? • mechanical analogies – think about how a thermostat does temperature control • what are the actions that can be taken? • what are the conditions under which these actions are triggered? • what parameters affect these decisions? let T be the desired or “control” temperature if temp>T, turn on AirConditioner if temp<T, turn on Heater • mechanical analogies – think about how a thermostat does temperature control • what are the actions that can be taken? • what are the conditions under which these actions are triggered? • what parameters affect these decisions? let D be the acceptable range of temp. variation if temp>T+D, turn on AirConditioner if temp<T-D, turn on Heater • mechanical analogies – think about how a thermostat does temperature control • what are the actions that can be taken? • what are the conditions under which these actions are triggered? • what parameters affect these decisions? if if if if temp>T, turn on AirConditioner temp<T, turn on Heater AC on and temp<T, turn AC off HE on and temp>T, turn HE off • mechanical analogies – think about how a thermostat does temperature control • what are the actions that can be taken? • what are the conditions under which these actions are triggered? • what parameters affect these decisions? – think about how a soda machine works • keep accepting coins till enough for item • dispense item (if avail.), then make change – think about the decision policy for an elevator – think about the pattern of traffic signals at an intersection (with sensors) • how do lights depend on cars waiting? pedestrians? • ultimately, we formalize these things into: – flowcharts and pseudocode – abstractions like finite-state machines procedure bubbleSort( list A ) repeat swapped = false for i = 1 to length(A)-1 do: if A[i] > A[i+1] then swap( A[i], A[i+1] ) swapped = true until not swapped finite-state machine representing a turnstile 1 1 3 3 7 7 9 2 2 9 11 11 13 13 17 17 these play a big role in compilers, network protocols, etc. The “earliest” known algorithm • Euclid’s algorithm for determining the GCD (greatest common denominator) – also known as the Chinese Remainder Theorem • problem: given two integers m and n, find the largest integer d that divides each of them • example: 4 divides 112 and 40; is it the GCD? (no, 8 is) • Euclid’s algorithm: repeatedly divide the smaller into the larger number and replace with the remainder GCD(a,b): if a<b, swap a and b while b>0: let r be the remainder of a/b a←b, b←r return a 1. 2. 3. 4. a=112, b=40, a/b=2 with rem. 32 a=40, b=32, a/b=1 with rem. 8 a=32, b=8, a/b=4 with rem. 0 a=8, b=0, return 8 • questions a Computer Scientist would ask: – Does it halt? (note how a always shrinks with each pass). – Is it correct? – Is there a more efficient way to do it (that uses fewer steps)? – Relationship to factoring and testing for prime numbers. ...UCLA 92 Stanford 80 – OklaSt 55 Iowa 61 – Indiana 83 MichSt 82 ... • while monitoring a stream of basketball scores, keep track of the 3 highest scores – impractical to just save them all and sort – how would you do it? ...UCLA 92 Stanford 80 – OklaSt 55 Iowa 61 – Indiana 83 MichSt 82 ... • algorithm design often starts with representation – imagine keeping 3 slots, A B C for the highest scores seen so far – define the “semantics” or an “invariant” to maintain: • A > B > C > all other scores ...UCLA 92 Stanford 80 – OklaSt 55 Iowa 61 – Indiana 83 MichSt 82 ... • algorithm design often starts with representation – imagine keeping 3 slots, A B C for the highest scores seen so far – define the “semantics” or an “invariant” to maintain: • A > B > C > all other scores – with each new game score (p,q) (e.g. Aggies 118, Longhorns 90) if p>A then C=B, B=A, A=p else if p>B, then C=B, B=p else if p>C, then C=p repeat this “shifting” with q p A B A p B A B p ...UCLA 92 Stanford 80 – OklaSt 55 Iowa 61 – Indiana 83 MichSt 82 ... • algorithm design often starts with representation – imagine keeping 3 slots, A B C for the highest scores seen so far – define the “semantics” or an “invariant” to maintain: • A > B > C > all other scores – with each new game score (p,q) (e.g. Aggies 118, Longhorns 90) if p>A then C=B, B=A, A=p else if p>B, then C=B, B=p else if p>C, then C=p repeat this “shifting” with q – questions to consider: p A B A p B A B p • what happens on first pass, before A, B, and C are defined? • what happens with ties? • should A, B, and C represent distinct games, or could 2 of them come from the same game? Spell-checking • given a document as a list of words, wi, identify misspelled words and suggest corrections – simple approach: use a dictionary • for each wi, scan dictionary in sorted order – can you do it faster? (doc size N x dict size D) • suppose we sort both lists • sorting algs usually take N log2 N time – example: if doc has ~10,000 words, sort in ~132,000 steps – assume you can call a sort sub-routine (reuse of code) – note: you will learn about different sorting algorithms (and related data structures like trees and hash tables) and analyze their computational efficiency in CSCE 211 • can scan both lists in parallel (takes D steps) – D + N log N < ND Words in a document like the US Declaration of Independence: • abdicated • abolish • abolishing • absolute • absolved • abuses • accommodation • accordingly • accustomed • acquiesce • act • acts • administration • affected • after • against • ages • all • allegiance • alliances • alone • ... Words in the English Dictionary: • ... • achieve • achromatic • acid • acidic note that this • acknowledge list is “denser” • acorn • acoustic • acquaint • acquaintance • acquiesce • acquiescent • acquire • acquisition • acquisitive • acquit • acquittal • acquitting • acre • acreage • acrid • acrimonious • ... • a harder problem: suggesting spelling corrections – requires defining “closest match” • fewest different letters? same length? – occupashun occupation – occurence occurrence occupashun ||||||***| d=3 occupation occupashun ||||****** d=6 occurrence occupashun |||***|**| d=5 occl-usion • “edit distance”: – formal defn: # diff letters + # gap spaces – minimal dist over all possible gap placements calculated by Dynamic Programming • how to efficiently find all words in dictionary with minimal distance? – does context matter? • used as noun or verb (affect vs. effect) • compliment vs. complement • principle vs. principal Summary about Computational Thinking • CT is about transforming (often ill-defined) activities into concrete, well-defined procedures. – a finite sequence of steps anybody could follow, with well-defined decision criteria and termination conditions • take-home message: the following components are important to computational thinking: 1. 2. 3. 4. 5. decomposition identifying patterns and repetition abstraction and generalization choosing a representation for the data defining all decision criteria