notes - Computer Science

CS 101 – Jan. 9 • What is this class about? – Using the computer to solve problems – How to make computers useful for people – Understanding what goes on inside (and outside) the machine • Commitment – Find 5 software companies. What are they best known for? Why is their product useful or popular? What is a computer? Memory CPU +* – / % √  Σ file • Machine that stores and processes information • With memory, machine can be programmed. Memory stores software and other files. file • Information inside memory is all in binary! file • CPU’s job is to obey instructions from a program. What is CS? • The study of how we… – – – – Solve problems Classify problems; insight from solutions Represent information, languages How to design better computing system – What is knowledge/information? Comes in various types Software functions What does software need to do? • Input or capture information • Store information for future use • Process: manipulate information and present it in a meaningful form • Output, i.e. allow retrieval of info • Communicate / transmit info to another location • Observe that information can be input to and output from a computer program. What we need • Computer: an electronic device that can process and store information • Communication network • Know-how: the skills needed to make best use of the computer system • In problem solving, we’d like: – Solutions / results useful for people – Find systematic ways of solving problem, so that the method or solution can be adapted to new problems – Speed – Consistency and reliability – Precision Human history • Agricultural age (up to ~ 1800) • Industrial age (1800 – 1950) • Information age (1950 --) – We live in an information-based society. In other words, many tasks are defined in terms of doing something with information. – People and businesses depend on IT to get things done. – Not all countries follow this timeline. Computer History • Analog machines – – – – Abacus Mechanical calculators, adding machines, cash registers Babbage suggested a programmable machine Hollerith adapted Jacquard’s punch cards • Digital machines – ENIAC, ABC, Mark I, Colossus – Became commercially successful in 1950s – Became increasingly affordable by 1980s • Innovations often respond to needs CS 101 – Jan. 11 • Continue overview – Computer origins – Algorithms – Abstraction Before 1940s • Only analog machines, moving parts • Specific purpose – Office machines: typewriter, cash register, adding machine – Herman Hollerith’s tabulator – Sunrise/sunset, celestial navigation • General computing only theoretical interest – Alan Turing 1940s • Code-breaking machines in WW 2 • First general purpose electronic computers – ENIAC, U. of Pennsylvania – ABC, Iowa State – Z3, Konrad Zuse in Germany • Transistor (1947) to have impact later • von Neumann concept forms basis of computer organization US Army photo 1950s & 1960s • Commercially produced computers – UNIVAC – IBM • Programming languages developed to facilitate commands to the machine • Colleges begin to teach computing • Large and expensive, gradually adopted in industry • Moore’s Law 1970s & 1980s • Integrated circuit (1971) allows computers to become much smaller – Intel chips 4004, 8008, 8086, 80286, etc. • Personal (home) computing – Applications for non-specialists: word processing, personal finance, games • Intense competition in HW and SW • Internet initially only used in large companies, universities: low bandwidth 1990s & 2000s • Computer for communication and mass medium • Internet as a virtual library & soapbox • Tech companies (Apple, Microsoft, Intel, Nokia,…) mature and gain clout • Growing need to manage information HW & SW • Hardware – physical computer components – CPU, memory, I/O devices, network • Software – programs that run on machine – Allows computer to do useful work (or play) – Tell the computer exactly what to do – Behind any program is its algorithm • The secret to how it really works • Clearly defined list of steps to solve a problem • Needs to be precise, and spell out details • Analogy: – a restaurant building, versus the actual restaurant Algorithm example • Euclidean algorithm: Given two numbers, find their greatest common divisor…. 1. Let m = larger, and n = smaller number 2. Let r = remainder after dividing m/n 3. If r = 0, then our answer is n, and we’re done. But if r ≠ 0, let m = n, and let n = r, and go to step 2. • Try it out…Do you understand the steps? Does the procedure work? • BTW, an algorithm should also clearly specify its input and output. √ CS 101 – Jan. 13 • Most essential skill: problem solving using the computer – Telling the machine exactly what we want it to do. – Also: making sure result of software is packaged in a way that ordinary people will understand. • Problem solving procedure • What does a solution look like? • Learn by example, and generalize. Review: Algorithm example • Euclidean algorithm: Given two numbers, find their greatest common divisor…. 1. Let m = larger, and n = smaller number 2. Let r = remainder after dividing m/n 3. If r = 0, then our answer is n, and we’re done. But if r ≠ 0, let m = n, and let n = r, and go to step 2. • Try it out…Do you understand the steps? Does the procedure work? • BTW, an algorithm should also clearly specify its input and output. √ Abstraction • Our way to manage complex problems • Big picture first, then the details – Details omitted until they become important – “top-down” design – Ex. Road map • We can study a machine without knowing how to build one Why software? • Want computational power – To have direct control of machine – Sometimes, existing software is not sufficient, doesn’t give what you want – Programs can be useful or fun for people to use (e.g. game, converting data to image, …) • Need to use a computer language – E.g. Javascript, PHP, Python, C++, etc. – Machine independent – Many common calculations are pre-defined, such as sorting, opening files, surfing the Web, creating a form button, etc. Program • One specimen of software is called a computer program • Small or large, purpose is to solve 1 problem. • Works like a recipe – List of necessary ingredients – List of instructions for CPU to obey. • A simple program normally has 3 phases. – Input – Calculations – Output Recipes • Cooking may be a good analogy, because it solves the problem “I’m hungry”  • What do we see in recipes? Here’s one: – – – – – – – Brown the beef 15 min. Drain grease. Dice carrot, celery, onion (aka mirepoix) Cut up and boil 6 potatoes until soft. Mash potatoes Add flour, spices, sauce, mirepoix to beef. Put meat mixture into casserole; top with potatoes. Bake in oven at 400 for 30 minutes. Recipes (2) • A computer program has some of the same elements as a recipe. • In recipes, we see: – Ingredients (the “nouns” of the problem) – Steps to perform (the “verbs”) – In some steps, we continue/wait for something – Sometimes we check things • Are potatoes fully mashed? • Should I add more _____ to the mixture? Recipes inside recipes • But we don’t eat the same stuff every day. Once we know a few recipes, we can put together a menu for choices. if if if if (have all ingredients), make Shepherd’s pie. (no potatoes), just make soup instead. (no veggies), make hamburger. (no beef), make pasta. • When you view a whole menu as a program, then “making soup” becomes a sub-program. – A large program is composed of several parts. – In industry, sometimes each part is implemented by different people, like a kitchen having many chefs. Problem-solving 1. Understand problem; inputs and outputs 2. Write solution in English “pseudo-code” 3. Write code in a programming language 4. Compile 5. Run and test • When program works, can refine or generalize. CS 101 – Jan. 18 • • • • What is a program? √ Problem solving procedure √ Step #2 is most important: write solution (algorithm) in English Structure of solution: – Sequence of steps (1, 2, 3, …) – Sometimes make a choice – Sometimes need to repeat • Examples • Commitment: – Look over practice problems Problems • The earliest problems given to a computer were mathematical. • Sometimes there is no clean formula – Many equations can’t be solved analytically. For example: try cos(x) = x. Need to solve numerically. – Ex. Heat equation is a partial differential equation (PDE). Most PDEs have to be solved numerically. – Ex. Calculating a square root. • Even if there is a clean formula, a computer can help automate the calculations. Problems (2) • What kinds of problems do we solve? – Finding directions – Predicting trends (weather, finance) – Games – Record keeping in a business – Networking and communication – Multimedia (e.g. image processing, animation and graphics) – Compressing and encrypting data – Searching for something (spell check) –… Algorithm • A clear sequence of steps to arrive at a solution to a problem. Must specify: – – – – Ingredients Input, output, variables and operations used The order in which steps are taken Means that anybody should be able to follow your directions. There is no ambiguity. • Ideally, each step should perform 1 calculation: – – – – Input or output of 1 value 1 calculation, or 1 decision to make Calculations usually limited to basic math In your algorithm, tedious details can be put off until later. Explain big picture first. Examples • Discuss in general how you would solve these problems: – – – – Print the numbers from 1 to 100. In this list (3, 2, 7, 5, 4) where is the number 5? Which room contains my umbrella? Have 2 people play Tic-Tac-Toe. • Let’s practice fully with these problems: – Ask the user to enter 2 numbers, and output their sum. – Determine a person’s weekly wage, given the number of hours worked and hourly rate. Don’t forget to figure in overtime. – Add the numbers from 1 to 5. Solutions • Algorithm to add 2 numbers – – – – Ask the user to enter 2 values. Obtain the input, and call the values a and b. Set a new variable sum and set it to: sum = a + b. Output sum. • Weekly wage – Get hours and rate from the user. – Set the wage as follows: • If (hours > 40), use overtime formula • Otherwise, use regular formula – Output wage CS 101 – Jan. 20 • Example problems requiring a loop – “How many iterations?” • Practice problems – See handout – Work in pairs to devise solution to 1 problem – We’ll discuss results • Commitment: – Prepare more solutions to practice problems Solutions (2) • Add up the numbers from 1 to 5 – No input! – Need 2 variables: sum and count • The count variable will go from 1 to 5, one at a time. • The sum will start at 0, and we continually add to the sum. – Sum = 0 – Count = 1 – For each value of count from 1 to 5: • Sum = sum + count – Output sum • If we can add 1-5, we can just as easily add 1-1000! Mystery • What does this algorithm do? – No input. – Create two variables: sum and count. – Sum = 0 – Count = 1 – For each value of count from 1 to 20: • Introduce new variable called temp • Temp = count * count • Sum = sum + temp – Output sum CS 101 – Jan. 23 • Continue working practice problems • Commitment: Consider these problems – How would you find the largest/smallest number in a list? – How would you count the number of values that are positive? – Counting vowels in a word. CS 101 – Jan. 25 • Consider these problems – Review searching – Finding largest/smallest – Counting things in a list – More practice problems Search error • What is wrong with this technique for searching for the value 3 in a list? list = [ 8, 5, 2, 8, 3, 6, 1, 9, 4 for each number in the list if that number is 3 found = true else found = false • How should we fix the mistake? ] Largest / smallest • How would you find the largest number in a list? – Assume first number is the largest. – For each of the other values, ask if it is larger than what we think the largest value is. If so, update the largest value. – Also keep track of the location of the largest value, in case that is also desired. • Finding the smallest number is analogous – What should we change? Pseudocode list = { 5, 7, 4, 2, 3, 8, 1 } largest = list[0] location = 0 for count = 1 to 6 inclusive: if list[count] > largest: largest = list[count] location = count Output the largest and location in a sentence, e.g. “I found the value 8 at position 5.” Counting • To determine how many values in a list are positive… • First, need a separate variable, numPositive, initialized to zero. • Ask each number if it’s positive. If so, add 1 to numPositive. No “else” clause is needed. list = [ -6, 0, 8, 4, 1, -2 ] numPositive = 0 for count = 0 to 5 if list[count] > 0 add 1 to numPositive Output the value numPositive in a sentence. Vowels • A word can be thought of as a list/array of characters. • Need a numVowels variables, initialized to zero. • For each character in the word, see if it’s a vowel (a, e, i, o or u). If so, add 1 to numVowels. word = "serendipity" length = the length of word numVowels = 0 for each letter c in the word if c is a,e,i,o or u add 1 to numVowels Output numVowels in a sentence. Programming note • In Python and other languages, if you need to ask several questions inside in if-condition, you need to separate each one with and/or as appropriate. • Each individual condition must be a complete question. • Illegal: if (c == ‘a’ or ‘e’ or ‘i’ or ‘o’ or ‘u’): if (x > 10 and < 20): • Legal: if (c == ‘a’ or c == ‘e’ or c == ‘i’ or c == ‘o’ or c == ‘u’): if (x > 10 and x < 20): Practice problems • Given a list of numbers, how many are multiples of 5? • Given a year, is it a leap year or not? • Given a number, is it a prime number? • Remaining practice problems from handout. CS 101 – Jan. 27 • Finish practice problems: designing solutions in English • Various ways of sorting • Handout: – Getting to know Python Examples • Given a year, is it a leap year or not? – Julian definition – Gregorian definition • Given a number, is it a prime number? • Remaining practice problems from handout. – Double letter (repeated value in a list) – Recognizing a 4 digit number • We know how to find the largest number in a list  – What about finding the 2nd largest or 3rd largest…? Sorting • Much studied problem in computing – many ways to do it. • Given a list, need to arrange it “in order”. Either ascending or descending. • Some methods do better based on type of data or how the values are distributed. • Enjoy this Web site demo of sorting methods: http://cg.scs.carleton.ca/~morin/misc/sortalg Some Methods • Selection sort: Find the largest value and swap it into first, find 2nd largest value and put it 2nd, etc. • Bubble sort: Scan the list and see which consecutive values are out of order and swap them. • Insertion sort: Place the next element in the correct place by shifting other ones over to make room. • Merge sort: Split list in half until just 1-2 elements. Merge adjacent lists by collating them. Homework • Due Monday, at the start of the lab period • Write a short report. Identify 5 software companies. For each one: – What is the company best known for? – Why is their software popular? In other words, why do people find their programs useful? – Quantify how well the company is doing financially. For example, what is the company’s revenue averaged per employee? – Describe employment opportunities at the company (e.g. according to their Web site). • Use highly reputable sources of information, and include all citations/references. Don’t plagiarize. • Turn in a typed document; aim for about 1000 words total. CS 101 – Jan. 30 • Sorting – Selection and Bubble √ – insertion: place next element in correct place by shifting over other ones to make space – Merge: Repeatedly split list in half until 1-2 elements each. Then merge adjacent lists by collating them. • Computer languages • 3 kinds of software errors Language evolution • Machine language • Assembly language – Like machine language, also unique to each manufacturer • High-level language  – – – – – FORTRAN, COBOL Pascal, Algol, Ada C, C++, C# Java, Javascript, Python many more Example • How would we calculate: 12 + 22 + 32 + … + 202 ? • Let’s create our own solution, and see what the “code” looks like in different types of languages: – Machine language  – Assembly language  – High-level language  Machine language 00003000: 00004000: 00004004: 00004008: 0000400c: 00004010: 00004014: 00004018: 0000401c: 00004020: 00004024: 00000014 200c0001 20080000 3c0a0000 354a3000 8d4a0000 018a4822 1d200005 018c0018 00005812 010b4020 00004028: 0000402c: 00004030: 00004034:  218c0001 08001005 2008000a 0000000c help me! Assembly language numValue: .word 20 __start: addi $12, $0, 1 addi $8, $0, 0 lui $10, 0 ori $10, $10, 0x3000 lw $10, 0($10) while: sub $9, $12, $10 bgtz $3, end mult $12, $12 mflo $11 add $8, $8, $11 addi $12, $12, 1 j while end: addi $8, $0, 10 syscall HLL (Pascal) var sum : integer; count : integer; begin sum := 0; for count := 1 to 20 do sum := sum + count * count; writeln(sum); end. Bugs • Any mistake made in a computer program • Term ‘bug’ coined by Grace Hopper, ca. 1950 • 3 kinds of bugs – Syntax errors – Run-time errors – Logical errors • Beyond bugs, program can be just slow! – Ex. Ineffective ways of finding the divisors of some number. CS 101 – Feb. 1 • Practice the problem-solving procedure  Let’s write these programs: – Example: time zone program – Example: a guessing game • Handouts: – Source code for the examples – Review for test 1 CS 101 – Feb. 6 • Secret communication – Steganography – Cryptography • Goal – Sender has message intended only for recipient. How to protect from detection or eavesdropping? • Definitions • Examples • No lab but please read: – Chapter 1 in The Code Book Steganography • Secret message is hidden inside larger document • How? • Microdotting – requires special equipment – Having this equipment makes you target of suspicion • Sprinkle the message into much larger area – Ex. Add a letter to every 7th page of a book – Slightly modify pixels of an image Cryptography • Modify the message so that if it’s discovered, only the recipient can understand it. • A “cipher system” consists of the following: • Plaintext = the message you want to send/conceal • Ciphertext = what actually gets sent • Encryption and decryption functions – Each function requires a key – To encrypt a message, sender uses the encryption function: inputs are the encryption key and plaintext message. – Receiver decrypts the message: uses the decryption function: inputs are the decryption key & ciphertext. Cryptography (2) • Cryptography is the science of developing good cipher systems – What encryption and decryption functions should we use? – What are good key values? – Should the keys be the same or different? • Cryptanalysis = trying to figure out plaintext from the ciphertext, when you don’t have the key – Who would want to do that? ASCII code • To manipulate text, we need to treat each letter in a message like a number. • “American Standard Code for Information Interchange” • The letters A – Z are assigned the consecutive values 65 thru 90. Examples: – ‘A’ + 10 = ‘K’ – ‘K’ – 10 = ‘A’ • ASCII codes are also assigned for lowercase letters and all other symbols you can type Caesar cipher • The key is a secret number from 1-25. This number gets added to each letter. • Julius Caesar liked the number 3 • How do you decrypt the message? • What happens if we add 3 to X, Y or Z? • Cryptanalysis: How does one try to “break” the code? OAAW FTQ DAMEF UZ HUZQSMD CS 101 – Feb. 8 • • • • Simple methods of encryption Transposition Substitution Better substitution methods • Important fact: the method you use might not be a secret, but the key must be! • Please read chapter 2 in Code Book Transposition • Change the order of the letters in our plaintext message • Easy approach: collect the odd # letters (1st, 3rd, 5th, 7th, …), then the even # letters (2nd, 4th, 6th, 8th, …). – In this case, we say that the period or key is 2. • Can try a larger key, which means a longer period. – Ex. Key = 4 means collect the 1st, 5th, 9th, 13th, … followed by the 2nd, 6th, 10th, 14th, etc. – In other words: Write your message in rows 4 letters long. Add Z’s to end to make last row complete. The ciphertext is obtained by reading columns down! Substitution • Most cipher systems use substitution: instead of moving letters around, change each letter into a different symbol. We can get very creative! • Caesar cipher – Easy to use, but only 26 possible keys (including a bad one) • The general substitution cipher – Not restricted in merely shifting the alphabet – We have 26! keys instead of 26 continued • Instead of adding a constant to each letter, scramble the assignment more randomly • One problem is how to “remember” the key. The key is not a number, but the sequence of 26 letters. – Ex. Identify a key by the letters of some word or phrase. Cryptography, page 26: “we hope you enjoy this book” gives the key: wehopyunjtisbkacdfglmqrvxz • Cryptanalysis? – Newspapers have “cryptogram” puzzles – Uncover a weakness of substitution method – Even worse if the message is long Improvements 1. Nulls: cipher includes symbols/numbers that mean nothing 2. Misspell words: use unusual letters more often 3. Homophonic cipher: common letters can be represented by 1 of several possible values 4. Code words – Pure encoding means we encipher by words, not letters – Nomenclator: just encode certain words 5. Polyalphabetic cipher: use multiple ciphers and alternate. Useful to hide double letters CS 101 – Feb. 10 • Chapter 2 – – – – Pinprick method Great Cipher of Louis XIV Vigenère cipher Book (Beale) cipher • Some implementation details – Caesar cipher – List of lists for homophonic ciphers • Homework due Monday • Please read chapter 3 of Code Book Early examples • Pinprick method – Long history from Ancient Greece… Victorian England – Is this steganography or cryptography? • Great Cipher of Louis XIV – Encode syllables – To thwart the cryptanalyst , also add cipher values for individual letters (even silent ones) Backspace symbol to tell decrypter to delete previous symbol • Homophonic cipher  – E.g. Table on page 53 Vigenère cipher • Example of polyalphabetic cipher • Has the effect of using many Caesar ciphers simultaneously. • The key is a word or phrase • Cycle through the letters of the key, to tell you what should be added to the next plaintext letter – Repeat the key as often as needed. – Singh says subtract 1 from your answers. Piper does not. It doesn’t matter as long as you are consistent. • Considered unbreakable until mid 19th century. Breaking Vigenère • Charles Babbage performed the cryptanalysis, to show this cipher is not perfectly secure • Look for repetitions of short letter sequences • How far apart are they? • Determine length of the key • Since Vigenère is several Caesar ciphers, do each pattern separately (1,6,11,16…) ; (2,7,12,17…); etc. • Compare statistical distribution of letters. Shift it over until it matches A-Z: shift amount is the key. Book cipher • Reminiscent of homophonic cipher: each letter can be represented by 1 of several possible numbers • The key is a long document, e.g. hundreds of words • Number the words 1,2,3… • Take note of first letter of each word – E.g. (1)take (2)note (3)of (4)first (5)letter (6)of (7)each (8)word. In this case, ‘o’ may be enciphered by 3 or 6. Python coding • See handout for Caesar cipher example • In Python, we need to be able to: – Open files for reading and writing – Read entire file into a string variable – For each letter in the file, we need to perform arithmetic on it, before writing it to the output file – Obstacle: Need to convert string’s letter to a number so we can add/subtract; convert back to letter. • To do homophonic encoding… – Key is a list of lists! For each letter A-Z, specify what numbers could represent it. Homework for 2/13 • Select a sentence in the Code Book at least 10 words long. Encrypt it in 2 ways: 1. Use a transposition cipher with a key of 5. 2. Use a Vigenère cipher, with your first name as the key. • Treat capital & lowercase letters the same. • Carefully show how each letter of the ciphertext is derived. • For each answer, show how you would decrypt it. • Please type your homework, not handwritten. CS 101 – Feb. 13 • Review Python example – File I/O, loop on characters, change letters – Running the Caesar cipher • Messages sent by radio • Review polyalphabetic ciphers (Vigenère) – One time pad improvement • Please read Code Book, pp. 127-160 – covering Enigma Radio technology • No longer necessary to send message by – Line of sight – Messenger, mail – Physical wire • Enemy can just as easily hear your message – Increased demand for cryptography by WW1 • 1 more piece of information: traffic analysis – Can identify radio operators by how they taps Morse code – Take note of direction and strength of signal – Over time, deduce movement of battalions… Polyalphabetic • Vigenère cipher was good – How does it work? • Babbage’s cryptanalysis – We know that the key needs to repeat while enciphering. – Look for repeated substrings in ciphertext  key length – Once you know key length, you have n Caesar ciphers. Check the letter frequency of each to see how much the alphabet “shifted” • But also possible to break even if key is very long & no substrings repeat – Example pp. 117-119 continued • Observations – Let’s assume that the key is a real word or phrase. – Trial & error: find where “the” could be. See which locations yield possible words in key. “can” and “ypt” are plausible, but “bsj” is not. – When you find letters in the key that form part of a word, fill in the remaining letters of the word, to see if more of the plaintext can be deciphered. We know the plaintext has to contain real words. e.g. Egypt seems to work – Once you have a word of the key, see if it’s part of a logical pattern, such as a list of countries. Lesson • To use a Vigenère cipher effectively, the key: – Should be long, so that repeated letters in ciphertext are not obvious or do not occur – Should consist of random letters, because real words make it breakable, as we just saw. e.g. Inserting “the” at various places in the plaintext should not give clues about the key • The result is called a one-time pad – To get a really long key, you distribute a “code book” to your agents. – Japanese version: encode entire words. Instead of replacing with a random number, add to a random number Random key helps • Cryptanalyst is forced to try all possible keys to see which one generates meaningful plaintext • But, many possible messages are possible. – E.g. if message length is 21, then all possible messages of size 21 will be found. Impossible to tell which one is “correct” because key does not have any clues. • 3 problems with one-time pad – How to distribute it to the field, maintain security – How to create the random key – Too difficult to use under extreme conditions. In some cases, on a negative acknowledgement, people resend message in plaintext in frustration or desperation. CS 101 – Feb. 15 • Lab recap • Improvements to Vigenère cipher – We want a long, random key  – Doing it for Japanese – Automating the process: do it by machine Enigma, Typex, SIGABA • Please finish chapter 4 in Code Book String loop • Let’s understand the loop we used in steganography for i in range (0, len(text)): if i % 100 == 0 and i/100 < len(secretMessage): outFile.write(secretMessage[i/100]) else: outFile.write(text[i]) • Let’s interpret this in English • For every 100 characters of “text” we write 1 letter of the secret message. Ex. What happens when i equals 700? Japan’s code • One-time pad (e.g. JN-25) – Dictionary table: convert each word to a 5-digit number – Additive table: add the next random number to each word – Preface the message by indicating where in additive table you are starting the encoding – Tables may be periodically changed. – Example: encryption code book.xlsx Enigma • • • • • Arthur Scherbius, 1918 Used by Germany through WW2 Commercial and military versions Served as basis for other cipher machines How to use – Set machine to today’s starting position – Has usual keyboard – As you type, display lights up the ciphertext letter, which you need to write down – As you type, scramblers rotate to next position, i.e. to next letter of the Vigenère “key” Scrambler • Disk containing wires that connect each letter on keyboard with different ciphertext letter for display board. – You type “A”, may be connected to “D” – But then the scrambler rotates, so the next “A” may be connected to a different letter (p. 129) – Effect  Vigenère cipher with random key of length 26 • Two scramblers – Effectively means we have two keys. – Plaintext + key1 + key2 = ciphertext – Like odometer, second scrambler rotates less often Strength of cipher • Enigma used 3 scramblers out of a possible 5 – Length of key = 26 * 26 * 26 – Number of scrambler choices = 5 * 4 * 3 = 60 – (Similar American device “SIGABA” used 15 scramblers) • Plugboard – 6 pairs of letters were swapped before entering scrambler Ex. Change “A” to “B” before adding the cipher keys – This increases # of possible keys, making cryptanalysis more confusing • “Code book” tells operator what daily machine settings are CS 101 – Feb. 17 • Enigma: mechanized Vigenère • Breaking the code – If you can mechanize enciphering, why not mechanize cryptanalysis? – Intelligence on how Germans used Enigma – The work of Marian Rejewski and Alan Turing • Please read chapter 6 in Code Book thru p. 279 Breaking the code • What skills are needed to be a cryptanalyst? – Prepare for the worst: The Germans might make your work in the future more difficult. • Day key and message key • Four messages today might begin like this L O K R G M M V T X Z E J K T M P E D V Y P Z X • L and R are encryptions of the same letter, etc. Letter chains A B C D E F G H I J K L M N O P Q R S T U V W X Y Z F Q H P L W O G B M V R X U Y C Z I T N J E A S D K • Look for patterns! Example: A  F  W  A • All letters belong to some chain • The length of the chain is a fingerprint of the scrambler settings for today. • We separate out the problem of the letter identity • Bombe: automatically try all 263 settings. • Plugboard? Look for words that are almost spelled right. German mistakes • Repeating the message key • Not allowing a scrambler to be in the same position (left, middle, right) two days in a row • Plugboard cannot link 2 letters next to each other on keyboard • Predictable structure of message, time of message Alan Turing • Bletchley Park contributions – Extending the work of Rejewski – Relied on fact that weather reports start with word “wetter” and used this to deduce initial scrambler setting But what if you are off by a couple of letters? Fortunately, more help from German constraints. • Turing Machine – He was interested in theoretical boundaries of what a computer could accomplish – Machine has: instructions, internal memory, tape – Turing proved that not all problems can be solved CS 101 – Feb. 20 • New flavor of cryptography: numbers – Transpose bits – “add” bits – Data encryption standard • Problem of sharing keys… solved! – Diffie-Hellman protocol • Please finish reading chapter 6 in Code Book Binary • Computers represent all data, including text, in the form of binary numbers • p. 246: ASCII codes for capital letters • Cryptography can now be done at a minute level: manipulate individual bits • Transposition cipher – E.g. Every 3rd bit swaps with bit to its right – It’s possible the last bit of a letter swaps with first bit of the next letter Vigenère in binary • If we adapt the Vigenère cipher to the binary representation of text, we get XOR cipher • Example p. 247 • Line up the bits for the addition: 0+0=0 1+1=0 0+1=1 1+0=1 – In other words, if the bits are the same, result is 0. If bits differ, result is 1. There is no carry. • Data Encryption Standard – Performs many successive steps of transposition and XOR – Key length 56, later increased. Key problem • One lingering problem since antiquity: how do 2 people share the key privately? – Need to coordinate in person periodically, or – Hire a trusted 3rd party to relay your keys – Expensive • Whitfield Diffie’s vision – – – – People will need privacy online Handling keys in person only is impractical Is there a technological solution? Yes! Collaborated with Martin Hellman and Ralph Merkle at Stanford Math idea • Some functions are 2-way (easy to undo) • We need a 1-way function (hard to undo) so that an eavesdropper cannot discover the key – The 1-way function they used was “mod” or remainder calculations. • Example, f(n) = 3n mod 19? – Given a value of n, calculating f(n) isn’t too hard. – But given a value of f(n), there is no obvious way to determine n except trial and error! – In practice, the “19” should be replaced by a much larger number. Diffie - Hellman • Method for 2 people to establish a private key  • Choose values p (prime) and q • Sender – chooses secret value a, and computes A = qa mod p – Sends A, p, q – Eavesdropper cannot easily determine a • Receiver – Chooses secret value b – Computes B = qb mod p and K = Ab mod p – Sends B back to sender, who can compute K = Ba mod p • Both methods of computing secret K are equivalent – Ab mod p = (qa)b mod p – Ba mod p = (qb)a mod p CS 101 – Feb. 22 • Recap Diffie-Hellman protocol – what is it? • How to calculate something like qa mod p? • Asymmetric cipher system: RSA • Please read Piper: pp. 53–56, 70–71, all of Chapter 6 “mod” properties • • • • (In Python, the symbol for mod is %) x mod y is always less than y If x < y, the answer is x. If x is a multiple of y, the answer is 0. • The 10th day of the year is in which month? – What about the 20th day, the 40th, the 100th ? These are essentially mod calculations, except that the month lengths are not quite the same! • What time is it, 75 hours after midnight? Asymmetric ciphers • Perennial problem in cryptography: key security • One solution is Diffie-Hellman protocol • Diffie thought of another approach – Use separate keys for encrypting and decrypting (this is why it’s called asymmetric) – Each person has a public encryption key, and a private decryption key – These keys don’t need to change often – No messages need to be passed to compute value of keys. It can be arbitrary & predetermined. Challenge • What kind of math formula can support asymmetric ciphers? • Goal: we want secure decryption key, but publicly known encryption key. • RSA system – The key value N is the product of two large prime numbers N = pq – Given N, there is no algorithm, other than trial and error, for determining values of p and q. – Ideally, N should be very large, e.g. hundreds of digits. – See Appendix J. RSA • Choose secret & distinct 512-bit random primes p and q • Let N = pq, and let M = (p – 1)(q – 1) • Choose (another) public encryption key e: a value less than and relatively prime to M. – Message is x. Sender transmits: y = xe mod N • Choose private decryption key d: where ed mod M =1 – e and N are public; outsider should have a tough time factoring N to obtain p and q to determine d – Recipient converts: z = yd mod N which should equal x. Example • Choose secret primes p,q • N = pq; M = (p – 1)(q – 1) • Choose e < & relatively prime to M. • Message is x. Compute and send y = xe mod N • Pick private decrypt key d where ed mod M = 1 • z = yd mod N, which should equal x. p = 31, q = 41 N = 1271, M = 1200 e=7 x = 12 y = 127 mod 1271 = 1047 d = 343 z = 1047343 mod 1271 = 12 It works! CS 101 – Feb. 24 Topics in cryptography • “Perfect secrecy” • Hash functions • Practical considerations • For Monday, please read Piper chapters 7 and 9 Perfect secrecy • Purpose of cipher system is to protect you in case an eavesdropper finds your ciphertext • But, even presence of the message itself may be enough information for the enemy – Eve will try to guess plaintext even if no ciphtertext – Don’t be too predictable! Advice • Very short message (e.g. 1 char) can’t be broken with any certainty • Try to have as many possible keys as possible messages. (key length >= message length) • Cryptography is just 1 facet of overall security – If you notice someone is trying to guess your PIN or password, how can you tell? What should happen? Hash functions • Many applications in CS – An efficient way to store lots of data, for easy retrieval later – Quickly determine whether data has become corrupted – Provide password security • It’s a 1-way function: used to encrypt passwords – When you enter your password, p, the system computes value of e(p) – Your correct password, c, is already stored in encrypted form e(c) – If e(p) == e(c), then you are “authenticated” Hash functions (2) • Why do it this way? – If password file is compromised, thief really has no information – If you lose your password, no one can retrieve it  very secure. In this case, a new password is generated • One way to do it: – – – – Multiply first letter value by 3 Add 2nd letter value. Multiply answer by 3. Add 3rd letter value. Multiply answer by 3. Continue… At each point, take mod by a large prime ~ 1 billion Hash functions (3) • Another purpose: provide digital fingerprint to a file • Associate with each file a single numerical value, like a serial number – For example, the file size – There is a slim chance that 2 files will map to the same number. • Quick way to determine if your files have been altered or damaged. Saves space. Collisions • When using a hash function, it’s possible for two items (e.g. message/file) to have same hash value. – Two passwords might be treated the same, e(p1) versus e(p2). • Issues: – Do you want to design a hash function that will handle all collisions? Or, is it acceptable to have a tiny probability of collision? – Techniques to handle collision, e.g. add 1 Practical considerations • One-time pad & RSA are really good cipher systems – But only if used properly. 3 is not a good choice of a prime number. • Better security requires higher cost – More computational time, more memory may be needed • Questions to consider – Do you really need to encrypt it? – Is the information confidential forever, or might it become obsolete soon? – What is the cost of compromised data? Breakability • Cryptanalysis often relies on “exhaustive key search.” What does this mean? • How does Moore’s Law relate? • What can an attacker do to speed up a search? • RSA & Diffie-Hellman rely on unsolvability of certain number theory problems – (we have faith in them) – But will someone discover short cuts? • We should be seeking out better 1-way functions just in case! Learn from history. CS 101 – Feb. 27 • Uses of cryptography – – – – – It’s not just to thwart eavesdroppers Data Integrity Authenticating people or data Non-repudiation Examples in chapter 9 • For W, Please read last chapter (8) in Singh • Lab today on RSA Data integrity • • • • A hacker might not care what your message says Wants to modify / corrupt it Just encrypting your data won’t help Need to – Recognize genuine data, unchanged from the source – Detect unauthorized access or change (security) • Preventing the attack might not be necessary – We have backups of the data Attacker strategy • Look for a long file/message to decrypt • Cryptanalysis • Once you discover the key, insert your own message in the original’s place • How to make attacker’s job harder: – During the message, modify the key so that it depends on an earlier part of the message – In effect, you are sending several messages, each with different keys Authentication • “How do I know it’s from you?” • 2 types: for “entities” and data • Ways to authenticate users: – Something known (password) – Something possessed (physical key/token/file) – Intrinsic property (retina, static IP address) • 1-way & 2-way authentication – When you set up your account, you can ask bank to always display some message when it asks for password – Fake login scams Data integrity • A sends message to B. (p. 95) – B wants to be assured data hasn’t been altered. – B wants to verify it really came from A, not someone else. • Some encryption algorithm is used as the “authentication function,” with a key • Both A & B share a secret key – very common in cryptography – Can be set up ahead of time or with Diffie-Hellman • A computes message’s authentication value, and appends it to message • B performs same computation for verification Digital signature • Data integrity with extra feature: proof of origin – Sender can’t later deny sending message! – In ordinary authentication, an imposter could have used Diffie-Hellman to pretend being A. • DS relies on sender’s public and private keys – The two keys are mathematically related – Sender must use private key to compute the message’s digital signature (analogous to authentication value) – Receiver uses sender’s public key to verify origin – See p. 98. – Why is the message “hashed”? Attacker strategy • Impersonating you: “Hey, I just changed my public key.” – Computes new public and private key, and claims they belong to you, so he really looks like you online – Can now perform digital signatures just like they came from you • Response: 3rd party trust – Your public key needs to be verified by a Certification Authority, known to your Web browser and the receiver of your message. – E.g. Verisign Case Study: ATM • Features both privacy and data integrity • Attacker’s plans – – – – Impersonate you Intercept communication to bank Alter amount of money bank knowing Duplicating transaction without bank knowing CS 101 – Feb. 29 Imagine the future of cryptography… • What if there were no such thing? – i.e. encryption futile because every computer has capability to break the code – Quantum computer: test every possible key simultaneously • Other extreme: What if there existed an unbreakable cipher system? – Quantum cryptography: securely establish a random key • (This view of future depends on our current understanding of physics) Intro notes • Some research is done in secret – We may be unaware of state of the art • Cat/mouse game – Cryptographers want better cipher system – Cryptanalysts want better computer • There is more to security than just finding/breaking a secure message – – – – Knowing message exists may be enough information Knowing source or destination may be enough Much traffic is poorly or not encrypted anyway Cipher system software may be hacked (e.g. backdoor) Quantum computer • Similar to idea of parallel computing • No one has built one yet, still a theory (or Sci Fi) – Charles Babbage had the same problem • What does “quantum” mean? – When you are not looking at something, it may exist in different states simultaneously. – Sounds bizarre, but idea works well in physics already. • HW trend: chips keep getting smaller – So, let subatomic particles represent basic units of information, the 0 and 1, depending on its spin. continued • 0 and 1 are different spins of a particle • Hit a particle with burst of energy to change it • Hit it with less energy: enters indeterminate state  enters both (i.e. 2) states! • Hit n particles with low energy  can represent any number from 1 to 2n. • Ex. Which number from 1 to 128 has a certain property? – Create a number x consisting of 7 particles. – Bombard x with low energy. Meanwhile, tell x to test itself. – The correct x will print itself as the solution. Challenges • HW: We don’t even know if we can create a quantum computer • SW: How to write a program for such a machine – Need to maintain indeterminate state for long enough for the computation to finish – Need to conceive of a new way to write programs – Some success (p. 330) – But we still can’t test it, since no machine exists • If we overcome these problems, what do we have? – The means to try all keys simultaneously – Can quickly break anything Quantum cryptography • More realistic possibility • Based on vibration or “polarization” of photons • When we transmit light particles, we can set orientation of each, e.g. the 4 directions: | – \ / –  a means of encoding messages • Receiver detects orientation of photon using a Polaroid filter – We’d like a perfect filter: e.g. a vertical one accepting | photons and rejecting all other types. – No such luck! 50% of / and \ get through due to physical nature of light. These will look like | too: p. 333. continued • Suppose you set up a | filter. – If a photon passes through, you only know it is not – . – If a photon is blocked, you only know it is not | . – Same kind of problem with any other filter. • In order to be sure, you have to know the orientation of the filter to use. – If you know a photon is either | or –, you can use a | filter. – If you know it’s either \ or /, then you can use a / filter. – Sender may send | – \ or /: choice of orientation may change on each photon. – List of orientations (rectilinear or diagonal) becomes key! Message • To send a message: – – – – – Convert it to binary, e.g. 101100110011 … For each bit, also select photon orientation (|– , or \/) So, each bit of your message is one of 4 possible values In effect, the message key is as long as the message. See pp. 340, 342, 343. • Eavesdropper (Eve) has to guess at orientation of each photon. – Often wrong, e.g. passing thru | could mean | or \ or /. • Alice & Bob need to establish the key securely – Follow example, and try card analogy (pp. 344-345) Making the key Alice wants to send 101100110011 Arbitrarily chooses directions ++X+XXX+X++X Therefore she transmits |-/|\\/|\-|/ Bob guesses directions +X++XX++X+XX Bob observes thru filter: 100100110001 (When was Bob lucky?) _yn___y___n_ After sharing orientations, which bits do we keep? 1--100-100-1 • Eve doesn’t know which bits to discard.  • Key will be random, so can use it in a 1-time pad • Method has been shown to work for short distances. CS 101 – March 12 Start of unit on representing information • Bits and bytes • Numbers: – interpreting: binary  decimal – encoding: decimal  binary • If time permits, shorthand ways of writing binary Bits • All information in computer is in binary form (0/1) • Smallest unit of information is the bit: a single 0 or 1 • When we have lots of bits, usually grouped in set of 8 called a byte • Basic building block of CPU is the logic gate, which manipulates bit values. – Very fast – Logic gates combine to perform math operations Numbers in binary • Place value system just like decimal – We understand 278 = (2 * 100) + (7 * 10) + (8 * 1) • In a binary number: – Each digit is either a 0 or 1 – Digits are multiplied by powers of 2, not powers of 10. • For example, 001110 and 100011: 32 * 16 * 8* 4* 2* 1* Value 0 0 1 1 1 0 14 1 0 0 0 1 1 35 Powers of 2 • • • • • 20 = 1 21 = 2 22 = 4 23 = 8 24 = 16 … • 210 ~ 1 thousand • 220 ~ 1 million • 230 ~ 1 billion • Let’s say we have 4 bits. – What is the lowest # ? – What is the highest # ? • What if we had 5 bits? • Is there a pattern? Decimal  binary • One thing to note is that binary numbers are “longer” than decimal. – A 5-digit decimal number may turn out to be 15 bits long. • My technique is the “binary store” – – – – All merchandise is priced $1, $2, $4, $8, $16, … You enter store with some money, say $45. Goal is to always buy most expensive gift possible. So, 45 = 32 + 8 + 4 + 1 32 * 16 * 8* 4* 2* 1* 1 0 1 1 0 1 Another example • Convert 61 to binary: • Go to binary store with $61… 61 = 32 + 16 + 8 + 4 + 1 Another way to write this is: 61 = 25 + 24 + 23 + 22 + 20 • Our binary answer is 111101. CS 101 – March 14 Data representation • Shorthand of writing binary: hexadecimal notation • Text – ASCII and Unicode • Sound • Images Numbers in a byte • A byte is 8 bits • So, how big can an 8-bit binary number be? • Hexidecimal shorthand – 8/4 = 2 hexidecimal digits per byte – What do the letters ‘a’ – ‘f’ mean? a = 10, b = 11, c = 12, d = 13, e = 14, f = 15 – Example: 010111102 = 5e in hex. – Try this one: 1110002 = ______ in hex. – Try this one: a4c in hex = ________ in binary. Text • Fundamental unit is the character. • Each character of a text document is given a numerical code. • ASCII code – Contiguous (make it easy to alphabetize) – Case sensitive – One byte per character • ASCII table – ‘A’ = 65 ‘a’ = 97 ‘0’ = 48 – Try encoding the word: “Dog” Unicode • To support foreign alphabet and misc. symbols. • Extension of ASCII • 16 bits per character, rather than 8 • unicode.org has code charts • Codes are given in hex. CS 101 – March 16 • Sampling – Frame rate in a video – Sound representation • Images – General concepts – Grayscale (black & white) – Color Sampling • “Real” sound and visual data are continuous, constantly changing • Sampling means to take rapid snapshots • Video: 30 images a second is good enough for our eyes • Real sound is in the form of a wave • Sampling sound means finding points along the curve. – Music CD: take a reading 44,100 times a second, and store as a 16-bit number… How much data is captured in 1 hour? – MIDI (= Musical Instrument Digital Interface) uses far less space, though does not sound like an actual recording. Images • Fundamental unit is the pixel • Usually 8 bits (1 byte) per pixel – This means each pixel is assigned a value from 0 to 255 – What do these numbers mean? Depends on color system – Grayscale = system for B/W images • Image dimensions are (horiz x vert) – Ex. 400 x 300  120,000 pixels • Aspect ratio – When changing size, this should not change. Resolution • Resolution – total number of pixels in image – “hi res” takes up more space – “lo res” means pixels become more obvious, pixelated Dynamic range • Dynamic range – how many colors / how many shades of gray – High dynamic range: more bits per pixel – Low dynamic range: may obscure features B/W vs. color • B/W: usually 1 byte per pixel – Each pixel = grayscale number 0-255 – Ex. 180 is a brighter shade of gray • Color: usually 3 bytes (24 bits) per pixel – Each pixel has 3 values, each 0-255 – Ex. (200, 50, 128) = ? – Most common scheme is RGB, where each pixel has a red #, green #, and blue #. RGB system • Based on primary colors for light • (red, green, blue) • Examples – Black = (0, 0, 0) – Purple = (75, 0, 100) – White = (255, 255, 255) • How about (x, x, x) or (0, 0, x) ? RGB examples Color R G B Black 0 0 0 White 255 255 255 Red 255 0 0 Green 0 255 0 Blue 0 0 255 Cyan 0 255 255 Magenta 255 0 255 Yellow 255 255 0 Q: How do we get “other” shades of blue? Indexed color • Do we really need 24 bits to represent color of one pixel? – This means we allocate 16,777,216 colors! – About 200 would be more practical • Indexed color is a “compressed” RGB – 6 values of each primary color, not 256 – Use hex values 00, 33, 66, 99, cc, ff – This is the color system used on the Web. • 1 byte per pixel instead of 3 • Use “dithering” to simulate in-between colors. CS 101 – March 19 • Review indexed color – Restricted values of each primary color – Dithering • Adding binary numbers • Integer rep’n in general – Unsigned – Signed – Biased  the most important of the 3 Binary addition • Analogous to decimal addition you know • Only a few cases to consider – just watch out for carry. – 0 + 0 = 0 (no carry) – 0 + 1 = 1 (no carry) – 1 + 1 = 10 (sum = 0, carry = 1) – 1 + 1 + 1 = 11 (sum = 1, carry = 1) • Example, 6-bit addition: 001110 + 001100 – Can check our answer in base 10 • Overflow: correct answer is beyond possible range Integer rep’n • How do we represent integers inside the computer? – Scheme I: unsigned – Scheme II: signed (a.k.a. Two’s complement) – Scheme III: biased (a.k.a. Excess notation) • Scheme I: Unsigned – This is the scheme you already know. – Cannot handle negative numbers. – For n bits, possible range is 0 to 2n – 1. • Scheme II: Signed – Basic idea: half of the representations should be negative. – Ex. For 5 bits, 16 of the 32 values are negative, so the range goes from –16 to +15. – For n bits, possible range is –2n–1 to 2n–1 – 1. Signed rep’n continued • How do we represent a number in signed? • If positive, same as unsigned.  – Ex. 6-bit signed rep’n of 13 is 001101. – Ex. 6-bit signed rep’n of 31 is 011111. (the largest #) • If negative: 3 steps to represent –x: 1. Find rep’n of +x. 2. Invert the bits. 3. Add 1. • Try some examples of negative numbers, and check answers. CS 101 – March 21 • Bits have no inherent meaning • To represent information, need to know: – Representation scheme + number of bits • Integer representation – Unsigned √ – Signed (review) – Biased Closer look… • In 5-bit unsigned… – Smallest number is 00000 (= 0) – Largest number is 11111 (= 31) • In 5-bit signed… – Smallest number is 10000 (= –16) – Largest number is 01111 (= 15) • Given a bit pattern, its signed and unsigned values differ by how much? – Try some examples. • In signed: – Leftmost bit is the sign bit. – Positive #’s have same rep’n as unsigned. – Technique for –x doesn’t work for lowest number. Special case. Signed + and – • Signed + is like unsigned. • Watch out for overflow. – – – – The correct mathematical result can’t be represented. (Pos) + (Pos) = (Neg) (Neg) + (Neg) = (Pos) Example: 01111 + 00001. • To subtract, add the opposite.  Example: 10111 – 00111 – First, –(00111) = 11001 – Turn into addition problem: 10111 + 11001 = __________ – Is there overflow? Scheme III: biased • Another way to represent integers that allows for negatives. • (It will soon help us see how real numbers are stored.)  • The “bias” is the number we subtract from unsigned range. – If B is the bias, the lowest number is –B. • When working with a biased rep’n, you have to be given the bias. – Ex. For 6-bits, bias is typically 31 or 32. – Ex. For 8-bits, bias is typically 127 or 128. • So, a “6 bit biased-31 rep’n” is based on 6-bit unsigned, except that 000000 is now –31 instead of 0. How to convert • How do we represent a number n in biased (B)? 1. 2. – Add the bias: n + B. Determine the unsigned rep’n of this number. Example: What is the 6-bit biased-31 rep’n of –9? • It’s the same as the unsigned rep’n of –9 + 31 = 22. • 22 in 6 bit unsigned is 010110. • How do we convert a biased number back into base 10? 1. 2. – Interpret the number as unsigned Subtract the bias. Example: 101010 is the 6-bit biased-31 rep’n of what number? • If unsigned, 101010 = 32+8+2 = 42. • 42 – 31 = 11. CS 101 – March 23 • Real numbers: – How to express in binary – Binary scientific notation – Floating-point representation Need to learn how to convert both ways. • Handout on floating-point representation Integer vs. Real • Integer arithmetic on computer is quick & exact, but has limited range. • Real arithmetic needs wide range, and a reasonable degree of precision. – Scientific / numerical computation – 14 significant digits is usually enough! • Skills: – Converting a base-10 real number into binary – Actual representation relies on “scientific notation” Examples • Consider this sequence: 111 = 7 1110 = 14 11100 = 28 111000 = 56 1110000 = 112 • Going the other way… 111. = 7 11.1 = 3.5 or 7/2 1.11 = 1.75 or 7/4 .111 = 7/8 .0111 = 7/16 See the pattern? Each digit corresponds to (+ /–) power of 2. Convert to binary • Separate real number (e.g. 5.7) into integer and fractional parts • Integer part  use binary store. • Fractional part: – Keep multiplying fractional part by 2 until it becomes zero, or until you have a repeating pattern. • Example: 9.625 – Integer part 9 becomes “1001” – Fractional part is 0.625: .625 * 2 = 1.25 .25 * 2 = 0.5 .5 * 2 = 1.0 Fractional part reaches 0. So our answer is 1001.101 Repeating pattern • Let’ try converting 0.7 to binary: .7 * 2 = 1.4 .4 * 2 = 0.8 .8 * 2 = 1.6 .6 * 2 = 1.2 .2 * 2 = 0.4 .4 * 2 = 0.8 • “Aha!” The pattern tells you which digits repeat. ____ • Answer is 0.1 0110 0110 0110 … or .10110 Real number rep’n • • • • Also called “floating point” Size is 32 or 64 bits: “single” vs. “double” precision Based on binary scientific notation Let’s look at single precision: – 1 bit for sign (0 = positive, 1 = negative) – 8 bits for exponent (expressed in biased-127) – 23 bits for mantissa • Big mantissa  precision is most important feature CS 101 – March 26 • Floating-point representation – See handout • Single vs. double precision • Consequences of the representation Example • We saw earlier that 9.625 = 1001.101 in binary. Let’s continue with the true representation. • Sign: 1 bit. Since it’s a positive number, we have 0 • Exponent: 8 bits – If we write 1001.101 in binary scientific notation, we get 1.001101 * 23. – The exponent 3 is expressed in 8-bit biased-127. In other words 3+127=130 in unsigned: 10000010 • Mantissa (23-bits): We only store the fractional part of the mantissa: 001101. Remaining bits are zero. • Final answer: 0 10000010 (1) 001101 017 Decoding • Let’s see if we can decode a real number: 1 10000001 (1) 011 020 • Sign: “1” means a negative number • Exponent: “10000001” looks like 129 in unsigned. But in biased-127 it is 129 – 127 = 2. So our number is something multiplied by 22. • Mantissa: 1.011 in binary = 1 + 1/4 + 1/8 = 1.375 • Combine all 3 parts: –1.375 * 22 = –5.5 Some thoughts • In single precision… – 8 bit exponent  256 possible exponents Highest number ~ 1038 Lowest number ~ 10–38 – 23 bit mantissa  ~ 8 million exact real number per power of 2 We have about 7 significant digits • Comparing with double precision – What can we conclude? Precision Sign Exponent Mantissa Single 1 8 23 Double 1 11 52 CS 101 – March 28 • Finish real number rep’n – Properties, consequences – Numbers that cannot be represented • Data compression – Compress what? Text, images, sounds – Huffman code Text compression • Goal is for a document to take up less space • Techniques – Keyword encoding: replace common words by special symbols like   ╞ – Run-length encoding: replace repetitions with a number: “pppppppppppppp”  [14p] – Huffman code: common letters should take up fewer bits Huffman code example • Suppose you want to send a message, and you know the only letters you need are A, D, E, L, N, P, S. • A Huffman code might look like this table: A D E L N P S 001 100 01 101 0001 0000 11 • How would you decode this message? 01110000101001000100110001 How to create code  • We’re given the set of letters used for the message, and their frequencies. – Ex. A = 5, B = 10, C = 20, D = 25, E = 30 – Ex. P = 5, N = 10, D = 10, L = 15, A = 20, S = 20, E = 30 • It’s convenient to arrange the frequencies in order. • Group the letters in pairs, always looking for the smallest sum of frequencies. • The resulting structure is a “tree”. Each left arm = “0” in the code; each right arm is a “1”. CS 101 – March 29 • How to create Huffman code • Dictionary encoding • How to compress – Image – Video How to create code  • We’re given the set of letters used for the message, and their frequencies. – Ex. A = 5, B = 10, C = 20, D = 25, E = 30 – Ex. P = 5, N = 10, D = 10, L = 15, A = 20, S = 20, E = 30 • It’s convenient to arrange the frequencies in order. • Group the letters in pairs, always looking for the smallest sum of frequencies. • The resulting structure is a “tree”. Each left arm = “0” in the code; each right arm is a “1”. Dictionary encoding • Convert each word to a number • Represent this number in binary • If 50,000 words in dictionary, we can represent each with 16 bits (2 bytes) since 216 > 50,000 • A lot shorter than the average word • Hmmm, sometimes a cipher system can be used for compression.  Image compression • RGB 24-bit color represented as (huge) bitmap file *.bmp • Most of the time, compressing an image is “lossy”, meaning that uncompressing won’t restore original .bmp information • GIF compression uses indexed color • JPG entails several steps – Make tiny modifications to the image, so that neighboring pixels will have more uniform values. For example, (10, 11, 12, 90)  (11, 11, 11, 90) – Use text/numerical compression techniques like run-length encoding. MPEG • Motion Picture Experts Group • Industry standard for compressing multimedia • Note that much information in consecutive frames is the same – Effectively doing run-length encoding in 3-D • Delete sound information that humans can’t detect • Goal: to make streaming video possible: 30 frames per second at a minimal DSL connection – Ex. 5-minute 300x200 video ~ 12 MB CS 101 – April 2 • The last word on representing information: • Error detection • Error correction Transmission errors • When you send data over a network, there could be rare random flipping of bits. • Error Detection • Error Correction • One method of detection is using a parity bit – – – – Add 9th bit to each byte during transmission Goal is that each byte has even # of 1’s Receiver checks each byte. … Catches many but not all errors. 2-d parity 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 1 0 1 0 0 1 1 0 1 0 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 The 9th byte is called a check byte. Error correction • Useful if you think there may be a lot of potential errors, such as a noisy transmission medium. • Devise a “code” so that each symbol’s bit pattern is quite distinct from all the others. – In practice, this means longer codes. In other words, the 8-bit ASCII code would not be enough. • One technique: Hamming code – Idea for assigning code is Hamming distance: comparing codes, count how many bits differ. – When you receive an erroneous code, see which letter it’s closest to. Then you can make a correction.  – Example … Example • Here is a simple error-correcting code. Letter Code A 000000 B 001111 C 010011 D 011100 E 100110 F 101001 G 110101 H 111010 • How would we interpret the bit string 010100 ? CS 101 – April 11 • Building blocks of a computer system: • Using bits – Binary data and operations – Logic gates • Units of measuring amount of data • CPU vs. memory Binary • All information inside computer is in binary • Smallest unit of data is the bit • Only the values 0 and 1 are used 0 means “false” or “off” or the number 0 1 means “true” or “on” or the number 1 • Individual bit values can be manipulated with Boolean operations: “and”, “or”, “not”, etc. – In hardware, we implement these operations with logic gates. Boolean examples • AND – To graduate, you must have 128 credits and 2.0 GPA. • OR – Classics scholarship requires 3 years of Latin or 3 years of Greek. • XOR (“exclusive” or) – To go to Cincinnati, you can fly or drive. In other words, it doesn’t make sense to do both. – Do you want a 2-door or a 4-door car? • NOT – If a statement is true, its negation is false, and vice versa. Gates • Basic building blocks of CPU’s circuitry. • Usually 2 inputs. • X and Y could be 0 or 1. • Combining gates into a circuit: – The output of one gate becomes input to another. – This is how more useful operations are performed. ‘AND’ and ‘OR’ AND OR X Y ans X Y ans 1 1 1 1 1 1 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 0 0 Note: 0 AND (anything) = 0 1 OR (anything) = 1 XOR • XOR basically says, “either but not both” • The output is 1 if both inputs are different. XOR X Y Ans 1 1 0 1 0 1 0 1 1 0 0 0 NOR, NAND • NOR gate – Negation of the OR – Same as feeding output of OR into a NOT gate. – Symbol for NOR gate is same as OR but with a loop on the end. • NAND gate – Negation of the AND…. analogous to NOR. • Interesting property: – NOR and NAND are universal gates. Any other boolean operation can be implemented by using several NAND’s or several NOR’s. Units of data size • Bit = a single 0 or 1 value • Nibble = 4 bits = 1 hexadecimal digit • Byte = 8 bits • • • • Kilobyte (KB) = 210 bytes Megabyte (MB) = 220 bytes Gigabyte (GB) = 230 bytes Terabyte (TB) = 240 bytes • 210 = 1024, though 1000 is a close approx. CPU and memory • CPU’s job is to obey instructions and do calculations • Memory system stores information for current and future use – CPU has tiny number of “registers” for calculations – main memory (RAM) stores all files currently open – Secondary memory (e.g. hard drive) is for long-term storage of files – Backup system: tape, external hard drive • Other types of memory: – Cache, between CPU and RAM – Removable drive, e.g. USB or DVD RAM • Runs on electricity: volatile but fast • Each byte is numbered and addressable – Capable of holding a single character or small # Address Contents 0 “c” 1 “a” 2 “t” 3 9 4 25 5 100 … … CS 101 – April 13 • Contrast between levels of memory – Tradeoff between cost / size / speed • Manipulating data by performing instructions • “What is going on in the CPU?” • Handout – A simple machine language Memory comparison Type Size Access time Cost per MB CPU registers 256 bytes 1 ns N/A Cache 64 KB 2 ns $ 20 RAM 512 MB 50 ns $ 0.20 Disk 200 GB 100,000 ns $ 0.0002 Numbers are approximate. “ns” means nanosecond = 1 billionth of a second Why are disks slow? Analogous to a record player: • Seek time: read-write head moves to find the correct track (up to ~ 8ms) • Latency: wait for disk to rotate to beginning of file (up to ~ 4ms) • Transfer: grab info from disk (e.g. 1 MB/sec read or 0.1 MB/sec write) Basic computer anatomy • Inside a computer are 2 parts – CPU – Memory – These are connected by a data bus: an “HOV lane” where traffic can go either way. • CPU contains: – ALU: arithmetic and logic unit – Control unit: figures out what to do next – Registers to hold values needed for calculation • Memory (RAM) contains: – Software: list of instructions the CPU needs to perform – Data: Input and output values need to be stored while program runs Stored program idea • Program = software = list of instructions for CPU to do • Programs reside in memory • CPU will do 1 instruction at a time • For each instruction, we do the following: – – – – Fetch it from memory Decode – figure out what it means Execute – do it And then we continue with the next instruction… until the program is finished. Simple example • A program to add two numbers. • This program may reside at bytes 100-116 in RAM. • The two numbers we wish to add are located at bytes 200 and 204 in RAM. • We want the result to go into memory at byte 208. • Program may go something like this: – – – – Load the value at Memory[200] into register 1. Load the value at Memory[204] into register 2. Add registers 1 and 2, and put result in register 3. Store the value from register 3 into Memory[208]. • Note that the bus is communicating instructions (RAM to CPU) as well as data (both ways). Machine language • Unfortunately, instructions for CPU can’t be in English, French, etc. • Machine language = binary (or hex) representation of our instructions. – Each type of computer has its own machine language. • This is the original form of “computer programming”. • Verbs: Instruction set. e.g. Add, subtract, load, store… • Nouns: Operands such as: registers, memory locations, constants, other instructions Verbs 3 kinds of instructions (instruction set) • Data transfer, using the bus – Load a value from memory into a CPU register Very similar to fetching an instruction! – Store a value from a CPU register into memory • ALU – Bit manipulation: AND, OR, XOR, NOT, shift left, shift right, … – Arithmetic: add, sub, mul, div, remainder, =, <, >, , , ≠, … • Control – “Go to” another instruction in program. In other words, interrupt normal sequence of instructions. – Can be conditional or unconditional Example language • Let’s consider very simple HW. • 256 bytes of RAM: addressable by 8 bits • CPU contains – Instruction register (to store contents of instruction) – Program counter (to indicate instruction’s address) – 16 general purpose registers: addressable by 4 bits • Each register is 1 byte • Each instruction is 2 bytes = 16 bits = 4 hex digits long • Instruction format: – First 4 bits are the opcode = specify which instruction type – Other 12 bits are operand(s) • What do instructions mean? CS 101 – April 16 • Machine language examples – Don’t memorize… • Instruction execution • Operations in instruction set – Performing arithmetic sometimes requires load / store instructions in addition to the arithmetic instruction – Instructions to manipulate bits directly Example instructions • Note: 16 possible opcodes: 4 bit opcode • Note: 16 possible registers: register number also 4 bits • Opcode 5 is used for adding – Expects 3 register operands – 5RST means R = S + T, where R, S and T are register numbers – Ex. 5123 means Add registers 2 and 3 and put result in register 1. • Opcode 2 is for putting a constant in a register – Expects a register operand, and an 8-bit constant operand – 2RXX means R = XX, where XX is some 8-bit pattern – Ex. 27c9 means Put the hexidecimal “c9” into register 7. • Try an example using both types of instructions. More instructions • Opcode 1 is for loading a memory value into a register – Expects a register operand (4 bits), and a memory address from which to load (8 bits). – Ex. 1820 means to go out to memory at address [20], grab the contents and load it into register 8. (It does not mean put the number 20 in register 8.) • Opcode 3 is a store = opposite of load – Ex. 3921 means to take the value in register 9, and put it into memory at location [21]. (It does not mean put the number 9 into memory location 21.) • Opcode C (hex code for 12) is for telling CPU it’s done. – Expects operand to be 12 zero-bits. Some practice Refer to handout… • How would we put the number 64 into memory at address 12? • How would we add the numbers 6 and 8 and put the result in register 1? • How would we add register 7 to register 5 and put the answer in memory at address 32? Execution • In our example, each instruction is 2 bytes long. • Program counter (PC) begins at address of first instruction. • For each instruction: – Fetch (and increment PC by 2) – Decode – Execute • Note that RAM contains both instructions and data, separated from each other. For example, addresses 0-99 could be reserved for code. Logic operations • Work just like gates, but we do several bits in parallel. • Examples 10101110 01101011 AND 11110000 AND 00011111 • Try the same examples with “OR” and “XOR” • Observations: – What happens when you AND with a 1? With a 0? – What about OR’ing with a 1 versus a 0? – What about XOR? • ASCII code: how do you capitalize a letter? CS 101 – April 18 • Operations that manipulate bits directly – Logical (review) – Shift • The last topic = operating systems – Appetizer today: data transfer speeds Logic operations • Work just like gates, but we do several bits in parallel. • Examples 10101110 01101011 AND 11110000 AND 00011111 • Try the same examples with “OR” and “XOR” • Observations: – What happens when you AND with a 1? With a 0? – What about OR’ing with a 1 versus a 0? – What about XOR? • ASCII code: how do you capitalize a letter? Shift operations • Given a bit pattern like 00011100, we can shift the bits left to obtain: 00111000. • If we shift to the right instead, 00011100 becomes this: 00001110. • We can even shift by more than one position. – Shifting 01010000 by 3 bits right  00001010. • Sometimes when we shift, 1’s fall off the edge. – Shifting 01010000 by 2 bits left  01000000. • When we shift, the “vacated” bits are usually 0. Why shift? • One application of a shift operation is to: – Multiply by 2: left shift – Divide by 2: right shift – Try some examples – should look familiar with our earlier work on binary numbers. • One funny exception: dividing a (signed) negative number by 2. We need a different operation: arithmetic right shift. – In this case, we want the vacated bit to be 1 – Example: –12 in signed is 11110100. If we shift right by 1, we get 01111010, but it should be this: 11111010. Rotate • Rotate operations work the same as shift… except that the vacated bits come from the other end of the number. • So, instead of 1’s falling off the edge, they rotate. • For example, 01010000 rotated left by 2 becomes 01000001. • Also: 00001111 rotated right by 3 becomes: 11100001. Summary • Here is a list of bitwise operators: • Logical – and, or, xor, not • Shift – – – – – sll (Shift left logical) srl (Shift right logical) sra (Shift right arithmetic) rol (Rotate left) ror (Rotate right) Communication rate • Used for both internal & network communication. • Units – “baud” = bit per second – Kbps, Mbps, Gbps – If there’s overhead/noise, figure on an average of 10 bits per byte, so 1 Mbps = 100 KB per second. – I can read a 20 MB file from USB drive in 2 seconds. What is the bit rate? • Voice telephone line – limited to 57.6 Kbps • DSL, cable modem: “broadband” 50-100+ times faster – Uses more of the sound spectrum; and data compression CS 101 – April 20 • Operating Systems – – – – – – definition origin responsibilities relationship with other software booting processes Original purpose • Streamline process of doing jobs on the computer. • Reduce overhead spent between jobs. • Batch processing: – Plan day’s jobs in advance – Give jobs to a computer operator – Job Control Language: enter commands to the OS • Computer operator not practical – Confidentiality – Some applications like a game are inherently interactive; require fast turnaround Utilization Historically… • CPU time used to be very expensive • 2 ways to run jobs – Interactively – In batch mode. This is useful if jobs must complete by some deadline in a real-time system. • Time sharing – Many users, but just 1 expensive machine. – Do not let any one job monopolize machine. – Have several jobs running at same time; give each a turn at the CPU: multi-programming – Today: 1 user has many jobs per day – use same technique Kinds of software • Application – fun & useful stuff for people – Hearts, Excel, Firefox, e-mail, PPT, … • System utilities – programs that make the computer more useful, added features – Formatting a disk, compress data, play DVD • OS shell – accept commands from user; display error message • OS kernel – major responsibilities; manage resources • Note: distinctions can be blurred – the distinctions among the categories are not exact Responsibilities (1) • Security – Require many password combinations – Penalties for mistake – Super user & diagnostic tools to detect abnormal activity • CPU – Be aware of all currently running programs – Synchronization: make sure 2 executing programs don’t interfere with each other – Scheduling: deciding which program to execute now • Memory – Decide which files (or portions of files) should be in RAM – Virtual memory: shuffle pages in and out of RAM to give illusion that RAM is bigger than it really is Responsibilities (2) • Files – Maintain folders – Keep information about each file: name, size, owner, type, permissions – User quotas • I/O and network – Device drivers • User interface (shell or GUI) Boot cycle • When you turn machine on, OS is on disk. • CPU must begin running pre-arranged code in non-volatile ROM. • The ROM program tells the CPU to load OS from disk to RAM. – As well as BIOS: basic I/O system • Finally, begin running the OS. • ROM designed to be fast/efficient, therefore small. The entire OS cannot fit in ROM. Process • A program that has started, but hasn’t yet completed…. Pending work. • OS must keep track of all current work, in case of interruption or hibernation. • Possible states for a process – Ready (could execute, but doesn’t have CPU) – Running (in CPU) – Waiting (doing I/O operation) • During its lifetime, a process may experience many state transitions. CS 101 – April 23 • Operating Systems – Scheduling – File permissions Scheduling • OS may need to decide the order in which to do jobs • Many ways to create a schedule. We’ll look at 2. • First-come, first-served – Do the jobs in the order in which they are requested • Shortest Job Next – Give priority to short/easy tasks. • Evaluating schedules – People are interested in how long for their requested jobs to complete. – Compute the average turnaround time. – Turnaround time of a job = (time @ finish – time @ request) Example 1 Process number Time of request Execution time needed 1 0 20 2 5 30 3 10 40 4 20 10 • First-come, first-served – – – – Process 1 can execute from t=0 to t=20 Process 2 can execute from t=20 to t=50 Process 3 can execute from t=50 to t=90 Process 4 can execute from t=90 to t=100 • We can enter this info as extra columns in the table. • What is the average turnaround time? • What if we tried Shortest Job Next? Example 2 Process number Time of request Execution time needed 1 0 10 2 30 30 3 40 20 4 50 5 Note that it’s possible to have idle time. System load • A measure of how “busy” the CPU is • At an instant: how many tasks are currently running or ready. – If load > 1, the system is “overloaded”, and work is backing up. • Typically reported as an average of the last 1, 5, or 15 minutes. • Based on the schedule, can calculate average load as well as maximum load. File permissions • 3 levels: owner, group, rest of world • For each level: – ‘r’ = Can I read the file? – ‘w’ = Can I write to (or delete) the file? – ‘x’ = Can I execute the file? • Examples – rw-rw-r-– rwxr-xr-– rw-r----- (664) (754) (640) Common permissions • On many systems, there are no groups, so the group permission is the same as “everybody else”. • Examples 644 600 755 700 • Only a file/folder’s owner or the administrator may change permissions.

notes - Computer Science

Related documents

Products

Support

notes - Computer Science

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib