Overview of Computer Science CSC 101 — Summer 2011 Analog, Binary and Digital Concepts Di iti ti Digitization Lecture 4 — July 11, 2011 Announcements • Writing Assignment #1 Due Today. – Hand it to me after class if you haven’t already – Make sure you have the electronic copy with you for lab tomorrow • Lab#1 is tomorrow (8am) – Be sure to read the prelab tonight 2 Objectives • Analog vs. digital information • Binary encoding of information – bits and bytes • Digitization 3 1 Processing Data • For a device to process data, what three steps are required? – Input some data – Process the data (perform some planned operations on the data) – Output the results • A computer is any device that processes data – Not necessarily only digital data 4 Analog Information • Analog information is what we experience directly – Sights, sounds, textures, smells, tastes, etc. • Analog info is continuous and infinitely variable • Example: monitoring the outside temp through the day using an analog thermometer 80° 70° 60° 50° midnight noon 5 An Analog Computer • A very simple analog computer is a mechanical thermostat – “Inputs”: • Measured temperature • Desired temperature (“setpoint”) – Executes a simple program: If temp > setpoint then AC.on – “Output” is the action of turning the AC on or off – temp and setpoint are both analog values • Temperature causes a spring to stretch or shrink • Setpoint is set by turning a dial • Both of these are continuous, infinitely variable values 6 2 Digital Information • Digital information is discrete – Definite, distinct, precise – Enumerable (countable) – Finite • Example: measuring temperature with a digital thermometer 56.5 °F Time Temperature 12:00 AM 56.5° 12:30 AM 54.9° 1:00 AM 54.0° 1:30 AM 53.5° 2:00 AM 53.3° 2:30 AM 53.1° 3:00 AM 53.0° … … 7 Analog vs. Digital Information • Advantages of digital information: – – – – Efficient storage and transfer Unlimited absolute replication Can be compressed Easily manipulated • Editing, combining, etc. • We don’t use many analog computers today – Digital computers give us all the advantages of being able to process digital information 8 Bits and Bytes • Computers contain lots of on/off switches – A relay, vacuum tube, or transistor acts like a switch – either on or off – Let’s say a switch that is on represents the digit 1 and off represents 0 • Digital computers represent all data using only 1s and 0s – Each of the billions of transistors in a computer are either on or off • A single digit (1 or 0) is called a bit (binary digit) • A bit is the smallest possible amount of information – Like an ‘atom’ of data • One bit provides only a minimum amount of data: – 1 or 0; Yes or No; On or Off; Up or Down; Stop or Go … any two-state value – Anything beyond a simple two-state value requires more than one bit 9 3 Bits and Bytes • A single light bulb is one bit of information – on or off; yes or no • The light g gives g the answer (yes or no), but you need to know the question – “One if by land, two if by sea…” 10 Bits and Bytes • A single light bulb is one bit of information – on or off • But a whole bunch of light g bulbs, arranged in a proper pattern, can give lots of information (such as a scoreboard), even though each light is only on or off 11 Bits and Bytes • A bit is the smallest possible amount of information – yes/no, on/off, 0/1, etc. • One bit doesn’t ggive us much information, but many bits together can give much more – – – – An image (maybe on a scoreboard) Words Sounds Numbers other than 0 or 1 • How can we represent numbers using bits? 12 4 Bits and Bytes • One bit can represent only 2 things 1-bit Binary Decimal – on or off, yes or no, 0 or 1 • Two bits can represent 4 things – Th There are 4 diff different patterns: 00, 01, 10, 11 0 0(off) 0 1 1(on) 1 2-bit Binary Decimal 00 0 01 1 10 2 11 3 13 Bits and Bytes • One bit can represent only 2 things 8-bit Binary Decimal – on or off, yes or no, 0 or 1 00000000 0 • Two bits can represent 4 things 00000001 1 – Th There are 4 diff different patterns: 00, 01, 10, 11 00000010 2 00000011 3 00000100 4 • Eight bits can represent 256 things – There are 256 different patterns possible with eight bits • A group of 8 consecutive bits is called a byte 00000101 5 … … 11111110 254 11111111 255 14 Bits and Bytes • Bytes are usually grouped for convenience – 1 typed character is (usually) 1 byte – 1 KB (kilobyte) is about 1,000 bytes (actually 1024 = 210) • A single typed manuscript page is about 1,500 characters—about 1.5 KB – 1 MB (megabyte) is about 1,000 KB, or a million bytes – 1 GB (gigabyte) is about 1,000 MB, or a billion bytes • The WFU T60 ThinkPad has 1 GB of RAM memory and a 100-GB hard disk • 100 GB is about 100,000,000 typed pages – 1 TB (terabyte) is about 1,000 GB, or a trillion bytes • 1 TB of data, if on typed pages of paper would be a stack of paper 50 miles high • The print collection of the Library of Congress is about 10 TB 15 5 Bits and Bytes – 1 PB (petabyte) is about 1,000 TB (1,000,000,000,000,000 bytes) • A stack of paper more than 6 times the diameter of the Earth... …1/5th the distance to the Moon! • All material ever printed on paper is estimated to be about 200 petabytes • Google processes many petabytes of data each day (http://portal.acm.org/citation.cfm?doid=1327452.1327492) – 1 EB (exabyte) is about 1,000 PB (1,000,000,000,000,000 bytes) • All the words ever spoken by any human, ever, would be about 5 EB of text – Next comes zettabyte, yottabyte, etc.… • Check out “How Much Data is That” • http://www.jamesshuggins.com/h/tek1/how_big.htm 16 Origin of the Term Byte • “…The term byte was coined by Werner Buchholz, a researcher at IBM, in 1956 during the early design phase for the IBM Stretch computer (the company’s first supercomputer). It was a modification of the word bite that was intended to avoid accidentally misspelling it as bit. … “The movement toward an eight-bit byte began in late 1956. A major reason that eight was considered the optimal number was that seven bits can define 128 characters (as against only 64 characters for six bits), which is sufficient for the approximately 100 unique codes needed for the upper and lower case letters of the English alphabet as well as punctuation marks and special characters, and the eighth bit could be used as a parity check (i.e., to confirm the accuracy of the other bits). “This size was later adopted by IBM's highly popular System/360 series of mainframe systems [1964] and this was a key factor in its eventually becoming the industry-wide standard. …” — From http://www.linfo.org/byte.html • “Half of an eight-bit byte (four bits) is sometimes called (playfully) a nibble (sometimes spelled nybble) or more formally a hex digit. The nibble is often called a semioctet in a networking or telecommunication context and also by some standards organisations. “The eight-bit byte is often called an octet in formal contexts such as industry standards, as well as in networking and telecommunication. This is also the word used for the eight-bit quantity in many non-English languages, where the pun on — From http://www.wordiq.com/definition/Byte bite does not translate. …” 17 Etymology of Unit Prefixes 1. 2. 3. 4. 5. 6 6. 7. 8. Kilo Mega Giga Tera Peta Exa Zetta Yotta 103 106 109 1012 1015 1018 1021 1024 from Greek khilioi = 1000 from Greek megas = great, e.g., Alexandros Megos (Alexander the Great) from Latin gigas = giant from Greek teras = monster from Greek pente = five, because it’s the fifth prefix… peNta – ‘N’ = peta from Greek hex = six, six because it it’ss the sixth prefix prefix… Hexa – ‘H’ H = exa the last letter of the Latin alphabet (similar to the Greek letter Zeta) the penultimate letter of the Latin alphabet (similar to the Greek Iota) The first prefix is number-derived; the second, third, and fourth are based on mythology. The fifth and sixth are just that: fifth and sixth. With the seventh, another fork has been taken. The General Conference of Weights and Measures (Conférence Générale des Poids et Mesures, CGPM) has now decided to name the prefixes, starting with the seventh, with the letters of the Latin alphabet, but starting from the end. Thus, going backwards through the Latin alphabet, the next prefixes will be: 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. Xona Weka Vunda Uda Treda Sorta Rinta Quexa Pepta Ocha Nena Minga Luna 1027 1030 1033 1036 1039 1042 1045 1048 1051 1054 1057 1060 1063 18 6 Digital Information • Digital computers process digital information • Digital information is discrete; however, natural forms of information are analog and continuous • The h process off converting i information i f i to a digital di i l form f is called digitization • Both discrete and analog information may be digitized – Information that is already discrete (numbers, text characters, etc.) is easily represented in a digital form – Analog information must be converted in some way 19 Digitizing Analog Information • Text and numbers are discrete information – • Digitization is simply a matter of conversion from one discrete form to another Analog information is continuous (non-discrete) (non discrete) – • Must be transformed into a discrete form for digitizing Analog information is digitized in two steps: 1. 2. Sampling: Discrete samples are chosen to represent the continuous data Quantizing: Each sample is assigned a particular number 20 Digitizing Analog Information • An example using an image 1. Sampling – Choose discrete pixels, or “picture picture elements elements” 2. Quantizing – Assign a number to each pixel 21 7 Digitizing Analog Information • Sample: break up the data into pixels 22 Digitizing Analog Information • Sample: break up the data into pixels • Average the contents of each pixel 23 Digitizing Analog Information • Sample: break up the data into pixels • Average the contents of each pixel • Quantize: assign a number to represent the gray level of each pixel – (e.g. from 0 – 15, where 0 = “black” and 15 = “white”) 24 8 Digitizing Analog Information • The quality of the digitized image depends on – Number/size of pixels – Number of different levels used in quantization • The size of the data file depends on the same factors • Tradeoff between image quality and file size 25 Digitizing Analog Data • Another example: temperature data • Step 1: sampling – How many y samples p do we need? – Is once a day sufficient? 80° 70° 60° 50° midnight noon 73.2° 26 Digitizing Analog Data • How about twice a day? 80° 70° 60° 50° midnight noon 66.3° 72.5° 27 9 Digitizing Analog Data • How about every two hours? 80° 70° 60° 50° midnight noon 28 Digitizing Analog Data • How about every two hours? – More accurate representation – But, still not complete 80° 70° 60° 50° midnight noon 29 Digitizing Analog Data • Adding more samples increases the fidelity (accuracy) of the representation – But, still not exactly identical to the analog data – Still have the tradeoff between data quality and file size 80° 70° 60° 50° midnight noon 30 10