COMP 40: Machine Structure and Assembly Language Programming (Spring 2014) What’s A Bit? Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah Topics What is information? What’s a bit? How do computer memories store information? © 2010 Noah Mendelsohn The History of Information Theory 3 © 2010 Noah Mendelsohn Claude Shannon 1948: Claude Shannon publishes: A mathematical theory of communication* * http://www.alcatel-lucent.com/bstj/vol27-1948/articles/bstj27-3-379.pdf 4 Photo by Tekniska Museet © 2010 Noah Mendelsohn Questions We can weigh things that have mass We can determine the volume of solid object or a liquid We can measure the height of the walls in this room Can we measure information? Can we distinguish more information from less? What units could we use? 5 © 2010 Noah Mendelsohn Intuition There is more information in the library of congress than there is in a single word of text… …but how can we prove that rigorously? 6 © 2010 Noah Mendelsohn Crucial insight Whenever two parties communicate: – We can view the communication as answering one or more questions – Example: you and I are deciding whether to have dinner. We agree in advance that I am going to phone you and give you the shortest possible message to convey the answer. I will say “yes” if we’re having dinner and “no” if not. – Harder example: we also need to decide whether we’re going to the movies. This time, we agree in advance that I will say “yes, yes” for movie and dinner, “yes, no” for dinner only, “no, yes” for movie only, and “no, no” for staying home. This is profound… …communication is answering questions! 7 © 2010 Noah Mendelsohn Crucial insight Whenever two parties communicate: – We can view the communication as answering one or more questions – Example: you and I are deciding whether to have dinner. We agree in advance that I am going to phone you and give you the shortest possible message to convey the answer. I will say “yes” if we’re having dinner and “no” if not. – Harder example: we also need to decide whether we’re going to the movies. This time, we agree in advance that I will say “yes, yes” for movie and dinner, “yes, no” for dinner only, “no, yes” for movie only, and “no, no” for staying home. This is profound… …communication is choosing among possibilites! 8 © 2010 Noah Mendelsohn Measuring information Just saying “yes” or “no” isn’t enough… …we have to agree on what the choices are The more choices we have to make, the more “yes” or “no” answers we’ll have to communicate Shannon’s paper introduced the term bit! (atributing it to co-worker John Tukey) We define the ability to convey a single yes/no answer as a bit We define the amount of information as the number of yes/no questions to be answered 9 © 2010 Noah Mendelsohn How many bits for Beethoven’s 9th Symphony? If you and I agree in advance that we are choosing between only two recordings that we both have, then: – We can choose between them with 1 bit! If we agree in advance only that it is some digital sound recording, then: – We need enough bits so that you can choose the intended sound wave from all possible such 70+ minute recordings* * By the way, the compact disc format was chosen to have enough bits to encode Beethoven’s 9th , at 44KHz x 16bits/sample x 2 channels estimated at 74 minutes. Approximately 5,872,025,600 bits 10 © 2010 Noah Mendelsohn Things to notice We always have to agree in advance what the possible choices are: – Whether we’re having dinner or not – Which of N sound wave forms I want you to reproduce We always have to agree on which answers (bit values) corresponds to which choices We can use any labels we like for the bit values, e.g. – [yes] will mean Beethoven – [yes yes] will mean dinner and movie Or… – [true] will mean Beethoven – [true false] will mean dinner and no movie 11 © 2010 Noah Mendelsohn What if we want to encode numbers? We always have to agree in advance what the possible choices are: – Whether we’re having dinner or not – Which of N sound wave forms I want you to reproduce – Which of N numbers I’ve stored in a computers’s memory Question: What are good labels for encoding numbers? 12 © 2010 Noah Mendelsohn Let’s try some labels for encoding numbers 13 Encoding Number encoded [no, no, no] 0 [no, no, yes] 1 [no, yes, no] 2 [no, yes, yes] 3 [yes, no, no] 4 [yes, no, yes] 5 [yes, yes, no] 6 [yes, yes, yes] 7 © 2010 Noah Mendelsohn Let’s try some labels for encoding numbers Encoding Number encoded [false, false, false] 0 [false, false, true] 1 [false, true, false] 2 [false, true, true] 3 [true, false, false 4 [true, false, true] 5 [true, true, false] 6 [true, true, true] 7 Any two labels will do… …but do you notice a pattern? 14 © 2010 Noah Mendelsohn Let’s try some labels for encoding numbers Encoding Number encoded [0,0,0] 0 [0,0,1] 1 [0,1,0] 2 [0,1,1] 3 [1,0,0] 4 [1,0,1] 5 [1,1,0] 6 [1,1,1] 7 Hey, that’s the binary representation of the number! 15 © 2010 Noah Mendelsohn Encoding numbers in a computer memory How many bits do I need if we need to encode which of 8 values are in the memory? – 1 bit: 0 or 1 [two choices] – 2 bits: 00, 01, 10, 11 [four choices] – 3 bits: 000, 001, 010, 011, 100, 101, 110, 111 [8 choices] Number_of_choices = 2N-bits 16 N-Bits = log2 (number_of_choices) © 2010 Noah Mendelsohn Encoding numbers in a computer memory How many bits do I need if we need to encode which of 8 values are in the memory? – 1 bit: 0 or 1 [two choices] – 2 bits: 00, 01, 10, 11 [four choices] – 3 bits: 000, 001, 010, 011, 100, 101, 110, 111 [8 choices] Number_of_choices = 2N-bits N-Bits = log2 (number_of_choices) As we said, those are binary numbers! 1 x 22 + 0 x 21 + 1 x 20 = 5 So…that’s why we label the states zero and one, because we can play this game to assign bit patterns to binary encodings of numbers Our hardware has instructions to do very efficient arithmetic on these binary representations of numbers 17 © 2010 Noah Mendelsohn Note: we will discuss negative numbers, numbers with fractions, very large and very small numbers, and arithmetic on all of these, at a later time 18 © 2010 Noah Mendelsohn Software structures model real world objects and concepts Number Students Bank statements Photographic images Sound recordings Etc. These things aren’t bits!! They don’t live in computers, but… 19 © 2010 Noah Mendelsohn Software structures model real world objects and concepts Numbers Students Bank statements Photographic images Sound recordings Etc. We build data structures that model them ...we agree which bit patterns represent them 20 © 2010 Noah Mendelsohn What we’ve learned so far… Bits encode yes/no choices To communicate, we agree in advance on which bit patterns represent which choices More information: more choices…which means more bits! We can store any information in a computer memory as long as we agree on which bit patterns represent which choice If we label the bit states 0 and 1, then binary numbers are an obvious representation for the integers We choose other encodings for characters (e.g. ASCII), photos (pixel on/off), music (digitized wave amplitude) 21 © 2010 Noah Mendelsohn How Do We Build Bits into Computers? 22 © 2010 Noah Mendelsohn Building a bit in hardware We need hardware that can be in a choice of two states Computer main memory history: – 1940’s: spots on a TV tube; sound pressure waves in a mercury delay line; vacuum tubes – 1950’s: rotating magnetic drum; vacuum tubes – 1950s – 1970s: tiny magnetizeable iron donuts (core memory) – 1970s – present: charges on a capacitor driving a transistor Computer bulk storage – Magnetizeable tape – Magnetizeable disk – Transistors holding charge or solid state magnetic devices These vary in cost/size/speed – all encode bits 23 © 2010 Noah Mendelsohn Technology for Storing Bits Relay 24 Thyratrons & Vacuum Tubes* © 2010 Noah Mendelsohn Technology for Storing Bits Magnetic Tape Transistors Core Memory Punch Cards Limited Integration: 25 Integrated circuit USB Key © 2010 Noah Mendelsohn Binary Numbers 26 © 2010 Noah Mendelsohn Learn your binary numbers N 2N N 2N 0 1 8 256 1 2 9 512 2 4 10 1024 3 8 11 2048 4 16 12 4096 5 32 13 8192 6 64 14 16384 7 128 15 32768 16 65536 220 ~= 1M 27 230 ~= 1B 232 ~= 4B 264 = HUGE © 2010 Noah Mendelsohn Another way to think about it 1011 = 11 (decimal) 16 28 8 0 © 2010 Noah Mendelsohn Another way to think about it 1011 16 29 8 0 © 2010 Noah Mendelsohn Another way to think about binary numbers 1011 16 30 8 0 © 2010 Noah Mendelsohn Another way to think about binary numbers 1011 16 31 8 0 © 2010 Noah Mendelsohn Another way to think about binary numbers 1011 16 8 0 The binary representation encodes a binary search for the number! 32 © 2010 Noah Mendelsohn 01100110 11100110 01100110 11100110 01100110 11100110 01100110 111 00110 32 bits © 2010 Noah Mendelsohn The logical structure of computer memory Can we get a C pointer to a bit in memory? NO! 01100110 11100110 01100110 11100110 01100110 11100110 01100110 11100110 Addr: 0 1 2 3 4 5 6 7 Pointers (on most modern machines) are to whole bytes © 2010 Noah Mendelsohn Why byte addressing? Can address more memory with smaller pointers Not too big, not too small 256 values: about right for holding the characters Western cultures need (ASCII) – one character fits in one byte 8 is a power of 2 … we’ll see advantages later Unfortunately: we need multiple byte representations for non-alphabetic languages (Chinese, Japanese Kanji etc.) – we deal with that in software What’s the largest integer we can store in a byte? 35 © 2010 Noah Mendelsohn Computers can work efficiently with bigger words Sizes vary with machine architecture: these are for AMD 64 Byte Byte BYTE (8) Byte Byte Byte SHORT (16) Byte Byte Byte INT (32) LONG (64) POINTER (64) • C has types for these • The hardware has instructions to directly manipulate these • The memory system moves these efficiently (in parallel) 36 © 2010 Noah Mendelsohn Review 37 © 2010 Noah Mendelsohn Review Bits encode choices We can thus choose a representation for information in bits We can interpret the same bit values in different ways (number 66 or ASCII letter C) If we call the bit states 0 & 1: then we easily get binary numbers We know how to implement bit stores in hardware and to compute with them We generally address bytes not bits We often use words of types like integer…these are useful, and the machine handles them efficiently 38 © 2010 Noah Mendelsohn Abstractions -- again 39 © 2010 Noah Mendelsohn Abstractions Are Layered at Every Level in our Systems • • 40 “Real world” concepts modeled in Hanson ADTs & C Types • Hanson ADTs implemented in C Types • Soon: bits & bytes used encode machine instructions • Words, Bytes and bits used to implement C types Bytes grouped in hardware to make words (int, long, etc.) • True/false bits grouped to make bytes • Information modeled as true/false bits • True/false bits encoded in charges on transistors © 2010 Noah Mendelsohn An Aside on Information Theory 41 © 2010 Noah Mendelsohn We’ve over-simplified the story a little What we’ve said about bits and choices is true However: – Many encodings are wasteful…I.e. values of the bits are somewhat predictable – Example: for each day of the year: [1=There was a hurricane, 0=No hurricane]…we know most bits will be zero – Can you find a better encoding? To really measure information: we use the smallest possible encoding Also: Shannon didn’t just measure information…he predicted how reliably you could send it through a noisy transmission line Still: what we’ve studied here is a great start on thinking about bits and information, which are the foundations for modern digital computing. 42 © 2010 Noah Mendelsohn