Data storage Charles McAnany What are the ones and zeroes? "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris. Computer 1011010 Hard drive Definitions A bit is a single one or zero. A byte is eight bits. Numbers stored as ones and zeroes are stored in binary. Binary numbers • A list of ones and zeroes is a number. • It’s just like a number in base ten. 1576 10110 Converting from binary to decimal 10110 Place Power 1 2^0 2 2^1 3 2^2 4 2^3 5 2^4 Value 1 2 4 8 16 Present? No Yes Yes No Yes Then, add up all the present values. 16 + 4 + 2 = 22 From decimal to binary • Harder! • Find the largest power of two that fits in the decimal number. • Subtract that number, and mark it as present in the binary. • Repeat until the decimal number is zero. 1577 1577 Largest power of two that fits: 1024 (not 2048, because 2048 > 1577.) 1577 -1024 = 553 Place Power 1 2^0 2 2^1 3 2^2 4 2^3 5 2^4 6 2^5 7 2^6 8 2^7 9 2^8 10 2^9 11 2^10 12 2^11 13 2^12 14 2^13 15 2^14 Mark the 1024 spot, and continue with 553. Value 1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 Present? 553 Largest power of two that fits: 512 553 - 512 = 41 Mark the 512 spot, and continue with 41. Place Power 1 2^0 2 2^1 3 2^2 4 2^3 5 2^4 6 2^5 7 2^6 8 2^7 9 2^8 10 2^9 11 2^10 12 2^11 13 2^12 14 2^13 15 2^14 Value 1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 Present? yes 41 Largest power of two that fits: 32 41 - 32 = 9 Mark the 32 spot, and continue with 9. Place Power 1 2^0 2 2^1 3 2^2 4 2^3 5 2^4 6 2^5 7 2^6 8 2^7 9 2^8 10 2^9 11 2^10 12 2^11 13 2^12 14 2^13 15 2^14 Value 1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 Present? yes yes 9 Largest power of two that fits: 8 9 - 8 = 1 Mark the 8 spot, and continue with 1. Place Power 1 2^0 2 2^1 3 2^2 4 2^3 5 2^4 6 2^5 7 2^6 8 2^7 9 2^8 10 2^9 11 2^10 12 2^11 13 2^12 14 2^13 15 2^14 Value 1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 Present? yes yes yes 1 Largest power of two that fits: 1 1 - 1 = 0 Place Power 1 2^0 2 2^1 3 2^2 4 2^3 5 2^4 6 2^5 7 2^6 8 2^7 9 2^8 10 2^9 11 2^10 12 2^11 13 2^12 14 2^13 15 2^14 Mark the 1 spot, the number is zero, so you’re done. Value 1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 Present? yes yes yes Place Power 1 2^0 2 2^1 3 2^2 Wherever you marked present, that’s a 1. 4 2^3 If there’s no mark, that number’s a 0. 2^4 Starting at the bottom, fill in the binary number. 5 The number is 000011000100001. 6 2^5 7 2^6 8 2^7 9 2^8 10 2^9 11 2^10 12 2^11 13 2^12 14 2^13 15 2^14 Value 1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 Present? yes yes yes yes Playing for money. • The first person to convert the following number to binary will receive a cash prize. • The number is: 200,000,000. Text • Each byte is a character. So, each character has a number. The capital letters are in the handout. • The following string (bytes separated by commas) is: • 01001000, 01000101, 01001100, 01001100, 01001111 • H E L L O • Please take a moment to write your name in binary. Glue the rows together. (remembering how long a row is elsewhere.) Images: Images are broken into pixels. 00 01 00 01 10 10 10 00 00 00 00 11 11 11 00 01 00 01 10 10 10 11 11 11 11 11 11 11 10 10 10 10 10 10 10 Then, store the numbers, and any other info needed to make the image. The image format might specify, for instance, the first eight bits is the row length. So, our file would be Each color is given a number. Blue = 0, Light blue = 1, red = 2, white = 3 00 01 00 01 10 10 10 00 00 00 00 11 11 11 00 01 00 01 10 10 10 11 11 11 11 11 11 11 10 10 10 10 10 10 10 000001110001000110101000000000 111111000100011010101111111111 111110101010101010 Recreating an image. • The first eight bits are the row length. Use the coloring scheme in the handout. • Here’s the file as it appears on the disk. Recreate the image. (I’ve broken it into bytes for ease of reading.) • 00000101 • 00010001 • 00000010 • 00001100 • 00001100 • 11111100 Compression • If a particular pattern occurs often in a file, it may be possible to compress the contents. We’ll use Huffman coding to deflate a text file. • The original text is “this is an example of a huffman tree”. • We start by analyzing letter frequency. The most common letters should have the shortest codes. Huffman coding Char To encode, we replace each character with its Huffman code. The word “tree” is originally 01010100 01010010 01000101 01000101 Replacing it with the codes, we get: 011011000000000 Freq Code space 7 111 a 4 010 e 4 000 f 3 1101 h 2 1010 i 2 1000 m 2 0111 n 2 0010 s 2 1011 t 2 0110 l 1 11001 o 1 00110 p 1 10011 r 1 11000 u 1 00111 x 1 10010 Image compression using Huffman coding • Our image example used two bits for every pixel. That’s great for images with four colors. But most images are stored using 32 or even 64 bits per pixel. • If most of the image is one color, we can give that color a code of very few bits, converting all of those 64-bit pixels into 1 bit pixels. Other compression methods • Huffman coding is very widely used in lossless compression. When you view the data that was encoded, you get the EXACT same data back. • But some things (music and images in particular) may not need to be stored with perfect accuracy. • For these, we use lossy compression. Lossy compression Credits • Maru the cat http://catsnco.wordpress.com/2011/10/19/iam-maru/ • Huffman tree http://en.wikipedia.org/wiki/Huffman_coding • Lossy compression NMR http://nmr.cemhti.cnrsorleans.fr/Dmfit/Howto/Top/Default.aspx