dataStorage - Rose

advertisement
Data storage
Charles McAnany
What are the ones and zeroes?
"Lorem ipsum dolor sit amet,
consectetur adipisicing elit,
sed do eiusmod tempor
incididunt ut labore et dolore
magna aliqua. Ut enim ad
minim veniam, quis nostrud
exercitation ullamco laboris.
Computer
1011010
Hard drive
Definitions
A bit is a single one or zero.
A byte is eight bits.
Numbers stored as ones and zeroes are stored in
binary.
Binary numbers
• A list of ones and zeroes is a number.
• It’s just like a number in base ten.
1576
10110
Converting from binary to decimal
10110
Place
Power
1 2^0
2 2^1
3 2^2
4 2^3
5 2^4
Value
1
2
4
8
16
Present?
No
Yes
Yes
No
Yes
Then, add up all the present values.
16 + 4 + 2 = 22
From decimal to binary
• Harder!
• Find the largest power of two that fits in the decimal
number.
• Subtract that number, and mark it as present in the binary.
• Repeat until the decimal number is zero.
1577
1577
Largest power of two that fits:
1024
(not 2048, because 2048 > 1577.)
1577
-1024
= 553
Place Power
1
2^0
2
2^1
3
2^2
4
2^3
5
2^4
6
2^5
7
2^6
8
2^7
9
2^8
10
2^9
11
2^10
12
2^11
13
2^12
14
2^13
15
2^14
Mark the 1024 spot, and continue with 553.
Value
1
2
4
8
16
32
64
128
256
512
1,024
2,048
4,096
8,192
16,384
Present?
553
Largest power of two that fits:
512
553
- 512
= 41
Mark the 512 spot, and continue with 41.
Place Power
1
2^0
2
2^1
3
2^2
4
2^3
5
2^4
6
2^5
7
2^6
8
2^7
9
2^8
10
2^9
11
2^10
12
2^11
13
2^12
14
2^13
15
2^14
Value
1
2
4
8
16
32
64
128
256
512
1,024
2,048
4,096
8,192
16,384
Present?
yes
41
Largest power of two that fits:
32
41
- 32
= 9
Mark the 32 spot, and continue with 9.
Place Power
1
2^0
2
2^1
3
2^2
4
2^3
5
2^4
6
2^5
7
2^6
8
2^7
9
2^8
10
2^9
11
2^10
12
2^11
13
2^12
14
2^13
15
2^14
Value
1
2
4
8
16
32
64
128
256
512
1,024
2,048
4,096
8,192
16,384
Present?
yes
yes
9
Largest power of two that fits:
8
9
- 8
= 1
Mark the 8 spot, and continue with 1.
Place Power
1
2^0
2
2^1
3
2^2
4
2^3
5
2^4
6
2^5
7
2^6
8
2^7
9
2^8
10
2^9
11
2^10
12
2^11
13
2^12
14
2^13
15
2^14
Value
1
2
4
8
16
32
64
128
256
512
1,024
2,048
4,096
8,192
16,384
Present?
yes
yes
yes
1
Largest power of two that fits:
1
1
- 1
= 0
Place Power
1
2^0
2
2^1
3
2^2
4
2^3
5
2^4
6
2^5
7
2^6
8
2^7
9
2^8
10
2^9
11
2^10
12
2^11
13
2^12
14
2^13
15
2^14
Mark the 1 spot, the number is zero, so you’re done.
Value
1
2
4
8
16
32
64
128
256
512
1,024
2,048
4,096
8,192
16,384
Present?
yes
yes
yes
Place Power
1
2^0
2
2^1
3
2^2
Wherever you marked present, that’s a 1.
4
2^3
If there’s no mark, that number’s a 0.
2^4
Starting at the bottom, fill in the binary number. 5
The number is 000011000100001.
6
2^5
7
2^6
8
2^7
9
2^8
10
2^9
11
2^10
12
2^11
13
2^12
14
2^13
15
2^14
Value
1
2
4
8
16
32
64
128
256
512
1,024
2,048
4,096
8,192
16,384
Present?
yes
yes
yes
yes
Playing for money.
• The first person to convert the following
number to binary will receive a cash prize.
• The number is: 200,000,000.
Text
• Each byte is a character. So, each character
has a number. The capital letters are in the
handout.
• The following string (bytes separated by
commas) is:
• 01001000, 01000101, 01001100, 01001100, 01001111
• H
E
L
L
O
• Please take a moment to write your name in
binary.
Glue the rows together. (remembering how
long a row is elsewhere.)
Images:
Images are broken into pixels.
00 01 00 01 10 10 10 00 00 00 00 11 11 11 00 01 00 01 10 10 10 11 11 11 11 11 11 11 10 10 10 10 10 10 10
Then, store the numbers, and any
other info needed to make the
image.
The image format might specify, for
instance, the first eight bits is the
row length. So, our file would be
Each color is given a number.
Blue = 0, Light blue = 1, red = 2, white = 3
00 01 00 01 10 10 10
00 00 00 00 11 11 11
00 01 00 01 10 10 10
11 11 11 11 11 11 11
10 10 10 10 10 10 10
000001110001000110101000000000
111111000100011010101111111111
111110101010101010
Recreating an image.
• The first eight bits are the row length. Use the coloring
scheme in the handout.
• Here’s the file as it appears on the disk. Recreate the
image. (I’ve broken it into bytes for ease of reading.)
• 00000101
• 00010001
• 00000010
• 00001100
• 00001100
• 11111100
Compression
• If a particular pattern occurs often in a file, it
may be possible to compress the contents.
We’ll use Huffman coding to deflate a text file.
• The original text is “this is an example of a
huffman tree”.
• We start by analyzing letter frequency. The
most common letters should have the shortest
codes.
Huffman coding
Char
To encode, we replace each character with its
Huffman code. The word “tree” is originally
01010100 01010010 01000101 01000101
Replacing it with the codes, we get:
011011000000000
Freq
Code
space
7
111
a
4
010
e
4
000
f
3
1101
h
2
1010
i
2
1000
m
2
0111
n
2
0010
s
2
1011
t
2
0110
l
1
11001
o
1
00110
p
1
10011
r
1
11000
u
1
00111
x
1
10010
Image compression using Huffman
coding
• Our image example used two bits for every
pixel. That’s great for images with four colors.
But most images are stored using 32 or even
64 bits per pixel.
• If most of the image is one color, we can give
that color a code of very few bits, converting
all of those 64-bit pixels into 1 bit pixels.
Other compression methods
• Huffman coding is very widely used in lossless
compression. When you view the data that
was encoded, you get the EXACT same data
back.
• But some things (music and images in
particular) may not need to be stored with
perfect accuracy.
• For these, we use lossy compression.
Lossy compression
Credits
• Maru the cat
http://catsnco.wordpress.com/2011/10/19/iam-maru/
• Huffman tree
http://en.wikipedia.org/wiki/Huffman_coding
• Lossy compression NMR
http://nmr.cemhti.cnrsorleans.fr/Dmfit/Howto/Top/Default.aspx
Download