lect03.ppt

advertisement
Today’s topics
Data Representation
 Memory organization
 Compression
Upcoming
 Systems
 Connections
 Algorithms
Reading
Computer Science, Chapter 1-3
CompSci 001
3.1
Main Memory Cells

Cell: A unit of main memory (typically 8
bits which is one byte)


Most significant bit: the bit at the left (high-order) end of
the conceptual row of bits in a memory cell
Least significant bit: the bit at the right (low-order) end of
the conceptual row of bits in a memory cell
CompSci 001
3.2
Brookshear 1-2
Figure 1.7 The organization of a bytesize memory cell
3.3
CompSci 001
Brookshear 1-3
Main Memory Addresses

Address: A “name” that uniquely identifies
one cell in the computer’s main memory



The names are actually numbers.
These numbers are assigned consecutively starting at zero.
Numbering the cells in this manner associates an order
with the memory cells.
3.4
CompSci 001
1-4
Figure 1.8 Memory cells arranged by
address
CompSci 001
3.5
Brookshear 1-5
Memory Terminology


Random Access Memory (RAM): Memory in which individual
cells can be easily accessed in any order
Dynamic Memory (DRAM): RAM composed of volatile
memory
CompSci 001
3.6
Brookshear 1-6
Measuring Memory Capacity

Kilobyte: 210 bytes = 1024 bytes



Megabyte: 220 bytes = 1,048,576 bytes



Example: 3 KB = 3 times1024 bytes
Sometimes “kibi” rather than “kilo”
Example: 3 MB = 3 times 1,048,576 bytes
Sometimes “megi” rather than “mega”
Gigabyte: 230 bytes = 1,073,741,824 bytes


Example: 3 GB = 3 times 1,073,741,824 bytes
Sometimes “gigi” rather than “giga”
CompSci 001
3.7
Brookshear 1-7
Mass Storage




On-line versus off-line
Typically larger than main memory
Typically less volatile than main memory
Typically slower than main memory
CompSci 001
3.8
Brookshear 1-8
Mass Storage Systems



Magnetic Systems
 Disk
 Tape
Optical Systems
 CD
 DVD
Flash Drives
CompSci 001
3.9
Brookshear 1-9
Figure 1.9 A magnetic disk storage
system
CompSci 001
3.10
Brookshear 1-10
Figure 1.10 Magnetic tape storage
CompSci 001
3.11
Brookshear 1-11
Figure 1.11 CD storage
CompSci 001
3.12
Brookshear 1-12
Files



File: A unit of data stored in mass storage system
 Fields and keyfields
Physical record versus Logical record
Buffer: A memory area used for the temporary storage of data
(usually as a step in transferring the data)
CompSci 001
3.13
Brookshear 1-13
Figure 1.12 Logical records versus
physical records on a disk
CompSci 001
3.14
Brookshear 1-14
Data Compression

Compression is a high-profile application
 .zip, .mp3, .jpg, .gif, .gz, …
 What property of MP3 was a significant factor in what
made Napster work (why did Napster ultimately fail?)

Why do we care?
 Secondary storage capacity doubles every year
 Disk space fills up quickly on every computer system
 More data to compress than ever before
CompSci 001
3.15
More on Compression




What’s the difference between compression techniques?
 .mp3 files and .zip files?
 .gif and .jpg?
 Lossless and lossy
Is it possible to compress (lossless) every file? Why?
Lossy methods
 Good for pictures, video, and audio (JPEG, MPEG, etc.)
Lossless methods
 Run-length encoding, Huffman, LZW, …
11 3 5 3 2 6 2 6 5 3 5 3 5 3 10
CompSci 001
3.16
Text Compression


Input: String S
Output: String S
 Shorter
 S can be reconstructed from S
CompSci 001
3.17
Figure 1.13 The message “Hello.” in
ASCII
CompSci 001
3.18
Brookshear 1-18
Text Compression: Examples
“abcde” in the different formats
ASCII:
01100001011000100110001101100100…
Fixed:
000001010011100
1
Encodings
ASCII: 8 bits/character
Unicode: 16 bits/character
0
Symbol
ASCII
Fixed
length
Var.
length
a
01100001
000
000
b
01100010
001
11
c
01100011
010
01
d
01100100
011
001
e
01100101
100
10
0
0
a
Var:
000110100110
1
1
b
0
1
c
d
0
0
e
1
0
0
0
CompSci 001
0
a
1
1
d
c
0
e
1
b
3.19
Huffman coding: go go gophers
ASCII
g 103
o 111
p 112
h 104
e 101
r 114
s 115
sp. 32

3 bits Huffman
1100111
1101111
1110000
1101000
1100101
1110010
1110011
1000000
000
001
010
011
100
101
110
111
00
01
1100
1101
1110
1111
101
101
13
7
6
g
o
3
3
Encoding uses tree:
 0 left/1 right
 How many bits? 37!!
 Savings? Worth it?
4
3
s
*
1
2
2
2
p
h
e
r
1
1
1
1
6
4
2
2
g
o
3
3
3
CompSci 001
p
h
e
r
1
1
1
1
s
*
1
2
3.20
Huffman Coding





D.A Huffman in early 1950’s
Before compressing data, analyze the input stream
Represent data using variable length codes
Variable length codes though Prefix codes
 Each letter is assigned a codeword
 Codeword is for a given letter is produced by traversing
the Huffman tree
 Property: No codeword produced is the prefix of another
 Letters appearing frequently have short codewords, while
those that appear rarely have longer ones
Huffman coding is optimal per-character coding method
CompSci 001
3.21
Building a Huffman tree

Begin with a forest of single-node trees (leaves)
 Each node/tree/leaf is weighted with character count
 Node stores two values: character and count
 There are n nodes in forest, n is size of alphabet?

Repeat until there is only one node left: root of tree
 Remove two minimally weighted trees from forest
 Create new tree with minimal trees as children,
• New tree root's weight: sum of children (character ignored)

Does this process terminate? How do we get minimal trees?
 Remove minimal trees, hummm……
CompSci 001
3.22
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
11
6
5
5
4
4
3
3
3
3
2
2
2
2
2
1
1
I
E
N
S
M
A
B
O
T
G
D
L
R
U
P
F
CompSci 001
1
3.23
C
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
2
11
1
1
F
C
6
5
5
4
4
3
3
3
3
I
E
N
S
M
A
B
O
T
CompSci 001
2
2
2
2
2
2
1
G
D
L
R
U
P
3.24
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
2
11
3
1
1
1
2
C
F
P
U
6
5
5
4
4
I
E
N
S
M
CompSci 001
3
3
3
3
3
A
B
O
T
2
2
2
2
2
G
D
L
R
3.25
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
4
2
11
3
1
1
1
2
C
F
P
U
6
5
5
I
E
N
CompSci 001
4
4
4
S
M
3
2
2
L
R
2
3
3
3
3
A
B
O
T
2
2
G
D
3.26
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
4
2
11
3
1
1
1
2
C
F
P
U
6
5
5
I
E
N
CompSci 001
4
4
4
4
S
M
3
4
2
2
2
2
G
D
L
R
3
3
3
3
A
B
O
T
2
3.27
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
5
3
4
2
3
O
11
6
1
1
1
2
C
F
P
U
5
CompSci 001
I
5
5
E
N
4
4
4
4
S
M
4
2
2
2
2
G
D
L
R
3
3
3
A
B
T
3
3.28
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
5
3
6
2
3
O
11
6
4
3
T
1
1
1
2
C
F
P
U
6
CompSci 001
I
5
5
5
E
N
4
4
4
2
2
2
2
G
D
L
R
4
4
3
3
S
M
A
B
3.29
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
6
5
3
6
2
3
O
11
6
4
3
T
1
1
1
2
C
F
P
U
6
CompSci 001
6
I
5
5
5
E
N
4
4
4
2
2
2
2
G
D
L
R
4
4
S
M
3
3
A
B
3.30
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
6
8
5
3
6
2
3
O
11
8
4
4
S
M
4
3
T
1
1
1
2
C
F
P
U
6
CompSci 001
6
6
I
5
5
5
E
N
4
4
2
2
2
2
G
D
L
R
4
3
3
A
B
3.31
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
8
5
3
6
2
3
O
11
8
4
4
S
M
4
3
T
1
1
1
2
C
F
P
U
8
CompSci 001
6
6
6
I
5
6
8
5
5
E
N
4
2
2
2
2
G
D
L
R
3
3
A
B
3.32
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
10
5
8
5
6
E
3
2
3
O
11
10
4
4
S
M
4
3
T
1
1
1
2
C
F
P
U
8
CompSci 001
8
6
6
6
I
5
6
8
5
N
4
2
2
2
2
G
D
L
R
3
3
A
B
3.33
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
10
5
11
5
5
E
6
N
3
2
3
O
11
8
11
3
T
1
1
1
2
C
F
P
U
10
CompSci 001
8
8
6
6
I
6
8
4
4
S
M
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.34
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
12
10
11
8
6
8
6
I
5
5
5
E
N
3
2
3
O
12
6
11
3
T
1
1
1
2
C
F
P
U
11
CompSci 001
10
8
8
4
4
S
M
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.35
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
16
10
11
12
8
6
8
6
I
5
5
5
E
N
3
2
3
O
16
6
12
3
T
1
1
1
2
C
F
P
U
11
CompSci 001
11
10
4
4
S
M
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.36
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
21
16
10
11
12
8
6
8
6
I
5
5
5
E
N
3
2
3
O
21
6
16
3
T
1
1
1
2
C
F
P
U
12
CompSci 001
11
4
4
S
M
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.37
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
N
3
2
3
O
23
6
21
3
T
1
1
1
2
C
F
P
U
16
CompSci 001
4
4
S
M
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.38
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
N
3
2
3
O
37
6
23
3
T
1
1
1
2
C
F
P
U
CompSci 001
4
4
S
M
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.39
Building a tree
“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
N
3
2
3
O
60
6
3
T
1
1
1
2
C
F
P
U
CompSci 001
4
4
S
M
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.40
Properties of Huffman coding


Want to minimize weighted path length L(T)of tree T
L(T ) 
d w
i
iLeaf ( T )
i
wi is the weight or count of each codeword i
 di is the leaf corresponding to codeword i
How do we calculate character (codeword) frequencies?
Huffman coding creates pretty full bushy trees?
 When would it produce a “bad” tree?
How do we produce coded compressed data from input
efficiently?




CompSci 001
3.41
Decoding a message
01100000100001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
3
T
1
1
1
2
C
F
P
U
4
4
S
M
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.42
Decoding a message
1100000100001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
3
T
1
1
1
2
C
F
P
U
4
4
S
M
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.43
Decoding a message
100000100001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
3
T
1
1
1
2
C
F
P
U
4
4
S
M
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.44
Decoding a message
00000100001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
3
T
1
1
1
2
C
F
P
U
4
4
S
M
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.45
Decoding a message
0000100001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
G
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.46
Decoding a message
000100001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
G
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.47
Decoding a message
00100001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
G
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.48
Decoding a message
0100001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
G
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.49
Decoding a message
100001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
G
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.50
Decoding a message
00001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
GO
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.51
Decoding a message
0001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
GO
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.52
Decoding a message
001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
GO
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.53
Decoding a message
01001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
GO
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.54
Decoding a message
1001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
GO
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.55
Decoding a message
001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
GOO
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.56
Decoding a message
01101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
GOO
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.57
Decoding a message
1101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
GOO
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.58
Decoding a message
101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
GOO
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.59
Decoding a message
01
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
4
4
S
M
3
T
1
1
1
2
C
F
P
U
GOO
4
4
2
2
2
2
G
D
L
R
3
3
A
B
3.60
Decoding a message
1
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
3
T
1
1
1
2
C
F
P
U
4
4
S
M
4
4
2
2
2
2
G
D
L
R
GOOD
3
3
A
B
3.61
Decoding a message
01100000100001001101
60
37
23
21
16
10
11
12
8
11
6
8
6
I
5
5
5
E
6
N
3
2
3
O
CompSci 001
3
T
1
1
1
2
C
F
P
U
4
4
S
M
4
4
2
2
2
2
G
D
L
R
GOOD
3
3
A
B
3.62
Huffman Tree 2

“A SIMPLE STRING TO BE ENCODED USING A MINIMAL
NUMBER OF BITS”
 E.g. “ A SIMPLE”  “10101101001000101001110011100000”
CompSci 001
3.63
Huffman Tree 2

“A SIMPLE STRING TO BE ENCODED USING A MINIMAL
NUMBER OF BITS”
 E.g. “ A SIMPLE”  “10101101001000101001110011100000”
CompSci 001
3.64
Huffman Tree 2

“A SIMPLE STRING TO BE ENCODED USING A MINIMAL
NUMBER OF BITS”
 E.g. “ A SIMPLE”  “10101101001000101001110011100000”
CompSci 001
3.65
Huffman Tree 2

“A SIMPLE STRING TO BE ENCODED USING A MINIMAL
NUMBER OF BITS”
 E.g. “ A SIMPLE”  “10101101001000101001110011100000”
CompSci 001
3.66
Huffman Tree 2

“A SIMPLE STRING TO BE ENCODED USING A MINIMAL
NUMBER OF BITS”
 E.g. “ A SIMPLE”  “10101101001000101001110011100000”
CompSci 001
3.67
Huffman Tree 2

“A SIMPLE STRING TO BE ENCODED USING A MINIMAL
NUMBER OF BITS”
 E.g. “ A SIMPLE”  “10101101001000101001110011100000”
CompSci 001
3.68
Huffman Tree 2

“A SIMPLE STRING TO BE ENCODED USING A MINIMAL
NUMBER OF BITS”
 E.g. “ A SIMPLE”  “10101101001000101001110011100000”
CompSci 001
3.69
Huffman Tree 2

“A SIMPLE STRING TO BE ENCODED USING A MINIMAL
NUMBER OF BITS”
 E.g. “ A SIMPLE”  “10101101001000101001110011100000”
CompSci 001
3.70
Huffman Tree 2

“A SIMPLE STRING TO BE ENCODED USING A MINIMAL
NUMBER OF BITS”
 E.g. “ A SIMPLE”  “10101101001000101001110011100000”
CompSci 001
3.71
Data Compression
Year Scheme
Bit/Cha 
r
1967 ASCII
7.00
1950 Huffman
4.70
1977 Lempel-Ziv (LZ77)
3.94

1984 Lempel-Ziv-Welch (LZW) – Unix 3.32
compress
Why is data
compression
important?
How well can
you compress
files losslessly?
1987 (LZH) used by zip and unzip
3.30

1987 Move-to-front
3.24

1987 gzip
2.71
1995 Burrows-Wheeler
2.29
1997 BOA (statistical data
compression)
1.99
CompSci 001

Is there a limit?
How to compare?
How do you
measure how
much
information?
3.72
Other methods



Adaptive Huffman coding
Lempel-Ziv algorithms
 Build the coding table on the fly while reading document
 Coding table changes dynamically
 Protocol between encoder and decoder so that everyone is
always using the right coding scheme
 Works well in practice (compress, gzip, etc.)
More complicated methods
 Burrows-Wheeler (bunzip2)
 PPM statistical methods
CompSci 001
3.73
Download