Uploaded by 藍允澤

course slides - 2023

advertisement
計算機概論
Introduction to Computer Science
游家牧
Chia-Mu Yu
交通大學資訊管理與財務金融學系
Department of Information Management and Finance
chiamuyu@gmail.com
Textbook, and Reference Books
2
Textbook, and Reference Books
3
Textbook, and Reference Books
4
Grading Policy
管院規定期中考 30%, 其他部分自定, 因此我們就是以下規定
• Midterm 30%, Final Exam 30%, Quiz and Assignment 40%
• Basically, the average of final score will be shifted to 78
5
Schedule for This Semester (2023 Fall)
• We will have quiz in the following date
• 10/3, 10/23, 11/28, 12/12
小考當天, 前一小時仍然上課, 後兩小時紙筆考試
• We follow the official schedule from NYCU
• Totally 16 weeks for lectures
• Midterm: 10/31
• Final Exam: 12/19 期中考, 期末考當天均是三小時都用來紙筆考試
10/3 當天老師請假 1 小時, 將以每次上課多上 10 分鐘來做補課, 希望這樣降低另找時間補課的需求
6
Alert
管院規定期中考範圍是前三章, 期末考範圍則由授課老師自行決定, 我們會較為超前
• Midterm coverage will be Chapters 1~3, but we have our own schedule
• Not all of the chapters in the textbook will be covered in this semester
• This semester we will have midterm on Nov 1st (Tuesday), 2022
7
• Chapter 0: Introduction
8
Introduction
• Computer science (CS)
臺灣常常翻譯成資訊工程, 但是應該是電腦科學
• Deal with computer design, computer programming, information processing
• Some important terminologies
•
•
•
•
•
Algorithm: a finite sequence of instructions to solve a problem
(computer) program: algorithms in a way that machine can understand
Programming (coding): an action that composes a computer program
Software: algorithm and program
Hardware: machine (e.g., PC and laptop)
9
Pseudocode (Algorithm) and Code (Program)
Pseudocode
Code
10
Algorithm
• Algorithm design is a math problem usually
• However, not for math problem only
• Algorithm has limit; the true or false of
certain algorithms cannot be verified
GCD
Kurt Gödel
Incompleteness theorem
Primality test
11
Algorithm
• The research of algorithms involve the following different perspectives
•
•
•
•
What kind of problems can be solved by algorithms 計算理論
How to design a more efficient algorithms 演算法
How to compare different algorithms 演算法
How to use algorithms to simulate our brain 演算法應用
12
Abstraction
• Abstraction
抽象化
• The process of removing physical, spatial, or temporal details in objects or systems
to focus on details of greater importance
• The creation of abstract concept-objects by mirroring common features or
attributes of various non-abstract objects or systems
著重於功能性, 而暫時不考慮系統或是元件的實現細節
• For example, you might use cellphone everyday but have no idea on the
cellular communication technology
• People can pay special attention on a particular component only
• The process of abstraction is sometimes called modeling
13
Data
• Computer represents, stores, and processes discrete and numeric data
• In recent year, data starts to play a central role in CS
• The terms such as data-driven, data scientist, etc. becomes increasingly popular
• Some issues related to big data
• How computer store numeric, text, image , sound, and video data
• How computer turn the real-world continuous data into digital data
• How to prevent and correct the falsified data
14
9 Key Computer Science Topics
管院必修
資財系資管組必修
管院必修
15
• Chapter 1: Data Storage
16
Binary World
• Bit: binary digit (0/1)
• Depending on your interpretation, bits could be numbers or some other
things such as images and videos
• Boolean operations are those dealing with true and false values
• It deals with true/false values, rather than numeric values
17
Binary World
不是正常電路長成這樣 (譬如我們都沒畫電源與接地), 而只是符號, 又叫做 gate (閘)
因為抽象化的關係, 所以不需要知道更底層到底是否 5V 電壓實作
• Bit: binary digit (0/1)
• Simple, logical,
and unambiguous
現實電壓不穩, 5V 容易跳動
真值表
• Boolean operations & gates
18
Flip-Flop
存下一個 bit, 是電腦記憶體的基本元件
• Purpose: to keep the state of output until the next excitement
• Flip-Flop (FF)
R
S
•
•
•
•
Has two input lines: set and reset
One input sets its stored value to be 1
Another input sets its stored value to 0 輸入是兩個 1 時, 輸出是 undefined, 這是 FF 沒用到的
While both inputs are 0, the most recently stored value is preserved
19
Flip-Flop
S
FlipFlop
R
20
Flip-Flop
0
0
0
0
0
1
一開始設定 S=0, R=1 (一開始設定 0 0 沒意思)
1
0 進入 AND 則輸出一定 0, OR 兩個都 0, 輸出 0
21
Flip-Flop
0
0
0
0
因上一個 (上頁)
時間儲存值是 0)
0
1
0
1
0
改設定 S=0, R=0
輸出維持是 0
22
Flip-Flop
因為 OR 有 1 了定會輸出 1
1
1
1
0
因上一個 (上頁)
時間儲存值是 0)
0
1
0
0
改設定 S=1, R=0
輸出變成 1
23
Flip-Flop
因為 OR 的下面的輸入有 1 了定會輸出 1
0
1
0
1
因上一個 (上頁)
時間儲存值是 1)
1
1
0
0
改設定 S=0, R=0
輸出維持是 1
24
Flip-Flop (Another Type of Implementation)
0
0
1
0
這叫跳線
1
0
1
一開始設定 S=0, R=1 (一開始設定 0 0 沒意思)
1
0
輸出 0, 且可看出輸入不變的話也很穩定
25
Flip-Flop (Another Type of Implementation)
因上一個 (上頁)
時間儲存值是 0
0
1
0
1
0
0
0
1
設定 S=0, R=0
0
1
0
輸出 0
26
Flip-Flop (Another Type of Implementation)
1
1
因 S=1, 故
這裡為 0
0
1
1
0
1
1
0
0
設定 S=1, R=0
0
1
輸出 1
27
Flip-Flop (Another Type of Implementation)
0
因上一個 (上頁)
時間儲存值是 1
1
0
0
1
1
1
0
0
設定 S=0, R=0 看看會怎麼樣
0
1
輸出 1
到此為止, 我們知道了真的可以讓電腦【記下】一個 bit
28
Flip-Flop
• Key observation: the output is dependent on an internal state
• The output is not only a direct mapping from the input
• We mention flip-flop for three reasons
• We demonstrate that a device can be composed by gates
• Abstraction helps; flip-flop can have
different implementations
• Flip-flop can memorize a bit
統稱數位電路設計
29
Hexadecimal Coding (Hex)
用前綴 0x 代表後面是 16 進制數字, 譬如 0xB5
• Binary is usually too long for
human to remember
• Binary to Hex is straightforward
• (0010111010110101)2 = (2EB5)16
30
Main Memory
• Cell: A basic unit of main memory (typically 8 bits which is one byte)
一般來說, 平常寫 329 也是 3 最大一樣, 所以不失一般性, 也是假設最左邊最大
Higher-order end
Lower-order end
1 1 0 1 1 1 0 0 1 0 0
Most Significant Bit (MSB)
Least Significant Bit (LSB)
31
Main Memory and Address
• One dimensional 記憶體有點像是一堆抽屜, 每個抽屜放一個 cell
• Random accessible 隨機存取 (相對於循序存取, 如錄音帶)
• Access the content by the address
裡面邏輯
結構如下
• Practically, also in binary
• cf. the pointer in C/C++
32
Memory Techniques
• Random Access Memory (RAM): Memory in which individual cells can be
easily accessed in any order
SRAM 要用很多 FF, 每個 FF 要很多 gate, 會導致很大體積,
•
•
•
•
•
Static Memory (SRAM): like flip-flop DRAM 概念上就是用電容儲存即可, 但因會消逝, 要定期充電
Dynamic Memory (DRAM): Tiny capacitors replenished regularly by refresh circuit
Synchronous DRAM (SDRAM)
SDRAM 是指 refresh circuit 可以一次 (同步) 對所有 DRAM
充電, 這樣好處是可以與電腦時脈同步 (時脈以後有機會再提)
Double Data Rate (DDR)
Dual/Triple channel
通常一個時脈內可以存取一次資料, 但是 DDR 可以兩次.
DDR2~DDR4 代表不同世代 DDR, 並非 data rate 又上升,
而只是更低電壓 (譬如原本是 5V 邏輯, 但是 DDR3 應該是 1.8V,
降下來的好處是功耗變小, 以及速度快一點 (因為充電時間變短)
雙通道指如果你一次插兩條記憶體, 譬如你要寫 2 bytes 到
記憶體的話, 可以一個 byte 寫到其中一條, 另一個 byte 寫到
另一條記憶體, 理論上速度會快兩倍, 相同道理也適用於三通道
33
Memory Techniques
• Random Access Memory (RAM): Memory in which individual cells can be
easily accessed in any order
SRAM 要用很多 FF, 每個 FF 要很多 gate, 會導致很大體積,
•
•
•
•
•
Static Memory (SRAM): like flip-flop DRAM 概念上就是用電容儲存即可, 但因會消逝, 要定期充電
Dynamic Memory (DRAM): Tiny capacitors replenished regularly by refresh circuit
Synchronous DRAM (SDRAM)
SDRAM 是指 refresh circuit 可以一次 (同步) 對所有 DRAM
充電, 這樣好處是可以與電腦時脈同步 (時脈以後有機會再提)
Double Data Rate (DDR)
Dual/Triple channel
• Capacity
• Kilobyte: 210 bytes = 1,024 bytes ≈ 103 bytes
• Megabyte: 220 bytes = 1,048,576 bytes ≈ 106 bytes
• Gigabyte: 230 bytes = 1,073,741,824 bytes ≈ 109 bytes
34
Mass Storage
• Properties (compared with main memory)
•
•
•
•
Larger capacity
Less volatility 揮發性 (不會忘)
Slower
On-line or off-line 可以不搭配電腦開電
• Types
• Magnetic systems (hard disk, tape)
• Optical systems (CD, DVD)
• Flash drives
磁性裝置利用南北極代表 0 與 1, 光裝置利用雷射
對晶體加熱成不同形狀, 讓反射的雷射有點不同,
flash device 則是利用 tunneling effect 把電子丟
到一個絕緣體裡面保存起來
35
Magnetic Disk Storage System
• Head, track, sector, cylinder 一圈叫做 track (磁軌), 一圈裡面分段叫 sector (磁區)
• Access time = seek time + rotation delay / latency time
• Transfer rate (SATA 1.5/3/6, etc.)
Head 走到資料位置叫 seek time, 但是走到那一圈之後
又要等對應 sector 轉過來, 這叫 rotation delay, 這與
硬碟轉速成反比, 越高轉速的 rotation delay 就越少
36
Optical Storage
相比於硬碟一圈一圈地儲存, CD 儲存資料 (尤其是音樂) 的方式
是採取螺旋狀的方式, 也就是磁軌是螺旋狀儲存, 才分成各磁區
37
Physical vs. Logical Records
• Files and file systems 決定資料真正放在哪個地方? 尤其是不能儲存在連續磁區時
• Fragmentation problem 檔案被分散不同地方
• We talk about this later in OS
38
Buffer
緩衝區的概念很通用, 通常是協調不同速度的裝置; 譬如蒐集夠足夠多的資料後才進行傳輸
• Purpose: To synchronize (or to make compatible) different R/W
mechanisms and rates
• A memory area used for the temporary storage of data (usually as a step
in transferring the data)
• Blocks of data compatible with physical records can be transferred
between buffers and the mass storage system
• Data in buffer can be referenced in terms of logical records.
39
Representing Text
• ASCII (American standard code for information interchange by ANSI): 7
bits (or 8 bits with a leading 0) 只有用到 7 bits, 如果用 byte 表示, 則 MSB 為 0
• Unicode: 16 bits ASCII 基本上只適用英文, 因此設計出 Unicode 表示各國文字符號, 如中文, 日文等等
• ISO standard (international organization of standardization): 32 bits
40
Unicode and UTF-8
有一字多型問題:如「ɑ/a」、「強/强」、「戶/户/戸」
• Unicode is still being revised, as of March 2020, its 13.0.0 version contains
130k characters
• Depending on platforms and storage requirements, Unicode
Transformation Format (UTF) defines the implementation of unicode
unicode 基本用 2 bytes 表示, 譬如常見中文用 2 bytes, 但冷僻中文
則需要用到 3 bytes; 其中如果是 ASCII 就有的字甚至不用 1 byte,
各語言都有不同. 如果全部都用 3 bytes 表示的話, 那將造成浪費.
utf-8 定義了一種方式讓我們識別不同長度的 unicode
PC 與 Mac 對位元次序的認知不相同, 因此也要特別用 little endian
(LE) 與 big endian (BE) 來辨別
41
Representing Numeric Values
42
From Binary to Decimal
43
From Decimal to Binary
44
Representing Images
• Bit map techniques
• Pixel: picture element 像素
• Colors: RGB, HSV, etc.
• LCD, scanner, digitcal cameras, etc.
• Vector techniques
• Scalable
• TrueType, Postscript, SVG (scalable vector graphics), etc
• CAD, printers
45
Representing Sounds
Bit resolution 也可解釋成縱軸只有幾個 level
• Sampling
bps 為單位
• Sampling rate 多久 sample 一次
• Bit resolution 要用多少 bit 表示實際的值
• Bit rate (sampling rate ✕ bit resolution)
• MIDI (synthesis)
就好像直接記下樂譜, 直接讓音效卡預錄的 Do, Re, Mi 按照
樂譜演奏出來, 但這也導致每台電腦的撥出都可能不一樣
46
Binary System Revisited
都只講加法, 所以所謂的減法就是加上一個負數, 但是這樣我們就要定義【負數】
47
Two's Complement Notation
• Range: -2n-1 ~ 2n-1-1
2 補數表示法
為什麼不要乾脆犧牲 MSB 來區分正負號就好? 會有正 0 跟負 0 的差別很奇怪
48
Two's Complement Notation
0 1 0 (2)
+ 1 1 1 (-1)
0 1 1 (3)
+ 1 1 0 (-2)
0 1 1 (3)
+ 1 0 0 (-4)
1001
1001
0111
2 補數方法的好處之一是可以直接作加減, 譬如這裡是 3-bit
加法, 因為只有 3 bits, 所以進位放棄剛好等於結果
-1 在 2 補數表示為 111, 但是 7 的二進制也是 111, 所以可
以想成是 2+7=9, 但是 9 = 1 (mod 8), 所以才導致上述
1001, 整個3-bit 2 補數的設計可以形成一個 mod 8 的機制
49
Two's Complement Encoding
• Textbook's way
針對一個二進制數字, 從右往左看, 遇到 1 (包含) 之前
全部照抄, 之後的全部倒著寫 (前提得要先確定 bit 數)
• Alternative way, for positive x:
• x: binary encoding of x
• -x: binary encoding of (2n - x)
另個方法是觀察到其實是 mod 的循環, 所以加上 2n 即可
持續順時鐘看, 都是由小到大的一個過程
50
Subtraction in 2's Complement
51
Excess Notation
另一種表示負數的方法
Binary Pattern
Decimal
Excess
000
001
010
011
100
101
110
111
0
1
2
3
4
5
6
7
-4
-3
-2
-1
0
1
2
3
整體讓位給負數
Excess 的好處在於【比較電路】不用變, 因為所有次序都相同, 只是賦予的意義不同, 但是 2 補數的次序則遭到改變,
譬如 (111)2 竟然比 (000)2 小, 但是當然 2 補數的加法器不用改變則是他獨特的好處
52
Excess Notation
另一種表示負數的方法
• In excess notation, the MSB serves as the sign bit - 1 represents the nonnegative (+) sign and 0 indicates a negative (-) number
• Conversion
• x → (2n-1 + x) mod 2n
3 bit 來說, 記住已經平移, 所以 2 移動到 6 的位置, -3 移到 1 的位置,
這種轉換對人類很難, 但對電腦來說是 MSB 變成 1 就好
• Addition
仍是不斷做加 2n-1 的動作
• x+y → (2n-1 + (2n-1 + x) + (2n-1 + y)) mod 2n
= (2n-1 + x + y) mod 2n
2+(-1) = (110)2 + (011)2 = (1001)2 後再加 4 且丟掉 MSB 變成 (101)2
道理是前兩次 conversion 都加 4, 所以變成 MSB 的 1 可以丟掉, 但是剩
下 (101)2 仍得要符合這上頁那張表的規定, 所以還得再加 4 一次
2 補數
Excess notation
53
Comparison in the Case of 3-bit Representation
x + (-y)
= x + (8-y)
=x-y
2 補數
x+y
= (x+4) + (y+4)
=x+y+8
However, in fact we want x+y+4,
this explains why we have to add 4 eventually
Excess notation
54
Overflow
• Overflow occurs when the arithmetic result is out of the range of
representation
• Addition of two positive numbers
• 2 + 3 = 5 → 3 (mod 8)
2 個負數或是 2 個正數相加都可能 overflow, 但是 1 正 1 負相加會嗎?
• Addition of two negative numbers
• (-2) + (-3) = -5 → 3 (mod 8)
2 個負數相加變成正數
55
Real Example of Overflow
可以觀察的到 overflow 真的發生了
56
Fraction in Binary (Fixed-Point)
57
Float-Point Notation
+1*2-10
• Why? (How to represent 0.000000000000001?)
尾數, 假數
不是 2 補數
• On most current 64-bit computers, the exponent takes 11 bits, and the
mantissa takes 52 bits (IEEE 754 standard) 得要是國際通用才行
58
Float-Point Notation
1 1 0 1
0 1 0 1
小數點固定在 exponent 與 mantissa 之間, 上述代表 (-), (+1), (.0101), 湊起來等於 -(1/4+1/16)*2+1
最小正數 +.0001*2-4 (這也代表精準度的極限), 最大正數 +.1111*23
59
Decoding Floating-Point
• 01101011
→ (0)(110)(1011)
→ (+)(+2)(1011)
0.1011 → 10.11 → 2 + 1/2+ 1/4 = 11/4
• 10010011
→ (1)(001)(0011)
→ (-)(-3)(0011)
-0.0011 → -0.0000011 → -(1/64 + 1/128) = -3/128
60
Truncation Errors
假數的記憶量不夠
• Required precision is beyond the limitation of the mantissa
到此已經知道不妙, 因為 mantissa
只有 4 位, 這裡卻有 5 位
單純把他
變成 2 進制
0
小數點移
動到最左
1
1
0
61
Normalized Form
為了避免小數點任意浮動, 導致一個數有不同表示法做個規定
• Rule: the most significant bit of mantissa is 1
• 0’s floating-point representation is all zero
Normalization
• 01100011 → (0)(110)(0011) → 0.0011 ✕ 22
→ 0.1100 ✕ 20 → (0)(100)(1100) → 01001100
IEEE standard 上面說法代表 mantissa 最前面一定是 1, 那這樣乾脆就不用把 1 記下來了才對
• The left-most bit in mantissa is always 1 → omit it
• An IEEE standard normalized form is (s)(eee)(mmmm)
→ (-1)s ✕ 1.mmmm ✕ 2(eee-4)
• 01100011 → (0)(110)(0011) → 1.0011 ✕ 2(6-4)
62
Loss of Digits
浮點數怎麼相加? 回想 4*108 + 2*107 你怎麼做? 就是先把次方一致化後在相加
因為要移動到 exponent = 111,
所以要移動 mantissa 四位, 1 就
會被移出去了, 導致 info loss
隨著執行順序不同, 而有不同結果
63
Floating Number Has Limitation
(0.1)10 無法被確切地被表示成二進制
64
Data Compression
• Lossy vs. lossless
• Run-length encoding 如果接下來是 100 個 1, 那只要傳【接下來是 100 個 1】而非真的把 100 個 1 傳出
• Frequency-dependent encoding ASCII 是種固定長度編碼, 但如果先知道頻率呢?
• Huffman encoding
本來固定長度編碼很容易區分哪幾個 bit 一段, 但是不固定長度怎麼解碼呢?
• Relative encoding / difference encoding 常用於影片, 採用這一張圖與下一張圖的差距
• Dictionary encoding 如果一串字很常見的話, 乾脆給他一個簡單的代碼 碼書 codebook 要一起傳輸
• Adaptive dictionary encoding
• LZW encoding
65
Huffman Coding
傳統編碼是個個字元編碼成一模一樣長度,
但是 Huffman coding 則是編碼成不同長度
右側是建立 A_DEAD_DAD_CEDED_A_BAD_
BABE_A_BEADED_ABACA_BED Huffman tree 例子
其實是一種 prefix code, 有興趣者請參考通訊理論
也要將 tree 傳給對方才能夠讓對方解碼
66
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
編碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
xyx xyx xyx xyx
1
chars code
x
1
y
2
space
3
67
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
編碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
xyx xyx xyx xyx
12
chars code
x
1
y
2
space
3
68
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
編碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
xyx xyx xyx xyx
121
chars code
x
1
y
2
space
3
69
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
編碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
xyx xyx xyx xyx
1213
遇到 space 就可以把前面字串加入至當下的字典裡
chars code
x
1
y
2
space
3
xyx
4
70
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
編碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
xyx xyx xyx xyx
1213 4
chars code
x
1
y
2
space
3
xyx
4
71
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
編碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
xyx xyx xyx xyx
1213 4 3
chars code
x
1
y
2
space
3
xyx
4
72
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
編碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
xyx xyx xyx xyx
1213 4 3 4
chars code
x
1
y
2
space
3
xyx
4
73
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
編碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
xyx xyx xyx xyx
1213 4 3 4 3
chars code
x
1
y
2
space
3
xyx
4
74
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
編碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
xyx xyx xyx xyx
1213 4 3 4 3 4
chars code
x
1
y
2
space
3
xyx
4
75
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
解碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
1213 4 3 4 3 4
x
chars code
x
1
y
2
space
3
76
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
解碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
1213 4 3 4 3 4
xy
chars code
x
1
y
2
space
3
77
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
解碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
1213 4 3 4 3 4
xyx
chars code
x
1
y
2
space
3
78
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
解碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
1213 4 3 4 3 4
xyx
遇到 space 仍是把前面字串加入至當下的字典裡
chars code
x
1
y
2
space
3
xyx
4
79
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
解碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
1213 4 3 4 3 4
xyx xyx
chars code
x
1
y
2
space
3
xyx
4
80
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
解碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
1213 4 3 4 3 4
xyx xyx xyx
chars code
x
1
y
2
space
3
xyx
4
81
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
解碼
一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子)
1213 4 3 4 3 4
xyx xyx xyx xyx
chars code
x
1
y
2
space
3
xyx
4
82
LZW Coding
是一種字典壓縮法, 但是字典不需要傳給對方即可解碼
• A dictionary encoding which does not
need to store the dictionary
83
Images, Audios, and Videos
通常都是 lossy encoding
• GIF: 256 colors, dictionary encoding
• JPEG
• Lossy or lossless
• Discrete cosine transform
• Discard high-frequency information insensitive to human eyes
• MP3
• Temporal masking
• Frequency masking
• MPEG
• Relative encoding & other techniques
紀錄連續資料區塊 (data block) 的差異; 如 88, 90, 95, 98, 92 (2*5 = 10B), 88, +2, +5, +3, -6 (2+1*4 = 6B)
84
Communication Errors
• Compression
• Remove redundancy
Pure noise 無法壓縮
• Error detection & correction
壓縮後若傳輸時有錯則整個都錯
• Add redundancy to prevent errors
• Error detection: Check code
• Cannot correct errors, but
can check if errors occur
• ID numbers
• ISBN
• Parity code
• Error correcting
• Correct errors (some degree)
85
Taiwan ID
數字間隱含機制, 檢查身分證字號是否合規
86
Taiwan ID
數字間隱含機制, 檢查身分證字號是否合規
87
ISBN-10
識別書籍的編碼
• Given ISBN-10 d1d2…d10
• It must follows 10d1 + 9d2 + … + 1d10 (mod 11) = 0
有兩個數字寫反也可以
• For example, the ISBN-10 of textbook is 0-273-75139-5
88
Parity Bits
• Add an additional bit to make the whole
odd number of 1s 錯兩個 bit 就偵測不出來
• Communication
• RAID (redundant array of independent disks)
techniques 磁碟陣列, 目的在於提升效能或資料冗餘
89
RAID
資料分別儲存至不同硬碟, 如雙通道變快
資料被複製儲存至另一硬碟
先 RAID 1 後再 RAID 0, 但至少要四顆硬碟
容許壞同一組的兩顆硬碟, 還能正常運作
以先鏡射再分割資料
不容許壞同一組的兩顆硬碟
將資料和相對應的奇偶校驗資訊平均儲存到每塊硬碟上
90
An Error-Correcting Code (ECC)
• (3,1)-repetition code (can correct 1-bit errors)
伺服器記憶體具有 ECC, 桌電沒有
實際上沒在用, C/P 值不高
91
Another Error-Correcting Code (ECC)
Hamming distance 代表兩串 bit string 之間差異 bit 個數
• Maximized Hamming distances among symbols (at least 3)
• If Receiving 010100, then decode it as D
可以檢查一下, 任意兩
個 symbol 之間的
Hamming distance
至少有 3
從空間的觀點來看, 會
發現如果只有壞掉一個
bit 的話, 那麼也不會
離原本的位置太遠
若收到 010100, 則純粹
計算與全部 symbol 的
Hamming distance 就
可以解碼了
比起 repetition code
來說 C/P 值好一點
92
Hamming (7, 4) Code
假設要傳 4 個 bit, 依序為 1101, 則先劃出 Venn Diagram 並幫中間交集區域編號如下
把 1101 分別填入對應位置
93
Hamming (7, 4) Code
非交集區域也編號如下, 但手動算出 even parity, 當然 odd parity 也可以
94
Hamming (7, 4) Code
可以檢查一下 even parity 是不是對的, 若無誤的話, 就實際送出 1101 100 這個 encoded bit string
95
Hamming (7, 4) Code
假設 m2 壞掉, 我們怎麼確認是 m2 壞掉呢? 這裡可知 p1, p3
都會湊不起來, 所以可以確認是 m2 壞掉. 當然, 如果是 m4 壞掉
的話, 3 個 parity 都會壞, 若是 parity 壞掉的話, 只有該 parity 壞掉
96
• Chapter 2: Data Manipulation
97
Motherboard
主機板
98
Computer Architecture
CPU 有自己記憶體
也就是暫存器
前述主機板就是在做 CPU 與記憶體之間溝通, 但 bus 看不到
Bus (匯流排)
99
Adding Values Stored in Memory
CPU 怎麼加兩個在記憶體內的數字?
看似簡單的運算實際上有點複雜, 其原因在於數字放在記憶體, 但是加法運算得要是 CPU 完成
 Get one of the values to be added from memory and place it in a
register through bus
 Get the other value to be added from memory and place it in another
register through bus
 Activate the addition circuitry with the registers used in Steps 1 and 2 as
inputs and another register designated to hold the result
 Store the result in memory through bus
100
Machine Instructions
• Three categories of instructions
因為是下指令給 CPU/機器 完成某工作, 所以稱機器指令
非常的 machine specific, 不同機器的指令集不一樣,
也就是回想 CPU 腳位時, 對應於不同電壓
• Data transfer
• LOAD, STORE, I/O
• Arithmetic/logic
• AND, OR, ADD, SUB, etc.
• SHIFT, ROTATE
• Control
• JUMP, HALT
• How to implement machine instructions
• RISC (Reduced Instruction Set Computing) (PRC, SPARC)
• CISC (Complex instruction set computing) (x86, x86-64)
101
Architecture of a Simple Machine
暫存器都是用 SRAM 技術
程式放在記憶體的哪裡
儲存現在正要執行的指令
這本教科書假想的暫存器只有 16 個 (0~F)
103
Fetch
擷取
要執行的指令放 A0 處
這本教科書假想的 IR 有 2 bytes,
所以等等會擷取 15 6C 放到 IR
15 6C 代表甚麼意思等等再講
104
Example of a Machine Instructions
代表這個指令要做甚麼事
代表這個指令要算的資料
這樣的機器指令定義出來後
透漏很多資訊, 譬如記憶體
最多就是 256B, 因為每個
cell 存 1B, 且每個記憶體位
址頂多只能 1B 來表示, 所以
頂多 256B
0011 代表儲存 (視 CPU 而不同)
105
Adding Two Values Revisited
直接寫機器指令也太累了,
有個稍微高階一點的組合
語言可以代替, 讓人類撰寫
程式語言較為簡便
這得要查詢教科書
後面的表格才行
與機器語言一對一對應
Compiler 並不是一個一個
照翻而已, 有很多特例, 譬如
如果 c=a+b, x=x+c, 這樣
的話, 在處理 x=x+c 時 c 就
不用存到記憶體再重 load
106
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
R1
R2
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
CPU
Instruction Register
ALU
一開始只有 memory 裡面有資料與指令
107
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
R1
R2
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
0000
CPU
Instruction Register
ALU
假設 program counter 裡面指向 0000
108
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
R1
R2
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
0000
CPU
Instruction Register
把 0006 的數字放進 R0
ALU
把指令撈回 instruction register 裡面放著
109
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
R1
R2
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
0001
CPU
Instruction Register
把 0006 的數字放進 R0
ALU
Program counter 裡面的值加 1
110
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
R1
1
R2
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
0001
CPU
Instruction Register
把 0006 的數字放進 R0
ALU
解讀 instruction register 內的指令, 把 0006 的數字放進 R0, 至此是一個 machine cycle (見後面說明)
111
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
R1
1
R2
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
0001
CPU
Instruction Register
把 0006 的數字放進 R0
ALU
Program counter 裡面指向 0001
112
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
R1
R2
1
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
0001
CPU
Instruction Register
把 0007 的數字放進 R1
ALU
把指令撈回 instruction register 裡面放著
113
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
R1
1
R2
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
0002
CPU
Instruction Register
把 0007 的數字放進 R1
ALU
Program counter 裡面的值加 1
114
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
R1
1
2
CPU
R2
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
0002
Instruction Register
把 0007 的數字放進 R1
ALU
解讀 instruction register 內的指令, 把 0007 的數字放進 R1, 至此是第二個 machine cycle
115
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
R1
1
2
CPU
R2
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
0002
Instruction Register
把 0007 的數字放進 R1
ALU
Program counter 裡面指向 0001
116
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
1
全部程式執行結束
R2
1R1
2
CPU
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
0002
Instruction Register
把 R0, R1 的數字相加放入 R3
ALU
把指令撈回 instruction register 裡面放著
117
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
Register 0 (R0)
R1
1
2
CPU
R2
0005
0006
0007
1
2
1
0008
1
Control Unit
Program Counter
0003
Instruction Register
把 R0, R1 的數字相加放入 R3
ALU
Program counter 裡面的值加 1
118
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
0005
0006
0007
1
2
1
Register 0 (R0)
R1
R2
Control Unit
1
2
3
Program Counter
CPU
0008
1
0003
Instruction Register
把 R0, R1 的數字相加放入 R3
ALU
解讀 instruction register 內的指令, 把 R0+R1 放到 R2, 至此是第三個 machine cycle
119
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
0005
0006
0007
1
2
1
Register 0 (R0)
R1
R2
Control Unit
1
2
3
Program Counter
CPU
0008
1
0003
Instruction Register
把 R0, R1 的數字相加放入 R3
ALU
Program counter 裡面指向 0003
120
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
0005
0006
0007
1
2
1
Register 0 (R0)
R1
R2
Control Unit
1
2
3
Program Counter
CPU
0008
1
0002
Instruction Register
把 R3 數字放進 0008
ALU
把指令撈回 instruction register 裡面放著
121
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
0005
0006
0007
1
2
1
Register 0 (R0)
R1
R2
Control Unit
1
2
3
Program Counter
CPU
0008
1
0004
Instruction Register
把 R3 數字放進 0008
ALU
Program counter 裡面的值加 1
122
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
0005
0006
0007
0008
1
2
3
1
Register 0 (R0)
R1
R2
Control Unit
1
2
3
Program Counter
CPU
0004
Instruction Register
把 R3 數字放進 0008
ALU
解讀 instruction register 內的指令, 把 R2 放到 0008, 至此是第四個 machine cycle
123
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
0005
0006
0007
0008
1
2
3
1
Register 0 (R0)
R1
R2
Control Unit
1
2
3
Program Counter
CPU
0004
Instruction Register
把 R0, R1 的數字相加放入 R3
ALU
Program counter 裡面指向 0004
124
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
0005
0006
0007
0008
1
2
3
1
Register 0 (R0)
R1
R2
Control Unit
1
2
3
Program Counter
CPU
0002
Instruction Register
全部程式執行結束
ALU
把指令撈回 instruction register 裡面放著
125
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
0005
0006
0007
0008
1
2
3
1
Register 0 (R0)
R1
R2
Control Unit
1
2
3
Program Counter
CPU
0005
Instruction Register
全部程式執行結束
ALU
Program counter 裡面的值加 1
126
An Example of Add Operation between 1 and 2
RAM
0000
0001
0002
0003
0004
把 0006
的數字放
進 R0
把 0007
的數字放
進 R1
把 R0, R1
的數字相
加放入 R2
把 R2 數
字放進
0008
全部程式
執行結束
1
0005
0006
0007
0008
1
2
3
1
Register 0 (R0)
R1
R2
Control Unit
1
2
3
Program Counter
CPU
0005
Instruction Register
全部程式執行結束
ALU
解讀 instruction register 內的指令, 全部程式執行結束
127
Program Execution
也並不是每顆 CPU 都採這種三階段設計
• Instruction register (IR),
program counter (PC)
• Machine cycle 一次右側的循環
• Clock
一秒鐘可執行多少次這樣的 cycle, 下面可做 3G 次
搞清楚到底要做甚麼事情
類似於人類查表的動作
128
Logic/Bit Operations
• Masking
藉由 0 或是 1 來去除或是保留某些資訊
129
Shift/Rotation
保留 MSB 的正負號
130
mask 設定成 1 之後左移 7 位
從程式的角度來看, 可以找到
簡單 shift + mask的應用如左
一個一個去 mask 看看, 如果 mask 之後為 0 則輸出 0, 反之為 1
記得每次都需要再把 mask 往右移動一格
01011
10000
01011
01000
01011
00100
00000
01000
00000
0
非0
0
131
Controller
概念上, 周邊裝置也是透過 bus 來與 CPU 溝通
• Specialized 譬如只針對硬碟的 SATA
• General: USB, FireWire
以前的 1394
bus 是有限資源, 越多周邊裝置與 CPU 進
行溝通則 CPU-memory 通訊越容易被打
斷, 因此有 controller 利用 buffer 來降低
周邊裝置占用 bus 的時間
132
Memory-mapped I/O
如果週邊 I/O 存取的方式很不一樣的話, 又要多很多機器指令, 所以
一個方法是假裝我們只有記憶體存取, 但是設定好如果存取某些特定
位址就等同於存取某特定周邊裝置
133
Communication with Other Devices
• DMA: direct memory access
除 CPU 外仍有其他裝置想存取記憶體, 但仍需中斷 CPU 取得允許
• Once authorized, controllers can access data directly from main memory without
notifying CPU 等於可以通知 CPU 一次之後就持續可以使用記憶體, 大量用在譬如光碟機上
• Handshaking
• 2-way communication
• Coordinating activities
• Parallel/Serial 概念上很多條線一起傳與一條線來傳的意思, 但是不見得 serial 一定比較慢
• Transfer rate: bit per second (bps, Kbps, Mbps, etc)
134
Pipelining
fetch, decode, execute 的電路其實都不一樣, 所以嚴格說來可以同時間運作
• Throughput increased
• Total amount of work accomplished in a given amount of time
• Example: pre-fetching
• Issue: conditional jump 
Branch prediction 可以克服, 譬如一個 1000 圈的 for-loop 有 999 次都跳回去
理想上, 你的效能提升三倍或是說 throughput 提升三倍
135
Parallel/distributed Computing
• Parallel
a=b+c, d=e+f 可變 (a, d)=(b, d)+(c, f)
現在 CPU 已經非常高頻, 現在轉向平行運算
• Multiprocessor
又叫向量化指令集
• MIMD, SISD, SIMD (Single Instruction Multiple Data), MISD
• Distributed
多核之後 SIMD 自然變成 MIMD
• Linking several computers via network
• Separate processors, separate memory
• Issues:
•
•
•
•
Pipeline 可以想成是一種
MISD, 因為是多個指令,
但是同一份資料來源
譬如一個單一指令卻用在很多份資料上; 這天生很適
合用在多媒體資料上. 譬如對單一 pixel 調亮度, 但
是應用在全部 pixel 上的話等同對影像調亮度
平行跟分散之差別是前者有 shared memory
Data dependency
Load balancing
Synchronization
Reliability
136
To Parallelize XOR Not to Parallelize
以下都是依賴於 compiler 的
最佳化, 但是實務上蠻困難的
有 dependency, 所以
好像不能平行化
有更深的 dependency,
所以好像不能平行化
但是可以變成每
跳兩格就乘 2
CPU 1 的運算不依
137
賴 A[i], 所以可平行化
Speedup & Scaling
用來計算平行化到底可給你多少的加速
P 代表有多少百分比的工作是可被平行化,
M 代表幾核, S 則是 (1-P), 則可大略算出加速
分子代表本來要執行的工作量, 分母代表若有平行化的工作量,
這樣對比出來的 ratio 就可以理解成是 speedup gain
以 P=0.8 來畫的
138
• Chapter 3: Operating Systems (OS)
OS 是讓使用者能充分地利用所購買硬體資源的系統程式
139
Batch Processing
非常久遠以前的情境
除程式設計師外, 另有特別的電
腦操作員, 讓電腦能夠被充分運作
• Computer operators
• First-in, first-out (FIFO)
電腦操作員可想成最原始的
OS, 也就是想辦法讓電腦忙碌
140
Interactive Processing
• OS with remote terminals
程式設計師思考空檔很多, 所以導致電腦可以服務多人
上頁狀況在鍵盤螢幕被發明之後稍有改善, 變成核心電腦外接
很多線到很多終端裝置 (很自然叫 terminal), 讓 terminal 的
使用者能夠藉由鍵盤螢幕直接控制電腦 (terminal 不具 CPU)
141
Different Types of OS
• Batch
• Interactive
• Real-time
• Response time is critical
醫療軍事金融用途
• Time-sharing (multitasking)
• Dividing time into intervals
• Only one task is being performed at any given time
• Multiprocessor
• Load balancing and scaling
越多核對使用者體驗真的有幫助嗎?
142
Definition of OS
• An OS is system software that manages computer hardware, software
resources, and provides common services for computer programs – wiki
作業系統的定義有點籠統, 但基本上就是一個統整軟硬體資源的軟體
故事角色
電腦名詞
政府
作業系統
土地
記憶體
人
執行緒
祭司
CPU
類比: 古代社會對土地使用沒有規範, 因此人類成立政府管理
生活, 政府即可對人, 土地, 各種資源做有效管理與運用. 古代
政府則由祭司指導人類工作
執行緒是甚麼後面會解釋, 目前先不解釋
143
How OS Gets Started
• OS is still a program and so it needs to be placed in the memory so that
CPU can fetch and execute OS
• Who places the OS in the memory and initially where can it find the OS?
開機流程是: BIOS → Boot loader → Bootstrap, 首先放在主機板 ROM 上的 BIOS 執行起來負責許多與硬體溝通和初始化硬體
的動作的程式, 幾乎每張主機板都有特別設計的 BIOS 負責設定 CPU 頻率, RAM 的速度, 抓取硬碟等等. 然後存在於 x86 機器上
硬碟 MBR (主要開機磁區) 可以把 OS 載入. OS 被載入後執行的一系列過程直至使用者可以使用為止的步驟叫做 Boostrap
144
Booting
OS 也是個程式, 也要放在記憶體內才能執行, 那到底誰執行他的, 以及怎麼執行, 其實是 BIOS
電腦的 ROM 內的 boot loader 會到硬碟內載入
到記憶體, 再歸還權力給 OS, 而 boot loader 裡面
我們比較能與之溝通的部分通常稱 BIOS
• Boot strapping (booting) 不求人, 自己來的意思
• You may change the booting sequence in BIOS (basic input/output system)
怎麼歸還控制權? 就是
把 PC 改到 OS 起始點
EEPROM, P 代表 programmable, 但是一燒進去壞掉
要整個更換, 所以發展出 E 代表 erasable, 可利用紫外線
照射 EPROM 來抹除資料, 但是這樣太麻煩, 所以又發
展出 electronic 電子式抹除資料方便很多
BIOS 怎麼知道 OS 在
哪裡? OS 會把自己固
定放在 MBR 這位置
145
Analogy of OS
類比: Alice 與 Bob 決定結婚, 因此請婚禮企劃人員撰寫結婚申請書並將之遞交紐約市政府, 市政府收到申請書後先放到籃子裡,
政府執行員從籃子依序逐行地閱讀申請書內容, 且每讀一行就做一件對應的事情. 譬如讀完第四行就馬上派人通知教堂做準備.
從以上可以看出紐約市政府是扮演統籌婚禮進行的主要單位, 有了這單位婚禮才能順利進行
故事角色
電腦名詞
紐約市政府
作業系統
婚禮企劃人員
程式設計師
結婚申請書
執行檔
結婚申請書格式
執行檔格式
內文 (企劃人員撰寫) 程式碼 (設計師撰寫)
結婚申請書
紐約市政府
市長: Xavier
申請日期: 2020/2/14
內文
第一行
Bob 今年 25 歲
男方姓名
籃子
記憶體
第二行
Alice 今年 23 歲
女方姓名
政府執行員
CPU
第三行
Bob 與 Alice 要結婚
事由
第四行
婚禮將辦在市郊教堂
舉辦地點
立刻請快遞通知遠處
教堂說有人要結婚, I/O 也就是輸入輸出
請他們做好準備
146
Analogy of Interrupt
假設政府執行員已經把前幾行讀完並且發配工作至遠方教堂, 因此正處於閒置狀態. 但此時遠方教堂新郎前女友出聲反對而中止
婚禮進行. 此時政府執行員收到遠方回傳的中止訊息後會前往處理中止行為, 代處理完畢後才返回市政府. 婚禮中止有很多種可能.
現在是 5 號, 而怎麼處理 5 號中止就用 1 號解決方法.
婚禮中止表
解決方法表
故事角色
電腦名詞
中止編號 中止原因
解決編號
解決方法表編號
解決編號
遠方婚禮中止
中斷
0
新郎爸爸反對
0000 0000
0: 0000 0000
說服對方
婚禮中止表
1
新郎媽媽反對
1: 0000 0000
2
新娘爸爸反對
2: 0000 0000
中斷向量表
(interrupt
vector table)
3
新娘媽媽反對
3: 0000 0000
解決方法
4
新郎前男友反對
中斷服務常式
(interrupt
service routine)
5
新郎前女友反對
5: 0000 0001
6
新娘前男友反對
6: 0000 0001
7
新娘前女友反對
7: 0000 0001
8
婚禮突然狂風暴雨
0000 0002
8: 0000 0002
換場地
9
有人搶婚
0000 0003
9: 0000 0003
把人搶回來
0000 0001
4: 0000 0001
不理對方
當電腦 I/O 出現問題時會發生中
斷, CPU 會把目前狀態儲存起來
(上頁第四行放入 stack), 接著查詢
中斷向量表 (IVT), 並跳到中斷服
務常式起始位址, 接著執行中斷服
務常式, 最後 CPU 完成工作, 回到
中斷點, 繼續執行後面任務
147
Stack and Pointer
• Stack is a special memory with first-in-last-out property
• Pointer: usually a memory cell contains data, but we can see a memory
cell containing a memory address as a pointer
前頁當中, 就是利用 stack 來回到中斷前的地方繼續執行. 另外, ISR 的中斷服務常式其實是個 pointer 指到真正處理的程式
148
Functionalities of OS
每本書的見解與看法可能略有不同
• Main functionalities
• Process management
• Memory management
• I/O management
• Minor functionalities
• Instruction explanation management
• Network management (will be discussed in more detail in next chapter)
149
Process Management
• Code (or program) is static; it is placed in the disk or memory
• Process is kind of dynamic, or can be seen as a state; it is under execution
結婚申請書
紐約市政府
市長: Xavier
申請日期: 2020/2/14
內文
第一行
Bob 今年 25 歲
男方姓名
第二行
Alice 今年 23 歲
女方姓名
第三行
Bob 與 Alice 要結婚
事由
第四行
婚禮將辦在市郊教堂
舉辦地點
結婚申請書只是描述婚禮的內容, 是靜態的. 這
時候婚禮還沒正式舉行. 但是如果政府執行員
開始根據婚禮申請書進行閱讀與動作, 則婚禮
這件事正在被執行, 所以是動態的
靜態的部分可以被認為是程式碼, 只有程式開
始執行, 此時靜態才轉變成動態, 這時候的動態
被稱為行程或是程序 (process)
150
Process Management
比起之前的講法, 我們加入了一位【主負責人】, 其職責就是把每一行的程式碼交給政府執行員讓政府執行員去執行. 像這種動態
的執行就是行程. 而婚禮主負責人則被稱為主執行緒 (main thread)
結婚申請書
第一行
Bob 今年 25 歲
第二行
Alice 今年 23 歲
第三行
Bob 與 Alice 要結婚
主申請日期: 2020/2/14
負
責
人
男方姓名
運
行
女方姓名
方
向
事由
婚禮將辦在市郊教堂
舉辦地點
紐約市政府
市長: Xavier
內文
政府執行員 第四行
通常在一台電腦不會只有一個行程, 就好像一
個國家裡不會只有一件婚禮要進行. 但不管幾
件婚禮, 政府執行員只有一個, 所以要互搶政府
執行員來讓婚禮進行. 這種情況就是競爭. 有競
爭就需要管理, 這就是為什麼行程需要管理
回到電腦, 同時間內會有很多行程, 譬如你電腦
上防毒軟體正在持續運作, 你開了 Word 在打
報告的同時還在用 Spotify 聽歌. 多個軟體要
在同一時間被執行都得透過 CPU, 這樣軟體間
就有競爭, 因此需要管理
151
Memory Management
• Protection
Swapping 的結果就是記憶體會支離破碎 (fragmentation),
導致記憶體利用率下降; 另外, swap in 時還能否搬回原本
swap out 時的位置也是個問題, 這都是 memory
management 要處裡的.
• Protect program from accessing other program’s data
• Protect the OS from user programs
• Swapping
Base and limit register 分別紀錄 process 起始記憶體位置
(base register) 跟 process 所佔記憶體位置大小 (limit register)
在程式執行時, process 有時會需要暫時離開記憶體, 之後
會再搬回來執行, 這就叫做 swapping, 搬上搬下的動作我
們稱為 roll out 跟 roll in. 而在這裡的硬碟 (disk) 我們會
152
將它稱作 backing store
Memory Management
• Paging: to deal with insufficiency memory, each part of process is
partitioned into fixed-sized pages (usually a few KBs), store back and forth
between memory and disk
• Two confusing terminologies:
• Page 將邏輯記憶體(logical memory)分成大小相同的block
• Frame 將實體記憶體(physical memory)切割成固定大小的block
• Virtual memory is a storage allocation
scheme in which secondary memory
can be addressed as though it were part
of the main memory
• Swapping occurs when whole process
is transferred to the disk, while Paging
occurs when some part of the process is transferred to the disk
153
I/O Management
• In the extreme case of no I/O devices for computers, you never know
whether the computation results are correct or not
• However, if I/O events occur, are we always required to handle them
through CPU?
• Not necessarily, we can resort to DMA (direct memory access) for the not-soimportant I/O events, where I/O devices directly access the memory
裝置具備 DMA 功能, 可讓它們隨時讀取和寫入系統記憶
體, 而不需要在這些作業中與系統處理器互動. 「由 DMA
驅動」攻擊是當系統擁有者不存在且通常花費不到 10 分
鐘時所發生的攻擊, 使用簡單到中等的攻擊工具 (不需要電
腦反組解碼的經濟實惠、現成的硬體和軟體) . 簡單的範
例是電腦擁有者離開電腦進行快速咖啡休息, 而在中斷時,
攻擊者會插入類似 USB 的裝置, 並離開電腦上的所有秘密,
或插入惡意程式碼, 讓他們能夠從遠端完全控制電腦.
Kernel DMA Protection in MS Windows 10/11
154
Instruction Explanation Management
指令解釋管理其實就是你可以用一些指令直接
來跟 OS 溝通, 像是 Windows 的 cmd. 在裡
面可以使用類似 ipconfig 的指令查詢電腦的
IP 位址 (下一章節會談到)
155
System Calls
系統呼叫
• Memory is divided into user space and
kernel space
• General program runs on user space, but
kernels and drivers run on kernel space
• When the program running on user
space wants to ask for higher privilege
from the OS kernel, it resorts to
system calls
• System calls are provided by OS kernel
and executed on kernel space
• Function calls are provided by library
and executed on user space
156
System Calls
系統呼叫
系統呼叫時, 參數的傳遞是非常重要的. 有三種較為常見的傳遞方法, 1. 簡單的就直接傳到暫存器內, 2. 在 Linux
較常用的是把參數存在 address of block 內再傳進去, 3.用 push 或 stack 的方式, 這樣參數的傳遞可以比較多
系統呼叫有哪些呢?其實系統
呼叫可以做的事有非常多, 像
是 process control、file
management、device
management、information
maintenance、
communication 跟
protection. 那 system call
跟system program 有什麼區
別呢?簡單來說, system call
是programmer寫程式來跟系
統溝通, 而 system program
就是讓使用者運用來跟系統溝
通, 使用者不需用寫程式
157
Process States
• New: generate a new process
• Ready: after process generation, and the process is waiting for dispatch
• Running: execute the instruction and data
• Waiting: process becomes awaiting, waiting for the I/O completion
• Terminate: process releases the resource
158
Thread: Basic Unit of Process
結婚申請書
紐約市政府
市長: Xavier
申請日期: 2020/2/14
內文
第一行
第二行
第三行
第四行
第五行
政府執行員
第六行
主
負
Bob 今年 25 歲
責
人
Alice 今年 23 歲
運
Bob 與 Alice 要結婚 行
方
前往敬酒
向
前往謝神
婚禮將辦在市郊教堂
男方姓名
女方姓名
事由
由於敬酒跟謝神都要花很多時間, 主負責人就
會卡在這兩步得花很久時間才能離開
回到電腦, 主執行緒就會卡很久才能把行程完
成
舉辦地點
159
Thread: Basic Unit of Process
結婚申請書
紐約市政府
市長: Xavier
申請日期: 2020/2/14
內文
第一行
第二行
第三行
第四行
第五行
政府執行員
第六行
主
負
Bob 今年 25 歲
責
人
Alice 今年 23 歲
運
Bob 與 Alice 要結婚 行
方
前往敬酒
向
前往謝神
婚禮將辦在市郊教堂
男方姓名
女方姓名
員 事由
工
1
主負責人聘請員工處理事情, 這樣就有兩人以
上可以同時處理事情
員
工
2
回到電腦, 主執行緒就不必卡很久才能結束行
程
舉辦地點
160
Process vs. Thread
• Process
• 每個應用程式至少都是一個 process
• 對作業系統來說, 它是資源分配的最小單位
• Thread
• 對作業系統來說, 它是最小的操作單位,
是 CPU 的最小執行單位, 它包含在 process 中
• Thread 是程式碼片段實際的執行者, 它可以存
取 process, OS resources 等等提供的記憶體
• 在執行程式時, thread 會將變數存在記憶體 stack 部分. Stack 會在程式 runtime
執行, 但在 thread 中的 stack 只有它自己可以使用, 無法讓其它 thread 共享
• Heap 則是 process 中的另一個屬性, 它可以被該 process 中的任何 thread 取用,
也就是 heap 是共享的記憶體空間
• OS 可以分配 CPU 直接給 thread 來進行工作, 然後在同一個 process 中的
thread 都可以共享 process 的記憶體空間
161
Process Control Block (PCB)
• PCB: the information about process
will be stored in memory
• The information includes:
•
•
•
•
•
•
•
Process id
Process state
Program counter
CPU registers
CPU scheduling priority
Memory management information
I/O status information
162
Different OS in Real World
整合軟硬體資源的 OS 存在於電腦, 也存在於手機, 智慧型手表, 甚至是現在已經逐漸普及的智慧型車輛
163
Software Classification
實務上很難區分 application 與
utility, 所以也可以把軟體粗分成
application 與 system 即可
使用者能接觸到的只有 shell
164
Shells
• Communication with users
• Text based
• GUI (graphics user interface), such as window manager
165
Kernel
至少會包含下面幾個 component
• File manager
• Directory/folder, path
• Device drivers 由於 Windows 太普及, 裝置生產商都已經取得 MS 認證, 所以大家感覺不出 driver 重要性
• Memory manager 最主要就是管理主記憶體與虛擬記憶體 (拿硬碟當記憶體用)
• Allocating main memory
• Paging, virtual memory
• Scheduler
• Dispatcher
如果跑的程式比較大, 這時就會需要虛擬記憶體的幫助. 虛擬記憶
體是讓程式以為有連續的記憶體空間可以使用, 但事實上有些會存
放在 disk 上, 當有需要時再交換進來,因為程式在執行時, 並不是所
有的 code 都會用到, 所以可以將某部分放到 virtual memory 中
號稱是 kernel of kernel , 後面將會敘述
166
Linux World
• Originally made by Linus Torvalds in 1991
• Freeware & open-source 兩個並不相同, 前者要求使用沒負擔並可以轉發
• Many distro (Linux distributions, http://distrowatch.com/)
但是 kernel 相同
• For beginner: Linux Mint
• In fact, Linux means only the kernel
• Servers, PCs, embedded systems (Android’s kernel is based on Linux)
167
Process
• Process
行程
Scheduler 與 dispatcher 負責處理
程式正在執行的過程中的所有活動狀態就是 process
• The activity of executing a program
• Process state
process 包含哪些狀態呢? 至少有以下狀態
• Program counter
• General purpose registers
• Associated memory cells
• Process table
譬如要利用虛擬記憶體交換
到硬碟去之前要存下記憶體
• Memory area assigned to the process
• Ready/waiting
工作管理員的概念
168
Process Administration
• Scheduler
維護管理 process table, 但是其實很複雜, 因為包含怎麼配置記憶體
• Maintains the process table
• Introduces new processes
• Removes completed processes
• Decides whether a process is ready or waiting
• Dispatcher
每個 OS 做法都不一樣
真的去實際執行, 難處在 context switch, 也就是把 process state 存起來與載入 process state
• Really execute the program
• Controls the allocation of time slices to the processes in the process table
• Process switch (context switch) by calling interrupt
現在是分配時間給兩個 process, OS 怎麼插隊來介入兩個 process 達成上述 process state 儲存與載入相關事宜呢? 是利用中斷
(interrupt), 這可想成一種 CPU 功能, 可註冊某個中斷編號, 呼叫該中斷編號時, CPU 就跳過去做那件事, 這裡就是 context switch
169
Different Types of Schedulers
• FCFS (first come first serve) scheduler
• SJF (shortest job first) scheduler
• RR (round robin) scheduler
能否搶走別人行程的進行
• Schedulers can also be divided into preemptive and non-preemptive
Non-Preemptive SJF: 當一個行程
拿到 CPU, 不會被搶佔直到他完成
Preemptive SJF: 當有新的行程且他的
CPU burst 的長度比較小, 搶佔發生
170
Multiprogramming (Time-sharing) Between 2 Processes
Context switch 越多, 真
正執行工作的時間越少
Time slice 越短, 越多浪
費在 context switch 上
171
Semaphores
若是程式之間互有關聯, 譬如兩個程式要寫同一個檔案, 但一個寫 A 另一個寫 B?
多個程式互搶資源的話, 則要確定同時間內只有一個程式拿到資源, 這叫 mutual exclusion
• A visual signaling apparatus with flags, lights, or mechanically moving
arms, as one used on a railroad
Test-and-set 像是
• Atomic Test-and-Set 期間中斷不可發生
檢查旗子與插旗宣告
執行權, 但是若檢查
• Critical region 同時間進入這區段 (資源) 的只能一個
沒有旗子後要插旗中
間突然來個中斷則所
• Mutual exclusion
有機制就無用了
要達到 mutual exclusion 就是使用 semaphore, 右圖的 lock
172
Prerequisites for Deadlock
Process 可能會要很多資源, 資源權到期之前只能等待
• Deadlock may occur only if all three of the following (necessary but
insufficient) conditions are satisfied:
• Competition for non-shareable resources
不是一開始就要 100KB 記憶體, 而是中途才動態要
• Resources are requested on a partial basis; i.e., having received some resources, a
process will return later to request more
• Once a resource has been allocated; it cannot be forcibly retrieved 資源沒辦法強制取回
要解決 deadlock 就是讓三個條件至少一個不成立; 譬如 preemptive OS 就代表隨時可搶資源, 但副作用是導致 context switch 變
173
多與 process 執行不正確, 另外也有可能強制規定一定要一次性配置所有資源後 process 才可以進行
Deadlock vs. Starvation
避免了 deadlock 有可能導致 starvation
• Starvation: process cannot get the resources needed for a long time
because the resources are being allocated to other processes
• Aging: adding an aging factor to the priority of each request
A 拿了 1, 2 之後執行到
一半突然又要 4, 若不把
4 給 A, 則造成 deadlock,
若把 4 給 A, 則可能造
成 starvation
174
• Chapter 5: Algorithms
175
Definition of Algorithm
• [D. Knuth] A finite, definite, effective procedure, with some output
• [Cormen et al. Introduction to Algorithms] A well-defined procedure for
transforming some input to a desired output
• Input: may have
• Output: must have
• Definiteness: must be clear and unambiguous
「演算法」白話理解就是一段清
• Finiteness: terminate after a finite number if steps 楚描述以解決問題的步驟
• Effectiveness: must be basic and feasible with pencil and paper 可實現
• Procedure: the sequence of specific stopes in a logical order
176
Interval Scheduling
只有一台機器, 要處理越多工作越好
• Given: set of jobs with start times and finish times
• Goal: find maximum cardinality subset of mutually compatible jobs
工作之間不能重疊
b
0
1
2
a
c
3
4
d
e
5
f
6
7
g
8
9
h
10
11
Time
177
Interval Scheduling
只有一台機器, 要處理越多「有價值的」工作越好
• Given: set of jobs with start times and finish times
• Goal: find maximum cardinality subset of mutually compatible jobs
12
0
1
2
23
20
3
4
26
13
5
20
6
7
11
8
9
16
10
11
Time
178
Bipartite Matching
要配對越多越好的業務人員與任務
• Given: bipartite graph
• Goal: find maximum cardinality matching
179
Dominating Set
電信骨幹網路
• Given: graph
• Goal: find minimum cardinality dominating set
Subset of nodes s. t. all nodes are ``covered’’
180
Competitive Facility Location
遊戲人工智慧
• Given: graph with weight on each node
• Rules:
• Two competing players alternate in selecting nodes
• Do not allow to select a node if any of its neighbors have been selected
• Goal: select a maximum weight subset
Second player can guarantee 20,but not 25
181
Five Representative Problems
• Efficiently solvable
• Interval scheduling: nlogn greedy algorithm
• Weighted interval scheduling: nlogn dynamic programming algorithm
• Bipartite matching : nk max-flow based algorithm
• Hard
• Independent set: NP-complete
• Competitive facility location: PSPACE-complete (even harder!)
182
Intrinsic Computation Tractability
• Intrinsic computational tractability: An algorithm’s worst-case running
time on inputs of size n grows at a rate that is at most proportional to
some function f(n) 我們現在變成在意其「大略的成長率」
• f(n): an upper bound of the running time of the algorithm
• Q: What’s wrong with 1.62n2 + 3.5n + 65 steps?
• A: We’d like to say it grows like n2 up to constant factors
• Too detailed
• Meaningless
• Hard to classify its efficiency
Insensitive to constant factors and low-order terms
• Our ultimate goal is to identify broad classes of algorithms that have similar behavior
• We’d actually like to classify running times at a coarser level of granularity so that similarities
among different algorithms, and among different problems, show up more clearly
183
Asymptotic Notations O, Ω, θ
• Let T(n) be a function to describe the worst-case running time of a certain
algorithm on an input of size n
• Asymptotic upper bound: T(n) = O(f(n)) if there exist constants c > 0 and
N0 ≥ 0 such that for all n ≥ N0 we have T(n) ≤ cf(n)
• Asymptotic lower bound: T(n) = Ω(f(n)) if there exist constants c > 0 and
N0 ≥ 0 such that for all n ≥ N0 we have T(n) ≥ cf(n)
• Asymptotic tight bound: T(n) = θ(f(n)) if T(n) is both O(f(n)) and Ω(f(n))
184
Example: O, Ω, θ
• Q: T(n) = 1.62n2 + 3.5n + 8, true or false ?
 T(n) = O(n)
 T(n) = O(n2)
 T(n) = O(n3)
 T(n) = Ω(n)
 T(n) = Ω(n2)
 T(n) = Ω(n3)
 T(n) = θ(n)
 T(n) = θ(n2)
 T(n) = θ(n3)
• A: 2, 3, 4, 5, 8
Easier way to infer O
Given f(n) and g(n) are two functions,
f(n) = θ(g(n)) if limn∞f(n)/g(n) exists
185
Abuse of Notation
• Q: Why using equality in T(n) = O(f(n))?
•
•
•
•
Asymmetric:
f(n) = 5n3 ; g(n) =3n2
f(n) = O(n3) = g(n) 造成奇怪的等式
but f(n) ≠ g(n)
• Better notation: T(n) ∈ O(f(n))
• O(f(n)) forms a set
• Cf. “Is ” in English
Big-O notation 應該是一種集合的概念,
但是演算法分析上我們已經慣用等於符號
這邊的 is 其實是 belong to 的意思
• Aristotle is a man, but a man isn’t necessarily Aristotle
186
The Interval Scheduling
• Given: Set of requests (1, 2, ….,n), where ith request corresponds an
interval with start time s(i) and finish time f(i)
• Interval i: [s(i), f(i)]
• Goal: Find a compatible subset of the requests with maximum size
• Execute as many tasks as possible
b
0
1
2
a
c
3
4
d
e
5
f
6
7
g
8
9
h
10
11
187
Time
Greedy Rule
算是 greedy algorithm 的一種樣版
• Repeat
• Use a simple rule to select a first request i1
• Once i1 is selected, reject all requests not compatible with i1
• Until run out of requests
• Q: How to decide a greedy rule for a good algorithm?
• A: 上面的 simple rule 怎麼設計呢? 有很多種, 不只下面四種而已, 但是可能影響最終的 optimality
•
•
•
•
Earliest start time: min s(i)
Shortest interval: min{f(i)-s(i)}
Fewest conflicts: min i=1…n |[j: is not compatible with i]| 感覺這個很好
Earliest finish time: min f(i) 但是這個才能達到 optimal, 因為最早結束的那個人也最早把 resource 釋出
代表能達到 optimal 的 greedy algorithm 的設計也不見得真的直覺
188
Counterexamples (Awful Cases)
• Earliest start time: min s(i)





• Shortest interval: min{f(i)-s(i)}



• Fewest conflicts: min i=1…n |[j: is not compatible with i]|











189
The Greedy Algorithm
• The 4th greedy rule in last page leads to the optimal solution
• We first accept the request that finishes first
• Natural idea: Our resource becomes free ASAP
• The greedy algorithm:
Interval-Scheduling(R)
// R: undetermined requests; A: accepted requests
1. A = ∅; 空集合
2. While (R is not empty)
3.
choose a request i ∈ R with minimum f(i) // greedy rule
4.
A = A+ {i}
5.
R = R-{i}-X, where X={i: j ∈ R and j is not compatible with i}
6. Return A
190
Structure of Pseudocode
• Pseudocode is a notational system in which ideas can be expressed
informally during the algorithm development process
Assignment statement
Conditional statement
Iterative statement
191
More Complicated Structure of Pseudocode
可以合併上述多個 structure 成為更大的 structure
192
The Interval Scheduling Problem
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Time
193
The Interval Scheduling Problem
1
2
3
4
5
6
7
8
9
1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Time
194
The Interval Scheduling Problem
1
2
3
4
5
6
7
8
9
2
1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Time
195
The Interval Scheduling Problem
1
2
3
4
5
6
7
8
9
2
1
0
1
2
3
4
5
5
6
7
8
9
10
11
12
13
14
15
16
Time
196
The Interval Scheduling Problem
1
2
3
4
5
6
7
8
9
2
1
0
1
2
3
4
5
5
6
7
8
9
8
10
11
12
13
14
15
16
Time
197
Warm-Up: Searching
• Problem: Searching
• Given
• A sorted list of n distinct integers
• Integer x
• Find
• j if x equals some integer of index j
• Solution:
• Naïve idea: compare one by one
• Correct but slow: O(n)
• Better idea?
• Hint : input is sorted 有些條件沒用到
198
Divide-and-conquer paradigm
• Divide: Break the input into several
parts of the same type
• Conquer: Solve the problem in each
part recursively
• Combine: Combine the solutions to
sub-problems into an overall solution
Binary search on a sorted array
• Divide: check the middle element
• Conquer: search the subarray
recursively
• Combine: trivial
每次可以把「沒有用的部分」刪除一半, 這是很強的利基
199
Non-Divide and Conquer Approach: Insertion Sort
要花大約 n2 次的 operation
200
Merge Sort
• Sorting problem
• Given a set of n numbers
• Find Sorted list in ascending order
• Mergesort fits the divide-and conquer template
• Divide the input into two halves
• Sort each half recursively
• Merge two halves into one
A
L
G
O
R
I
T
H
M
S
A
L
G
O
R
I
T
H
M
S
A
G
L
O
R
I
I
M
S
T
A
G
H
I
L
M
O
R
S201 T
Merge Sort
適用於 sorting a linked list
• The base case: single element
(trivially sorted)
Mergesort (A, p, r)
//A[p..r]: initially unsorted
1. If (p < r)
2. q = ⌊(p+r)/2⌋
3. Mergesort (A, p, q)
4. Mergesort (A, q+1, r)
5. Merge(A, p, q, r)
•
•
•
•
Divide: line 1-2, D(n)
Conquer: line 3-4, 2T(n/2)
Combine: line 5, C(n)
T(n) = D(n) + 2T(n/2) + C(n) = O(1) + 2T(n/2) + O(n) = O(nlogn)
Array 的話 divide 花 O(1)
可以自己試著代入看看
若準備另一陣列放答案的話, 簡單可達到 C(n) = n
Recursion: Another Way to Compose Program
203
Recurrence Relation
• Running time: T(n)
• Base case: for n ≤ 2, T(n) ≤ c
• T(n) = 2T(n/2) + D(n) + C(n)
• T(n) = 2T(n/2) + O(1) + O(n)
• T(n) ≤ 2T(n/2) + cn
Mergesort (A, p, r)
//A[p..r]: initially unsorted
1. If (p < r)
2. q = ⌊(p+r)/2⌋
3. Mergesort (A, p, q)
4. Mergesort (A, q+1, r)
5. Merge(A, p, q, r)
• Q: Why not T(n) ≤ T(⌊n/2⌋) + T(⌈n/2⌉) + cn?
• A: The asymptotic bounds are not affected by ignoring floor and ceiling
204
Solving Recurrences
• Two basic ways to solve a recurrence
• Unrolling the recurrence (recursion tree) 單純就是硬展開
• Substituting a guess 先猜一個答案然後再用數學歸納法證明它, 但其實不常用 (沒什麼經驗的話很難猜到)
• Initially, we assume n is a power of 2 and replace ≤ with =
• T(n) = 2T(n/2) + cn
• Simplify the problem by omitting floor and ceilings
• Solve the worst case
205
Solving Recurrences – Unrolling Recurrence
• Merge sort has running time T(n) = 2T(n/2) + cn
每一層都要花 O(n) 時間, 但是總共有 O(logn) 層, 所以總共要花 O(nlogn) 時間
206
Master Theorem for Recurrence
• A good theorem for deriving asymptotic analysis
• The proof can be found below
https://www.cs.cornell.edu/courses/cs3110/2012sp/lectures/lec20master/mm-proof.pdf
• Involves drawing the recurrence tree and some approximations using the
geometric series progression
207
Quick Sort
時間複雜度的分析與 merge sort 差不多想法, 但是 tricky 一點, 這邊我們省略
• An alternative for merge sort and also follow divide-and-conquer
Quicksort(A[1, …, n])
1. If (n > 1)
2. Choose a pivot element A[p]
3. r = Partition(A, p)
4. Quicksort(A[1, …, r-1])
5. Quicksort(A[r+1, …, n])
所以只要找到一個
【能不多佔用空間
又能線性時間】的
Partition function
就好了
208
pi 代表要放置 pivot 的位置, i 則是逐次往右移動
209
210
Fibonacci Sequence
• Recurrence relation: Fn = Fn-1 + Fn-2, F0 = 0, F1 = 1
• e.g., 0, 1, 1, 2, 3, 5, 8, …
• Direct implementation:
Fib(n)
1. If n ≤ 1 return n
2. Return fib(n-1) +fib(n-2)
211
What’s Wrong?
Fib(n)
1. If n ≤ 1 return n
2. Return fib(n-1) +fib(n-2)
• What if we call fib(5)?
•
•
•
•
•
fib(5)
fib(4)+fib(3)
fib(3)+fib(2)+fib(2)+fib(1)
fib(2)+fib(1)+fib(1)+fib(0)+fib(1)+fib(0)
fib(1)+fib(0)+fib(2)+fib(1)+fib(1)+fib(0)+fib(1)+fib(0)
太多需要重複計算
212
Dynamic Programming – Memoization
• Store the values in a table
• Create a table before a recursive call
• Top-down!
• The control flow is almost the same as the original one
Fib(n)
1. Initialize f[0...n] with -1 // -1: unfilled
2. f[0]=0; f[1]=1
3. Fibonacci(n, f)
Fibonacci(n, f)
1. If f[n] = -1
2. f[n] = Fibonacci(n-1, f) + Fibonacci(n-2, f)
3. Return f[n] // if[n] already exists, directly return
5
4
3
2
1
213
Dynamic Programming – Bottom-Up
• Store the values in a table
• Bottom-up
• Compute the values for small problems first
Fib(n)
1. Initialize f[0..n] with -1 // -1: unfilled
2. f[0] = 0; f[1] = 1
3. For i = 2 to n do
4. f[i] = f[i-1]+f[i-2]
5. Return f[n]
從 f[0], f[1], 這樣由前面往後面算的話, 可以保證算後面時需要的每個前面結果都已經有了
5
4
3
2
1
214
• Chapter 4: Networking and the
Internet
224
Starting with Your Own Experience
• Usually, users (clients) connect to a remote server (e.g., Google, Instagram)
to browse the webpage
• Browsing is like downloading and displaying stuffs from a remote server
• To connect to a server, you need to know server’s address (location)
• Layered structure of computer communications is important
225
Protocols
協定: 為了訊號能夠有效率的溝通, 所定義出來的一種標準
• Protocol is composed of data format and communication procedure
• Data format indicates what kind of format will be used for data
• Communication procedure setups the rules for the processing order and content
• A large number of errors might occurs during communications, and so we handle
this matter by setting up proper data format and communication procedure
• Standardizing a protocol requires much effort from different stakeholders
網際網路工程任務小組, 主要負
責網際網路相關技術之標準化
國際電信聯盟, 主要負責廣泛的
電子通訊及無線通訊相關技術
標準化
第三代合作夥伴計劃, 主要負責
第三代行動電話等相關技術標
226
準化
Network Architecture (OSI Model)
Open System Interconnection
高
層
共同功能都盡量
放在下層, 較個別
功能放在上層
只要設定好每層
的邊界, 就能讓通
訊協定的定位更
明確, 重組更輕鬆
暫且看不懂沒關
係, 之後會講細節
低
層
227
Network Architecture (TCP/IP Model)
228
LAN and WAN
• Network can be divided into LAN and WAN
• LAN (local area network) is a smaller network in a physical location
•
•
•
•
LAN is like you connect a couple of computers together, forming a network in office
LAN is commonly achieved by Ethernet
Ethernet has MAC and PHY layers control
Different Ethernet have different performance
• 1000BASE-T (1Gbps) and 10GBASE-T (10Gbps)
• WAN (wide area network) is a larger network
usually across different physical locations
• WAN usually connected by ISPs (Internet Service
Provider) such as Chunghwa Telecom
• WAN achieves interconnection among different sites
• WAN ≠ Internet, though they both rely on internetworking
229
LAN Topology
所有訊號都
廣播出去
乙太網路
其實是單
方向傳輸
230
LAN Protocols
• Token ring
拿到 token 的機器才可以發言
• Popular in ring topology
• Token and messages are passed in one direction
• Only the machine that gets the token can transmit its own message
• CSMA/CD (carrier sense, multiple access with collision detection)
避免碰撞
• Popular in bus topology (wired Ethernet) 廣泛用在有線的乙太網路
• Broadcasting
• When collision, both machines wait for a brief random time before trying again
• CSMA/CA (carrier sense, multiple access with collision avoidance)
• Popular in wireless Ethernet
• Broadcasting
訊號有時間差, 馬上講的話可能會碰撞
• Detect if a channel is idle, if so, wait for a brief random time and then detect again
If the channel is still idle, start sending
231
Wireless & Access Point
• Wi-Fi (wireless fidelity)
• IEEE 802.11 (b, g, i, n, ac, ...)
• Wi-Fi 6 (802.11ax)
232
Internetworking
• Internetworking is a mechanism that enables multiple networks
connected to each other
• Usually, we rely on internetworking to build up an even larger network
• The advantages of internetworking can be as follows
• Avoid unnecessary communications over the entire network
• Limit the range and impact of network fault
• Have a better network management according to the stakeholders
• To have internetworking, we are required to have a protocol for
internetworking
• IP (Internet Protocol) is the internetworking protocol commonly used in Internet
• Specifically, because each network has different network address, we transmit data
to the target network according to the network address through routing
mechanism
• Internet refers to the Internet, while internet means networks
interconnected to each other
233
Circuit Switching and Packet Switching
• Switching: transmit data to the target during the communication
• Circuit switching: building a (virtual) cable/line between source and target
• Packet switching: divide data into packets, and then transmit packets to
target with shared cables/lines differently
• Packet means partitioning data according to predefined way, and then
adding a header to each packet
• Frames are used in Ethernet but it has the same idea with packet
• Both packet and frame can be called PDU (protocol data unit)
234
TCP/IP Model
1982 年, 由 ISO 與 ITU 共同創立 OSI Model, 同時間另一項 TCP/IP Model 也被研
究學界提出, 近年來因為 OSI Model 複雜所以 TCP/IP Model 較為流行
235
TCP/IP Model
• Network interface layer
• Establish a connection to a direct-connected peer
• Also encompass the physical devices
• The functionality is achieved through the Ethernet-standardized NIC (network
interface controller) NIC 就是指網路卡
• Internet layer
• Most important is forwarding (or routing), which transmits data to a indirectconnected peer
• Transport layer
• Depending on different objectives, this layer selectively chooses high-reliability
communications or real-time-but-not-reliable communications
• Application layer
應用程式就是將每一層的協定加以組合應用; 譬如若要存取網頁伺服器的話, (應用層,
傳輸層, 網際網路層, 網路介面層) 的選擇可以是 (HTTP, TCP, IP, Ethernet), 但若是要
瀏覽影片或是打網路電話的話, 選擇可以是 (RTP, UDP, IP, Ethernet)
236
Network Interface Layer in TCP/IP Model
• Establish a connection to a direct-connected peer
• This layer does NOT have internetworking functionality
• The functionality is achieved through the Ethernet-standardized NIC (network
interface controller)
• A representative protocol in this layer is Ethernet
• Commodity computers mostly use Ethernet
• Wi-Fi establishes LAN in a wireless manner
• PPPoE (PPP over Ethernet) establishes a peer-to-peer connection
• Ethernet in essence is a broadcast mechanism
讓你撥接後能夠寬頻上網, 這裡我們不會多提
• The hardware in this layer has unique address: MAC address
• ARP (address resolution protocol) maps between IP and MAC addresses
ARP 作用與原理後面才講
237
MAC Address
• MAC address (medium access control address) is also called physical
address and, in principle, is unique over the world
• Some software tool can be used to modify MAC address manually
• Ethernet transmits frames to the target by specifying MAC address
• Frame with MAC is broadcast on the bus in old Ethernet (10Base-2, 10Base-5)
• Frame with MAC is sent to hub/switch that memorize the mapping between MAC
and port in MAC address table first, and then directed to particular port in new
Ethernet (100Base-TX, 1000Base-T)
port
製造商編號
機型編號與產品序號
238
Internet Layer in TCP/IP Model
• Main purpose is to connect multiple networks
• Most important is forwarding (or routing), which transmits data to a
indirect-connected peer
• In the network interface layer, everyone can see its direct peers only
• A representative protocol in this layer is IP (Internet protocol)
• IPv4 vs. IPv6
• Another representative is ICMP (Internet control message protocol)
這個指令能回報 ping 封包到目的端設備來回所需的最少時間, 最大時
間與平均時間, 可以用來確認到指定設備之間網路路徑的可靠程度
利用增加存活時間 (TTL) 來實現其功能; 每當封包經過一個路由器, 其存活時間就會減 1. 當其
240
存活時間是 0 時, 主機便取消封包, 並傳送一個 ICMP TTL 封包給原封包的發出者
Transport Layer in TCP/IP Model
• Depending on different objectives, this layer selectively chooses highreliability communications or real-time-but-not-reliable communications
• Two representative protocols in this layer: TCP and UDP
• TCP (transmission control protocol): reliable data transfer
• ACK after receiving the packets
• Used by HTTP and SMTP HTTP 是指網頁傳輸, SMTP 則是傳送接收 email
• UDP (user datagram protocol): real-time but unreliable
• No ACK when receiving the packets
• Used by DNS and NTP DNS 是指域名解析, NTP 是時間同步協定
241
How TCP Establish Reliable Communications
• Important: the underlying communication channel is indeed unreliable
• Six techniques/steps in TCP to ensure reliable communications
1.
2.
3.
4.
5.
6.
為資料加上編號, 以確保資料傳輸順序 (序號)
確認所接收到資料是否有誤 (錯誤偵測)
確認對方已經收到正確的資料 (確認回應)
請求重送未被送達的資料 (滑動視窗)
傳送資料時會配合通訊對象的步調 (流量控制)
依網路塞車的狀態來調整傳輸速度 (壅塞控制)
• Sliding window is the main tool for flow control 避免高速裝置癱瘓低速接收端
• Dynamically adjust sending speed in congestion control 避免高速裝置癱瘓網路
242
TCP Handshake
• TCP starts with a three-way handshake and terminates with another
handshake
243
Application Layer in TCP/IP Model
• Lots of applications layer protocols can be found
• temporarily ignore the port # in the table below
244
Packets in Different Layers
• Headers will be added to the data from the upper layer
• Eventually, the physical signal will be transmitted
• After receiving the signal, receiver decodes the signal and read headers
245
IP Address: 4 Byte Address
• General rule: each computer (or online device) has a unique IP address
• Thus, we need to dispatch IP addresses to devices according to some regulation
• ICANN (Internet Corporation for assigned names and numbers) is an
organization for managing IP addresses
• ICANN dispatches a specific range of IP addresses to regional Internet registry (RIR
for short, but APNIC in Asia), which then dispatches IP addresses to either national
Internet registry (NIR, and TWNIC in Taiwan) or local Internet registry (LIR)
• Chunghwa Telecom is an LIR in Taiwan
246
Port Number
任何一個 Socket 都給予一個特殊號碼 (IP number + TCP port), 使用者之間
只要記住對方的 Socket 號碼, 便可以直接通訊
• Ports are used to indicate the functionality of the target computer
• Standardized across all network-connected devices, with each port assigned a
number from [0, 65535]
• Port number is functionality provided by TCP/UDP
• While IP addresses enable messages to go to and from specific devices, port
numbers allow targeting of specific services or applications within those devices
247
Public and Private IP
公有與私有 IP
• Public IP refers to IP addresses unique over Internet
• Private IP means those IPs not satisfied with the above requirement
• Private IP cannot be used on Internet 
• IPv4 provides 232 IP addresses in total
大約有 43 億個
• However, due to the Internet of Things era, most devices can connect to Internet
and therefore have a significant demand for IPs
• IPv4 address exhaustion: the depletion of the pool of unallocated IPv4 addresses
• People resorts to private IP and NAPT to solve the IPv4 address exhaustion
248
IP Address Class and Subnet Mask
• IP address has 32 bits
用來找出 IP 的網路部分
左邊代表網路, 右邊代表主機
• Lefthand side refers to network, while righthand side refers to host
• IP address has classes from A to E (D and E are for special purpose)
• The difference among A, B, C is their max number of supported devices
• Class A supports 16,777,214 devices but C supports 254 devices only
249
IP Subnetting
子網路切割
子網路的劃分是一個將主機部分的若干位分配到網路部分的過程
• When you have a class A network, you
are able to deploy 65,536 devices
• Nonetheless, you only have 10 devices,
causing waste of IP addresses
• Your company has different departments,
each of which plans to have its own network
• IP subnetting:
because of the
above requirements,
one can partition
the class A network
into smaller ones
through moving
network address
to host address
250
Broadcast and Multicast
• Unicast: one-to-one communication
• Broadcast: send data to all of the computers on the same Ethernet
• In fact, two approaches are used
• Send packet to 255.255.255.255 (limited broadcast destination address): the packet
will be forwarded to all of the computers on the same Ethernet
• The router will become the boundary
• Send packet to the IP address whose bits in host address are all 1 (directed
broadcast address): depending on the requirement, the router will forward the
packet to another network, whose router will broadcast to all of the computers in
the target network
• Multicast: send data to some of the computers in a specific group
251
Connecting Compatible Networks
以下機器只在同個協定的網路
• Hub: a junction with broadcast functionality
• Switch: a junction with unicast functionality
• They all work on L1 (PHY) and L2 (Link)
252
Connecting Compatible Networks
Hub 是屬於實體層的設備, 所有資料只會
視為電子流, 只做 flood.另外 Hub 也有訊
號增益的功能, 所以能將每個 Port 視為一
個 Repeater. 這張圖我們可以看到, Hub
會將資料沖刷到每個 Port 造成頻寬壅塞,
故集線器只適合用於臨時串接
Switch 則是資料鏈結層的設備,L2 的設
備最大的不同是他會以學習的方式記錄每
一個 Port 底下設備的 Mac Address,再
根據 Mac Address Table 內的資料選擇
要將封包轉發至哪個 Port
253
Connecting Incompatible Networks
• Router
以下機器在不同個協定的網路
路由器
• Main functionality is to forward IP packet in network layer
• From user’s perspective, router forwards packets among independent Ethernets
實體層與資料鏈結層定義了 Ethernet 的
涵蓋範圍, 透過網路層, 協助 Ethernet 工
作範圍內的各個獨立網路進行資料中繼
254
ARP (Address Resolution Protocol)
• Fact
• Devices in Ethernet direct to the destination through MAC address
• Devices in TCP/IP direct to the destination through IP address
• So, one needs another protocol to fit them both
• ARP (address resolution protocol)
• Main purpose is to convert IP to MAC address through broadcast
• The packet is routed to the destination according to the network address
首先 A (163.15.2.1) 欲透過 Ethernet 傳送訊息給 IP =
163.15.2.4, 則發送出 ARP Request (查問 163.15.2.4) 廣播
到所屬網路區段內. 所有主機都會接收到 ARP Request, 並分
解是否詢問自己, 如果不是就不予理會而拋棄. C (163.15.2.4)
收到 ARP Request 後, 發現詢問自己則回應 ARP Reply
(Ethernet 位址) 給 A (163.15.2.1)
255
Router Again
預設閘道可被想成是當目的端未知時所預設的傳送對象
• On Ethernet, many protocols (e.g., ARP) do broadcast for data exchange
• If there are many devices, it would cause awful performance
• Hence, use router as the boundary of broadcast domain
• Partitioning into network segment (or broadcast domain) mitigates broadcast storm
• Default gateway is the router of your network segment 網段
預設閘道
當PC0
(192.168.10.10/24) 要
ping PC3
(192.168.20.10/24) 時,
ARP 會判斷目標 IP 是
否在同一網段. 此例是跨
網段, 故 ARP 會回覆預
設路由的 MAC
256
Domain Name
網域名稱
• It is difficult to memorize IP address, though IP is used for routing
• Routing: the procedure for the router to forward packets
• Domain name is the name of a computer on network
• Routers still use IP to do routing; later we will mention how to map between
domain name and IP
• Domain name is unique and managed by ICANN
國家與地區頂級域名
• ICANN is in charge of TLD (top level domain) and ccTLD (country code TLD)
• ICANN does not have a direct management; instead, different registries does so
• For example, Verisign manages .com and .net, while TWNIC manages .tw
註冊管理機構
257
URL (Uniform Resource Locator)
• The illustration below shows the relation among ICANN, Verisign, and user
• URL: complete web address used to find a particular web page
• While domain is website’s name, URL will lead to a page within the website
• Host names are sometimes called domain names
258
Routing Protocols
• Many routers sit between the source and destination
• Routing: how to choose a best path and forward packets through the best path
• In particular, individual routers are owned by different organizations but they each
maintains a routing table to determine the best path from source to destination
• Static routing: routing table is fixed and needs to be modified manually
• Dynamic routing: routers exchange routing tables to update the best paths
in routing tables through pre-defined protocols
• IGP (interior gateway protocol) includes RIP/RIP2 (routing information
protocol) and OSPF (open shortest path first)
• EGP (exterior gateway protocol) includes BGP (border gateway protocol)
• IGP handles routing within an AS (autonomous system) and EGP handles
routing between ASs
• AS is an ISP-scale network having the same routing policy
本頁未標顏色的關鍵字不需特別記
259
DHCP
利用廣播的方式取得 IP 與網路資訊, 一旦拿到 IP 之後就不需要進行 DHCP 了
• How to setup default gateway
• You can manually configurate the corresponding IP address or rely on DHCP
• DHCP (dynamic host configuration protocol) automatically assigns IP
addresses and other configurations to devices connected to the network
using a client–server architecture
• So, for example, the device with DHCP will receive IP and default gateway
260
NAT and NAPT
• If you connect your multiple computers together, then they can be
connected with private IP
• Unfortunately, private IP cannot connect to Internet
• NAT (network address translation)
• NAPT (network address port translation), also called port forwarding
NAT 指的是 router 有分配到 n 個 public IP 時就可以讓內網的 n 個
private IP 上網; 但是因為 NAT table 單純做 public/private IP 對照, 所以
頂多就是 n 個 private IP 上網. 但是 NAPT 的 router 即使只有一個 public
IP, 但是因為利用 port 來記錄資訊, 因此可以接受 65536 個 private IP 上網
261
L2 Switch
• L2 Switch is the switch we mentioned previously
• The term L2 puts emphasis on working on layer 2 (MAC layer)
• Switch has a MAC table
• When receiving an Ethernet frame, switch checks the MAC address of destination
and forwards the frame to the destination
• MAC table is a mapping between MAC and switch port
• How to update MAC table
• When connecting to a port for the first time, a computer sends an Ethernet frame
to update MAC table
• Sometimes even though a computer X connects to a port, it does not sends an
Ethernet frame to update MAC table
• During this period, when a frame needs to be sent to X, the switch floods to all
computers connected to the switch
262
VLAN (Virtual LAN)
• VLAN: Logically partitioning a network into a couple of networks
• LAN is physically formed, as all devices connected to the same switch form a LAN
• VLAN has the advantages
• Performance: reduce the size of broadcast domain
• Security: act as a firewall 之後才會講
• VLAN have many kinds: port-based VLAN and tag-based VLAN
Port-based VLAN
Port-based VLAN with 2 switches
Tag-based VLAN with 2 switches
263
L3 Switch
• L3 switch is a switch with routing mechanism
• Routing aims to support communications among VLANs
• Also called IP switch or switch router
考慮各 VLAN 有不同網段
Router 通常定位在跨 WAN 的邊界連接用, 但 Router 很少提供大量的 Port 數 (因為很少會有數十條 WAN 同時接進來), 而且
因為他專注在速度較慢的跨 WAN 應用上, 所以對於封包轉送的效能, 不如用在內網的 L3 Switch 來得高. L3 Swtich 比較專注
在企業內網的 LAN 環境應用, 缺乏跨大型 WAN 網路需要的調度能力如 BGP 與高容量記憶體(用來儲存數十萬條路由) 264
Interprocess Communication
• Server-client
• One server, several clients
• Clients initiate communications
by sending requests
• Server serves
• P2P (peer-to-peer)
• Two processes communicating as equals
• The most popular distribution mode
nowadays
265
The Internet
注意這邊是大寫的 I
• The most notable example of an internet is the Internet
• Original goal was to prevent disruptions caused by local disaster
• Deviated from the advanced research projects agency network (ARPANet) around
1960
• 4 nodes — UCLA, SRI, UCSB, UTAH
• Now it‘s a commercial undertaking
Robert Kahn
Vin Cerf
266
Internet Applications
• VoIP (voice over Internet protocol)
• email (electronic mail)
• FTP (file transfer protocol)
• telnet & ssh (secure shell)
• P2P: bittorrent, edonkey, emule...
267
WWW (World Wide Web)
• www, w3, web
• hypertext, hyperlink, hypermedia Hypertext 代表文字不只是文字, 可以點選後跳去其他地方
• Web page: hypertext document
• Website: a collection of closely related web pages
Tim Berners-Lee
268
Browsers
用來看 hypertext 的工具
• Presenting the web pages downloaded from the Internet
• HTTP (hypertext transfer protocol)
• URL (uniform resource locator)
預設是 index.hotml
269
Hyper-Text Markup Language (HTML)
270
HTTP (Hypertext Transfer Protocol)
• HTTP is an application layer protocol for WWW
• HTML creates hypertexts, and then HTTP transmits hypertext
• Usually, HTTP is working with TCP
• The web server handling HTTP communications uses port 80
271
HTTP Message Format
• HTTP client and server communicate by
sending text messages
• The client sends a request message to the
server and the server, in turn, returns a response message
272
HTTP Request
• The request line has the following syntax
• The request headers are in the form of name:value pairs
273
HTTP Response
• The first line is status line, followed by
optional response header(s)
• The status line has the following syntax
• The response headers are in the form name:value pairs
274
HTTPS (HTTP-Secure)
• HTTPS is an extension of HTTP, which is used for secure communication
• HTTPS is also referred to as HTTP over TLS or HTTP over SSL SSL/TLS 後面會講
275
PHP and SQL
PHP 根據結果輸
出不同的 HTML
276
eXtensible Markup Language (XML)
現在瀏覽器都可秀 XML
• Standard style to represent data as text 反而不是拿來做網頁, 而是程式 config 格式
• Restricted mapping each opening to each ending 每個標籤有頭有尾
• <x property=”yyy”> ...... </x>
• XHTML
因為是資料, 所以交換順序不影響解讀
• HTML that follows XML format
如果標籤都沿用 HTML 且都有 ending tag, 則稱 XHTML
277
Client-side & Server-side
• Client-side
把程式下載到本機端才執行, 減輕 server 負擔
• Server-side
有些東西不想讓 client 知道
• Java applets Java 的 subset
• Javascripts 其實跟 Java 沒甚麼關係
• Flash
• CGI
• Servlets (JSP, ASP)
• PHP (Personal Home Page, PHP Hypertext Processor)
278
SMTP (Simple Mail Transfer Protocol)
• SMTP is a protocol for forwarding emails (NOT retrieving emails)
• The default port number is 25
• Compared to HTTP, SMTP keeps
executing instructions and
replying during the connection
279
POP3 and IMAP4
• Both POP3 (post office protocol version 3) and IMAP4 (Internet message
access protocol version 4) are used to retrieve emails
• POP3 downloads emails and then deletes emails on the remote server
• Users can offline access the emails because emails have been kept in the local
• Default port number is 110
• IMAP4 allows users just read the emails without deleting them
• Users need to be online to access emails
• Default port number is 143
280
DNS (Domain Name Service)
• DNS resolves domain names to IP addresses, as routers only recognizes IP
• The default port is 53 together with UDP
282
DNS (Domain Name Service)
• DNS is characterized by its distributed coordination service
• Multiple name servers work together to resolve domain names
• Public DNS providers include OpenDNS (208.67.222.222), Comodo DNS
(8.26.56.26), Google (8.8.8.8), and Cloudflare (1.1.1.1)
283
DNS (Domain Name Service)
• Resolver (or recursive name server, DNS resolver): a server designed to
receive DNS recursive queries from web browsers
ICANN 共有 13 個 root server
• Authoritative name server: provides
actual answer to your DNS queries
• It only responds to iterative query
• It only returns answers to queries
about domain names that are
installed in its configuration system
• The authoritative DNS servers can be
where the website is hosted or where
the DNS provider is
• Resolvers have a cache but
authoritative name servers do
NOT have a cache
284
DNS Packet Structure
• DNS queries and replies are transmitted via a single UDP packet
• Standard UDP DNS query consists of
• Header: 16-bit query identifier, selected by querying client and replicated in the response from the
server
• Query part: domain name
• Answer part:
• NAME field of variable length
• Contain full domain name
• 2-byte TYPE field
• The type of DNS record
• A for standard domain-to-address resolution
• NS for information about name server
• 2-byte CLASS field
• Usually only IN for Internet domains
• 4-byte TTL
• Specify how long a record will remain valid (in seconds)
• 2-byte RDLENGTH
• The length of the data segment (in bytes)
• RDATA of variable length
• The actual record data
• For example, RDATA segment of an A record is a 32-bit IP address
285
How DNS Queries Work
要求 A record
Recursive name server
連接到 13 台 root server
其中一台, 這裡我們選 b
286
How DNS Queries Work
287
How DNS Queries Work
288
REST API (RESTful API)
• REST API (RESTful API) is an application programming interface (API) that
conforms to the constraints of REST architectural style and allows for
interaction with RESTful web services
289
Security
• Attacks
• Malware (malicious software)
• Virus, worm, Trojan horse, spyware, phishing
• Ransomware
• Denial of service (DoS)
• Spam
• Protections
•
•
•
•
Firewall
Spam filter
Proxy
Antivirus,
antispyware
290
Symmetric Cipher and Asymmetric Cipher
• AES is a representative of symmetric cipher (symmetric cryptography)
• Both sender and receiver are assumed to share a common key
• Block size is 16 bytes (128 bits) and key length may have 128/192/256 bits
• RSA is a representative of asymmetric cipher (asymmetric cryptography)
• Public key is for encryption only and private key is for decryption only
• Completely different from our intuitive idea about the cryptography
Symmetric cipher
Asymmetric cipher
291
AES (Advanced Encryption Standard)
• As block size is 128 bits, how AES encrypts a data with 1280 bits?
• We resort to mode of operation 用 AES 的 key size 當作量測安全度的指標
• ECB (Electronic Codebook Book), CBC (Cipher Block Chaining), GCM (Galois/Counter Mode)
• AES-128-CBC means AES with 128-bit key and CBC
• AES-256-GCM means AES with 256-bit key and GCM
ECB (只存在於教科書)
CBC (漸少使用)
GCM (HTTPS 使用)
292
RSA
• Most famous one
• Developed by Rivest, Shamir, and Adleman
• Have both academic and monetary values
293
RSA
1.
2.
3.
4.
與 symmetric cipher 不同之處在於還有所謂的 key generation algorithm
Randomly select two primes p and q
Compute N = pq
Select e such that GCD(e, (p-1)(q-1)) = 1
Compute d such that
ed = 1 mod (p-1)(q-1)
• Public (encryption) key = (e, N)
• Private (decryption) key = (d, p, q)
都是超大的數字在做計算 (至少 1024 bit 的數字)
294
RSA
與 symmetric cipher 不同之處在於還有所謂的 key generation algorithm
• How to do encryption with message m?
• Ciphertext c = me mod N
• How to do decryption with message c?
• Plaintext m = cd mod N
• Why the above correct?
• cd = (me)d = med = m1+s(p-1)(q-1) = m∙ms(p-1)(q-1) = m
• Show that cd can recover m successfully
都是超大的數字在做計算 (至少 1024 bit 的數字)
295
296
Hash Function and Message Authentication Code
• Hash function has four characteristics below
常常被用來當作指紋, 又叫 fingerprint
• Arbitrary-size input, fixed-size output, hard to reverse, and hard to find a collision
• MAC (message authentication code)
不要與前面 MAC layer 的 MAC 搞錯, 只是恰好同名
• Can be seen as a keyed hash function
• Non-trivial construction because hashing with key easily attracts attack
297
SSL/TLS
• SSL (secure socket layer) and TLS (transport layer security) are
cryptographic protocols that provides secure communications
298
HTTPS (HTTP Secure)
• HTTPS is integration of HTTP and SSL/TLS
299
Why do we need TLS?
• TLS gives us three guarantees
• Authentication 確定去的網站真的是想要去的網站 (不會被釣魚)
• Verify identity of the communicating parties (both clients and servers)
• With asymmetric cipher, TLS ensures we will go to authentic website
• Confidentiality
傳輸的資料不會被壞人偷看
• Protects data from unauthorized access by encrypting it with symmetric encryption
• Integrity
傳輸的資料不會被中途竄改
• Recognize any alteration of data during transmission by checking the MAC
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
https://www.deviantart.com/noneofus/art/Treasure486382200
319
Download