計算機概論 Introduction to Computer Science 游家牧 Chia-Mu Yu 交通大學資訊管理與財務金融學系 Department of Information Management and Finance chiamuyu@gmail.com Textbook, and Reference Books 2 Textbook, and Reference Books 3 Textbook, and Reference Books 4 Grading Policy 管院規定期中考 30%, 其他部分自定, 因此我們就是以下規定 • Midterm 30%, Final Exam 30%, Quiz and Assignment 40% • Basically, the average of final score will be shifted to 78 5 Schedule for This Semester (2023 Fall) • We will have quiz in the following date • 10/3, 10/23, 11/28, 12/12 小考當天, 前一小時仍然上課, 後兩小時紙筆考試 • We follow the official schedule from NYCU • Totally 16 weeks for lectures • Midterm: 10/31 • Final Exam: 12/19 期中考, 期末考當天均是三小時都用來紙筆考試 10/3 當天老師請假 1 小時, 將以每次上課多上 10 分鐘來做補課, 希望這樣降低另找時間補課的需求 6 Alert 管院規定期中考範圍是前三章, 期末考範圍則由授課老師自行決定, 我們會較為超前 • Midterm coverage will be Chapters 1~3, but we have our own schedule • Not all of the chapters in the textbook will be covered in this semester • This semester we will have midterm on Nov 1st (Tuesday), 2022 7 • Chapter 0: Introduction 8 Introduction • Computer science (CS) 臺灣常常翻譯成資訊工程, 但是應該是電腦科學 • Deal with computer design, computer programming, information processing • Some important terminologies • • • • • Algorithm: a finite sequence of instructions to solve a problem (computer) program: algorithms in a way that machine can understand Programming (coding): an action that composes a computer program Software: algorithm and program Hardware: machine (e.g., PC and laptop) 9 Pseudocode (Algorithm) and Code (Program) Pseudocode Code 10 Algorithm • Algorithm design is a math problem usually • However, not for math problem only • Algorithm has limit; the true or false of certain algorithms cannot be verified GCD Kurt Gödel Incompleteness theorem Primality test 11 Algorithm • The research of algorithms involve the following different perspectives • • • • What kind of problems can be solved by algorithms 計算理論 How to design a more efficient algorithms 演算法 How to compare different algorithms 演算法 How to use algorithms to simulate our brain 演算法應用 12 Abstraction • Abstraction 抽象化 • The process of removing physical, spatial, or temporal details in objects or systems to focus on details of greater importance • The creation of abstract concept-objects by mirroring common features or attributes of various non-abstract objects or systems 著重於功能性, 而暫時不考慮系統或是元件的實現細節 • For example, you might use cellphone everyday but have no idea on the cellular communication technology • People can pay special attention on a particular component only • The process of abstraction is sometimes called modeling 13 Data • Computer represents, stores, and processes discrete and numeric data • In recent year, data starts to play a central role in CS • The terms such as data-driven, data scientist, etc. becomes increasingly popular • Some issues related to big data • How computer store numeric, text, image , sound, and video data • How computer turn the real-world continuous data into digital data • How to prevent and correct the falsified data 14 9 Key Computer Science Topics 管院必修 資財系資管組必修 管院必修 15 • Chapter 1: Data Storage 16 Binary World • Bit: binary digit (0/1) • Depending on your interpretation, bits could be numbers or some other things such as images and videos • Boolean operations are those dealing with true and false values • It deals with true/false values, rather than numeric values 17 Binary World 不是正常電路長成這樣 (譬如我們都沒畫電源與接地), 而只是符號, 又叫做 gate (閘) 因為抽象化的關係, 所以不需要知道更底層到底是否 5V 電壓實作 • Bit: binary digit (0/1) • Simple, logical, and unambiguous 現實電壓不穩, 5V 容易跳動 真值表 • Boolean operations & gates 18 Flip-Flop 存下一個 bit, 是電腦記憶體的基本元件 • Purpose: to keep the state of output until the next excitement • Flip-Flop (FF) R S • • • • Has two input lines: set and reset One input sets its stored value to be 1 Another input sets its stored value to 0 輸入是兩個 1 時, 輸出是 undefined, 這是 FF 沒用到的 While both inputs are 0, the most recently stored value is preserved 19 Flip-Flop S FlipFlop R 20 Flip-Flop 0 0 0 0 0 1 一開始設定 S=0, R=1 (一開始設定 0 0 沒意思) 1 0 進入 AND 則輸出一定 0, OR 兩個都 0, 輸出 0 21 Flip-Flop 0 0 0 0 因上一個 (上頁) 時間儲存值是 0) 0 1 0 1 0 改設定 S=0, R=0 輸出維持是 0 22 Flip-Flop 因為 OR 有 1 了定會輸出 1 1 1 1 0 因上一個 (上頁) 時間儲存值是 0) 0 1 0 0 改設定 S=1, R=0 輸出變成 1 23 Flip-Flop 因為 OR 的下面的輸入有 1 了定會輸出 1 0 1 0 1 因上一個 (上頁) 時間儲存值是 1) 1 1 0 0 改設定 S=0, R=0 輸出維持是 1 24 Flip-Flop (Another Type of Implementation) 0 0 1 0 這叫跳線 1 0 1 一開始設定 S=0, R=1 (一開始設定 0 0 沒意思) 1 0 輸出 0, 且可看出輸入不變的話也很穩定 25 Flip-Flop (Another Type of Implementation) 因上一個 (上頁) 時間儲存值是 0 0 1 0 1 0 0 0 1 設定 S=0, R=0 0 1 0 輸出 0 26 Flip-Flop (Another Type of Implementation) 1 1 因 S=1, 故 這裡為 0 0 1 1 0 1 1 0 0 設定 S=1, R=0 0 1 輸出 1 27 Flip-Flop (Another Type of Implementation) 0 因上一個 (上頁) 時間儲存值是 1 1 0 0 1 1 1 0 0 設定 S=0, R=0 看看會怎麼樣 0 1 輸出 1 到此為止, 我們知道了真的可以讓電腦【記下】一個 bit 28 Flip-Flop • Key observation: the output is dependent on an internal state • The output is not only a direct mapping from the input • We mention flip-flop for three reasons • We demonstrate that a device can be composed by gates • Abstraction helps; flip-flop can have different implementations • Flip-flop can memorize a bit 統稱數位電路設計 29 Hexadecimal Coding (Hex) 用前綴 0x 代表後面是 16 進制數字, 譬如 0xB5 • Binary is usually too long for human to remember • Binary to Hex is straightforward • (0010111010110101)2 = (2EB5)16 30 Main Memory • Cell: A basic unit of main memory (typically 8 bits which is one byte) 一般來說, 平常寫 329 也是 3 最大一樣, 所以不失一般性, 也是假設最左邊最大 Higher-order end Lower-order end 1 1 0 1 1 1 0 0 1 0 0 Most Significant Bit (MSB) Least Significant Bit (LSB) 31 Main Memory and Address • One dimensional 記憶體有點像是一堆抽屜, 每個抽屜放一個 cell • Random accessible 隨機存取 (相對於循序存取, 如錄音帶) • Access the content by the address 裡面邏輯 結構如下 • Practically, also in binary • cf. the pointer in C/C++ 32 Memory Techniques • Random Access Memory (RAM): Memory in which individual cells can be easily accessed in any order SRAM 要用很多 FF, 每個 FF 要很多 gate, 會導致很大體積, • • • • • Static Memory (SRAM): like flip-flop DRAM 概念上就是用電容儲存即可, 但因會消逝, 要定期充電 Dynamic Memory (DRAM): Tiny capacitors replenished regularly by refresh circuit Synchronous DRAM (SDRAM) SDRAM 是指 refresh circuit 可以一次 (同步) 對所有 DRAM 充電, 這樣好處是可以與電腦時脈同步 (時脈以後有機會再提) Double Data Rate (DDR) Dual/Triple channel 通常一個時脈內可以存取一次資料, 但是 DDR 可以兩次. DDR2~DDR4 代表不同世代 DDR, 並非 data rate 又上升, 而只是更低電壓 (譬如原本是 5V 邏輯, 但是 DDR3 應該是 1.8V, 降下來的好處是功耗變小, 以及速度快一點 (因為充電時間變短) 雙通道指如果你一次插兩條記憶體, 譬如你要寫 2 bytes 到 記憶體的話, 可以一個 byte 寫到其中一條, 另一個 byte 寫到 另一條記憶體, 理論上速度會快兩倍, 相同道理也適用於三通道 33 Memory Techniques • Random Access Memory (RAM): Memory in which individual cells can be easily accessed in any order SRAM 要用很多 FF, 每個 FF 要很多 gate, 會導致很大體積, • • • • • Static Memory (SRAM): like flip-flop DRAM 概念上就是用電容儲存即可, 但因會消逝, 要定期充電 Dynamic Memory (DRAM): Tiny capacitors replenished regularly by refresh circuit Synchronous DRAM (SDRAM) SDRAM 是指 refresh circuit 可以一次 (同步) 對所有 DRAM 充電, 這樣好處是可以與電腦時脈同步 (時脈以後有機會再提) Double Data Rate (DDR) Dual/Triple channel • Capacity • Kilobyte: 210 bytes = 1,024 bytes ≈ 103 bytes • Megabyte: 220 bytes = 1,048,576 bytes ≈ 106 bytes • Gigabyte: 230 bytes = 1,073,741,824 bytes ≈ 109 bytes 34 Mass Storage • Properties (compared with main memory) • • • • Larger capacity Less volatility 揮發性 (不會忘) Slower On-line or off-line 可以不搭配電腦開電 • Types • Magnetic systems (hard disk, tape) • Optical systems (CD, DVD) • Flash drives 磁性裝置利用南北極代表 0 與 1, 光裝置利用雷射 對晶體加熱成不同形狀, 讓反射的雷射有點不同, flash device 則是利用 tunneling effect 把電子丟 到一個絕緣體裡面保存起來 35 Magnetic Disk Storage System • Head, track, sector, cylinder 一圈叫做 track (磁軌), 一圈裡面分段叫 sector (磁區) • Access time = seek time + rotation delay / latency time • Transfer rate (SATA 1.5/3/6, etc.) Head 走到資料位置叫 seek time, 但是走到那一圈之後 又要等對應 sector 轉過來, 這叫 rotation delay, 這與 硬碟轉速成反比, 越高轉速的 rotation delay 就越少 36 Optical Storage 相比於硬碟一圈一圈地儲存, CD 儲存資料 (尤其是音樂) 的方式 是採取螺旋狀的方式, 也就是磁軌是螺旋狀儲存, 才分成各磁區 37 Physical vs. Logical Records • Files and file systems 決定資料真正放在哪個地方? 尤其是不能儲存在連續磁區時 • Fragmentation problem 檔案被分散不同地方 • We talk about this later in OS 38 Buffer 緩衝區的概念很通用, 通常是協調不同速度的裝置; 譬如蒐集夠足夠多的資料後才進行傳輸 • Purpose: To synchronize (or to make compatible) different R/W mechanisms and rates • A memory area used for the temporary storage of data (usually as a step in transferring the data) • Blocks of data compatible with physical records can be transferred between buffers and the mass storage system • Data in buffer can be referenced in terms of logical records. 39 Representing Text • ASCII (American standard code for information interchange by ANSI): 7 bits (or 8 bits with a leading 0) 只有用到 7 bits, 如果用 byte 表示, 則 MSB 為 0 • Unicode: 16 bits ASCII 基本上只適用英文, 因此設計出 Unicode 表示各國文字符號, 如中文, 日文等等 • ISO standard (international organization of standardization): 32 bits 40 Unicode and UTF-8 有一字多型問題:如「ɑ/a」、「強/强」、「戶/户/戸」 • Unicode is still being revised, as of March 2020, its 13.0.0 version contains 130k characters • Depending on platforms and storage requirements, Unicode Transformation Format (UTF) defines the implementation of unicode unicode 基本用 2 bytes 表示, 譬如常見中文用 2 bytes, 但冷僻中文 則需要用到 3 bytes; 其中如果是 ASCII 就有的字甚至不用 1 byte, 各語言都有不同. 如果全部都用 3 bytes 表示的話, 那將造成浪費. utf-8 定義了一種方式讓我們識別不同長度的 unicode PC 與 Mac 對位元次序的認知不相同, 因此也要特別用 little endian (LE) 與 big endian (BE) 來辨別 41 Representing Numeric Values 42 From Binary to Decimal 43 From Decimal to Binary 44 Representing Images • Bit map techniques • Pixel: picture element 像素 • Colors: RGB, HSV, etc. • LCD, scanner, digitcal cameras, etc. • Vector techniques • Scalable • TrueType, Postscript, SVG (scalable vector graphics), etc • CAD, printers 45 Representing Sounds Bit resolution 也可解釋成縱軸只有幾個 level • Sampling bps 為單位 • Sampling rate 多久 sample 一次 • Bit resolution 要用多少 bit 表示實際的值 • Bit rate (sampling rate ✕ bit resolution) • MIDI (synthesis) 就好像直接記下樂譜, 直接讓音效卡預錄的 Do, Re, Mi 按照 樂譜演奏出來, 但這也導致每台電腦的撥出都可能不一樣 46 Binary System Revisited 都只講加法, 所以所謂的減法就是加上一個負數, 但是這樣我們就要定義【負數】 47 Two's Complement Notation • Range: -2n-1 ~ 2n-1-1 2 補數表示法 為什麼不要乾脆犧牲 MSB 來區分正負號就好? 會有正 0 跟負 0 的差別很奇怪 48 Two's Complement Notation 0 1 0 (2) + 1 1 1 (-1) 0 1 1 (3) + 1 1 0 (-2) 0 1 1 (3) + 1 0 0 (-4) 1001 1001 0111 2 補數方法的好處之一是可以直接作加減, 譬如這裡是 3-bit 加法, 因為只有 3 bits, 所以進位放棄剛好等於結果 -1 在 2 補數表示為 111, 但是 7 的二進制也是 111, 所以可 以想成是 2+7=9, 但是 9 = 1 (mod 8), 所以才導致上述 1001, 整個3-bit 2 補數的設計可以形成一個 mod 8 的機制 49 Two's Complement Encoding • Textbook's way 針對一個二進制數字, 從右往左看, 遇到 1 (包含) 之前 全部照抄, 之後的全部倒著寫 (前提得要先確定 bit 數) • Alternative way, for positive x: • x: binary encoding of x • -x: binary encoding of (2n - x) 另個方法是觀察到其實是 mod 的循環, 所以加上 2n 即可 持續順時鐘看, 都是由小到大的一個過程 50 Subtraction in 2's Complement 51 Excess Notation 另一種表示負數的方法 Binary Pattern Decimal Excess 000 001 010 011 100 101 110 111 0 1 2 3 4 5 6 7 -4 -3 -2 -1 0 1 2 3 整體讓位給負數 Excess 的好處在於【比較電路】不用變, 因為所有次序都相同, 只是賦予的意義不同, 但是 2 補數的次序則遭到改變, 譬如 (111)2 竟然比 (000)2 小, 但是當然 2 補數的加法器不用改變則是他獨特的好處 52 Excess Notation 另一種表示負數的方法 • In excess notation, the MSB serves as the sign bit - 1 represents the nonnegative (+) sign and 0 indicates a negative (-) number • Conversion • x → (2n-1 + x) mod 2n 3 bit 來說, 記住已經平移, 所以 2 移動到 6 的位置, -3 移到 1 的位置, 這種轉換對人類很難, 但對電腦來說是 MSB 變成 1 就好 • Addition 仍是不斷做加 2n-1 的動作 • x+y → (2n-1 + (2n-1 + x) + (2n-1 + y)) mod 2n = (2n-1 + x + y) mod 2n 2+(-1) = (110)2 + (011)2 = (1001)2 後再加 4 且丟掉 MSB 變成 (101)2 道理是前兩次 conversion 都加 4, 所以變成 MSB 的 1 可以丟掉, 但是剩 下 (101)2 仍得要符合這上頁那張表的規定, 所以還得再加 4 一次 2 補數 Excess notation 53 Comparison in the Case of 3-bit Representation x + (-y) = x + (8-y) =x-y 2 補數 x+y = (x+4) + (y+4) =x+y+8 However, in fact we want x+y+4, this explains why we have to add 4 eventually Excess notation 54 Overflow • Overflow occurs when the arithmetic result is out of the range of representation • Addition of two positive numbers • 2 + 3 = 5 → 3 (mod 8) 2 個負數或是 2 個正數相加都可能 overflow, 但是 1 正 1 負相加會嗎? • Addition of two negative numbers • (-2) + (-3) = -5 → 3 (mod 8) 2 個負數相加變成正數 55 Real Example of Overflow 可以觀察的到 overflow 真的發生了 56 Fraction in Binary (Fixed-Point) 57 Float-Point Notation +1*2-10 • Why? (How to represent 0.000000000000001?) 尾數, 假數 不是 2 補數 • On most current 64-bit computers, the exponent takes 11 bits, and the mantissa takes 52 bits (IEEE 754 standard) 得要是國際通用才行 58 Float-Point Notation 1 1 0 1 0 1 0 1 小數點固定在 exponent 與 mantissa 之間, 上述代表 (-), (+1), (.0101), 湊起來等於 -(1/4+1/16)*2+1 最小正數 +.0001*2-4 (這也代表精準度的極限), 最大正數 +.1111*23 59 Decoding Floating-Point • 01101011 → (0)(110)(1011) → (+)(+2)(1011) 0.1011 → 10.11 → 2 + 1/2+ 1/4 = 11/4 • 10010011 → (1)(001)(0011) → (-)(-3)(0011) -0.0011 → -0.0000011 → -(1/64 + 1/128) = -3/128 60 Truncation Errors 假數的記憶量不夠 • Required precision is beyond the limitation of the mantissa 到此已經知道不妙, 因為 mantissa 只有 4 位, 這裡卻有 5 位 單純把他 變成 2 進制 0 小數點移 動到最左 1 1 0 61 Normalized Form 為了避免小數點任意浮動, 導致一個數有不同表示法做個規定 • Rule: the most significant bit of mantissa is 1 • 0’s floating-point representation is all zero Normalization • 01100011 → (0)(110)(0011) → 0.0011 ✕ 22 → 0.1100 ✕ 20 → (0)(100)(1100) → 01001100 IEEE standard 上面說法代表 mantissa 最前面一定是 1, 那這樣乾脆就不用把 1 記下來了才對 • The left-most bit in mantissa is always 1 → omit it • An IEEE standard normalized form is (s)(eee)(mmmm) → (-1)s ✕ 1.mmmm ✕ 2(eee-4) • 01100011 → (0)(110)(0011) → 1.0011 ✕ 2(6-4) 62 Loss of Digits 浮點數怎麼相加? 回想 4*108 + 2*107 你怎麼做? 就是先把次方一致化後在相加 因為要移動到 exponent = 111, 所以要移動 mantissa 四位, 1 就 會被移出去了, 導致 info loss 隨著執行順序不同, 而有不同結果 63 Floating Number Has Limitation (0.1)10 無法被確切地被表示成二進制 64 Data Compression • Lossy vs. lossless • Run-length encoding 如果接下來是 100 個 1, 那只要傳【接下來是 100 個 1】而非真的把 100 個 1 傳出 • Frequency-dependent encoding ASCII 是種固定長度編碼, 但如果先知道頻率呢? • Huffman encoding 本來固定長度編碼很容易區分哪幾個 bit 一段, 但是不固定長度怎麼解碼呢? • Relative encoding / difference encoding 常用於影片, 採用這一張圖與下一張圖的差距 • Dictionary encoding 如果一串字很常見的話, 乾脆給他一個簡單的代碼 碼書 codebook 要一起傳輸 • Adaptive dictionary encoding • LZW encoding 65 Huffman Coding 傳統編碼是個個字元編碼成一模一樣長度, 但是 Huffman coding 則是編碼成不同長度 右側是建立 A_DEAD_DAD_CEDED_A_BAD_ BABE_A_BEADED_ABACA_BED Huffman tree 例子 其實是一種 prefix code, 有興趣者請參考通訊理論 也要將 tree 傳給對方才能夠讓對方解碼 66 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 編碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) xyx xyx xyx xyx 1 chars code x 1 y 2 space 3 67 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 編碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) xyx xyx xyx xyx 12 chars code x 1 y 2 space 3 68 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 編碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) xyx xyx xyx xyx 121 chars code x 1 y 2 space 3 69 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 編碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) xyx xyx xyx xyx 1213 遇到 space 就可以把前面字串加入至當下的字典裡 chars code x 1 y 2 space 3 xyx 4 70 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 編碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) xyx xyx xyx xyx 1213 4 chars code x 1 y 2 space 3 xyx 4 71 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 編碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) xyx xyx xyx xyx 1213 4 3 chars code x 1 y 2 space 3 xyx 4 72 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 編碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) xyx xyx xyx xyx 1213 4 3 4 chars code x 1 y 2 space 3 xyx 4 73 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 編碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) xyx xyx xyx xyx 1213 4 3 4 3 chars code x 1 y 2 space 3 xyx 4 74 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 編碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) xyx xyx xyx xyx 1213 4 3 4 3 4 chars code x 1 y 2 space 3 xyx 4 75 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 解碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) 1213 4 3 4 3 4 x chars code x 1 y 2 space 3 76 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 解碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) 1213 4 3 4 3 4 xy chars code x 1 y 2 space 3 77 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 解碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) 1213 4 3 4 3 4 xyx chars code x 1 y 2 space 3 78 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 解碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) 1213 4 3 4 3 4 xyx 遇到 space 仍是把前面字串加入至當下的字典裡 chars code x 1 y 2 space 3 xyx 4 79 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 解碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) 1213 4 3 4 3 4 xyx xyx chars code x 1 y 2 space 3 xyx 4 80 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 解碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) 1213 4 3 4 3 4 xyx xyx xyx chars code x 1 y 2 space 3 xyx 4 81 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 解碼 一邊編碼, 一邊還增加字典的內容, 而原來的編碼只是 ASCII (但是以下範例是簡化例子) 1213 4 3 4 3 4 xyx xyx xyx xyx chars code x 1 y 2 space 3 xyx 4 82 LZW Coding 是一種字典壓縮法, 但是字典不需要傳給對方即可解碼 • A dictionary encoding which does not need to store the dictionary 83 Images, Audios, and Videos 通常都是 lossy encoding • GIF: 256 colors, dictionary encoding • JPEG • Lossy or lossless • Discrete cosine transform • Discard high-frequency information insensitive to human eyes • MP3 • Temporal masking • Frequency masking • MPEG • Relative encoding & other techniques 紀錄連續資料區塊 (data block) 的差異; 如 88, 90, 95, 98, 92 (2*5 = 10B), 88, +2, +5, +3, -6 (2+1*4 = 6B) 84 Communication Errors • Compression • Remove redundancy Pure noise 無法壓縮 • Error detection & correction 壓縮後若傳輸時有錯則整個都錯 • Add redundancy to prevent errors • Error detection: Check code • Cannot correct errors, but can check if errors occur • ID numbers • ISBN • Parity code • Error correcting • Correct errors (some degree) 85 Taiwan ID 數字間隱含機制, 檢查身分證字號是否合規 86 Taiwan ID 數字間隱含機制, 檢查身分證字號是否合規 87 ISBN-10 識別書籍的編碼 • Given ISBN-10 d1d2…d10 • It must follows 10d1 + 9d2 + … + 1d10 (mod 11) = 0 有兩個數字寫反也可以 • For example, the ISBN-10 of textbook is 0-273-75139-5 88 Parity Bits • Add an additional bit to make the whole odd number of 1s 錯兩個 bit 就偵測不出來 • Communication • RAID (redundant array of independent disks) techniques 磁碟陣列, 目的在於提升效能或資料冗餘 89 RAID 資料分別儲存至不同硬碟, 如雙通道變快 資料被複製儲存至另一硬碟 先 RAID 1 後再 RAID 0, 但至少要四顆硬碟 容許壞同一組的兩顆硬碟, 還能正常運作 以先鏡射再分割資料 不容許壞同一組的兩顆硬碟 將資料和相對應的奇偶校驗資訊平均儲存到每塊硬碟上 90 An Error-Correcting Code (ECC) • (3,1)-repetition code (can correct 1-bit errors) 伺服器記憶體具有 ECC, 桌電沒有 實際上沒在用, C/P 值不高 91 Another Error-Correcting Code (ECC) Hamming distance 代表兩串 bit string 之間差異 bit 個數 • Maximized Hamming distances among symbols (at least 3) • If Receiving 010100, then decode it as D 可以檢查一下, 任意兩 個 symbol 之間的 Hamming distance 至少有 3 從空間的觀點來看, 會 發現如果只有壞掉一個 bit 的話, 那麼也不會 離原本的位置太遠 若收到 010100, 則純粹 計算與全部 symbol 的 Hamming distance 就 可以解碼了 比起 repetition code 來說 C/P 值好一點 92 Hamming (7, 4) Code 假設要傳 4 個 bit, 依序為 1101, 則先劃出 Venn Diagram 並幫中間交集區域編號如下 把 1101 分別填入對應位置 93 Hamming (7, 4) Code 非交集區域也編號如下, 但手動算出 even parity, 當然 odd parity 也可以 94 Hamming (7, 4) Code 可以檢查一下 even parity 是不是對的, 若無誤的話, 就實際送出 1101 100 這個 encoded bit string 95 Hamming (7, 4) Code 假設 m2 壞掉, 我們怎麼確認是 m2 壞掉呢? 這裡可知 p1, p3 都會湊不起來, 所以可以確認是 m2 壞掉. 當然, 如果是 m4 壞掉 的話, 3 個 parity 都會壞, 若是 parity 壞掉的話, 只有該 parity 壞掉 96 • Chapter 2: Data Manipulation 97 Motherboard 主機板 98 Computer Architecture CPU 有自己記憶體 也就是暫存器 前述主機板就是在做 CPU 與記憶體之間溝通, 但 bus 看不到 Bus (匯流排) 99 Adding Values Stored in Memory CPU 怎麼加兩個在記憶體內的數字? 看似簡單的運算實際上有點複雜, 其原因在於數字放在記憶體, 但是加法運算得要是 CPU 完成 Get one of the values to be added from memory and place it in a register through bus Get the other value to be added from memory and place it in another register through bus Activate the addition circuitry with the registers used in Steps 1 and 2 as inputs and another register designated to hold the result Store the result in memory through bus 100 Machine Instructions • Three categories of instructions 因為是下指令給 CPU/機器 完成某工作, 所以稱機器指令 非常的 machine specific, 不同機器的指令集不一樣, 也就是回想 CPU 腳位時, 對應於不同電壓 • Data transfer • LOAD, STORE, I/O • Arithmetic/logic • AND, OR, ADD, SUB, etc. • SHIFT, ROTATE • Control • JUMP, HALT • How to implement machine instructions • RISC (Reduced Instruction Set Computing) (PRC, SPARC) • CISC (Complex instruction set computing) (x86, x86-64) 101 Architecture of a Simple Machine 暫存器都是用 SRAM 技術 程式放在記憶體的哪裡 儲存現在正要執行的指令 這本教科書假想的暫存器只有 16 個 (0~F) 103 Fetch 擷取 要執行的指令放 A0 處 這本教科書假想的 IR 有 2 bytes, 所以等等會擷取 15 6C 放到 IR 15 6C 代表甚麼意思等等再講 104 Example of a Machine Instructions 代表這個指令要做甚麼事 代表這個指令要算的資料 這樣的機器指令定義出來後 透漏很多資訊, 譬如記憶體 最多就是 256B, 因為每個 cell 存 1B, 且每個記憶體位 址頂多只能 1B 來表示, 所以 頂多 256B 0011 代表儲存 (視 CPU 而不同) 105 Adding Two Values Revisited 直接寫機器指令也太累了, 有個稍微高階一點的組合 語言可以代替, 讓人類撰寫 程式語言較為簡便 這得要查詢教科書 後面的表格才行 與機器語言一對一對應 Compiler 並不是一個一個 照翻而已, 有很多特例, 譬如 如果 c=a+b, x=x+c, 這樣 的話, 在處理 x=x+c 時 c 就 不用存到記憶體再重 load 106 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) R1 R2 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter CPU Instruction Register ALU 一開始只有 memory 裡面有資料與指令 107 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) R1 R2 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter 0000 CPU Instruction Register ALU 假設 program counter 裡面指向 0000 108 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) R1 R2 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter 0000 CPU Instruction Register 把 0006 的數字放進 R0 ALU 把指令撈回 instruction register 裡面放著 109 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) R1 R2 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter 0001 CPU Instruction Register 把 0006 的數字放進 R0 ALU Program counter 裡面的值加 1 110 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) R1 1 R2 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter 0001 CPU Instruction Register 把 0006 的數字放進 R0 ALU 解讀 instruction register 內的指令, 把 0006 的數字放進 R0, 至此是一個 machine cycle (見後面說明) 111 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) R1 1 R2 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter 0001 CPU Instruction Register 把 0006 的數字放進 R0 ALU Program counter 裡面指向 0001 112 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) R1 R2 1 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter 0001 CPU Instruction Register 把 0007 的數字放進 R1 ALU 把指令撈回 instruction register 裡面放著 113 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) R1 1 R2 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter 0002 CPU Instruction Register 把 0007 的數字放進 R1 ALU Program counter 裡面的值加 1 114 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) R1 1 2 CPU R2 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter 0002 Instruction Register 把 0007 的數字放進 R1 ALU 解讀 instruction register 內的指令, 把 0007 的數字放進 R1, 至此是第二個 machine cycle 115 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) R1 1 2 CPU R2 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter 0002 Instruction Register 把 0007 的數字放進 R1 ALU Program counter 裡面指向 0001 116 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) 1 全部程式執行結束 R2 1R1 2 CPU 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter 0002 Instruction Register 把 R0, R1 的數字相加放入 R3 ALU 把指令撈回 instruction register 裡面放著 117 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 Register 0 (R0) R1 1 2 CPU R2 0005 0006 0007 1 2 1 0008 1 Control Unit Program Counter 0003 Instruction Register 把 R0, R1 的數字相加放入 R3 ALU Program counter 裡面的值加 1 118 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 0005 0006 0007 1 2 1 Register 0 (R0) R1 R2 Control Unit 1 2 3 Program Counter CPU 0008 1 0003 Instruction Register 把 R0, R1 的數字相加放入 R3 ALU 解讀 instruction register 內的指令, 把 R0+R1 放到 R2, 至此是第三個 machine cycle 119 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 0005 0006 0007 1 2 1 Register 0 (R0) R1 R2 Control Unit 1 2 3 Program Counter CPU 0008 1 0003 Instruction Register 把 R0, R1 的數字相加放入 R3 ALU Program counter 裡面指向 0003 120 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 0005 0006 0007 1 2 1 Register 0 (R0) R1 R2 Control Unit 1 2 3 Program Counter CPU 0008 1 0002 Instruction Register 把 R3 數字放進 0008 ALU 把指令撈回 instruction register 裡面放著 121 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 0005 0006 0007 1 2 1 Register 0 (R0) R1 R2 Control Unit 1 2 3 Program Counter CPU 0008 1 0004 Instruction Register 把 R3 數字放進 0008 ALU Program counter 裡面的值加 1 122 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 0005 0006 0007 0008 1 2 3 1 Register 0 (R0) R1 R2 Control Unit 1 2 3 Program Counter CPU 0004 Instruction Register 把 R3 數字放進 0008 ALU 解讀 instruction register 內的指令, 把 R2 放到 0008, 至此是第四個 machine cycle 123 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 0005 0006 0007 0008 1 2 3 1 Register 0 (R0) R1 R2 Control Unit 1 2 3 Program Counter CPU 0004 Instruction Register 把 R0, R1 的數字相加放入 R3 ALU Program counter 裡面指向 0004 124 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 0005 0006 0007 0008 1 2 3 1 Register 0 (R0) R1 R2 Control Unit 1 2 3 Program Counter CPU 0002 Instruction Register 全部程式執行結束 ALU 把指令撈回 instruction register 裡面放著 125 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 0005 0006 0007 0008 1 2 3 1 Register 0 (R0) R1 R2 Control Unit 1 2 3 Program Counter CPU 0005 Instruction Register 全部程式執行結束 ALU Program counter 裡面的值加 1 126 An Example of Add Operation between 1 and 2 RAM 0000 0001 0002 0003 0004 把 0006 的數字放 進 R0 把 0007 的數字放 進 R1 把 R0, R1 的數字相 加放入 R2 把 R2 數 字放進 0008 全部程式 執行結束 1 0005 0006 0007 0008 1 2 3 1 Register 0 (R0) R1 R2 Control Unit 1 2 3 Program Counter CPU 0005 Instruction Register 全部程式執行結束 ALU 解讀 instruction register 內的指令, 全部程式執行結束 127 Program Execution 也並不是每顆 CPU 都採這種三階段設計 • Instruction register (IR), program counter (PC) • Machine cycle 一次右側的循環 • Clock 一秒鐘可執行多少次這樣的 cycle, 下面可做 3G 次 搞清楚到底要做甚麼事情 類似於人類查表的動作 128 Logic/Bit Operations • Masking 藉由 0 或是 1 來去除或是保留某些資訊 129 Shift/Rotation 保留 MSB 的正負號 130 mask 設定成 1 之後左移 7 位 從程式的角度來看, 可以找到 簡單 shift + mask的應用如左 一個一個去 mask 看看, 如果 mask 之後為 0 則輸出 0, 反之為 1 記得每次都需要再把 mask 往右移動一格 01011 10000 01011 01000 01011 00100 00000 01000 00000 0 非0 0 131 Controller 概念上, 周邊裝置也是透過 bus 來與 CPU 溝通 • Specialized 譬如只針對硬碟的 SATA • General: USB, FireWire 以前的 1394 bus 是有限資源, 越多周邊裝置與 CPU 進 行溝通則 CPU-memory 通訊越容易被打 斷, 因此有 controller 利用 buffer 來降低 周邊裝置占用 bus 的時間 132 Memory-mapped I/O 如果週邊 I/O 存取的方式很不一樣的話, 又要多很多機器指令, 所以 一個方法是假裝我們只有記憶體存取, 但是設定好如果存取某些特定 位址就等同於存取某特定周邊裝置 133 Communication with Other Devices • DMA: direct memory access 除 CPU 外仍有其他裝置想存取記憶體, 但仍需中斷 CPU 取得允許 • Once authorized, controllers can access data directly from main memory without notifying CPU 等於可以通知 CPU 一次之後就持續可以使用記憶體, 大量用在譬如光碟機上 • Handshaking • 2-way communication • Coordinating activities • Parallel/Serial 概念上很多條線一起傳與一條線來傳的意思, 但是不見得 serial 一定比較慢 • Transfer rate: bit per second (bps, Kbps, Mbps, etc) 134 Pipelining fetch, decode, execute 的電路其實都不一樣, 所以嚴格說來可以同時間運作 • Throughput increased • Total amount of work accomplished in a given amount of time • Example: pre-fetching • Issue: conditional jump Branch prediction 可以克服, 譬如一個 1000 圈的 for-loop 有 999 次都跳回去 理想上, 你的效能提升三倍或是說 throughput 提升三倍 135 Parallel/distributed Computing • Parallel a=b+c, d=e+f 可變 (a, d)=(b, d)+(c, f) 現在 CPU 已經非常高頻, 現在轉向平行運算 • Multiprocessor 又叫向量化指令集 • MIMD, SISD, SIMD (Single Instruction Multiple Data), MISD • Distributed 多核之後 SIMD 自然變成 MIMD • Linking several computers via network • Separate processors, separate memory • Issues: • • • • Pipeline 可以想成是一種 MISD, 因為是多個指令, 但是同一份資料來源 譬如一個單一指令卻用在很多份資料上; 這天生很適 合用在多媒體資料上. 譬如對單一 pixel 調亮度, 但 是應用在全部 pixel 上的話等同對影像調亮度 平行跟分散之差別是前者有 shared memory Data dependency Load balancing Synchronization Reliability 136 To Parallelize XOR Not to Parallelize 以下都是依賴於 compiler 的 最佳化, 但是實務上蠻困難的 有 dependency, 所以 好像不能平行化 有更深的 dependency, 所以好像不能平行化 但是可以變成每 跳兩格就乘 2 CPU 1 的運算不依 137 賴 A[i], 所以可平行化 Speedup & Scaling 用來計算平行化到底可給你多少的加速 P 代表有多少百分比的工作是可被平行化, M 代表幾核, S 則是 (1-P), 則可大略算出加速 分子代表本來要執行的工作量, 分母代表若有平行化的工作量, 這樣對比出來的 ratio 就可以理解成是 speedup gain 以 P=0.8 來畫的 138 • Chapter 3: Operating Systems (OS) OS 是讓使用者能充分地利用所購買硬體資源的系統程式 139 Batch Processing 非常久遠以前的情境 除程式設計師外, 另有特別的電 腦操作員, 讓電腦能夠被充分運作 • Computer operators • First-in, first-out (FIFO) 電腦操作員可想成最原始的 OS, 也就是想辦法讓電腦忙碌 140 Interactive Processing • OS with remote terminals 程式設計師思考空檔很多, 所以導致電腦可以服務多人 上頁狀況在鍵盤螢幕被發明之後稍有改善, 變成核心電腦外接 很多線到很多終端裝置 (很自然叫 terminal), 讓 terminal 的 使用者能夠藉由鍵盤螢幕直接控制電腦 (terminal 不具 CPU) 141 Different Types of OS • Batch • Interactive • Real-time • Response time is critical 醫療軍事金融用途 • Time-sharing (multitasking) • Dividing time into intervals • Only one task is being performed at any given time • Multiprocessor • Load balancing and scaling 越多核對使用者體驗真的有幫助嗎? 142 Definition of OS • An OS is system software that manages computer hardware, software resources, and provides common services for computer programs – wiki 作業系統的定義有點籠統, 但基本上就是一個統整軟硬體資源的軟體 故事角色 電腦名詞 政府 作業系統 土地 記憶體 人 執行緒 祭司 CPU 類比: 古代社會對土地使用沒有規範, 因此人類成立政府管理 生活, 政府即可對人, 土地, 各種資源做有效管理與運用. 古代 政府則由祭司指導人類工作 執行緒是甚麼後面會解釋, 目前先不解釋 143 How OS Gets Started • OS is still a program and so it needs to be placed in the memory so that CPU can fetch and execute OS • Who places the OS in the memory and initially where can it find the OS? 開機流程是: BIOS → Boot loader → Bootstrap, 首先放在主機板 ROM 上的 BIOS 執行起來負責許多與硬體溝通和初始化硬體 的動作的程式, 幾乎每張主機板都有特別設計的 BIOS 負責設定 CPU 頻率, RAM 的速度, 抓取硬碟等等. 然後存在於 x86 機器上 硬碟 MBR (主要開機磁區) 可以把 OS 載入. OS 被載入後執行的一系列過程直至使用者可以使用為止的步驟叫做 Boostrap 144 Booting OS 也是個程式, 也要放在記憶體內才能執行, 那到底誰執行他的, 以及怎麼執行, 其實是 BIOS 電腦的 ROM 內的 boot loader 會到硬碟內載入 到記憶體, 再歸還權力給 OS, 而 boot loader 裡面 我們比較能與之溝通的部分通常稱 BIOS • Boot strapping (booting) 不求人, 自己來的意思 • You may change the booting sequence in BIOS (basic input/output system) 怎麼歸還控制權? 就是 把 PC 改到 OS 起始點 EEPROM, P 代表 programmable, 但是一燒進去壞掉 要整個更換, 所以發展出 E 代表 erasable, 可利用紫外線 照射 EPROM 來抹除資料, 但是這樣太麻煩, 所以又發 展出 electronic 電子式抹除資料方便很多 BIOS 怎麼知道 OS 在 哪裡? OS 會把自己固 定放在 MBR 這位置 145 Analogy of OS 類比: Alice 與 Bob 決定結婚, 因此請婚禮企劃人員撰寫結婚申請書並將之遞交紐約市政府, 市政府收到申請書後先放到籃子裡, 政府執行員從籃子依序逐行地閱讀申請書內容, 且每讀一行就做一件對應的事情. 譬如讀完第四行就馬上派人通知教堂做準備. 從以上可以看出紐約市政府是扮演統籌婚禮進行的主要單位, 有了這單位婚禮才能順利進行 故事角色 電腦名詞 紐約市政府 作業系統 婚禮企劃人員 程式設計師 結婚申請書 執行檔 結婚申請書格式 執行檔格式 內文 (企劃人員撰寫) 程式碼 (設計師撰寫) 結婚申請書 紐約市政府 市長: Xavier 申請日期: 2020/2/14 內文 第一行 Bob 今年 25 歲 男方姓名 籃子 記憶體 第二行 Alice 今年 23 歲 女方姓名 政府執行員 CPU 第三行 Bob 與 Alice 要結婚 事由 第四行 婚禮將辦在市郊教堂 舉辦地點 立刻請快遞通知遠處 教堂說有人要結婚, I/O 也就是輸入輸出 請他們做好準備 146 Analogy of Interrupt 假設政府執行員已經把前幾行讀完並且發配工作至遠方教堂, 因此正處於閒置狀態. 但此時遠方教堂新郎前女友出聲反對而中止 婚禮進行. 此時政府執行員收到遠方回傳的中止訊息後會前往處理中止行為, 代處理完畢後才返回市政府. 婚禮中止有很多種可能. 現在是 5 號, 而怎麼處理 5 號中止就用 1 號解決方法. 婚禮中止表 解決方法表 故事角色 電腦名詞 中止編號 中止原因 解決編號 解決方法表編號 解決編號 遠方婚禮中止 中斷 0 新郎爸爸反對 0000 0000 0: 0000 0000 說服對方 婚禮中止表 1 新郎媽媽反對 1: 0000 0000 2 新娘爸爸反對 2: 0000 0000 中斷向量表 (interrupt vector table) 3 新娘媽媽反對 3: 0000 0000 解決方法 4 新郎前男友反對 中斷服務常式 (interrupt service routine) 5 新郎前女友反對 5: 0000 0001 6 新娘前男友反對 6: 0000 0001 7 新娘前女友反對 7: 0000 0001 8 婚禮突然狂風暴雨 0000 0002 8: 0000 0002 換場地 9 有人搶婚 0000 0003 9: 0000 0003 把人搶回來 0000 0001 4: 0000 0001 不理對方 當電腦 I/O 出現問題時會發生中 斷, CPU 會把目前狀態儲存起來 (上頁第四行放入 stack), 接著查詢 中斷向量表 (IVT), 並跳到中斷服 務常式起始位址, 接著執行中斷服 務常式, 最後 CPU 完成工作, 回到 中斷點, 繼續執行後面任務 147 Stack and Pointer • Stack is a special memory with first-in-last-out property • Pointer: usually a memory cell contains data, but we can see a memory cell containing a memory address as a pointer 前頁當中, 就是利用 stack 來回到中斷前的地方繼續執行. 另外, ISR 的中斷服務常式其實是個 pointer 指到真正處理的程式 148 Functionalities of OS 每本書的見解與看法可能略有不同 • Main functionalities • Process management • Memory management • I/O management • Minor functionalities • Instruction explanation management • Network management (will be discussed in more detail in next chapter) 149 Process Management • Code (or program) is static; it is placed in the disk or memory • Process is kind of dynamic, or can be seen as a state; it is under execution 結婚申請書 紐約市政府 市長: Xavier 申請日期: 2020/2/14 內文 第一行 Bob 今年 25 歲 男方姓名 第二行 Alice 今年 23 歲 女方姓名 第三行 Bob 與 Alice 要結婚 事由 第四行 婚禮將辦在市郊教堂 舉辦地點 結婚申請書只是描述婚禮的內容, 是靜態的. 這 時候婚禮還沒正式舉行. 但是如果政府執行員 開始根據婚禮申請書進行閱讀與動作, 則婚禮 這件事正在被執行, 所以是動態的 靜態的部分可以被認為是程式碼, 只有程式開 始執行, 此時靜態才轉變成動態, 這時候的動態 被稱為行程或是程序 (process) 150 Process Management 比起之前的講法, 我們加入了一位【主負責人】, 其職責就是把每一行的程式碼交給政府執行員讓政府執行員去執行. 像這種動態 的執行就是行程. 而婚禮主負責人則被稱為主執行緒 (main thread) 結婚申請書 第一行 Bob 今年 25 歲 第二行 Alice 今年 23 歲 第三行 Bob 與 Alice 要結婚 主申請日期: 2020/2/14 負 責 人 男方姓名 運 行 女方姓名 方 向 事由 婚禮將辦在市郊教堂 舉辦地點 紐約市政府 市長: Xavier 內文 政府執行員 第四行 通常在一台電腦不會只有一個行程, 就好像一 個國家裡不會只有一件婚禮要進行. 但不管幾 件婚禮, 政府執行員只有一個, 所以要互搶政府 執行員來讓婚禮進行. 這種情況就是競爭. 有競 爭就需要管理, 這就是為什麼行程需要管理 回到電腦, 同時間內會有很多行程, 譬如你電腦 上防毒軟體正在持續運作, 你開了 Word 在打 報告的同時還在用 Spotify 聽歌. 多個軟體要 在同一時間被執行都得透過 CPU, 這樣軟體間 就有競爭, 因此需要管理 151 Memory Management • Protection Swapping 的結果就是記憶體會支離破碎 (fragmentation), 導致記憶體利用率下降; 另外, swap in 時還能否搬回原本 swap out 時的位置也是個問題, 這都是 memory management 要處裡的. • Protect program from accessing other program’s data • Protect the OS from user programs • Swapping Base and limit register 分別紀錄 process 起始記憶體位置 (base register) 跟 process 所佔記憶體位置大小 (limit register) 在程式執行時, process 有時會需要暫時離開記憶體, 之後 會再搬回來執行, 這就叫做 swapping, 搬上搬下的動作我 們稱為 roll out 跟 roll in. 而在這裡的硬碟 (disk) 我們會 152 將它稱作 backing store Memory Management • Paging: to deal with insufficiency memory, each part of process is partitioned into fixed-sized pages (usually a few KBs), store back and forth between memory and disk • Two confusing terminologies: • Page 將邏輯記憶體(logical memory)分成大小相同的block • Frame 將實體記憶體(physical memory)切割成固定大小的block • Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of the main memory • Swapping occurs when whole process is transferred to the disk, while Paging occurs when some part of the process is transferred to the disk 153 I/O Management • In the extreme case of no I/O devices for computers, you never know whether the computation results are correct or not • However, if I/O events occur, are we always required to handle them through CPU? • Not necessarily, we can resort to DMA (direct memory access) for the not-soimportant I/O events, where I/O devices directly access the memory 裝置具備 DMA 功能, 可讓它們隨時讀取和寫入系統記憶 體, 而不需要在這些作業中與系統處理器互動. 「由 DMA 驅動」攻擊是當系統擁有者不存在且通常花費不到 10 分 鐘時所發生的攻擊, 使用簡單到中等的攻擊工具 (不需要電 腦反組解碼的經濟實惠、現成的硬體和軟體) . 簡單的範 例是電腦擁有者離開電腦進行快速咖啡休息, 而在中斷時, 攻擊者會插入類似 USB 的裝置, 並離開電腦上的所有秘密, 或插入惡意程式碼, 讓他們能夠從遠端完全控制電腦. Kernel DMA Protection in MS Windows 10/11 154 Instruction Explanation Management 指令解釋管理其實就是你可以用一些指令直接 來跟 OS 溝通, 像是 Windows 的 cmd. 在裡 面可以使用類似 ipconfig 的指令查詢電腦的 IP 位址 (下一章節會談到) 155 System Calls 系統呼叫 • Memory is divided into user space and kernel space • General program runs on user space, but kernels and drivers run on kernel space • When the program running on user space wants to ask for higher privilege from the OS kernel, it resorts to system calls • System calls are provided by OS kernel and executed on kernel space • Function calls are provided by library and executed on user space 156 System Calls 系統呼叫 系統呼叫時, 參數的傳遞是非常重要的. 有三種較為常見的傳遞方法, 1. 簡單的就直接傳到暫存器內, 2. 在 Linux 較常用的是把參數存在 address of block 內再傳進去, 3.用 push 或 stack 的方式, 這樣參數的傳遞可以比較多 系統呼叫有哪些呢?其實系統 呼叫可以做的事有非常多, 像 是 process control、file management、device management、information maintenance、 communication 跟 protection. 那 system call 跟system program 有什麼區 別呢?簡單來說, system call 是programmer寫程式來跟系 統溝通, 而 system program 就是讓使用者運用來跟系統溝 通, 使用者不需用寫程式 157 Process States • New: generate a new process • Ready: after process generation, and the process is waiting for dispatch • Running: execute the instruction and data • Waiting: process becomes awaiting, waiting for the I/O completion • Terminate: process releases the resource 158 Thread: Basic Unit of Process 結婚申請書 紐約市政府 市長: Xavier 申請日期: 2020/2/14 內文 第一行 第二行 第三行 第四行 第五行 政府執行員 第六行 主 負 Bob 今年 25 歲 責 人 Alice 今年 23 歲 運 Bob 與 Alice 要結婚 行 方 前往敬酒 向 前往謝神 婚禮將辦在市郊教堂 男方姓名 女方姓名 事由 由於敬酒跟謝神都要花很多時間, 主負責人就 會卡在這兩步得花很久時間才能離開 回到電腦, 主執行緒就會卡很久才能把行程完 成 舉辦地點 159 Thread: Basic Unit of Process 結婚申請書 紐約市政府 市長: Xavier 申請日期: 2020/2/14 內文 第一行 第二行 第三行 第四行 第五行 政府執行員 第六行 主 負 Bob 今年 25 歲 責 人 Alice 今年 23 歲 運 Bob 與 Alice 要結婚 行 方 前往敬酒 向 前往謝神 婚禮將辦在市郊教堂 男方姓名 女方姓名 員 事由 工 1 主負責人聘請員工處理事情, 這樣就有兩人以 上可以同時處理事情 員 工 2 回到電腦, 主執行緒就不必卡很久才能結束行 程 舉辦地點 160 Process vs. Thread • Process • 每個應用程式至少都是一個 process • 對作業系統來說, 它是資源分配的最小單位 • Thread • 對作業系統來說, 它是最小的操作單位, 是 CPU 的最小執行單位, 它包含在 process 中 • Thread 是程式碼片段實際的執行者, 它可以存 取 process, OS resources 等等提供的記憶體 • 在執行程式時, thread 會將變數存在記憶體 stack 部分. Stack 會在程式 runtime 執行, 但在 thread 中的 stack 只有它自己可以使用, 無法讓其它 thread 共享 • Heap 則是 process 中的另一個屬性, 它可以被該 process 中的任何 thread 取用, 也就是 heap 是共享的記憶體空間 • OS 可以分配 CPU 直接給 thread 來進行工作, 然後在同一個 process 中的 thread 都可以共享 process 的記憶體空間 161 Process Control Block (PCB) • PCB: the information about process will be stored in memory • The information includes: • • • • • • • Process id Process state Program counter CPU registers CPU scheduling priority Memory management information I/O status information 162 Different OS in Real World 整合軟硬體資源的 OS 存在於電腦, 也存在於手機, 智慧型手表, 甚至是現在已經逐漸普及的智慧型車輛 163 Software Classification 實務上很難區分 application 與 utility, 所以也可以把軟體粗分成 application 與 system 即可 使用者能接觸到的只有 shell 164 Shells • Communication with users • Text based • GUI (graphics user interface), such as window manager 165 Kernel 至少會包含下面幾個 component • File manager • Directory/folder, path • Device drivers 由於 Windows 太普及, 裝置生產商都已經取得 MS 認證, 所以大家感覺不出 driver 重要性 • Memory manager 最主要就是管理主記憶體與虛擬記憶體 (拿硬碟當記憶體用) • Allocating main memory • Paging, virtual memory • Scheduler • Dispatcher 如果跑的程式比較大, 這時就會需要虛擬記憶體的幫助. 虛擬記憶 體是讓程式以為有連續的記憶體空間可以使用, 但事實上有些會存 放在 disk 上, 當有需要時再交換進來,因為程式在執行時, 並不是所 有的 code 都會用到, 所以可以將某部分放到 virtual memory 中 號稱是 kernel of kernel , 後面將會敘述 166 Linux World • Originally made by Linus Torvalds in 1991 • Freeware & open-source 兩個並不相同, 前者要求使用沒負擔並可以轉發 • Many distro (Linux distributions, http://distrowatch.com/) 但是 kernel 相同 • For beginner: Linux Mint • In fact, Linux means only the kernel • Servers, PCs, embedded systems (Android’s kernel is based on Linux) 167 Process • Process 行程 Scheduler 與 dispatcher 負責處理 程式正在執行的過程中的所有活動狀態就是 process • The activity of executing a program • Process state process 包含哪些狀態呢? 至少有以下狀態 • Program counter • General purpose registers • Associated memory cells • Process table 譬如要利用虛擬記憶體交換 到硬碟去之前要存下記憶體 • Memory area assigned to the process • Ready/waiting 工作管理員的概念 168 Process Administration • Scheduler 維護管理 process table, 但是其實很複雜, 因為包含怎麼配置記憶體 • Maintains the process table • Introduces new processes • Removes completed processes • Decides whether a process is ready or waiting • Dispatcher 每個 OS 做法都不一樣 真的去實際執行, 難處在 context switch, 也就是把 process state 存起來與載入 process state • Really execute the program • Controls the allocation of time slices to the processes in the process table • Process switch (context switch) by calling interrupt 現在是分配時間給兩個 process, OS 怎麼插隊來介入兩個 process 達成上述 process state 儲存與載入相關事宜呢? 是利用中斷 (interrupt), 這可想成一種 CPU 功能, 可註冊某個中斷編號, 呼叫該中斷編號時, CPU 就跳過去做那件事, 這裡就是 context switch 169 Different Types of Schedulers • FCFS (first come first serve) scheduler • SJF (shortest job first) scheduler • RR (round robin) scheduler 能否搶走別人行程的進行 • Schedulers can also be divided into preemptive and non-preemptive Non-Preemptive SJF: 當一個行程 拿到 CPU, 不會被搶佔直到他完成 Preemptive SJF: 當有新的行程且他的 CPU burst 的長度比較小, 搶佔發生 170 Multiprogramming (Time-sharing) Between 2 Processes Context switch 越多, 真 正執行工作的時間越少 Time slice 越短, 越多浪 費在 context switch 上 171 Semaphores 若是程式之間互有關聯, 譬如兩個程式要寫同一個檔案, 但一個寫 A 另一個寫 B? 多個程式互搶資源的話, 則要確定同時間內只有一個程式拿到資源, 這叫 mutual exclusion • A visual signaling apparatus with flags, lights, or mechanically moving arms, as one used on a railroad Test-and-set 像是 • Atomic Test-and-Set 期間中斷不可發生 檢查旗子與插旗宣告 執行權, 但是若檢查 • Critical region 同時間進入這區段 (資源) 的只能一個 沒有旗子後要插旗中 間突然來個中斷則所 • Mutual exclusion 有機制就無用了 要達到 mutual exclusion 就是使用 semaphore, 右圖的 lock 172 Prerequisites for Deadlock Process 可能會要很多資源, 資源權到期之前只能等待 • Deadlock may occur only if all three of the following (necessary but insufficient) conditions are satisfied: • Competition for non-shareable resources 不是一開始就要 100KB 記憶體, 而是中途才動態要 • Resources are requested on a partial basis; i.e., having received some resources, a process will return later to request more • Once a resource has been allocated; it cannot be forcibly retrieved 資源沒辦法強制取回 要解決 deadlock 就是讓三個條件至少一個不成立; 譬如 preemptive OS 就代表隨時可搶資源, 但副作用是導致 context switch 變 173 多與 process 執行不正確, 另外也有可能強制規定一定要一次性配置所有資源後 process 才可以進行 Deadlock vs. Starvation 避免了 deadlock 有可能導致 starvation • Starvation: process cannot get the resources needed for a long time because the resources are being allocated to other processes • Aging: adding an aging factor to the priority of each request A 拿了 1, 2 之後執行到 一半突然又要 4, 若不把 4 給 A, 則造成 deadlock, 若把 4 給 A, 則可能造 成 starvation 174 • Chapter 5: Algorithms 175 Definition of Algorithm • [D. Knuth] A finite, definite, effective procedure, with some output • [Cormen et al. Introduction to Algorithms] A well-defined procedure for transforming some input to a desired output • Input: may have • Output: must have • Definiteness: must be clear and unambiguous 「演算法」白話理解就是一段清 • Finiteness: terminate after a finite number if steps 楚描述以解決問題的步驟 • Effectiveness: must be basic and feasible with pencil and paper 可實現 • Procedure: the sequence of specific stopes in a logical order 176 Interval Scheduling 只有一台機器, 要處理越多工作越好 • Given: set of jobs with start times and finish times • Goal: find maximum cardinality subset of mutually compatible jobs 工作之間不能重疊 b 0 1 2 a c 3 4 d e 5 f 6 7 g 8 9 h 10 11 Time 177 Interval Scheduling 只有一台機器, 要處理越多「有價值的」工作越好 • Given: set of jobs with start times and finish times • Goal: find maximum cardinality subset of mutually compatible jobs 12 0 1 2 23 20 3 4 26 13 5 20 6 7 11 8 9 16 10 11 Time 178 Bipartite Matching 要配對越多越好的業務人員與任務 • Given: bipartite graph • Goal: find maximum cardinality matching 179 Dominating Set 電信骨幹網路 • Given: graph • Goal: find minimum cardinality dominating set Subset of nodes s. t. all nodes are ``covered’’ 180 Competitive Facility Location 遊戲人工智慧 • Given: graph with weight on each node • Rules: • Two competing players alternate in selecting nodes • Do not allow to select a node if any of its neighbors have been selected • Goal: select a maximum weight subset Second player can guarantee 20,but not 25 181 Five Representative Problems • Efficiently solvable • Interval scheduling: nlogn greedy algorithm • Weighted interval scheduling: nlogn dynamic programming algorithm • Bipartite matching : nk max-flow based algorithm • Hard • Independent set: NP-complete • Competitive facility location: PSPACE-complete (even harder!) 182 Intrinsic Computation Tractability • Intrinsic computational tractability: An algorithm’s worst-case running time on inputs of size n grows at a rate that is at most proportional to some function f(n) 我們現在變成在意其「大略的成長率」 • f(n): an upper bound of the running time of the algorithm • Q: What’s wrong with 1.62n2 + 3.5n + 65 steps? • A: We’d like to say it grows like n2 up to constant factors • Too detailed • Meaningless • Hard to classify its efficiency Insensitive to constant factors and low-order terms • Our ultimate goal is to identify broad classes of algorithms that have similar behavior • We’d actually like to classify running times at a coarser level of granularity so that similarities among different algorithms, and among different problems, show up more clearly 183 Asymptotic Notations O, Ω, θ • Let T(n) be a function to describe the worst-case running time of a certain algorithm on an input of size n • Asymptotic upper bound: T(n) = O(f(n)) if there exist constants c > 0 and N0 ≥ 0 such that for all n ≥ N0 we have T(n) ≤ cf(n) • Asymptotic lower bound: T(n) = Ω(f(n)) if there exist constants c > 0 and N0 ≥ 0 such that for all n ≥ N0 we have T(n) ≥ cf(n) • Asymptotic tight bound: T(n) = θ(f(n)) if T(n) is both O(f(n)) and Ω(f(n)) 184 Example: O, Ω, θ • Q: T(n) = 1.62n2 + 3.5n + 8, true or false ? T(n) = O(n) T(n) = O(n2) T(n) = O(n3) T(n) = Ω(n) T(n) = Ω(n2) T(n) = Ω(n3) T(n) = θ(n) T(n) = θ(n2) T(n) = θ(n3) • A: 2, 3, 4, 5, 8 Easier way to infer O Given f(n) and g(n) are two functions, f(n) = θ(g(n)) if limn∞f(n)/g(n) exists 185 Abuse of Notation • Q: Why using equality in T(n) = O(f(n))? • • • • Asymmetric: f(n) = 5n3 ; g(n) =3n2 f(n) = O(n3) = g(n) 造成奇怪的等式 but f(n) ≠ g(n) • Better notation: T(n) ∈ O(f(n)) • O(f(n)) forms a set • Cf. “Is ” in English Big-O notation 應該是一種集合的概念, 但是演算法分析上我們已經慣用等於符號 這邊的 is 其實是 belong to 的意思 • Aristotle is a man, but a man isn’t necessarily Aristotle 186 The Interval Scheduling • Given: Set of requests (1, 2, ….,n), where ith request corresponds an interval with start time s(i) and finish time f(i) • Interval i: [s(i), f(i)] • Goal: Find a compatible subset of the requests with maximum size • Execute as many tasks as possible b 0 1 2 a c 3 4 d e 5 f 6 7 g 8 9 h 10 11 187 Time Greedy Rule 算是 greedy algorithm 的一種樣版 • Repeat • Use a simple rule to select a first request i1 • Once i1 is selected, reject all requests not compatible with i1 • Until run out of requests • Q: How to decide a greedy rule for a good algorithm? • A: 上面的 simple rule 怎麼設計呢? 有很多種, 不只下面四種而已, 但是可能影響最終的 optimality • • • • Earliest start time: min s(i) Shortest interval: min{f(i)-s(i)} Fewest conflicts: min i=1…n |[j: is not compatible with i]| 感覺這個很好 Earliest finish time: min f(i) 但是這個才能達到 optimal, 因為最早結束的那個人也最早把 resource 釋出 代表能達到 optimal 的 greedy algorithm 的設計也不見得真的直覺 188 Counterexamples (Awful Cases) • Earliest start time: min s(i) • Shortest interval: min{f(i)-s(i)} • Fewest conflicts: min i=1…n |[j: is not compatible with i]| 189 The Greedy Algorithm • The 4th greedy rule in last page leads to the optimal solution • We first accept the request that finishes first • Natural idea: Our resource becomes free ASAP • The greedy algorithm: Interval-Scheduling(R) // R: undetermined requests; A: accepted requests 1. A = ∅; 空集合 2. While (R is not empty) 3. choose a request i ∈ R with minimum f(i) // greedy rule 4. A = A+ {i} 5. R = R-{i}-X, where X={i: j ∈ R and j is not compatible with i} 6. Return A 190 Structure of Pseudocode • Pseudocode is a notational system in which ideas can be expressed informally during the algorithm development process Assignment statement Conditional statement Iterative statement 191 More Complicated Structure of Pseudocode 可以合併上述多個 structure 成為更大的 structure 192 The Interval Scheduling Problem 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time 193 The Interval Scheduling Problem 1 2 3 4 5 6 7 8 9 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time 194 The Interval Scheduling Problem 1 2 3 4 5 6 7 8 9 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time 195 The Interval Scheduling Problem 1 2 3 4 5 6 7 8 9 2 1 0 1 2 3 4 5 5 6 7 8 9 10 11 12 13 14 15 16 Time 196 The Interval Scheduling Problem 1 2 3 4 5 6 7 8 9 2 1 0 1 2 3 4 5 5 6 7 8 9 8 10 11 12 13 14 15 16 Time 197 Warm-Up: Searching • Problem: Searching • Given • A sorted list of n distinct integers • Integer x • Find • j if x equals some integer of index j • Solution: • Naïve idea: compare one by one • Correct but slow: O(n) • Better idea? • Hint : input is sorted 有些條件沒用到 198 Divide-and-conquer paradigm • Divide: Break the input into several parts of the same type • Conquer: Solve the problem in each part recursively • Combine: Combine the solutions to sub-problems into an overall solution Binary search on a sorted array • Divide: check the middle element • Conquer: search the subarray recursively • Combine: trivial 每次可以把「沒有用的部分」刪除一半, 這是很強的利基 199 Non-Divide and Conquer Approach: Insertion Sort 要花大約 n2 次的 operation 200 Merge Sort • Sorting problem • Given a set of n numbers • Find Sorted list in ascending order • Mergesort fits the divide-and conquer template • Divide the input into two halves • Sort each half recursively • Merge two halves into one A L G O R I T H M S A L G O R I T H M S A G L O R I I M S T A G H I L M O R S201 T Merge Sort 適用於 sorting a linked list • The base case: single element (trivially sorted) Mergesort (A, p, r) //A[p..r]: initially unsorted 1. If (p < r) 2. q = ⌊(p+r)/2⌋ 3. Mergesort (A, p, q) 4. Mergesort (A, q+1, r) 5. Merge(A, p, q, r) • • • • Divide: line 1-2, D(n) Conquer: line 3-4, 2T(n/2) Combine: line 5, C(n) T(n) = D(n) + 2T(n/2) + C(n) = O(1) + 2T(n/2) + O(n) = O(nlogn) Array 的話 divide 花 O(1) 可以自己試著代入看看 若準備另一陣列放答案的話, 簡單可達到 C(n) = n Recursion: Another Way to Compose Program 203 Recurrence Relation • Running time: T(n) • Base case: for n ≤ 2, T(n) ≤ c • T(n) = 2T(n/2) + D(n) + C(n) • T(n) = 2T(n/2) + O(1) + O(n) • T(n) ≤ 2T(n/2) + cn Mergesort (A, p, r) //A[p..r]: initially unsorted 1. If (p < r) 2. q = ⌊(p+r)/2⌋ 3. Mergesort (A, p, q) 4. Mergesort (A, q+1, r) 5. Merge(A, p, q, r) • Q: Why not T(n) ≤ T(⌊n/2⌋) + T(⌈n/2⌉) + cn? • A: The asymptotic bounds are not affected by ignoring floor and ceiling 204 Solving Recurrences • Two basic ways to solve a recurrence • Unrolling the recurrence (recursion tree) 單純就是硬展開 • Substituting a guess 先猜一個答案然後再用數學歸納法證明它, 但其實不常用 (沒什麼經驗的話很難猜到) • Initially, we assume n is a power of 2 and replace ≤ with = • T(n) = 2T(n/2) + cn • Simplify the problem by omitting floor and ceilings • Solve the worst case 205 Solving Recurrences – Unrolling Recurrence • Merge sort has running time T(n) = 2T(n/2) + cn 每一層都要花 O(n) 時間, 但是總共有 O(logn) 層, 所以總共要花 O(nlogn) 時間 206 Master Theorem for Recurrence • A good theorem for deriving asymptotic analysis • The proof can be found below https://www.cs.cornell.edu/courses/cs3110/2012sp/lectures/lec20master/mm-proof.pdf • Involves drawing the recurrence tree and some approximations using the geometric series progression 207 Quick Sort 時間複雜度的分析與 merge sort 差不多想法, 但是 tricky 一點, 這邊我們省略 • An alternative for merge sort and also follow divide-and-conquer Quicksort(A[1, …, n]) 1. If (n > 1) 2. Choose a pivot element A[p] 3. r = Partition(A, p) 4. Quicksort(A[1, …, r-1]) 5. Quicksort(A[r+1, …, n]) 所以只要找到一個 【能不多佔用空間 又能線性時間】的 Partition function 就好了 208 pi 代表要放置 pivot 的位置, i 則是逐次往右移動 209 210 Fibonacci Sequence • Recurrence relation: Fn = Fn-1 + Fn-2, F0 = 0, F1 = 1 • e.g., 0, 1, 1, 2, 3, 5, 8, … • Direct implementation: Fib(n) 1. If n ≤ 1 return n 2. Return fib(n-1) +fib(n-2) 211 What’s Wrong? Fib(n) 1. If n ≤ 1 return n 2. Return fib(n-1) +fib(n-2) • What if we call fib(5)? • • • • • fib(5) fib(4)+fib(3) fib(3)+fib(2)+fib(2)+fib(1) fib(2)+fib(1)+fib(1)+fib(0)+fib(1)+fib(0) fib(1)+fib(0)+fib(2)+fib(1)+fib(1)+fib(0)+fib(1)+fib(0) 太多需要重複計算 212 Dynamic Programming – Memoization • Store the values in a table • Create a table before a recursive call • Top-down! • The control flow is almost the same as the original one Fib(n) 1. Initialize f[0...n] with -1 // -1: unfilled 2. f[0]=0; f[1]=1 3. Fibonacci(n, f) Fibonacci(n, f) 1. If f[n] = -1 2. f[n] = Fibonacci(n-1, f) + Fibonacci(n-2, f) 3. Return f[n] // if[n] already exists, directly return 5 4 3 2 1 213 Dynamic Programming – Bottom-Up • Store the values in a table • Bottom-up • Compute the values for small problems first Fib(n) 1. Initialize f[0..n] with -1 // -1: unfilled 2. f[0] = 0; f[1] = 1 3. For i = 2 to n do 4. f[i] = f[i-1]+f[i-2] 5. Return f[n] 從 f[0], f[1], 這樣由前面往後面算的話, 可以保證算後面時需要的每個前面結果都已經有了 5 4 3 2 1 214 • Chapter 4: Networking and the Internet 224 Starting with Your Own Experience • Usually, users (clients) connect to a remote server (e.g., Google, Instagram) to browse the webpage • Browsing is like downloading and displaying stuffs from a remote server • To connect to a server, you need to know server’s address (location) • Layered structure of computer communications is important 225 Protocols 協定: 為了訊號能夠有效率的溝通, 所定義出來的一種標準 • Protocol is composed of data format and communication procedure • Data format indicates what kind of format will be used for data • Communication procedure setups the rules for the processing order and content • A large number of errors might occurs during communications, and so we handle this matter by setting up proper data format and communication procedure • Standardizing a protocol requires much effort from different stakeholders 網際網路工程任務小組, 主要負 責網際網路相關技術之標準化 國際電信聯盟, 主要負責廣泛的 電子通訊及無線通訊相關技術 標準化 第三代合作夥伴計劃, 主要負責 第三代行動電話等相關技術標 226 準化 Network Architecture (OSI Model) Open System Interconnection 高 層 共同功能都盡量 放在下層, 較個別 功能放在上層 只要設定好每層 的邊界, 就能讓通 訊協定的定位更 明確, 重組更輕鬆 暫且看不懂沒關 係, 之後會講細節 低 層 227 Network Architecture (TCP/IP Model) 228 LAN and WAN • Network can be divided into LAN and WAN • LAN (local area network) is a smaller network in a physical location • • • • LAN is like you connect a couple of computers together, forming a network in office LAN is commonly achieved by Ethernet Ethernet has MAC and PHY layers control Different Ethernet have different performance • 1000BASE-T (1Gbps) and 10GBASE-T (10Gbps) • WAN (wide area network) is a larger network usually across different physical locations • WAN usually connected by ISPs (Internet Service Provider) such as Chunghwa Telecom • WAN achieves interconnection among different sites • WAN ≠ Internet, though they both rely on internetworking 229 LAN Topology 所有訊號都 廣播出去 乙太網路 其實是單 方向傳輸 230 LAN Protocols • Token ring 拿到 token 的機器才可以發言 • Popular in ring topology • Token and messages are passed in one direction • Only the machine that gets the token can transmit its own message • CSMA/CD (carrier sense, multiple access with collision detection) 避免碰撞 • Popular in bus topology (wired Ethernet) 廣泛用在有線的乙太網路 • Broadcasting • When collision, both machines wait for a brief random time before trying again • CSMA/CA (carrier sense, multiple access with collision avoidance) • Popular in wireless Ethernet • Broadcasting 訊號有時間差, 馬上講的話可能會碰撞 • Detect if a channel is idle, if so, wait for a brief random time and then detect again If the channel is still idle, start sending 231 Wireless & Access Point • Wi-Fi (wireless fidelity) • IEEE 802.11 (b, g, i, n, ac, ...) • Wi-Fi 6 (802.11ax) 232 Internetworking • Internetworking is a mechanism that enables multiple networks connected to each other • Usually, we rely on internetworking to build up an even larger network • The advantages of internetworking can be as follows • Avoid unnecessary communications over the entire network • Limit the range and impact of network fault • Have a better network management according to the stakeholders • To have internetworking, we are required to have a protocol for internetworking • IP (Internet Protocol) is the internetworking protocol commonly used in Internet • Specifically, because each network has different network address, we transmit data to the target network according to the network address through routing mechanism • Internet refers to the Internet, while internet means networks interconnected to each other 233 Circuit Switching and Packet Switching • Switching: transmit data to the target during the communication • Circuit switching: building a (virtual) cable/line between source and target • Packet switching: divide data into packets, and then transmit packets to target with shared cables/lines differently • Packet means partitioning data according to predefined way, and then adding a header to each packet • Frames are used in Ethernet but it has the same idea with packet • Both packet and frame can be called PDU (protocol data unit) 234 TCP/IP Model 1982 年, 由 ISO 與 ITU 共同創立 OSI Model, 同時間另一項 TCP/IP Model 也被研 究學界提出, 近年來因為 OSI Model 複雜所以 TCP/IP Model 較為流行 235 TCP/IP Model • Network interface layer • Establish a connection to a direct-connected peer • Also encompass the physical devices • The functionality is achieved through the Ethernet-standardized NIC (network interface controller) NIC 就是指網路卡 • Internet layer • Most important is forwarding (or routing), which transmits data to a indirectconnected peer • Transport layer • Depending on different objectives, this layer selectively chooses high-reliability communications or real-time-but-not-reliable communications • Application layer 應用程式就是將每一層的協定加以組合應用; 譬如若要存取網頁伺服器的話, (應用層, 傳輸層, 網際網路層, 網路介面層) 的選擇可以是 (HTTP, TCP, IP, Ethernet), 但若是要 瀏覽影片或是打網路電話的話, 選擇可以是 (RTP, UDP, IP, Ethernet) 236 Network Interface Layer in TCP/IP Model • Establish a connection to a direct-connected peer • This layer does NOT have internetworking functionality • The functionality is achieved through the Ethernet-standardized NIC (network interface controller) • A representative protocol in this layer is Ethernet • Commodity computers mostly use Ethernet • Wi-Fi establishes LAN in a wireless manner • PPPoE (PPP over Ethernet) establishes a peer-to-peer connection • Ethernet in essence is a broadcast mechanism 讓你撥接後能夠寬頻上網, 這裡我們不會多提 • The hardware in this layer has unique address: MAC address • ARP (address resolution protocol) maps between IP and MAC addresses ARP 作用與原理後面才講 237 MAC Address • MAC address (medium access control address) is also called physical address and, in principle, is unique over the world • Some software tool can be used to modify MAC address manually • Ethernet transmits frames to the target by specifying MAC address • Frame with MAC is broadcast on the bus in old Ethernet (10Base-2, 10Base-5) • Frame with MAC is sent to hub/switch that memorize the mapping between MAC and port in MAC address table first, and then directed to particular port in new Ethernet (100Base-TX, 1000Base-T) port 製造商編號 機型編號與產品序號 238 Internet Layer in TCP/IP Model • Main purpose is to connect multiple networks • Most important is forwarding (or routing), which transmits data to a indirect-connected peer • In the network interface layer, everyone can see its direct peers only • A representative protocol in this layer is IP (Internet protocol) • IPv4 vs. IPv6 • Another representative is ICMP (Internet control message protocol) 這個指令能回報 ping 封包到目的端設備來回所需的最少時間, 最大時 間與平均時間, 可以用來確認到指定設備之間網路路徑的可靠程度 利用增加存活時間 (TTL) 來實現其功能; 每當封包經過一個路由器, 其存活時間就會減 1. 當其 240 存活時間是 0 時, 主機便取消封包, 並傳送一個 ICMP TTL 封包給原封包的發出者 Transport Layer in TCP/IP Model • Depending on different objectives, this layer selectively chooses highreliability communications or real-time-but-not-reliable communications • Two representative protocols in this layer: TCP and UDP • TCP (transmission control protocol): reliable data transfer • ACK after receiving the packets • Used by HTTP and SMTP HTTP 是指網頁傳輸, SMTP 則是傳送接收 email • UDP (user datagram protocol): real-time but unreliable • No ACK when receiving the packets • Used by DNS and NTP DNS 是指域名解析, NTP 是時間同步協定 241 How TCP Establish Reliable Communications • Important: the underlying communication channel is indeed unreliable • Six techniques/steps in TCP to ensure reliable communications 1. 2. 3. 4. 5. 6. 為資料加上編號, 以確保資料傳輸順序 (序號) 確認所接收到資料是否有誤 (錯誤偵測) 確認對方已經收到正確的資料 (確認回應) 請求重送未被送達的資料 (滑動視窗) 傳送資料時會配合通訊對象的步調 (流量控制) 依網路塞車的狀態來調整傳輸速度 (壅塞控制) • Sliding window is the main tool for flow control 避免高速裝置癱瘓低速接收端 • Dynamically adjust sending speed in congestion control 避免高速裝置癱瘓網路 242 TCP Handshake • TCP starts with a three-way handshake and terminates with another handshake 243 Application Layer in TCP/IP Model • Lots of applications layer protocols can be found • temporarily ignore the port # in the table below 244 Packets in Different Layers • Headers will be added to the data from the upper layer • Eventually, the physical signal will be transmitted • After receiving the signal, receiver decodes the signal and read headers 245 IP Address: 4 Byte Address • General rule: each computer (or online device) has a unique IP address • Thus, we need to dispatch IP addresses to devices according to some regulation • ICANN (Internet Corporation for assigned names and numbers) is an organization for managing IP addresses • ICANN dispatches a specific range of IP addresses to regional Internet registry (RIR for short, but APNIC in Asia), which then dispatches IP addresses to either national Internet registry (NIR, and TWNIC in Taiwan) or local Internet registry (LIR) • Chunghwa Telecom is an LIR in Taiwan 246 Port Number 任何一個 Socket 都給予一個特殊號碼 (IP number + TCP port), 使用者之間 只要記住對方的 Socket 號碼, 便可以直接通訊 • Ports are used to indicate the functionality of the target computer • Standardized across all network-connected devices, with each port assigned a number from [0, 65535] • Port number is functionality provided by TCP/UDP • While IP addresses enable messages to go to and from specific devices, port numbers allow targeting of specific services or applications within those devices 247 Public and Private IP 公有與私有 IP • Public IP refers to IP addresses unique over Internet • Private IP means those IPs not satisfied with the above requirement • Private IP cannot be used on Internet • IPv4 provides 232 IP addresses in total 大約有 43 億個 • However, due to the Internet of Things era, most devices can connect to Internet and therefore have a significant demand for IPs • IPv4 address exhaustion: the depletion of the pool of unallocated IPv4 addresses • People resorts to private IP and NAPT to solve the IPv4 address exhaustion 248 IP Address Class and Subnet Mask • IP address has 32 bits 用來找出 IP 的網路部分 左邊代表網路, 右邊代表主機 • Lefthand side refers to network, while righthand side refers to host • IP address has classes from A to E (D and E are for special purpose) • The difference among A, B, C is their max number of supported devices • Class A supports 16,777,214 devices but C supports 254 devices only 249 IP Subnetting 子網路切割 子網路的劃分是一個將主機部分的若干位分配到網路部分的過程 • When you have a class A network, you are able to deploy 65,536 devices • Nonetheless, you only have 10 devices, causing waste of IP addresses • Your company has different departments, each of which plans to have its own network • IP subnetting: because of the above requirements, one can partition the class A network into smaller ones through moving network address to host address 250 Broadcast and Multicast • Unicast: one-to-one communication • Broadcast: send data to all of the computers on the same Ethernet • In fact, two approaches are used • Send packet to 255.255.255.255 (limited broadcast destination address): the packet will be forwarded to all of the computers on the same Ethernet • The router will become the boundary • Send packet to the IP address whose bits in host address are all 1 (directed broadcast address): depending on the requirement, the router will forward the packet to another network, whose router will broadcast to all of the computers in the target network • Multicast: send data to some of the computers in a specific group 251 Connecting Compatible Networks 以下機器只在同個協定的網路 • Hub: a junction with broadcast functionality • Switch: a junction with unicast functionality • They all work on L1 (PHY) and L2 (Link) 252 Connecting Compatible Networks Hub 是屬於實體層的設備, 所有資料只會 視為電子流, 只做 flood.另外 Hub 也有訊 號增益的功能, 所以能將每個 Port 視為一 個 Repeater. 這張圖我們可以看到, Hub 會將資料沖刷到每個 Port 造成頻寬壅塞, 故集線器只適合用於臨時串接 Switch 則是資料鏈結層的設備,L2 的設 備最大的不同是他會以學習的方式記錄每 一個 Port 底下設備的 Mac Address,再 根據 Mac Address Table 內的資料選擇 要將封包轉發至哪個 Port 253 Connecting Incompatible Networks • Router 以下機器在不同個協定的網路 路由器 • Main functionality is to forward IP packet in network layer • From user’s perspective, router forwards packets among independent Ethernets 實體層與資料鏈結層定義了 Ethernet 的 涵蓋範圍, 透過網路層, 協助 Ethernet 工 作範圍內的各個獨立網路進行資料中繼 254 ARP (Address Resolution Protocol) • Fact • Devices in Ethernet direct to the destination through MAC address • Devices in TCP/IP direct to the destination through IP address • So, one needs another protocol to fit them both • ARP (address resolution protocol) • Main purpose is to convert IP to MAC address through broadcast • The packet is routed to the destination according to the network address 首先 A (163.15.2.1) 欲透過 Ethernet 傳送訊息給 IP = 163.15.2.4, 則發送出 ARP Request (查問 163.15.2.4) 廣播 到所屬網路區段內. 所有主機都會接收到 ARP Request, 並分 解是否詢問自己, 如果不是就不予理會而拋棄. C (163.15.2.4) 收到 ARP Request 後, 發現詢問自己則回應 ARP Reply (Ethernet 位址) 給 A (163.15.2.1) 255 Router Again 預設閘道可被想成是當目的端未知時所預設的傳送對象 • On Ethernet, many protocols (e.g., ARP) do broadcast for data exchange • If there are many devices, it would cause awful performance • Hence, use router as the boundary of broadcast domain • Partitioning into network segment (or broadcast domain) mitigates broadcast storm • Default gateway is the router of your network segment 網段 預設閘道 當PC0 (192.168.10.10/24) 要 ping PC3 (192.168.20.10/24) 時, ARP 會判斷目標 IP 是 否在同一網段. 此例是跨 網段, 故 ARP 會回覆預 設路由的 MAC 256 Domain Name 網域名稱 • It is difficult to memorize IP address, though IP is used for routing • Routing: the procedure for the router to forward packets • Domain name is the name of a computer on network • Routers still use IP to do routing; later we will mention how to map between domain name and IP • Domain name is unique and managed by ICANN 國家與地區頂級域名 • ICANN is in charge of TLD (top level domain) and ccTLD (country code TLD) • ICANN does not have a direct management; instead, different registries does so • For example, Verisign manages .com and .net, while TWNIC manages .tw 註冊管理機構 257 URL (Uniform Resource Locator) • The illustration below shows the relation among ICANN, Verisign, and user • URL: complete web address used to find a particular web page • While domain is website’s name, URL will lead to a page within the website • Host names are sometimes called domain names 258 Routing Protocols • Many routers sit between the source and destination • Routing: how to choose a best path and forward packets through the best path • In particular, individual routers are owned by different organizations but they each maintains a routing table to determine the best path from source to destination • Static routing: routing table is fixed and needs to be modified manually • Dynamic routing: routers exchange routing tables to update the best paths in routing tables through pre-defined protocols • IGP (interior gateway protocol) includes RIP/RIP2 (routing information protocol) and OSPF (open shortest path first) • EGP (exterior gateway protocol) includes BGP (border gateway protocol) • IGP handles routing within an AS (autonomous system) and EGP handles routing between ASs • AS is an ISP-scale network having the same routing policy 本頁未標顏色的關鍵字不需特別記 259 DHCP 利用廣播的方式取得 IP 與網路資訊, 一旦拿到 IP 之後就不需要進行 DHCP 了 • How to setup default gateway • You can manually configurate the corresponding IP address or rely on DHCP • DHCP (dynamic host configuration protocol) automatically assigns IP addresses and other configurations to devices connected to the network using a client–server architecture • So, for example, the device with DHCP will receive IP and default gateway 260 NAT and NAPT • If you connect your multiple computers together, then they can be connected with private IP • Unfortunately, private IP cannot connect to Internet • NAT (network address translation) • NAPT (network address port translation), also called port forwarding NAT 指的是 router 有分配到 n 個 public IP 時就可以讓內網的 n 個 private IP 上網; 但是因為 NAT table 單純做 public/private IP 對照, 所以 頂多就是 n 個 private IP 上網. 但是 NAPT 的 router 即使只有一個 public IP, 但是因為利用 port 來記錄資訊, 因此可以接受 65536 個 private IP 上網 261 L2 Switch • L2 Switch is the switch we mentioned previously • The term L2 puts emphasis on working on layer 2 (MAC layer) • Switch has a MAC table • When receiving an Ethernet frame, switch checks the MAC address of destination and forwards the frame to the destination • MAC table is a mapping between MAC and switch port • How to update MAC table • When connecting to a port for the first time, a computer sends an Ethernet frame to update MAC table • Sometimes even though a computer X connects to a port, it does not sends an Ethernet frame to update MAC table • During this period, when a frame needs to be sent to X, the switch floods to all computers connected to the switch 262 VLAN (Virtual LAN) • VLAN: Logically partitioning a network into a couple of networks • LAN is physically formed, as all devices connected to the same switch form a LAN • VLAN has the advantages • Performance: reduce the size of broadcast domain • Security: act as a firewall 之後才會講 • VLAN have many kinds: port-based VLAN and tag-based VLAN Port-based VLAN Port-based VLAN with 2 switches Tag-based VLAN with 2 switches 263 L3 Switch • L3 switch is a switch with routing mechanism • Routing aims to support communications among VLANs • Also called IP switch or switch router 考慮各 VLAN 有不同網段 Router 通常定位在跨 WAN 的邊界連接用, 但 Router 很少提供大量的 Port 數 (因為很少會有數十條 WAN 同時接進來), 而且 因為他專注在速度較慢的跨 WAN 應用上, 所以對於封包轉送的效能, 不如用在內網的 L3 Switch 來得高. L3 Swtich 比較專注 在企業內網的 LAN 環境應用, 缺乏跨大型 WAN 網路需要的調度能力如 BGP 與高容量記憶體(用來儲存數十萬條路由) 264 Interprocess Communication • Server-client • One server, several clients • Clients initiate communications by sending requests • Server serves • P2P (peer-to-peer) • Two processes communicating as equals • The most popular distribution mode nowadays 265 The Internet 注意這邊是大寫的 I • The most notable example of an internet is the Internet • Original goal was to prevent disruptions caused by local disaster • Deviated from the advanced research projects agency network (ARPANet) around 1960 • 4 nodes — UCLA, SRI, UCSB, UTAH • Now it‘s a commercial undertaking Robert Kahn Vin Cerf 266 Internet Applications • VoIP (voice over Internet protocol) • email (electronic mail) • FTP (file transfer protocol) • telnet & ssh (secure shell) • P2P: bittorrent, edonkey, emule... 267 WWW (World Wide Web) • www, w3, web • hypertext, hyperlink, hypermedia Hypertext 代表文字不只是文字, 可以點選後跳去其他地方 • Web page: hypertext document • Website: a collection of closely related web pages Tim Berners-Lee 268 Browsers 用來看 hypertext 的工具 • Presenting the web pages downloaded from the Internet • HTTP (hypertext transfer protocol) • URL (uniform resource locator) 預設是 index.hotml 269 Hyper-Text Markup Language (HTML) 270 HTTP (Hypertext Transfer Protocol) • HTTP is an application layer protocol for WWW • HTML creates hypertexts, and then HTTP transmits hypertext • Usually, HTTP is working with TCP • The web server handling HTTP communications uses port 80 271 HTTP Message Format • HTTP client and server communicate by sending text messages • The client sends a request message to the server and the server, in turn, returns a response message 272 HTTP Request • The request line has the following syntax • The request headers are in the form of name:value pairs 273 HTTP Response • The first line is status line, followed by optional response header(s) • The status line has the following syntax • The response headers are in the form name:value pairs 274 HTTPS (HTTP-Secure) • HTTPS is an extension of HTTP, which is used for secure communication • HTTPS is also referred to as HTTP over TLS or HTTP over SSL SSL/TLS 後面會講 275 PHP and SQL PHP 根據結果輸 出不同的 HTML 276 eXtensible Markup Language (XML) 現在瀏覽器都可秀 XML • Standard style to represent data as text 反而不是拿來做網頁, 而是程式 config 格式 • Restricted mapping each opening to each ending 每個標籤有頭有尾 • <x property=”yyy”> ...... </x> • XHTML 因為是資料, 所以交換順序不影響解讀 • HTML that follows XML format 如果標籤都沿用 HTML 且都有 ending tag, 則稱 XHTML 277 Client-side & Server-side • Client-side 把程式下載到本機端才執行, 減輕 server 負擔 • Server-side 有些東西不想讓 client 知道 • Java applets Java 的 subset • Javascripts 其實跟 Java 沒甚麼關係 • Flash • CGI • Servlets (JSP, ASP) • PHP (Personal Home Page, PHP Hypertext Processor) 278 SMTP (Simple Mail Transfer Protocol) • SMTP is a protocol for forwarding emails (NOT retrieving emails) • The default port number is 25 • Compared to HTTP, SMTP keeps executing instructions and replying during the connection 279 POP3 and IMAP4 • Both POP3 (post office protocol version 3) and IMAP4 (Internet message access protocol version 4) are used to retrieve emails • POP3 downloads emails and then deletes emails on the remote server • Users can offline access the emails because emails have been kept in the local • Default port number is 110 • IMAP4 allows users just read the emails without deleting them • Users need to be online to access emails • Default port number is 143 280 DNS (Domain Name Service) • DNS resolves domain names to IP addresses, as routers only recognizes IP • The default port is 53 together with UDP 282 DNS (Domain Name Service) • DNS is characterized by its distributed coordination service • Multiple name servers work together to resolve domain names • Public DNS providers include OpenDNS (208.67.222.222), Comodo DNS (8.26.56.26), Google (8.8.8.8), and Cloudflare (1.1.1.1) 283 DNS (Domain Name Service) • Resolver (or recursive name server, DNS resolver): a server designed to receive DNS recursive queries from web browsers ICANN 共有 13 個 root server • Authoritative name server: provides actual answer to your DNS queries • It only responds to iterative query • It only returns answers to queries about domain names that are installed in its configuration system • The authoritative DNS servers can be where the website is hosted or where the DNS provider is • Resolvers have a cache but authoritative name servers do NOT have a cache 284 DNS Packet Structure • DNS queries and replies are transmitted via a single UDP packet • Standard UDP DNS query consists of • Header: 16-bit query identifier, selected by querying client and replicated in the response from the server • Query part: domain name • Answer part: • NAME field of variable length • Contain full domain name • 2-byte TYPE field • The type of DNS record • A for standard domain-to-address resolution • NS for information about name server • 2-byte CLASS field • Usually only IN for Internet domains • 4-byte TTL • Specify how long a record will remain valid (in seconds) • 2-byte RDLENGTH • The length of the data segment (in bytes) • RDATA of variable length • The actual record data • For example, RDATA segment of an A record is a 32-bit IP address 285 How DNS Queries Work 要求 A record Recursive name server 連接到 13 台 root server 其中一台, 這裡我們選 b 286 How DNS Queries Work 287 How DNS Queries Work 288 REST API (RESTful API) • REST API (RESTful API) is an application programming interface (API) that conforms to the constraints of REST architectural style and allows for interaction with RESTful web services 289 Security • Attacks • Malware (malicious software) • Virus, worm, Trojan horse, spyware, phishing • Ransomware • Denial of service (DoS) • Spam • Protections • • • • Firewall Spam filter Proxy Antivirus, antispyware 290 Symmetric Cipher and Asymmetric Cipher • AES is a representative of symmetric cipher (symmetric cryptography) • Both sender and receiver are assumed to share a common key • Block size is 16 bytes (128 bits) and key length may have 128/192/256 bits • RSA is a representative of asymmetric cipher (asymmetric cryptography) • Public key is for encryption only and private key is for decryption only • Completely different from our intuitive idea about the cryptography Symmetric cipher Asymmetric cipher 291 AES (Advanced Encryption Standard) • As block size is 128 bits, how AES encrypts a data with 1280 bits? • We resort to mode of operation 用 AES 的 key size 當作量測安全度的指標 • ECB (Electronic Codebook Book), CBC (Cipher Block Chaining), GCM (Galois/Counter Mode) • AES-128-CBC means AES with 128-bit key and CBC • AES-256-GCM means AES with 256-bit key and GCM ECB (只存在於教科書) CBC (漸少使用) GCM (HTTPS 使用) 292 RSA • Most famous one • Developed by Rivest, Shamir, and Adleman • Have both academic and monetary values 293 RSA 1. 2. 3. 4. 與 symmetric cipher 不同之處在於還有所謂的 key generation algorithm Randomly select two primes p and q Compute N = pq Select e such that GCD(e, (p-1)(q-1)) = 1 Compute d such that ed = 1 mod (p-1)(q-1) • Public (encryption) key = (e, N) • Private (decryption) key = (d, p, q) 都是超大的數字在做計算 (至少 1024 bit 的數字) 294 RSA 與 symmetric cipher 不同之處在於還有所謂的 key generation algorithm • How to do encryption with message m? • Ciphertext c = me mod N • How to do decryption with message c? • Plaintext m = cd mod N • Why the above correct? • cd = (me)d = med = m1+s(p-1)(q-1) = m∙ms(p-1)(q-1) = m • Show that cd can recover m successfully 都是超大的數字在做計算 (至少 1024 bit 的數字) 295 296 Hash Function and Message Authentication Code • Hash function has four characteristics below 常常被用來當作指紋, 又叫 fingerprint • Arbitrary-size input, fixed-size output, hard to reverse, and hard to find a collision • MAC (message authentication code) 不要與前面 MAC layer 的 MAC 搞錯, 只是恰好同名 • Can be seen as a keyed hash function • Non-trivial construction because hashing with key easily attracts attack 297 SSL/TLS • SSL (secure socket layer) and TLS (transport layer security) are cryptographic protocols that provides secure communications 298 HTTPS (HTTP Secure) • HTTPS is integration of HTTP and SSL/TLS 299 Why do we need TLS? • TLS gives us three guarantees • Authentication 確定去的網站真的是想要去的網站 (不會被釣魚) • Verify identity of the communicating parties (both clients and servers) • With asymmetric cipher, TLS ensures we will go to authentic website • Confidentiality 傳輸的資料不會被壞人偷看 • Protects data from unauthorized access by encrypting it with symmetric encryption • Integrity 傳輸的資料不會被中途竄改 • Recognize any alteration of data during transmission by checking the MAC 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 https://www.deviantart.com/noneofus/art/Treasure486382200 319