16.36: Introduction to Communication Systems Engineering Eytan Modiano Eytan Modiano Slide 1 Timeline of modern communication Analog Comm Systems Digital Comm Systems Networked Comm Systems (packets) Eytan Modiano Slide 2 • • • 1876 - Bell Telephone 1920 - Radio Broadcast 1936 - TV Broadcast • • 1960’s - Digital communications 1965 - First commercial satellite • 1970 - First Internet node Darpa-net, Aloha-net • • 1980 - Development of TCP/IP 1993 - Invention of Web Why communications in AA? • AA Information Initiative – – – • Computers are a vital part of an Aerospace system – – – • Satellite TV, Internet Access Information technology is a critical engineering discipline – Eytan Modiano Slide 3 Control of system, Human interface Involves computers, software, communications, etc. E.g., complex communication networks within spacecraft or aircraft Space communications is a booming industry – • Communications Software and computers Autonomous systems These skills are as fundamental today as the knowledge of basic math or physics Communication Applications • Broadcast TV/Radio – • Digital telephony – • Little new here Wired and wireless Computer communications/networks – Resource sharing Computing: mainframe computer (old days) Printers, peripherals Information, DB access and update – Internet Services Email, FTP, Telnet, Web access • Eytan Modiano Slide 4 Today, the majority of network traffic is for internet applications 16.36 topics • Elements of digital communications – Analog to digital conversion Sampling Quantization – Source coding and compression – Information transmission Modulation/ Detection Channel coding Link design • Introduction to computer networking – 7-layer network architecture Packet communications Multiple access and LANS Eytan Modiano Slide 5 – Introduction to queueing theory – Packet routing in a network – TCP IP and the internet 7 Layer Network Model Application Application Virtual network service Presentation Presentation Virtual session Session Session Virtual link for end to end messages Transport Transport Virtual link for end to end packets Network Network Network Network Virtual link for reliable packets Data link Control DLC DLC phys. int. phys. int. DLC DLC phys. int. phys. int. Data link Control Virtual bit pipe physical interface physical interface Physical link Eytan Modiano Slide 6 External Site subnet node subnet node External site Basic elements of a digital communication system Digital Data in Source encoder Channel encoder modulator Channel Digital Data out • Used to compress the data Introduce “redundancy” to protect against errors due to “noise” Modulation – Eytan Modiano Slide 7 Sampling Quantization Channel coding – • Demodulator Source coding – • Channel decoder A/D conversion – – • Source decoder Represent bits using continuous valued signals suited for transmission Entropy: Information content of a random variable • Random variable X – – Outcome of a random experiment Discrete R.V. takes on values from a finite set of possible outcomes PMF: P(X = y) = Px(y) • How much information is contained in the event X = y? – Will the sun rise today? Revealing the outcome of this experiment provides no information – Will the Celtics win the NBA championship? Since this is unlikely, revealing yes provides more information than revealing no • Eytan Modiano Slide 8 Events that are less likely contain more information than likely events Measure of Information • I(xi) = Amount of information revealed by an outcome X = xi • Desirable properties of I(x): 1. 2. 3. 4. If P(x) = 1 or P(x) = 0, then I(x) = 0 If 0 < P(x) < 1, then I(x) > 0 If P(x) < P(y), then I(x) > I(y) If x and y are independent events then I(x,y) = I(x)+I(y) • Above is satisfied by: I(x) = Log2(1/P(x)) • Base of Log is not critical – Eytan Modiano Slide 9 Base 2 => information measured in bits Entropy • A measure of the information content of a random variable • X ∈ {x1,…,XM} • H(X) = E[I(X)] = ∑P(xi) Log2(1/P(xi)) • Example: Binary experiment – – X = x1 with probability p X = x2 with probability (1-p) – H(X) = pLog2(1/p) + (1-p)Log2(1/(1-p)) = Hb(p) – H(X) is maximized with p=1/2, Hb(1/2) = 1 Not surprising that the result of a binary experiment can be conveyed using one bit Eytan Modiano Slide 10 Bound on entropy • Theorem: Given a random variable with M possible values 0 <= H(X) <= Log2(M) A) H(X) = 0 if and only if P(xi) = 1 for some i B) H(X) = Log2(M) if and only if P(xi) = 1/M for all i • Suppose we conduct an experiment with M possible outcomes, how many bits are needed to represent the outcome of the experiment? – Theorem: The average number of bits required to represent the outcome of an experiment satisfies H (X ) ≤ n ≤ H (X ) + 1 Eytan Modiano Slide 11 Source coding Source Alphabet {a1..aN} • Channel Alphabet {c1..cN} Source symbols – – • Encode Letters of alphabet, ASCII symbols, English dictionary, etc... Quantized voice Channel symbols – In general can have an arbitrary number of channel symbols Typically {0,1} for a binary channel • Objectives of source coding – – Unique decodability Compression Encode the alphabet using the smallest average number of channel symbols Eytan Modiano Slide 12 Huffman codes • Huffman codes are special prefix codes that can be shown to be optimal (minimize average codeword length) H(X) • Huffman codes Shannon/ H(X)+1 Fano codes Given a source with alphabet {a1..ak}, probabilities {p1..pk} Huffman Algorithm: 1) Arrange source letters in decreasing order of probability (p1 ≥ p2 .. ≥ pk) 2) Assign ‘0’ to the last digit of ak and ‘1’ to the last digit of ak-1 3) Combine pk and pk-1 to form a new set of probabilities {p1, p2 ,.., pk-2,(pk-1+ pk)} 4) If left with just one letter then done, otherwise go to step 1 and repeat Eytan Modiano Slide 13 Huffman code example A = {a1,a2,a3, a4, a5} and p = {0.3, 0.25,0.25, 0.1, 0.1} Letter a1 a2 a3 a4 a5 Eytan Modiano Slide 14 1 0.3 0.25 0.25 + 0.2 0 Codeword 11 10 01 001 000 1 0.3 0.25 + 0.45 1 0 0.55 0.45 1 + 0.3 0.25 0.25 0.1 0.1 + a1 a2 a3 a4 a5 1.0 0 0 n = 2 × 0.8 + 3 × 0.2 = 2.2 bits / symbol H ( X ) = ∑ pi log( 1 ) = 2.1855 pi 1 Shannon − Fano codes ⇒ ni = log( ) pi n1 = n2 = n3 = 2, n4 = n5 = 4 ⇒ n = 2.4 bits / symbol < H ( X ) + 1 Lempel-Ziv Source coding • Source statistics are often not known • Most sources are not independent – Letters of alphabet are highly correlated E.g., E often follows I, H often follows G, etc. • One can code “blocks” of letters, but that would require a very large and complex code • Lempel-Ziv Algorithm – – – “Universal code” - works without knowledge of source statistics Parse input file into unique phrases Encode phrases using fixed length codewords Variable to fixed length encoding Eytan Modiano Slide 15 Lempel-Ziv Algorithm • Parse input file into phrases that have not yet appeared – – • Notice that each new phrase must be an older phrase followed by a ‘0’ or a ‘1’ – Eytan Modiano Slide 16 Input phrases into a dictionary Number their location Can encode the new phrase using the dictionary location of the previous phrase followed by the ‘0’ or ‘1’ Lempel-Ziv Example Input: 0010110111000101011110 Parsed phrases: 0, 01, 011, 0111, 00, 010, 1, 01111 Dictionary Loc binary rep 0 0000 1 0001 2 0010 3 0011 4 0100 5 0101 6 0110 7 0111 8 1000 phrase Codeword comment null 0 01 011 0111 00 010 1 01111 0000 0 0001 1 0010 1 0011 1 0001 0 0010 0 0000 1 0100 1 Sent sequence: 00000 00011 00101 00111 00010 00100 00001 01001 Eytan Modiano Slide 17 loc-0 + ‘0’ loc-1 + ‘1’ loc-2 + ‘1’ loc-3 + ‘1’ loc-1 +’0’ loc-2 + ‘0’ loc-0 + ‘1’ loc-4 + ‘1’ Notes about Lempel-Ziv • Decoder can uniquely decode the sent sequence • Algorithm clearly inefficient for short sequences (input data) • Code rate approaches the source entropy for large sequences • Dictionary size must be chosen in advance so that the length of the codeword can be established • Lempel-Ziv is widely used for encoding binary/text files – – Eytan Modiano Slide 18 Compress/uncompress under unix Similar compression software for PCs and MACs