Physics of information ‘Communication in the presence of noise’ C.E. Shannon, Proc. Inst. Radio Eng. (1949) ‘Some informational aspects of visual perception’, F. Attneave, Psych. Rev. (1954) Ori Katz ori.katz@weizmann.ac.il Talk overview • Information capacity of a physical channel • Redundancy, entropy and compression • Connection to biological systems Emphasis concepts, intuitions, and examples A little background • An extension of “A mathematical theory of communications”, (1948). • The basis for information theory field (first use in print of ‘bit’) • Shannon worked for Bell-labs at the time. • His Ph.D thesis: “An algebra for theoretical genetics”, was never published ‘Theseus’ ‘W.C. Fields’ • Built the first juggling machine (‘W.C.Fields’), and a mechanical-mouse with learning capabilities (‘Theseus’) A general communication system ‘message’ Encoder Information source Continuous function s(t) Transmitter s(t) – pressure amplitude Physical Channel Continuous function s(t)+n(t) Receiver ‘message’ Decoder (bandwidth W) Information destination Added noise Shannon’s route for this abstract problem: 1) Encoder codes each message continuous waveform s(t) 2) Sampling theorem: s(t) represented by finite number of samples 3) Geometric representation: samples a point in Euclidean space. 4) Analyze the addition of noise (physical channel) a limit on reliable transmission rate The (Nyquist/Shannon) sampling theorem • Transmitted waveform = a continuous function in time s(t), bandwidth (W) limited by the physical channel: S(f>W)=0 • sample its values at discrete times Δt=1/fs: (fs = sampling frequency) Vn=[s(Δt), s(2 Δt),…] t 2t 3t... Fourier (freq.) domain: • s(t) can be represented exactly by the discrete samples Vn as long as: fs 2W (Nyquist sampling rate) S(f>W)=0 S ( f ) s (t )e i 2ft dt • Result: waveform of duration T, is represented by 2WT numbers = a vector in 2WT-dimensions space: V=[s(1/2W), s(2/2W),… , s(2WT/2W)] An example for Nyquist rate – a music CD • Audible human-ear frequency range: 20Hz - 20KHz • The Nyquist rate is therefore: 2 x 20KHz = 40KHz • CD sampling rate = 44.1KHz, fulfilling Nyquist rate. Anecdotes: • Exact rate was inherited from late 70’s magnetic-tape storage conversion devices. • Long debate between Philips (44,056 samples/sec) and Sony (44,100 samples/sec)... The geometric representation • Each continuous signal s(t) of duration T and bandwidth W, mapped to a point in 2WT-dimension space (coordinates = sampled amplitudes): V = [x1,x2,…, x2WT] = [s(1/2W), …, s(2WT/2W)] In our example: A 1 hour CD recording a single point in a space having: 44,100 x 60sec x 60min = 158.8x106 dimensions (!!) • The norm (distance2) in this space is measures signal power / total energy An Euclidean space metric d2 2TW n 1 2 2 x 2 W s n (t )dt 2W E 2WTP Addition of noise in the channel • Example in a 3-dimensional space (first 3 samples in the CD): V = [x1,x2,…, x2WT] = [s(Δt), s(2Δt), …, s(T)] x3 “mapping” N P P x1 x2 • Addition of white Gaussian (thermal) noise with an average power N smears each point into a sphere cloud with a radii N: • For large T, noise power s(2Δt)+n(2Δt), N (statistical average) VS+N = [s(Δt)+n(Δt), …, s(T)+n(T)] Received point, located on sphere shell: distance = noise N “clouded” sphere of uncertainty becomes rigid The number of distinguishable messages • Reliable transmission: receiver must distinguish between any two different messages, under the given noise conditions x3 N P P x1 x2 • Max number of distinguishable messages (M) the ‘sphere-packing’ problem in 2TW dimensions: accesible volume Volume{Sph ere with a radii P N } P N M sphere volume N Volume{Sph ere with a radii N } 2TW • Longer mapped message, ‘rigid’-er spheres probability to err is as small as one wants (reliable transmission) The channel capacity • Number of distinguishable messages (coded as signals of length T): PN M N 2TW • Number of different distinguishable bits: PN # bits log 2 M TW log 2 N • The reliably transmittable bit-rate (bits per unit time): C Channel bandwidth # bits PN W log 2 T N Signal to Noise Ratio (SNR) P C W log 2 1 (in bits/second) N The celebrated ‘channel capacity theorem’ by Shannon. - Also proved that C can be reached Gaussian white noise = Thermal noise? 2 KT amplitude (pressure) • With no signal, the receiver measures a fluctuating noise • In our example: pressure fluctuations of air molecules impinging on the microphone (thermal energy): P{s=v} time • The statistics of thermal noise is Gaussian: P{s(t)=v} exp(-(m/2KT)v2) • The power spectral-density is constant: (power-spectrum |S(f)|2=const) |S(f)| 2 “white” “pink/brown” frequency Some examples for physical channels Channel capacity limit: P C W log 2 1 (in bits/second) N 1) Speech (e.g. this lecture): W=20KHz, P/N=~1 - 100 C 20,000bps – 130,000bps Actual bit-rate = ~ (2 words/sec) x (5 letters/word) x (5 bits/letter) = 50 bps 2) Visual sensory channel: (Images/sec) x (receptors/image) x (Two eyes) Bandwidth (W) = ~25 x ~50x106 x ~2 = ~2.55x109 Hz P/N > 256 C 2.5x109 x log2(256) = ~20x109 bps A two-hour movie: 2hours x 60min x 60 sec x 20Gbps = 1.4x1014bits = ~15,000 Gbytes (DVD = 4.7Gbyte) • We’re not using the channel capacity redundant information • Simplify processing by compressing signal • Extracting only the essential information (what is essential…?!) Redundant information demonstration (using Matlab) Original sample: 44.1Ks/s x 16bit/s = 705Kbps (CD quality) 16bit 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0 2000 4000 6000 8000 sample number 10000 12000 14000 16000 With only 4bit per sample 44.1Ks/s x 4bit/s = 176.4Kbps 4bit 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0 2000 4000 6000 8000 10000 sample number 12000 14000 16000 With only 3bit per sample 44.1Ks/s x 3bit/s = 132.3Kbps 3bit 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0 2000 4000 6000 8000 10000 sample number 12000 14000 16000 With only 2bit per sample 44.1Ks/s x 2bit/s = 88.2Kbps 2bit 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0 2000 4000 6000 8000 10000 sample number 12000 14000 16000 With only 1bit per sample (!) 44.1Ks/s x 1bit/s = 44.1Kbps 1bit 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 Sounds not-too-good, but the essence is there… -0.8 Main reason: not all of ‘phase-space’ is accessible by mouth/ear -1 0 2000 4000 6000 8000 10000 12000 Another example: (smart) high-compression mp3 algorithm: sample number 14000 16000 @16Kbps Visual redundancy / compression • Images: Redundancies in Attneave’s paper image compression formats - edges - short-range similarities - patterns “a bottle” on “a table” What information is essential?? (evolution…?) (2008) (1954) - repetitions - symmetries 80x50 pixels 400x600 704Kbyte .bmp 30.6Kbyte .jpg - repetitions 10.9Kbyte .jpg - etc, etc…. 8Kbyte .jpg 6.3Kbyte .jpg 5Kbyte .jpg 4Kbyte .jpg • Movies: the same + consecutive images are similar… • Text: future ‘language’ lesson (Lilach & David) How much can we compress? How many bits are needed to code a message? • Intuitively: #bits = log2M (M - possible messages) • Regularities/Lawfulness smaller M • some messages more probable can do better than log2M • Can code a message with: (without loss of information) bits Source p( M i ) log 2 p( M i ) ‘Entropy’ message Mi • Intuition: Can use shorter bit-strings for probable messages. lossless-compression example (entropy code) Example: M=4 possible messages (e.g. tones): ‘A’ (94%), ‘B’ (2%), ‘C’ (2%), ‘D’ (2%) 1) Without compression: 2 bits/message: ‘A’00, ‘B’01, ‘C’10, ‘D’11. 2) A better code: ‘A’0, ‘B’10 , ‘C’110, ‘D’111 <bits/message> = 0.94x1 + 0.02x2 + 2x (0.02x3) = 1.1 bits/msg source entropy p( M i ) log 2 p ( M i ) 0.94 log 2 0.94 3 0.02 log 2 0.02 0.42 Mi Why entropy? bits p( M i ) log 2 p( M i ) message Mi • The only measure that fulfills 4 ‘physical’ requirements: 1. H=0 if P(Mi)=1. 2. A message with P(Mi)=0 does not contribute 3. Maximum entropy for equally distributed messages 4. Addition of two independent messages-spaces: Hx+y = Hx+Hy Any regularity probable patterns lower entropy (redundant information) The speech Vocoder (VOice-CODer) Model the vocal-tract with a small number of parameters. Lawfulness of speech subspace only fails for musical input Used by Skype / Google-talk / GSM (~8-15KBps) The ancestor of modern speech CODECs (COderDECoders): The ‘Human organ’ Link to biological systems • Information is conveyed via. a physical channel: Cell to cell , DNA to cell, Cell to its descendant , Neurons/nerve system • The physical channel: concentrations of molecules (mRNA, ions….) as a function of space and time. • Bandwidth limit: parameters cannot change at an infinite rate (diffusion, chemical reaction timescales…) • Signal to noise: Thermal fluctuations, environment • Major difference: not 100% reliable transmission Model: an overlap of non-rigid uncertainty clouds. • Use channel-capacity theorem at your own risk... Summary • Physical channel Capacity theorem • SNR, bandwidth • Geometrical representation • Entropy as a measure of redundancy • Link to biological systems