Nonlinear Microscopy with Shaped Femtosecond Pulses

advertisement
Physics of information
‘Communication in the presence of noise’
C.E. Shannon, Proc. Inst. Radio Eng. (1949)
‘Some informational aspects of visual
perception’, F. Attneave, Psych. Rev. (1954)
Ori Katz
ori.katz@weizmann.ac.il
Talk overview
• Information capacity of a physical channel
• Redundancy, entropy and compression
• Connection to biological systems
Emphasis concepts, intuitions, and examples
A little background
• An extension of “A mathematical
theory of communications”,
(1948).
• The basis for information theory
field (first use in print of ‘bit’)
• Shannon worked for Bell-labs at
the time.
• His Ph.D thesis: “An algebra for
theoretical genetics”, was never
published
‘Theseus’
‘W.C. Fields’
• Built the first juggling machine
(‘W.C.Fields’), and a
mechanical-mouse with learning
capabilities (‘Theseus’)
A general communication system
‘message’
Encoder
Information
source
Continuous
function s(t)
Transmitter
s(t) – pressure
amplitude
Physical
Channel
Continuous
function
s(t)+n(t)
Receiver
‘message’
Decoder
(bandwidth W)
Information
destination
Added
noise
Shannon’s route for this abstract problem:
1) Encoder codes each message  continuous waveform s(t)
2) Sampling theorem: s(t) represented by finite number of samples
3) Geometric representation: samples  a point in Euclidean space.
4) Analyze the addition of noise (physical channel)
 a limit on reliable transmission rate
The (Nyquist/Shannon) sampling theorem
• Transmitted waveform = a continuous function in time s(t), bandwidth (W)
limited by the physical channel: S(f>W)=0
• sample its values at discrete times Δt=1/fs: (fs = sampling frequency)
Vn=[s(Δt), s(2 Δt),…]
t 2t 3t...
Fourier
(freq.) domain:
• s(t) can be
represented
exactly by the discrete samples Vn as long as:
fs  2W
(Nyquist sampling rate)


S(f>W)=0
S ( f )  s (t )e i 2ft dt
• Result: waveform of duration T, is represented by 2WT
numbers

= a vector in 2WT-dimensions space:
V=[s(1/2W), s(2/2W),… , s(2WT/2W)]
An example for Nyquist rate – a music CD
• Audible human-ear frequency range: 20Hz - 20KHz
• The Nyquist rate is therefore: 2 x 20KHz = 40KHz
• CD sampling rate = 44.1KHz, fulfilling Nyquist rate.
Anecdotes:
• Exact rate was inherited from late 70’s magnetic-tape storage conversion
devices.
• Long debate between Philips (44,056 samples/sec) and Sony (44,100
samples/sec)...
The geometric representation
• Each continuous signal s(t) of duration T and bandwidth W, mapped to
 a point in 2WT-dimension space (coordinates = sampled amplitudes):
V = [x1,x2,…, x2WT] = [s(1/2W), …, s(2WT/2W)]
In our example:
A 1 hour CD recording  a single point in a space having:
44,100 x 60sec x 60min = 158.8x106 dimensions (!!)
• The norm (distance2) in this space is measures signal power / total
energy  An Euclidean space metric
d2 
2TW

n 1

2
2
x

2
W
s
 n
 (t )dt  2W  E  2WTP
Addition of noise in the channel
• Example in a 3-dimensional space (first 3 samples in the CD):
V = [x1,x2,…, x2WT] = [s(Δt), s(2Δt), …, s(T)]
x3
“mapping”
N
P
P
x1
x2
• Addition of white Gaussian (thermal) noise with an average power N
smears each point into a sphere cloud with a radii N:
• For large T,
noise
power  s(2Δt)+n(2Δt),
N (statistical average)
VS+N
= [s(Δt)+n(Δt),
…, s(T)+n(T)]
 Received point, located on sphere shell: distance = noise  N
 “clouded” sphere of uncertainty becomes rigid
The number of distinguishable messages
• Reliable transmission: receiver must distinguish between any two
different messages, under the given noise conditions
x3
N
P
P
x1
x2
• Max number of distinguishable messages (M)  the ‘sphere-packing’
problem in 2TW dimensions:
accesible volume Volume{Sph ere with a radii  P  N }  P  N 

M

 
sphere volume
N 
Volume{Sph ere with a radii  N }

2TW
• Longer mapped message, ‘rigid’-er spheres
 probability to err is as small as one wants (reliable transmission)
The channel capacity
• Number of distinguishable messages (coded as signals of length T):
 PN 

M  
N 

2TW
• Number of different distinguishable bits:
PN
# bits  log 2 M  TW log 2 

 N 
• The reliably transmittable bit-rate (bits per unit time):
C
Channel
bandwidth
# bits
PN
 W log 2 

T
 N 
Signal to Noise Ratio (SNR)
P

C W log 2 1   (in bits/second)
 N
The celebrated ‘channel capacity theorem’ by Shannon.
- Also proved that C can be reached
Gaussian white noise = Thermal noise?
 2  KT
amplitude (pressure)
• With no signal, the receiver measures a fluctuating noise
• In our example: pressure fluctuations of air molecules impinging on the
microphone (thermal energy):
P{s=v}
time
• The statistics of thermal noise is Gaussian: P{s(t)=v}  exp(-(m/2KT)v2)
• The power spectral-density is constant: (power-spectrum |S(f)|2=const)
|S(f)| 2
“white”
“pink/brown”
frequency
Some examples for physical channels
Channel capacity limit:
P

C W log 2 1   (in bits/second)
 N
1) Speech (e.g. this lecture):
W=20KHz, P/N=~1 - 100
 C  20,000bps – 130,000bps
Actual bit-rate = ~ (2 words/sec) x (5 letters/word) x (5 bits/letter) = 50 bps
2) Visual sensory channel:
(Images/sec) x (receptors/image) x (Two eyes)
Bandwidth (W) =
~25
x
~50x106
x
~2
= ~2.55x109 Hz
P/N > 256

C  2.5x109 x log2(256) = ~20x109 bps
A two-hour movie:
 2hours x 60min x 60 sec x 20Gbps = 1.4x1014bits = ~15,000 Gbytes (DVD = 4.7Gbyte)
• We’re not using the channel capacity  redundant information
• Simplify processing by compressing signal
• Extracting only the essential information (what is essential…?!)
Redundant information demonstration (using Matlab)
Original sample: 44.1Ks/s x 16bit/s = 705Kbps (CD quality)
16bit
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0
2000
4000
6000
8000
sample number
10000
12000
14000
16000
With only 4bit per sample
44.1Ks/s x 4bit/s = 176.4Kbps
4bit
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0
2000
4000
6000
8000
10000
sample number
12000
14000
16000
With only 3bit per sample
44.1Ks/s x 3bit/s = 132.3Kbps
3bit
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0
2000
4000
6000
8000
10000
sample number
12000
14000
16000
With only 2bit per sample
44.1Ks/s x 2bit/s = 88.2Kbps
2bit
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0
2000
4000
6000
8000
10000
sample number
12000
14000
16000
With only 1bit per sample (!)
44.1Ks/s x 1bit/s = 44.1Kbps
1bit
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
Sounds not-too-good,
but the essence is there…
-0.8
Main reason:
not all of ‘phase-space’ is accessible by mouth/ear
-1
0
2000
4000
6000
8000
10000
12000
Another example:
(smart)
high-compression
mp3
algorithm:
sample number
14000
16000
@16Kbps
Visual redundancy / compression
• Images: Redundancies in Attneave’s paper  image compression formats
- edges
- short-range
similarities
- patterns
“a bottle” on “a table”
What information
is essential??
(evolution…?) (2008)
(1954)
- repetitions
- symmetries
80x50
pixels
400x600
704Kbyte .bmp
30.6Kbyte .jpg
- repetitions
10.9Kbyte .jpg
- etc, etc….
8Kbyte .jpg
6.3Kbyte .jpg
5Kbyte .jpg
4Kbyte .jpg
• Movies: the same + consecutive images are similar…
• Text: future ‘language’ lesson (Lilach & David)
How much can we compress?
How many bits are needed to code a message?
• Intuitively:
#bits = log2M
(M - possible messages)
• Regularities/Lawfulness  smaller M
• some messages more probable  can do better than log2M
• Can code a message with: (without loss of information)
bits
Source
  p( M i ) log 2 p( M i )
‘Entropy’
message
Mi
• Intuition:
Can use shorter bit-strings for probable messages.
lossless-compression example (entropy
code)
Example: M=4 possible messages (e.g. tones):
‘A’ (94%), ‘B’ (2%), ‘C’ (2%), ‘D’ (2%)
1) Without compression: 2 bits/message:
‘A’00, ‘B’01, ‘C’10, ‘D’11.
2) A better code:
‘A’0, ‘B’10 , ‘C’110, ‘D’111
<bits/message> = 0.94x1 + 0.02x2 + 2x (0.02x3) = 1.1 bits/msg
source  entropy   p( M i ) log 2 p ( M i )  0.94 log 2 0.94  3  0.02 log 2 0.02  0.42
Mi
Why entropy?
bits
  p( M i ) log 2 p( M i )
message
Mi
• The only measure that fulfills 4 ‘physical’ requirements:
1. H=0 if P(Mi)=1.
2. A message with P(Mi)=0 does not contribute
3. Maximum entropy for equally distributed messages
4. Addition of two independent messages-spaces:
Hx+y = Hx+Hy
Any regularity 
probable patterns  lower entropy
(redundant information)
The speech Vocoder (VOice-CODer)
Model the vocal-tract with a small number of parameters.
Lawfulness of speech subspace only  fails for musical input
Used by Skype / Google-talk / GSM (~8-15KBps)
The ancestor of
modern speech
CODECs (COderDECoders):
The ‘Human organ’
Link to biological systems
• Information is conveyed via. a physical channel:
Cell to cell , DNA to cell, Cell to its descendant , Neurons/nerve system
• The physical channel:
concentrations of molecules (mRNA, ions….) as a function of
space and time.
• Bandwidth limit:
parameters cannot change at an infinite rate (diffusion, chemical
reaction timescales…)
• Signal to noise:
Thermal fluctuations, environment
• Major difference: not 100% reliable transmission
 Model: an overlap of non-rigid uncertainty clouds.
• Use channel-capacity theorem at your own risk...
Summary
• Physical channel Capacity theorem
• SNR, bandwidth
• Geometrical representation
• Entropy as a measure of redundancy
• Link to biological systems
Download