Estimation of Sub-Micrometer Translations of a

Estimation of Sub-Micrometer Translations of a
Rigid Body Using Light Microscopy
by
Charles Quentin Davis
S.B. Electrical Engineering, MIT (1991)
Submitted to the Department of Electrical Engineering and
Computer Science
in partial fulfillment of the requirements for the Degrees of
Electrical Engineer
and
Master of Science in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
May 1994
© Massachusetts Institute of Technology 1994. All rights reserved.
-
Author...................-....
Department of Electrical Engineering and Computer Science
.-..
by.... .,......
Certified
Cn
A ----
4- -
1-Ucpuflu
1
1_
y ....
..
-...
I
X.. .. ...
(AA
.
,-
May
18,
1994
..........
........
Dennis M. Freeman
Research Scientist
Thesis Supervisor
-
Frederic R. Morgenthaler
Chairman,
ttee on Graduate Students
Estimation of Sub-Micrometer Translations of a Rigid
Body Using Light Microscopy
by
Charles Quentin Davis
Submitted to the Department of Electrical Engineering and Computer Science
on May 18, 1994, in partial fulfillment of the
requirements for the Degrees of
Electrical Engineer
and
Master of Science in Electrical Engineering and Computer Science
Abstract
An algorithm based on optical flow (Horn and Schunck, 1981; Horn and Weldon,
Jr., 1988) is developed. The new algorithm eliminates the linear component of the
bias in optical flow's motion estimate. A confidence statistic for the new algorithm
is also developed. Statistical performance parameters (bias and standard deviation)
are determined from simulations for both the original optical flow algorithm and the
bias-compensated algorithm as a function of the distance the target moved and the
pictures' signal-to-noise ratio, frequency content, and size. Results of the simulations
are verified by using video images of a scene whose motion is independently known.
Our target was a 4-5 pixel diameter bead which was dark on a light, uniform
background. The image of the bead had non-zero spectral content at all spatial
frequencies not aliased by the camera. Under typical conditions, the bias in both
algorithms is greater than the standard deviation, except for very small motions
(< 0.035 pixels). For motions less than 0.1 pixels, both algorithms had very similar
performance. In contrast, for motions greater than 0.1 pixels the bias-compensated
algorithm was superior. The bias in the standard optical flow algorithm is unbounded,
becoming large (> 0.2 pixels) for displacements greater than 2 pixels. For the biascompensated algorithm, estimates that satisfy the confidence statistic are always less
than 0.02 pixels.
A system to measure sound-induced motions of inner ear structures has been developed. The system consists of a scientific grade CCD camera, a light microscope,
and a stroboscopic illumination system. Performance of the system was verified using
a moving target whose motion is independently known. Results suggest that estimations from one set of measurements of sinusoidal motion have a standard deviation
of about 5 nm and a bias of about 9 nm. Both errors can be reduced by averaging
the individual displacement estimates, resulting in a total error of 14nm//,
where
n is the number of averages.
An in vitro preparation for measuring the motion of inner ear structures of the
alligator lizard was developed. The excised cochlear duct is held in place between two
fluid-filled reservoirs by a stainless-steel ring that provides a hydrodynamic seal between the reservoirs. A hydrodynamic stimulator generates pressures across the basilar membrane that mimic those generated in situ by the middle ear. The stimulator
can generate pressures of at least 100 dB SPL over the frequency range 0.02-20 kHz.
Results for one preliminary experiment are shown. Motions for six locations are
analyzed in six planes: through the sensory epithelium, the middle and tips of the
hair bundles, and three through the tectorial membrane. Results indicate that the
tectorial membrane does not move as a rigid body.
Thesis Supervisor: Dennis M. Freeman
Title: Research Scientist
Acknowledgments
First of all, I would like to thank my advisor Denny for all his help in the both the
research and the writing, as well as just being a friend. I would also like to thank Zeiss
corporation for showing me just how painful it can be to deal with an incompetent
company. Finally, I appreciate my wife's futile attempt to get all the we's, out of my
thesis, and her last-minute proofreading was definitely needed.
And to my parents, thanks for your support. I'm done, but is Mr. Hester direct
yet?
4
First answer my questions, then drink however much you will.
Ask.
What is quicker than the wind?
Thought.
What can cover the earth?
Darkness.
Who are more numerous, the living or the dead?
The living, because the dead are no longer.
Give me an ezample of space.
My two hands as one.
An an eample of grief.
Ignorance.
Of poison.
Desire.
An ezample of defeat.
Victory.
Which came first, day or night?
Day, but it was only a day ahead.
What is the cause of the tworld?
Love.
What is your opposite?
Myself.
What is madness?
A forgotten way.
And revolt, why do men revolt?
To find beauty, either in life or in death.
And what for each of us is inevitable?
Happiness.
And what is the greatest wonder?
Each day death strikes and we live as though we were immortal.
You answer well, I am your father Dharma. I came to test your merit,
and I have found it true, and I return your brothers to life.
- Mahabharata
5
Contents
8
1 Introduction
2 Compensation for Systematic Errors in Motion Estimates Based on
Optical Flow
9
2.1
9
Abstract.
2.2 Introduction ........................
2.3
2.4
10
Methods ..........................
........
11
2.3.1
Notation.
........
11
2.3.2
Optical flow.
........
11
2.3.3
Recursive optical flow ...............
. . . . . . . .
14
2.3.4
Simulation.
. . . . . . . *.
19
2.3.5
Measurements.
. . . . . . . .
20
. . . . . . . .
22
Results.
2.4.1
2.5
. . . . . . . *.
....................
..................
..........................
Performance of optical flow versus recursive opti,cal flow
. ...
2.4.2 Parameters affecting ROF's performance.....
. . . . . . . .
2.4.3
........
Simulation versus measurements .........
22
22
......29
........ .. 30
Discussion .........................
2.5.1
Abnormal terminations to recursive optical flow
. . . . . . .
30
2.5.2
Effect of signal-to-noise ratio ...........
. . . . . . .
31
2.5.3
Effect of low pass filtering ............
. . . . . . .
32
2.5.4
Effect of cropping .................
. . . . . . .
32
2.5.5
Simulation versus measurements .........
. . . . . . .
32
6
3 Direct observations of inner-ear micromechanics: sensory epithelium, hair bundles, and tectorial membrane
3.1
Abstract.
3.2 Introduction.
3.3
.....................
Methods ........................
3.3.1
Hardware.
3.3.2
Motion detection algorithm.
3.3.3
Verification of the motion detection system .
3.3.4 Preparation ..................
3.4
3.5
Results.
........................
3.4.1
System verification.
3.4.2
Preliminary results of lizard ear motions
Discussion .......................
. .
........
........
........
........
........
........
........
........
........
........
........
34
. .34
. .35
. .36
. .36
. .37
. .39
. .39
. .42
. .42
. .44
. .52
3.5.1
Performance characteristics of motion measurement system
52
3.5.2
Comparison with other motion detection methods .....
53
3.5.3
Cochlear mechanics of lizard .................
55
3.5.4
Implications for micromechanics ...............
57
7
Chapter 1
Introduction
In order to expedite the publishing of this work in peer-reviewed journals, the results are presented in the form of two papers.
Thus the remaining chapters are
self-contained. The reader's assumed background is different for the two chapters,
owing to the different audiences reached by the different journals. The first paper,
"Compensation for systematic errors in motion estimates based on optical flow," is
being prepared for submission to the IEEE Transactions on Pattern Analysis and
Machine Intelligence. The second paper, "Direct observations of inner-ear micromechanics: sensory epithelium, hair bundles, and tectorial membrane," is being prepared
for submission to Hearing Research.
8
Chapter 2
Compensation for Systematic
Errors in Motion Estimates Based
on Optical Flow
2.1
Abstract
An algorithm based on optical flow (Horn and Schunck, 1981; Horn and Weldon,
Jr., 1988) is developed. The new algorithm eliminates the linear component of the
bias in optical flow's motion estimate. A confidence statistic for the new algorithm
is also developed. Statistical performance parameters (bias and standard deviation)
are determined from simulations for both the original optical flow algorithm and the
bias-compensated algorithm as a function of the distance the target moved and the
pictures' signal-to-noise ratio, frequency content, and size. Results of the simulations
are verified by using video images of a scene whose motion is independently known.
Our target was a 4-5 pixel diameter bead which was dark on a light, uniform
background.
The image of the bead had non-zero spectral content at all spatial
frequencies not aliased by the camera. Under typical conditions, the bias in both
algorithms is greater than the standard deviation, except for very small motions
(< 0.035 pixels). For motions less than 0.1 pixels, both algorithms had very similar
performance. In contrast, for motions greater than 0.1 pixels the bias-compensated
9
algorithm was superior. The bias in the standard optical flow algorithm is unbounded,
becoming large (> 0.2 pixels) for displacements greater than 2 pixels. For the biascompensated algorithm, estimates that satisfy the confidence statistic are always less
than 0.02 pixels.
2.2
Introduction
For a project in hearing research, measuring the very small motions of inner-ear structures is desired. Even after magnification with a microscope, the resulting displacements are sub-pixel. In order to measure these motions, several different algorithms
were examined. An algorithm based on optical flow (Horn and Schunck, 1981) seemed
attractive because of its computational efficiency and its ability to use information
from all the pixels. In addition some of the inner-ear structures whose motions are
desired are very low-contrast, preventing the use of feature-based detection systems.
Nevertheless, optical flow algorithms have some serious limitations:
Aggarwal
and Nandhakumar 1988 states as one of their conclusions that the optic flow based
approach to motion estimation has three major drawbacks: "1) it is highly noise sensitive due to its dependence on spatio-temporal gradients, 2) it requires that motion
be smooth and small thus requiring a high rate of image acquision, and 3) it requires
that motion vary continuously over the image."
We present a solution to the second problem stated by Aggarwal and Nandhakumar. By compensating for the systematic errors found in optical flow, the algorithm
presented extends the optical flow's dynamic range by a factor of 10. With this compensation, larger displacements can be estimated so the rate of image acquisition can
be reduced by a factor of 10.
10
2.3
Methods
2.3.1
Notation
Before proceeding, we first define the notation used. Define A, B, and C to be sets
whose elements are arbitrary. The statement f:
A C C -- B defines a function f
whose domain is A, which is a subset of C, and whose range is B. If a represents
an element of A (a E A), then f(a) is an element of B (f(a) E B). Two functions,
f: A -
B and g: A -
B, are said to be equal (f = g) if f(a) = g(a) for all a E A.
The set A x B is the set of all ordered pairs (a, b) for which a E A and b E B. We
use R to denote the set of all real numbers and Z to denote the set of all integers.
2.3.2
Optical flow
The optical flow equations are easiest to derive for continuous space and time, where
derivatives and velocities can be calculated.
Following Horn and Schunk (1981),
we define a differentiable function E which represents the brightness pattern on a
camera: E : S x T C 3
-
brightness values. The set S E
? which maps space (,y)
2
and time (t) into scalar
represents all of the points where information on
the brightness function in space is available; the set T E ?, in time. The shape of
the brightness function E is assumed not to change with time; it can only translate
in space. This assumption, coined "the constant brightness assumption" by Horn
and Schunk (1981), can be expressed mathematically as the following: there exists a
function (X, b) : T x T - V2, mapping two time variables into two position variables,
such that for every (to,tl) E T xT the brightness at (xl +x(to, tl), yl +(to, tl), tl +to)
is the same as the brightness at (xl,y l ,tl) for all (xl,yl) E S:
E(xl + X(to,tl), Yi + (to,t),tl + to) = E(xl, yl,tl).
By the definition of the derivative,
11
(2.1)
lim
(x0(tot0),tot),t o),
)
E(xl + X(to,t),yl + O(toj),l), + to)- E(xl, y, tl) Et(xl,y 1,tl)to -+ 0
E(xl,yl, tl)x(to, tl)- Ey(xl,yl, tl)i(to, t)-
where Ex, Ey, Ez are the partial derivatives of E. Since E is continuous, (X, 0b)tends
to (0,0) as to tends to 0. After inserting Equation 2.1 into the above expression, we
obtain
Ex(xl, Y1,t)X(to, tl) + EZ(xi, y, t)b(to, tl) + Et(xl, y, t l)to -- 0
as to -+ 0.
(2.2)
We now define the velocity in the x direction u : T -- R as limto-o X(to, t l )/to and
the velocity in the y direction v: T
-+
R as limto-o /(to, tl)/to.
Equation 2.2 can
then be simplified to give
Exu + Eyv + Et = 0.
At a particular time t E T and position (x1 ,yl) E S, this equation provides one
constraint on the two unknowns (u(tl), v(ti)). Assuming there is information at more
than two points, the above equation can be solved for (u(ti), v(tl)).
Because data
taken with a camera is corrupted by noise, the constraint equations from more than
two points are usually inconsistent; the equations are solved using least squares by
minimizing the sum of the squares of errors
:S xT -
R (Horn and Weldon, Jr.,
1988)
Eu + Eyv + Et = e
(2.3)
to give
U=
f EE,
Edxdy
dxdy
V
f f EEy dxdy f f EEy dxdy
EEddxdxdy
I
EE
f f EyEtdxdy
(2.4)
where the integrals are over S and both sides are functions only of time.
Also following Horn and Schunk (1981), Equation 2.4 is then made discrete by
changing the continuous functions to discrete functions, the integrals to summations,
and the partial derivatives of brightness to first differences.
12
Let G: C x A C Z 3 -- Z represent the gray (brightness) value from the camera,
where C is the set of two dimensional pixel coordinates (i,j), and JV is the set of
picture numbers (k).
transformation
Janesick et al. (1987) provide a detailed discussion on the
between E and G. Roughly, the spatial variables of E are made
discrete into pixel coordinates in G; the temporal variable of E is made discrete into
the picture number in G; and the value of E is quantized to become the gray value
in G.
The partial derivatives of E are changed to first differences of G. In order to
estimate the three partial derivatives at the same point, these derivatives are the
average of four first differences:
1 j+l k+l
E. Gi =4-
k'=k
1
k+1
Ey , Gj = Et
Gk =
E
4 j=j
i+1
i'=i k'=k
1 i+1 j+1
i=i j=j
G(i+ 1,j',k') - G(i,j',k')
G(i',j + 1, k') - G(i',j,k')
G(i',j',k + 1)- G(i',j', k)
The two desired velocities (u, v) become displacements (, :y) E R2 by approximating
(u, v) as (/St,
/St) where
St =
1 picture. Finally, the discrete form of Equation 2.4
can be written as
1x
Y2
1GiGj
GiGi
EGiGj EEGjGj
GiG
E EGjGk
where the summations are over C. Hereafter, we refer to this equation as "the optical
flow algorithm" or just "optical flow."
The optical flow algorithm estimates displacements between two pictures of image
data; the estimated displacements are in terms of pixels. Nevertheless, the desired
displacements are those in the scene. In our particular case, video microscopy, translations in the scene are very simply related to translations in the image they produce
on the camera. Specimen translations in the plane perpendicular to the microscope's
13
optic axis (the xy plane), equal the translations in the image divided by the microscope's magnification M. In the more typical case of a video camera looking through a
lens at the macroscopic world, the relationship between scene translations and image
translations is more complex (Horn, 1986).
2.3.3
Recursive optical flow
As will be shown in the Results, optical flow systematically underestimates the magnitude of large displacements. We refer to the systematic error as bias. In principle,
one could use optical flow to generate more accurate estimates of large displacements
by pre-shifting the pictures to decrease the displacement estimated by the algorithm.
Recursive optical flow uses a search algorithm to implement such a shifting procedure.
The search procedure locates shifts of the images for which the estimated
displacements is smaller than one pixel.
Estimates based on optical flow are biased even when the displacements are subpixel. However, for sub-pixel displacements, the bias function has a predictable structure. Recursive optical flow exploits that structure. Optical flow estimates are generated for three different shifts of the images, all chosen to minimize the magnitudes
of the resulting estimates. These three estimates are then combined to eliminate the
linear component of the bias.
Lastly, recursive optical flow computes a confidence statistic.
Each optical flow
estimate is based on a least-mean-square solution to equations that are inconsistent
because of noise. The confidence statistic is based on the extent to which the estimated motion reduces the mean square error. The computed confidence statistic can
be used to make later signal processing stages more robust against occasional errors
in motion estimation.
Shifting the images
To determine how to shift the images to reduce the estimated displacement, we take
advantage of the fact that when optical flow works at all, the estimate's bias is
less than the true displacement. The important implication of this fact is that the
14
direction of the displacement estimate based on optical flow is correct. Therefore, we
can use optical flow to determine the best shifts. In other words, optical flow just
has to estimate the direction of motion correctly in order for recursive optical flow to
obtain a good estimate of both the motion's magnitude and direction.
First, the optical flow algorithm is used to obtain estimates il and Y, of the
displacement between the scenes in picture one, G(i, j, kl), and picture two, G(i, j, k2 ).
If the estimate indicates motion in the positive/negative x direction, we shift the
second image one pixel in the negative/positive x direction. Similarly, if the estimate
indicates motion in the positive/negative y direction, we shift the second image one
pixel in the negative/positive y direction. We denote these shifts as xoff2 = Xl/liiI
and Yoff2
= lll1
Optical flow is then used to estimate the displacement between picture one,
G(i,j, kl), and the shifted picture two, G(i - Xoff2,j- Yoff
2, k2 ). We refer to the
result (2,Y 2 ) as the "local estimate" of displacement at (xoff2,yoff2 ).
It can be
used to compute a new estimate of the displacement between the original pictures,
X2 =
2 + Xoff2 and Y2 = Y2 + yoff2. However, the whole process can be repeated to
generate possibly better estimates. New offsets are computed, Xoff3= Xoff2 + x2 /x 2
and Yoff3
=
Yoff2
+ Y2/1Y21along with new local estimates
(x3,y3). The process is
repeated until both local estimates xn and Sn change signs in the last two iterations.
We think about the search procedure described above as searching over a (Xoff,yoff)
space, associating local estimates (,)
with each of these points.
proceeds, the magnitude of the local displacement estimate
As the search
k 2 + 2, tends to decrease
and the bias in the optical flow estimate similarly decreases.
Elimination of the linear component of bias
In this section we show how optical flow estimates from differently shifted images can
be used to eliminate the linear component of bias. Our method exploits structure
in the bias found in optical flow estimates of small displacements. Therefore, we
apply this method for estimates based on the optimum shift found in the previous
section as follows. Define the estimates and offsets of the last two iterations to
15
be (a,
a,),
(i,
yp)
and (Xoffayoffa), (Xoff, yoffp), respectively.
oriented diagonally in (Xoff,yff) space because Xoffp= Xoff,
These offsets are
1 and yffp = yoff,
1.
Compute estimates from the other diagonal such that the four offsets form a square:
i.e. calculate the estimates (,
kr) from the offsets (xoff, yoff,) = (Xoffa, yffp) and the
estimates (s, Y6)from the offsets (Xoffs,yff 6 ) = (xoffp,yoffa). Then choose the three
estimates whose distances (
2
+ k2) are smallest and use them in the estimation
algorithm. Since optical flow is unbiased when there is no motion, the three smallest
are chosen under the assumption that their biases are the smallest.
The estimation algorithm uses the three estimates to eliminate the linear component of the bias as follows1 . The optical flow estimates can be considered to be a
differentiable function (, j) : M C R2 __+R2, mapping a two dimensional space of
possible motions (x, y) E M to a two dimensional space of corresponding estimates.
As will be shown in the Results, optical flow gives an unbiased estimate when there
is no motion: (,
)(0,0) = (0, 0). In contrast, for nonzero motions (ms, my), optical
flow does not give the correct answer: (, k)(mx, my)- (m, my)
a non-constant function of the motion, (,
)(mX-
O0.Since the bias is
Xoff,my - yoff)-(m
is a non-constant function of the offsets (Xoff,yff).
-Xoff, my - yff)
The bias is generally closest to
zero when (mX - Xoff, my - yoff) is closest to (0,0). By exploiting additional structure
in the bias, we can combine information from multiple sets of optical flow estimates
to develop an estimate that is superior to that obtained from any shift of an integer
number of pixels.
Five properties of optical flow are used in the formulation recursive optical flow's
motion estimate:
(xy)(O,O) = (, 0)
r(m -x', 0)
0,
O V x 's.t.
(, my-y')
0, V y's.t. 0< my-Y'l <
0<
m - x'
<1
1For simplicity, we ignore noise in the optical flow estimates (, y). The argument for the noisy
case is similar except that the expected values of the now stochastic functions must be used.
16
',m -y') >
(m-
,Y
(m
a
- ',my- y'),
Y/
-
x',my- y)
<
Y(m -
Im - ' <1
',y' s.t.
Y/
ay~~Imp,1
x'I <1
Vx',y' s.t.
,my-y'),
where V stands for "for all", and s.t. stands for "such that".
j
Imy -y'
<1
<
1
The first and second
property state that when the y displacement is 0 and the x displacement is less than
1, the x estimate is nonzero except at x' = mx. The first and third properties make
the analogous statement for the y estimate. The fourth properties states that optical
flow's x estimate is more sensitive to motions in the x direction than to motions
in the y direction when the motion is less than 1 pixel. The fifth is the analogous
property of the y estimate.
The first property defines our goal: if a unique point
exists (c, d) E M such that (,
)(mx - c, my - d) = (0,0) then (mx, my) = (c, d).
The other four properties ensure that a unique point (c, d) does exist. Unfortunately,
the ordered pairs (x', y') used in optical flow are constrained to be integer offsets of the
pictures. In order to estimate the desired values (c, d), the optical flow estimates at
the three given offsets are linearly interpolated to give (, y) ~ (c, d). In other words,
(k, k)(mx - x', my - y') is approximated by two planes-defined
estimates at the three given offsets-and
by the optical flow
the point where those two planes intersect
the plane z = 0 is the desired estimate (,
Y)
-
(c,d). By the fourth and fifth
properties from above, the plane approximations to (, k)(m. - x', my - y') intersect
in a line. The stopping criterion constrains this line of intersection to cross the plane
z = 0 by making the optical flow estimates change sign for the different offsets.
Consequently, the final estimate (x,Y), which equals this point of intersection, is
unique.
The second and third properties are required to show that (c, d) is also
unique, but this statement will not be proven here.
In summary, if the final three sets of estimates and offsets are (xi,Y i, xffi,yoffi)
17
for i = {q, r, s}, then the equation of the x bias plane is determined from
X - Xoffq Y - yoffq z - Xq
Xoffr -
Xoffq
Yoffr- Yoffq Xr - Xq
Xoffs -
Xoffq
Yoff -
Yoffq Xs -
= 0.
Xq
The above equation states that the volume (triple scalar product) of the parallelepiped
formed by the three row vectors is 0; i.e. the vectors lie in the same plane. A similar
equation is found for the y bias plane. The system of three linear equations and three
unknowns comprised of the bias equations and the equation z = 0 are then solved
using Cramer's rule to give (, Y, 0). The total motion estimate X and y thus equals
x + x0ff and
+ yoff, respectively.
Confidence statistic
The recursive optical flow (ROF) algorithm computes a confidence statistic along with
the motion estimates. The statistic uses the ratio of the mean square error per pixel
(¢2 from Equation 2.3) given the motion estimates (, y) to the mean square error
per pixel assuming (,j)
equal (0,0). This ratio's value is near 1 when the motion
estimates do not significantly reduce the mean square error. In contrast, the ratio's
value can be very close to 0 when the motion estimates reduce the mean square error.
This ratio is computed at the offset not used from the four offsets calculated at the
end of the search algorithm. This offset had the largest estimates of the four, yet it
is still a reasonable estimate of the motion. One of the other three offsets is not used
because if the motion was very close to a integer number of pixels, the ratio of the
error assuming the estimated motion to the error assuming no motion would be very
close to 1 for the estimate whose offsets equaled the motion. Consequently, the use
of the fourth offset should provide the best "contrast" between correct estimates and
poor estimates. A threshold of 0.65 has been empirically determined from a variety
of simulations and measurements. Estimates whose confidence static is greater than
or equal to 0.65 are deemed to be poor and are not used. Unlike the results for the
optical flow algorithm where the data from every trial is presented, the only results
18
presented for the ROF algorithm are those which had a confidence static less than
0.65.
The recursive optical flow (ROF) algorithm has two abnormal termination situations. First, it only allows a maximum of 16 search iterations. If the 16th iteration
is reached, the search stops reports an error message. Nevertheless, the algorithm re-
ports the last optical flowestimate (n + Xoffn,
statistic.
Tn
+ Xoffn) and adds 1 to the confidence
Second, ROF terminates if the number of pixels left in the least squares
minimization is less than 100. As has been determined empirically, below 100 pixels
the confidence statistic becomes less reliable2 . This reliability reduction is probably
due to the breakdown of assumptions about the noise processes present such as the
law of large numbers. In this case, ROF reports an error message along with the optical flow estimate (n + Xoffn,~n + Xoffn) and adds 2 to the confidence statistic. The
linear bias correction was not performed in either case because the stopping criterion
was not met. Because optical flow's estimates are nonlinearly related to the motion,
the estimate planes may find the wrong zero crossing.
2.3.4
Simulation
Most of the data in this paper are from simulations. The simulations allow parametric
studies to be done with the confidence that only the desired parameters change.
The simulation involved first finding suitable simulated images with which to
test the algorithm. The simulated data attempted to match the space and frequency
content in an image of a 0.3 #m diameter TiO2 microsphere (hereafter called a "bead")
seen through a microscope (Zeiss Axioplan, New York) with a 40x, 0.75 NA objective
and a total magnification of 100x (See Figure 2-1). A radially symmetric hanning
window of proper darkness and diameter was selected to represent the bead on a
uniform, bright background. After shot (Poisson) noise, simulating photon noise, is
added to both the original and shifted pictures, the data are processed by the motion
detection algorithm3 . Each data point represents the average of 100 simulations.
2
3
Typically, at least 1000 pixels are used in the estimates.
At typical light levels, photon noise is by far the dominant noise source.
19
2.3.5
Measurements
In order to verify that the simulations well represent pictures taken with our camera
(Photometrics 200 series, Tucson, AZ), measurements were also performed.
The specimen used in the measurements is a 0.3 #m diameter TiO 2 bead mounted
between a pair of cover slips using the mounting adhesive Mount Quick (Electron Microscopy Sciences) (See Figure 2-1). A piezoelectric stack (11 grams) is attached on
one side to an aluminum block (188 grams) and on the other side to a copper disk to
which the specimen (0.18 grams) is glued perpendicularly (Patil, 1989). Because of
the relative masses, motions of the piezo only moves the specimen. The distance the
specimen moved was measured using a detector (Angstrom Resolver, Opto Acoustics
Sensors, Raleigh, NC) that senses the distance between the tip of its fiber-optic probe
and a reflecting surface using an optic lever technique (Cook and Hamm, 1979). The
fiber-optic probe is mounted to the aluminum block and looks through the piezoelectric stack to the copper disk. The specimen-piezo system is positioned so that
displacements of the stack cause motions of the image in the x direction. The fiberoptic probe detects motions in the x direction, but gives no estimate of motions in
the orthogonal directions.
The output voltage of the fiber-optic detector is linearly related to motions of the
target bead over the small range of displacements generated by the stack, but its
sensitivity (m/volt)
was not known. The sensitivity was therefore calibrated using
the video microscope system. Images of the bead for zero motion and for maximal
motion were compared.
In order to better see the displacement of the bead, the
pictures were upsampled by a factor of 10 by inserting zeros between the data points
and low pass filtering the resulting picture. The pictures were then upsampled by
an additional factor of 7 using bilinear interpolation.
The pictures were displayed
in rapid succession to make the motion more apparent.
In order to estimate the
displacement, one of the pictures was shifted by an integral number of pixels. When
20
the apparent motion between the pictures was minimized, the amount of shift applied
to the picture was interpreted as the displacement. This process was repeated for 9
sets of data, and the average displacement was 2.26 pixels, with a standard error of
0.004 pixels (0.2% of the average).
I
I
I
I
I
2500 -
I
I
_
= 2000 -
.
1000 -
I11
i1
Co
1500
-
'100
i
4 n
-
E5
tfVV
I
-
I
0
'
x
2.
!
_
_
I
I
I
I
I
8
16
24
32
-i
x pixel number
r
I
-
'
I
10000 1
-x/2
I
I
0
l/2
I
x spatial frequency
Figure 2-1: Comparison of simulated (solid lines) and measured (dotted lines) TiO 2
bead. The left panel displays the gray values along a line which runs through the
center of the bead in the 32x32 pixel pictures. Since the beads are rotationally
symmetric, the other lines through the center of the bead look similar, with the
exception that the noise is different. The right panel displays data from the 2D
DFT of the pictures used in the left panel. The panel plots the magnitude of the
x spatial frequency while the y spatial frequency is held at 0 (DC). The measured
bead picture has been given the same DC value as the simulated picture for ease of
comparison. The dashed line represents the typical noise level from the shot noise
in the simulations. On the magnitude plot, the point off the plot's scale plot is the
DC value of the pictures, which equals 32 x 32 x 2200 = 2 x 106. The simulated and
measured beads have very similar spatial and frequency content.
Quasi-static tests are done as follows. A first picture is taken with the specimen
in one fixed position, and the voltage from the optic-lever probe is recorded. Then
the specimen is moved by the piezoelectric stack. A second picture is taken and the
probe's new voltage is recorded. The two pictures are normalized to have the same
average gray value, and then a background picture previously taken is subtracted from
them. The background picture is taken with the bead not visible, and it subtracted
from the other two pictures with the intention of obtaining pictures of beads on a
uniform background. These two background-subtracted pictures are processed by the
motion detection algorithms and compared to the simulations.
21
2.4
2.4.1
Results
Performance of optical flow versus recursive optical
flow
Figure 2-2 shows typical results for both the optical flow and recursive optical flow
algorithms. For motions less than 0.1 pixels (2% of the bead diameter), the algorithms
have the nearly same performance. In contrast, for motions greater than 0.1 pixels,
the ROF algorithm has much less bias than the optical flow algorithm.
The bias in optical flow becomes poor for motions greater than 1.5 pixels. For
motions greater than about 4.4 pixels, optical flow does not consistently identify the
correction direction of the motion.
In contrast, ROF's bias remains small even out to 8 pixels (two bead diameters).
For motions less than 4.4 pixels, ROF always found the bead; i.e., the confidence
statistic was not used. For motions greater than 4.4 pixels, where even the direction
of the optical flow estimate was often incorrect, ROF only sometimes found the bead.
Nevertheless, the performance statistics remain good past 4.4 pixels because the confidence statistic eliminated the trials in which ROF did not find the bead. The bias
in ROF is periodic with a period of 1 pixel.
2.4.2
Parameters affecting ROF's performance.
Factors, such as signal-to-noise ratio, low pass filtering, and cropping, affect the
performance of both the optical flow algorithm and the ROF algorithm. Even though
ROF is based on optical flow, these factors do not necessarily affect the algorithms
equally. For example, if a factor linearly changes (for better or worse) the bias in
optical flow, the bias in ROF should not change because the linear component of the
bias is eliminated.
22
I
m.
I
. .s...
s.
II
.
i 11
sai
.a 11
- A
,m
i, 1
1 11
-A
111
!
A
Mn
A
0.1
I
A
-
.)
. -0.1_ _
"--U. ,.
I
1 ' ""1
' '
0.01
0.1
l
I
I
@\MY~~~
"_'
1
-
Is
U..A.I1
10
I';,',
I .I'
0I I,
,,
h
0.02-
--
0.01 -
a"'l ' ' ""1 1
1
J S
-_ 0.03 0
.X
-
_
1
l
A
U.{,4
II
I I
I I
I I
l I
I
0.01
111111''"'1
' ' ''111111
1 11111111'
' '""i
0.1
1
10
x motion (pixels)
x motion (pixels)
Figure 2-2: Optical flow (dashed line) and recursive optical flow's (solid line) performance statistics versus the distance the image moved: simulation. A simulated bead
image was generated and shifted in the x direction, and the resulting pictures were
analyzed by both algorithms. The bias (left panel) and standard deviation (right
panel) are for the estimate in the x direction. Bias is calculated by the sample mean
of the estimated motion minus the distance moved. Standard deviation (s.d.) is the
square root of the sample variance (See e.g., (Drake, 1988)). Both the bias and the
standard deviation have to be small for a good estimator. The simulated light level
in the pictures represented the typical light level in pictures taken with our camera,
resulting in a 50 dB signal-to-noise (SNR) ratio. There was no low pass filtering
performed on the pictures, which were 32 pixels square.
23
Signal-to-noise ratio
The dominant noise source at the light levels being used is shot noise from the quantum nature of light. Consequently, we assume that all of the noise in the pictures
is Poisson (shot), whose variance equals its mean.
Signal-to-noise ratio (SNR) is
measured in dB and is defined to be
SNR = 20 x log(/#
of photons striking the CCD's pixel)
The number of photons striking the CCD is related to the number of hole-electron
pairs created in the pixel by a Bernoulli process. The Bernoulli process's probability of
hole-electron generation is called the quantum efficiency of the CCD (Janesick et al.,
1987). A Bernoulli selection of a Poisson process is Poisson (Drake, 1988); therefore,
the signal-to-noise ratio can also be stated in terms of the number of electrons in the
CCD's pixel. After factoring in our CCD camera's conversion factor of 46.1 electrons
per gray value (Photometrics, 1993), the signal-to-noise ratio can be restated as
SNR = 10 x log(gray value x 46.1).
A typical picture's brightness has a SNR of 50 dB (average gray value of 2200 out of
4095).
As shown in Figure 2-3, the standard deviations of the motion estimates from optical flow and ROF are nearly the same when the optical flow estimates are restricted
to motions smaller than 1 pixel4 . The standard deviations decrease with increasing
SNR, with a slope of about -10 dB/decade.
Since both algorithms are unbiased
with no motion (Figure 2-2), increasing the SNR greatly improves both algorithms'
performance for very small motions (
0.035 pixels or 0.8% of the simulated bead's
diameter under the conditions stated in Figure 2-2).
Even though ROF's standard deviation is sensitive to the SNR, the bias is not.
The maximum bias in ROF, as a function of the SNR, is bounded by 0.01 pixels from
4
As shown in Figure 2-2, optical flow's standard deviation increases with larger motions.
24
below and by 0.02 pixels from above. For signal-to-noise ratios above 45 dB the bias
is the largest source of error in ROF. Accordingly, the combined errors for ROF do
not appreciably change for signal-to-noise ratios above 45 dB, except for very small
motions (< 0.035 pixels).
In contrast to ROF's well-behaved bias characteristics, the bias in optical flow is
complex. First, the bias is unbounded: for large motions (> 4.4 pixels) optical flow
estimates the motion to be near 0. Accordingly, a family of bias curves is plotted
where the maximum motion considered is varied.
For motions between 0 and 1
pixel, optical flow's bias decreases monotonically with increasing SNR. The amount
improvement is large at low SNR, but it becomes small at high SNR. The situation
is more complex for motions between 0 and 2 pixels. The bias curve has a minimum
at 44 dB SNR. This odd behavior is caused by the switching of the maximum bias
point from the positive bias peak to the initial negative bias peak (See Figure 2-2).
Decreasing the signal-to-noise generally causes optical flow to decrease the magnitude
of its motion estimate. Reducing the SNR down to 44 dB reduces the height of the
positive bias peak. Below 44 dB, the first negative bias peak overtakes the positive
peak and becomes the maximum bias point. For motions between 0-3, 0-4, and 0-8,
the bias is poor. The bias curves do not change much with SNR because the estimates
even at the highest SNR are close to 0.
In conclusion, without low pass filtering, optical flow's bias is the dominant error
source, regardless of the signal-to-noise ratio. Even with the bias-compensated ROF
algorithm, the bias is the dominant error source until 6 dB below the typical lighting
level. Consequently, averaging many estimates will not significantly improve either
algorithm's performance, except for very small motions (< 0.035 pixels).
Low pass filtering
Errors caused by the spatial derivative approximations can lead to biased motion estimations. Nevertheless, the errors in the first difference approximation of the spatial
derivatives approach 0 as the maximum spatial frequency in the picture approaches
0. Thus, low-pass filtering the pictures before using optical flow or ROF may reduce
25
I
I
I
I
I
I
I
I
10
0
x
I
L_
8
4
3
1-
2
CO.
-
-0
0.1 -
M_
1
'
A--
0.01
nnrno
- R
9-
s.d.
-
I
30
'
I
40
'
I
'
50
I
60
'
I
70
SNR (dB)
Figure 2-3: Optical flow's and recursive optical flow's performance statistics versus
the images' signal-to-noise ratio: simulation. The curve labeled R is the magnitude of
the maximum bias for motions between 0 and 8 pixels for recursive optical flow. Since
optical flow's bias is unbounded, five bias curves are shown with differing maximum
motions (the top five curves). Theses curves show the maximum bias for motions
between 0 and the number labelling the curve. The bottom two curves, labelled s.d.,
are the standard deviations of the recursive optical flow algorithm and the optical
flow algorithm for motions between 0 and 1. As shown in Figure 2-2, optical flow's
standard deviation increases for larger motions. The pictures were 32 pixels square
and had no low-pass filtering.
26
the algorithm's bias. On the other hand, this low-pass filtering reduces information
content in the pictures which may increase the standard deviation.
This trade-off
between reducing bias and increasing the standard deviation is shown in Figure 2-4.
The standard deviations of the two algorithms are nearly the same when the
optical flow estimates are restricted to motions smaller than 1 pixel. For cutoff frequencies less than or equal to 0.57r, the standard deviations decrease with an increase
in the low pass filter's cutoff frequency, with a slope of about -20 dB/decade. The
standard deviation decreases slightly from the 0.57r cutoff frequency point to the no
low-pass-filtering point. Thus, low pass filtering with a cut-off frequency less than
0.5ir reduces both algorithms' very small motion (< 0.035 pixels) performance, where
the standard deviation is the dominant error source.
The bias in ROF drops by a factor of two from 0.016 pixels to 0.008 pixels when
changing from no low pass filtering to low pass filtering with a cutoff frequency of
0.57r. For greater low pass filtering, ROF's bias slowly gets worse. Nevertheless, the
standard deviation is the dominant error source in this region; the bias is insignificant.
The optimal low pass filtering for ROF has a cutoff frequency near r/2.
For motions between 0 and 1 pixel, optical flow's bias decreases monotonically with
decreasing cutoff frequency and is comparable to ROF's bias at a cutoff frequency
of r/16. The bias and standard deviation are equal near a cutoff frequency of 7r/4.
For motions between 0 and 2 pixels, the low lass filtering has to be severe (0.097r)
before the bias and standard deviation equal. For motions greater than 2 pixels, the
bias decreases with decreasing cutoff frequency, but not enough to cross the standard
deviation curve over the range of low pass filters tested.
Cropping
In some cases, one would like to estimate the motion of an object which is moving
with respect to a stationary background. For example, the target may be an airplane
traveling through the sky or a hair bundle in the inner ear when no other structures
are in focus. In these cases, the performance of a motion detection algorithm may be
affected by how much background is in the pictures. For example, if the pictures are
27
I. . I I ..II I
10
M
I
I
I
I I I I Il
0
C
X
E_
8
4
3
I--,"
a
a,
I
.S I 111I
w
1i
W
I
i
2
0.1 -
1
-
L_
R
o.0
s.d.
0.01 -
0.001 -
I' ' I I
_
· ___
' ' I' ' '1 I
·
_
·
· ___I
I
!
I
d10
LPF cutoff freq
Figure 2-4: Recursive optical flow's performance statistics versus the amount of low
pass filtering applied to pictures: simulation. The curves are the same as those defined
in Figure 2-3. The ordinate plots the 3 dB point of the digital low pass filter. The low
pass filter is a radially symmetric hanning window with a 16 pixel diameter, except for
the 7r/16 data point, where a 24 pixel diameter was used. The data at ordinate value
of 7r represents no low-pass filtering, instead of a low pass filter with its 3 dB point at
7r. To prevent ringing artifacts, the filtering was performed using direct convolutions
and the resulting pictures were cropped by the radius of the low pass filter. The final
picture size was 32 pixels square, and had an average signal-to-noise ratio of 50 dB.
28
cropped close to the moving target the algorithm's performance may improve. On
the other hand, this cropping has to be done with user intervention to determine the
cropping size.
As shown in Figure 2-5, the optical flow algorithm is sensitive to cropping. The
bias is dramatically reduced as the picture size is reduced from 128x128 pixels to
16x16. In contrast, Figure 2-5 also shows that ROF's performance actually improves
slightly in the larger pictures.
------128 x 128
I , I
, I
, I
-- 64x644
- 32 x32
,
, I
, I
, I
F
16 x 16 0.5
[I-
0.02
, I
la 0.01
.S
S
0.0-
S -o.5
CD
10 -0.n
-0.011
.0
L
-0.02
I
0
'
I
2
'
I
4
'
\
-
10
-1.0
.LS~~~~~I
L
-1.5
I
I
I
6
8
0
i
I
2
'
I
4
I
'
6
8
x motion (pixels)
x motion (pixels)
Figure 2-5: Effect of cropping on the bias in recursive optical flow (left panel) and
optical flow (right panel): simulation. The four curves represent four picture sizes
(effectively the amount of background) from 128x128 pixels to 16x16 pixels. The
pictures signal-to-noise ratio is 50 dB and no low pass filtering was performed. Note
the different bias scales for the two panels.
2.4.3
Simulation versus measurements
Figure 2-6 compares the simulations to measurements of a TiO 2 bead. Both algorithms performed worse in the measurements than the simulations.
Nevertheless,
ROF still had vastly better performance than optical flow. The bias in the measurements for ROF is bounded by 0.07 pixels, a factor of 5 worse than the simulations.
The maximum bias in the measurements for optical flow is 0.7 pixels. Optical flow's
bias appears to be unbounded in the measurements, as it was in the simulations. The
maximum standard deviations for the algorithms were around 0.015 pixels, a factor
of 2 larger than the simulations.
29
.. I
-
3-
*c
..I
r.
-I ....
... I. .
-
-..-
0
.E
Cn
..
.
I.... I.... I.... I
S
_
0ce
0.0-
.o
I....
0.1-
LL
0.03:11
I
_
-
4
n
.U
-
--
0.0
1.0
2.0
U.I
1
0.0
x motion(pixels)
1.0
2.0
x motion (pixels)
Figure 2-6: Comparison the bias in optical flow (left) and ROF (right) in simulations
and measurements. The box plots represent the measurements. The box represents the interquartile range, the horizontal line represents the median value, and the
vertical line represents the range of estimates from the 100 trials. The solid curve
represents the bias in the simulations. The picture size is 32x32 pixels, had an average signal-to-noise ratio of 50 dB, and had no low-pass filtering. Note the different
vertical scale on the optical flow and the ROF plots.
2.5
2.5.1
Discussion
Abnormal terminations to recursive optical flow
The recursive optical flow (ROF) algorithm has two abnormal termination situations.
First, it only allows a maximum of 16 search iterations. This termination situation is
needed when the displacement is so large that optical flow does not correctly identify
the direction of the motion. In these cases, the motion estimates are random, sometimes in the proper direction and sometimes not. Those first random guesses which
are "lucky" cause the beads to overlap enough that the algorithm gets very close to
the correct answer. The first guesses which are "unlucky" cause the algorithm to
wander aimlessly, depending on the noise. The stopping criterion is that both the x
and y estimates must change sign in the last two estimates. If the pictures contain
only noise on a uniform background, this stopping criterion is met with a probability
of 1/4 for every estimate after the first. Since the probability of stopping on or before
the nt h estimate of this Bernoulli process is 1 - (3/4) ( " - 1), there is a non-zero probability that the stopping criterion will not be met before there is no overlap between
30
the pictures!
To prevent this anomaly, no more than 16 iterations are allowed in
the search process. Regardless of what causes the ROF algorithm to stop when its
first guesses are "unlucky", those estimates are thrown out because their confidence
statistic reflects the poor match. Consequently, even though for very large motions
the algorithm only finds the bead by having a "lucky" first guess, the performance
parameters are still very good because of the confidence statistic. For motions smaller
than 4.4 pixels, ROF always found the bead; i.e., no data were eliminated because of
the confidencestatistic.
Large shifts of the images reduce the number of pixels used in the least-squares
minimization. A second abnormal termination situation occurs when this number of
pixels is less than 100. Photon noise, which causes random fluctuations in the pixel
values, alters the spatial and temporal derivatives at each pixel. When a large number
of pixels are used in the least squares minimization, the noise-created errors tend to
average to zero. In contrast, when the number of pixels is small, these fluctuations
may not average to zero motion. In this case, the motion estimates are wrong and
the confidence statistic may be abnormally favorable. As determined empirically, the
confidence statistic becomes less reliable below 100 pixels5 .
2.5.2
Effect of signal-to-noise ratio
Increasing the amount of noise, increases the importance of the noise-created errors
in the least-squares minimization. Since these errors tend to average to zero, they act
as "stationary" votes in the least-squares minimization. Thus optical flow estimates
smaller motions in low SNR situations than in high SNR situations. In contrast, the
bias in ROF is unaffected by SNR. The reason for this difference is that ROF uses
shifts that are both smaller and larger than the motion. The stationary "votes" for
these shifts cause errors in opposite directions and tend to cancel.
5
Typically, at least 1000 pixels are used in the estimates.
31
2.5.3
Effect of low pass filtering
Low pass filtering reduces the bias in both estimates. Nevertheless, even with significant low pass filtering, the bias in optical flow is at least comparable to the standard
deviation. For motions greater than 1 pixel, the bias is much larger than the standard
deviation for reasonable low pass filters. In contrast, the standard deviation for ROF
is typically larger than the bias when the data is low pass filtered.
2.5.4
Effect of cropping
If a moving target is imaged on a uniform background, the amount of background
affects the motion estimates. The background pixels only contribute to the noise in
the picture and consequently tend to make optical flow estimate a smaller motion.
Oddly, increasing the number of background pixels improves ROF. The reason can
be seen in Figure 2-5. Optical flow's bias for the larger picture sizes is more linear for
displacements between 0 and 1 than closely cropped pictures. Since ROF eliminates
the linear component of the bias, the bias is smaller for the larger picture sizes.
Nevertheless, as the fraction of moving pixels to background pixels decreases, the less
the motion reduces the mean square error-reducing
the effectiveness of the confidence
statistic.
2.5.5
Simulation versus measurements
Both algorithms performed worse in the measurements than the simulations. Nevertheless, ROF still had vastly better performance than optical flow. These facts
indicate that 1) the recursive optical flow algorithm does compensate for a large fraction of optical flow's bias, and 2) the simulations did not capture all of the important
features in the measurements.
Upon examination of the background-subtracted
pictures used in the measure-
ments, a significant amount of structure (background) which did not move remained
in the pictures.
In contrast, the simulations had a perfectly uniform background
before they were corrupted by Poisson (shot) noise.
32
The non-uniform background in the measurements could be caused by, for example, the CCD camera's fixed pattern noise and dirt on the imaging optics cause
the same pattern on each of the pictures taken. The non-uniformities in the background form a stationary "target" that competes with the desired moving target in
the least-squares solution of the optical flow algorithm. The consequence is an underestimation of the motion of the desired target, which is what happened in Figure 2-6.
The effect on ROF is the same as optical flow, because the stationary "votes" do not
change with the picture offsets.
33
Chapter 3
Direct observations of inner-ear
micromechanics: sensory
epithelium, hair bundles, and
tectorial membrane
3.1
Abstract
A system to measure sound-induced motions of inner ear structures has been developed. The system consists of a scientific grade CCD camera, a light microscope, and
a stroboscopic illumination system. Performance of the system was verified using a
moving target whose motion is independently known. Results suggest that estimations from one set of measurements of sinusoidal motion have a standard deviation
of about 5 nm and a bias of about 9 nm. Both errors can be reduced by averaging
the individual displacement estimates, resulting in a total error of 14nm/x/i, where
n is the number of averages.
An in vitro preparation for measuring the motion of inner ear structures of the
alligator lizard was developed. The excised cochlear duct is held in place between two
fluid-filled reservoirs by a stainless-steel ring that provides a hydrodynamic seal be-
34
tween the reservoirs. A hydrodynamic stimulator generates pressures across the basilar membrane that mimic those generated in situ by the middle ear. The stimulator
can generate pressures of at least 100 dB SPL over the frequency range 0.02-20 kHz.
Results for one preliminary experiment are shown. Motions for six locations are
analyzed in six planes: through the sensory epithelium, the middle and tips of the
hair bundles, and three through the tectorial membrane. Results indicate that the
tectorial membrane does not move as a rigid body.
3.2
Introduction
In normal hearing of vertebrates, the transformation between mechanical and electrochemical energy occurs at sensory cells called hair cells (Hudspeth and Corey, 1977).
These hair cells project tufts of stereocilia or hair bundles into the surrounding fluids
in the inner ear. Depending on the species, a gelatinous membrane called the tectorial membrane may surround the hair bundles, be in close proximity to the hair
bundles, or may be absent. Regardless of the presence or absence of a tectorial membrane, sound-induced pressure modulations in the inner ear fluids result in bending
of the hair bundles about pivots located near their base, thereby initiating a sequence
of electrochemical transformations. These transformations ultimately lead to neural
activity-i.e., action potentials on the auditory nerve-that the brain interprets as
sound.
Although the motions of hair bundles play a key role in modern conceptions of
inner ear function (Davis, 1958; Dallos et al., 1972; Weiss and Leong, 1985; Freeman
and Weiss, 1990), only two studies report direct observations of audio frequency hair
bundle motions in an intact inner ear (Holton and Hudspeth, 1983; Frishkopf and
DeRosier, 1983). Furthermore, direct observations of audio frequency motions of the
tectorial membrane have not been previously reported.
The scarcity of direct measurements is due in part to the difficulty of measuring
sound-induced motions. The motions that result from sounds of everyday intensities are less than 1 m (Holton and Hudspeth, 1983; Frishkopf and DeRosier, 1983).
35
Furthermore, the tectorial membrane is almost transparent under brightfield illumination, making it extremely difficult to study.
This paper describes a new system to measure these sound-induced motions. The
system combines recent advances in video imaging with the ability of compound microscopes to "optically section" a target. While a strobe light stops the apparent
motion of inner ear structures that are moving at audio frequencies, a scientific grade
CCD camera takes successive images of the inner ear at varying depths in the specimen. By processing these pictures, we can, for example, estimate the motion of a
hair bundle's base and tips as well as the overlying tectorial membrane.
3.3
3.3.1
Methods
Hardware
The small displacements that we wish to measure place special constraints on the
video-imaging system. Even after magnification by a light microscope, the displacements are smaller than the camera's pixel spacing and generate only small intensity differences in the individual pixels; thus, sub-pixel motion detection algorithms
are acutely sensitive to imaging degradations such as those caused by shot noise,
dark noise, read noise, charge transfer efficiency, and inter-pixel gain variations (See
Janesick et al. (1987) for details on CCD cameras.). Accordingly, we use a scientificgrade CCD camera (Photometrics 200 series, 12 bit dynamic range, Photometrics
AZ) to maximize the motion detection system's spatial resolution.
Since all scientific CCD cameras are very slow compared to audio frequencies, the
inner ear must be stroboscopically illuminated to slow its apparent motion. Strobe
systems exhibit three types of problems: temporal, intensity, and spatial instabilities. Temporal instabilities (variations in the time between the trigger and the flash)
cause the specimen to be illuminated at the wrong time, leading to incorrect motion measurements.
Compensating for temporal instabilities is extremely difficult;
accordingly, an arcing strobe light (Chadwick-Helmuth, CA) with small temporal in-
36
stability is used. The intensity instabilities (variations in the brightness of individual
flashes) for the strobe light are typically less than 4%. Problems resulting from in-
tensity instabilities are largely eliminated by taking advantage of the fact that the
small motions do not change the entire picture's average brightness. Thus, intensity
instabilities typically do not result in large errors in the motion detection algorithm
used. Spatial instabilities, caused by wandering of the the strobe light's arc, result in
a nonuniformly-illuminated specimen and the structure of the nonuniformity changes
with every picture. Since sub-pixel motion detection algorithms interpret changes in
intensity as motion; spatial instabilities are unacceptable. The spatial instabilities
are largely eliminated through the use of a fiber optic scrambler (Inoue, 1986).
To prevent spatial aliasing by the CCD camera, the image of the specimen must be
bandlimited by a low pass filter. Since all microscopes have finite-sized apertures, they
all bandlimit the images they transmit. Accordingly, we use microscope optics (Zeiss
Axioplan with a 40x, 0.75 NA water immersion objective) that (1) have the greatest
resolution (highest spatial cutoff frequency) and (2) have enough magnification to
prevent spatial aliasing.
The signals needed for both verifying the system and performing animal experiments are generated from a two-channel, 12 bit D/A converter (DT 1497, Data
Translations).
One channel, called the stimulus, is low passed filtered with 18 kHz
reconstruction filter (TTE, CA) and then amplified by a 40 dB amplifier. The other
channel is used to trigger the strobe light at known phases of the stimulus frequency,
so that stop-action pictures can be taken.
With the above hardware, stroboscopic pictures can be taken with an average
period of one picture every three seconds.
3.3.2 Motion detection algorithm
Our goal is to measure translations of the sensory epithelium, translations of the
tectorial membrane, and rotations of hair bundles in response to sinusoidal stimulation
at audio frequencies. In the in vitro preparation of the alligator lizard, the epithelium,
hair bundles, and tectorial membrane are located on top of one another; the hairs
37
in the hair bundle are parallel to the microscope's optic axis. Due to the optical
sectioning property of microscopes-i.e.
the ability to separately image sections of
the specimen perpendicular to the optic axis-we can independently estimate the
displacements of the epithelium, multiple points along the length of the hair bundles,
and multiple points through the thickness of the tectorial membrane.
We characterize the magnitude and phase of the displacement of each target from
sequences of video images taken at different phases of the stimulus (Figure 3-1).
Displacements between the imaged objects in successive pictures are estimated using recursive optical flow (Davis and Freeman, in preparation), a motion detection
algorithm that is based on optical flow (Horn and Schunck, 1981).
translation
0
000
0
*
|~~~~
-
0
0
0
0
0
0
0
000
0
Figure 3-1: Illustration of the method for estimating motion from video images. Data are
taken with the sound on and the strobe synchronized to n different phases of the stimulus.
The phases are evenly spaced in 360 degrees.
If the estimated displacement between the first
two phases is dl, 2 , between the second and third
is d2 ,3 , etc., then the data points plotted are
phase (1,0), (2,d1, 2), (3,d1 ,2 +d 2,3 ), etc. Thus the symbols map out one period of the motion, with
the exception that the DC component is incorrect because of the arbitrary assignment of 0 to
the location of the specimen in the first strobe
phase. For clarity in the plots, the first point
(1,0) is repeated as the last point (n+1,0) in
order to more easily recognize the sine wave.
To characterize the physiologicallyrelevant rotation of the hair bundle about its
base, we estimate the translations of the cilia in optical sections through the tip and
base of the hair bundle. Because the cilia are stiff (Flock et al., 1977) and the rotations
are small, we can estimate the rotation as the difference between the displacements
of the hair bundle's tips and the base divided by the distance between the planes of
section.
38
3.3.3
Verification of the motion detection system
The motion detection system was tested using a "calibrated target"-a
microscopic
specimen that could be moved at audio frequencies and whose motions could be
calibrated using an independent method.
The calibrated target is described elsewhere (Davis and Freeman, in preparation).
Briefly, it consists of a TiO 2 microsphere with a 0.3 pm diameter (Polysciences, Warrington PA) mounted so that 1) the microsphere can be displaced by a piezoelectric
stack and 2) the resulting motions can be detected using a fiber optic position detector
(Angstrom Resolver, Opto Acoustics Sensors, Raleigh, NC).
The calibrated target can generate displacements with amplitudes up to 0.6 gm at
frequencies from DC to 1 kHz. The response of the fiber-optic detector is linear over
the range of amplitudes and frequencies that the piezoelectric stack can generate.
Consequently, we can verify the motion detection system by comparing the total
harmonic distortion (THD) in the fiber-optic detector's response to the THD in the
motion detection algorithm.
3.3.4
Preparation
Our goal is to measure inner-ear micromechanics in the alligator lizard (Plate 1). The
basic method (surgical approach, solutions) is described in detail elsewhere (Freeman
et al., 1993). Briefly, adult alligator lizards, 22-28 g body weight, were anesthetized
by cooling to 4-80 C for 60 minutes and then sacrificed by decapitation.
approach was used-first
sacculus-to
A dorsal
removing the skin and muscle, then bone, and finally the
expose the cochlear duct.
The eighth nerve was cut, and the duct
removed. The vestibular membrane was opened and the duct was positioned over a
hole separating two fluid-filled regions (Figure 3-2).
39
Plate 1: Cross section of the alligator lizard (Gerrhonotus multicarinatus) cochlear
duct. The auditory receptor organ, the basilar papilla (BP), rests on the thin basilar
membrane (BM). The mechanically sensitive hair bundles are found on top of the
basilar papilla; in this section they are just below part of the tectorial membrane
(TM). The rest the tectorial membrane spans the gap between the basilar papilla
and the limbic bulge (LB). The limbic bulge and the triangular limbus (TL) are
connective tissues that are contiguous with the basilar membrane. The x and z axes
define the coordinate system used. The y axis points directly into the page. In the in
vitro preparation, the vestibular membrane (VM) is removed so that the view of the
basilar papilla x-y plane will be unobstructed. This 20 m thick, stained cross section
was prepared by Rindy Northrup and Diane Jones, under the direction of Michael
Mulroy.
40
VM
LB
TM
/
I
BM
,L=
Plate 1: Alligator lizard anatomy
In the experimental chamber, the cochlear duct is held over a hole by a stainlesssteel ring.
Because 1) there is no perfusion capabilities in the lower fluid region
and 2) the metal ring has not yet been shown to provide a long-term chemical seal
between the perlymphatic and endolymphatic sides of the cochlear duct, an artificial
perilymph (171 mM NaCl, 2 mM KCI, 2 mM CaC12, 3 mM D-glucose, 5 mM HEPES
buffer, and approximately 2 mM NaOH to adjust the pH to 7.3 0.05) is used on
both sides of the duct (Freeman et al., 1993).
The chamber's hydrodynamic stimulator can produce at least 100 dB SPL pressures over the frequency range 0.02-20 kHz. The stimulator's power spectrum is not
constant across frequency; consequently, the drive voltage to the piezo must change
when collecting data for a constant-amplitude,
sweep-in-frequency experiment.
On
the other hand, halfing the drive voltage at a fixed frequency does half the sound
pressure in the chamber.
3.4
3.4.1
Results
System verification
Figure 3-3 shows results from the sinusoidally moving test specimen. Stroboscopic
pictures of a 0.3 m TiO 2 bead-shaken
at 500 Hz by a piezoelectric crystal-were
taken at eight evenly-spaced phases of the piezo's stimulus. The imaged bead was a
4-5 pixel dark area on a light background. The pictures were cropped to 32x32 pixels
and had an average brightness of 1/2 the maximum brightness for our camera'. Two
strobe flashes were used for each picture.
If the motion detection system were perfect, the estimates shown in Figure 3-3
would be fit exactly by an offset sine wave (the offset is required because the estimate
at 00 is arbitrarily taken as zero). To test the motion detection system, the DFT of
the median estimates were computed. The magnitude and phase of the fundamental
and the DC value were used to generate the curve in Figure 3-3. The ratio of the
1
The performance of the ROF algorithm weakly depends on cropping size and light level (Davis
and Freeman, in preparation).
42
crank
clamp
/
X
hydro
..;.
,.
-I _ .
I .
perilymph
Figure 3-2: Experimental chamber for measuring motions of inner ear structures.
After removal of the vestibular membrane, the dissected cochlear duct is placed over
the 0.74 mm hole in the plexiglass chamber. A clamp-made from a 250 tm long
piece of 19-gauge, thin-wall, stainless-steel tubing (inside diameter 0.8 mm, outside
diameter 1.1 mm) glued to a bent stainless steel tube makes a hydrodynamic seal
between the top and bottom fluid-filled regions. The crank modulates the force on the
cochlear duct by pulling the string which stretches the spring. The spring's force is
transferred to the cochlear duct by rotating the clamp about the pivot. A piezoelectric
disk morph, obtained from a Panasonic EFR series 40 kHz ultrasonic transducer and
waterproofed with High Q Gloss Enamel, seals the left-most opening in the bottom
fluid-filled region and provides the hydrodynamic stimulation to the cochlear duct. A
pressure transducer (not shown) (Entran EPX) monitors the fluid pressure under the
duct. The bottom fluid-filled region is sealed from below by a glass slide (not shown),
instead of plexiglass, because of glass's superior optical properties. Our analogue to
the ear's helicotrema is a stainless steel tube (not shown) (150 m inside diameter,
25 mm long) which prevents DC pressures from forming across the cochlear duct by
venting the bottom fluid-filled region to the atmosphere. The bottom region is filled
through a hole (not shown) located on the top surface. During the experiment, this
hole is sealed with a plexiglass plate (coated with vacuum grease) that is held in place
by four screws (not shown).
Figure 3-3: Box plots of estimated
motion versus the phase of the
I,
1.0
I,
I,
I
.
0.2
0
.X
E
Cn
CD
._E
0.0
o
-1.0
stimulus in which the strobe was
,
I
1.0
0
E
E
fired. At each phase, the box represents the interquartile range, the
horizontal line represents the medidan
value, and the vertical line
represents the range of estimates
g
.
-0.20 2
from
the 50 trials. The solid line is
.
fl
0
I
I
I
90
I
I
180
I
I
270
360
E
n-ffQ,,,f
Y)0.
~V
~~~~~
~ Q~ni- uan-uq-'IXT'hnQ,~nian
~E
J
~
magnitude,
l
llII
JW
and
V
phase
~~l
are
·
NMI
JV`lIl
deter-
mined from the discrete Fourier
transform (DFT) of the median
data values.
stimulus
phase(degrees)
43
sum of the powers in the harmonics to the power in the fundamental (total harmonic
distortion) is taken as a measure of the quality of the fit. For the medians shown in
Figure 3-3, the total harmonic distortion (THD) is -84 dB. The THD of the optic-lever
system which was measuring the motion of the target was -55 dB.
The above data represents the time-average of 50 trials. However, a description
of the system's performance from one trial is also useful. Without time averaging
(Figure 3-4), 50% of the peak-to-peak estimates of the motion fall within 8 nm of
one another.
The phase estimates are similarly fall within 1.3 degrees; the THD,
8 dB. Note that the median THD is -65 dB, almost 20 dB worse than the THD
from the time-averaged calculation. This decrease in THD implies that there is some
non-deterministic noise in the individual data points, which gets averaged out in the
time-average case. This noise also affects the peak-to-peak estimates: the estimate
from the median of the DFT is 9 nm greater than the DFT of the medians. In
contrast, the noise does not seem to affect the phase of the DFT fundamental.
-LtZ
0.45 -
2
.0
_
PX
-bU -
I80-
0.44
-6
2
0.43
78
,L
0.42
0
0.41
Q.
u.-v
4
fI
-70I-
X
_
-
76I
7A
II
Figure 3-4: Box plots of the peakto-peak
displacement (left), phase
.
.
(center), and total harmonic dis_AL-. ,.1. A._L '__ /rrTT"h Ih
LortlonII nu) 1__L..
rzgnfl)oI ne nLea
sine wave. For each of the 50 trials, a DFT was computed and the
relevant statistics are tabulated in
the box plots. The crosses represent the data from Figure 3-3,
where the values were obtained by
4-_L:__ 4.'L-
-80
--Vu
LalILll
Llmc IILcItUll
-t 4.'L- 4.:-- f__
I0LI
Illllu Our
stimulus phase) waveform before
computing the DFT.
3.4.2
Preliminary results of lizard ear motions
In this section we show measurements of micromechanics in the tectorial region of
the alligator lizard cochlea. All of the results are from a single preparation, and for
that reason must be considered preliminary.
44
Gross motions of the sensory receptor organ
In order to observe audio frequency motions of the in vitro preparation, a strobe light
was used to slow the apparent motion. When the preparation is stimulated, the entire
receptor organ moves roughly as a rigid body-pivoting
about an axis parallel to its
length. Motions were clearly visible for sound pressures on the order of 100 dB SPL
and for frequencies from 20 Hz to 5 kHz.
At low frequencies (less than 100 Hz), the tectorial membrane moved in phase
and almost equal in magnitude with the papilla.
As the frequency increased, the
magnitude of the tectorial membrane motion decreased.
Motions of tectorial membrane and hair bundles
Plate 2 shows data from the tectorial region of the papilla. The displacements shown
are in the x direction of the camera, which has been aligned to the physiologicallyrelevant polarization of the hair bundles (Mulroy, 1974).
The magnitude of motion is largest near the base of the hair bundles, and gradually
diminishes towards the top of the tectorial membrane. The rate of motion decrease is
greatest through the planes containing the hair bundles, and is smaller between the
top two planes-where
the tectorial membrane is most prominent.
The motions of the six hair bundle bases are similar in both magnitude and phase.
The peak-to-peak motions were on the order of 0.5 pm. The phases of the hair bundles
bases were within 50 of one another. The magnitude of the motions varied by 87 nm,
with the smaller motions tending to occur towards in basal (free-standing) end.
The motions near the tips of the hair bundles were about a factor of two smaller
than the base. The phase of the motions varied by about 300, and averaged 200 less
than those of the base.
The greatest variation in motion is in the tectorial membrane. The average peakto-peak motions was about 0.2 pm, with a range of about 0.2 pm. The average phase
lag was -40o relative to the base, with a range of 400.
By subtracting the motions of the hair bundle tips from the hair bundle base,
the physiologically relevant rotation can be estimated (Hudspeth and Corey, 1977).
45
For the data seen, the peak-to-peak angular rotation varied between 0.034 and 0.055
radians, averaging 0.044 radians.
No correlation between the hair bundle location
and the magnitude of the rotation was found.
Plate 2: Motions near six hair cells in the tectorial membrane region, at six different
planes of focus: x direction. The six pictures are photomicrographs of the papilla's
tectorial region through six different planes of focus. The plane of focus is indicated
by z above each picture. The basal (free-standing) region is off the top of each
picture, the neural side is to the left. The line plots are estimates of the x-direction
displacements in the region bounded by the white boxes. The scale bars are 0.5 m for
all line plots. The coordinates of the white boxes relative to the picture's coordinates
are the same in all six planes of focus. The horizontal scale bar indicates 25 m. The
cochlear duct was stimulated at 513 Hz with a fluid sound pressure of 104 dB SPL.
46
7 = 15 L m
z= 12Lum
w--
]In,
IIL -
+--
--
V
7 =61m
z = 9 jLm
- le---
jInI- jInI--
-
7
Z =O lm
..
YN
37
11%
N
-11)7
-)PV
NC·
_N[O[.5pM
25
Plate 2: Displacements
upm
in the x direction
Plate 3 shows y-directed motions in the tectorial region. These motions do not
have direct physiological relevance, since the motions are perpendicular to the morphological polarization of the hair bundles. Nevertheless, they may have some type
of indirect effect.
The motions in this direction are smaller than the motions in the x direction. The
largest motion shown is 0.09 m. Unlike the x-directed motions, the largest motions
are not found at the hair bundle bases. In fact, the motion tends to increase to a
maximum in the tectorial membrane (z = 12 gm). Depending on location, the motion
at the top of the tectorial membrane may be smaller, larger, or about the same as
the motion 3 gm below the top.
Plate 3: Motions near six hair cells in the tectorial membrane region, at six different
planes of focus: y direction. This plate is identical to Plate 2 with the exception that
the line plots are for the y direction of motion.
48
7 = 12um
--
-
--
--
]_-
.I-
__-
z = 9 Lm
---
-]1--
}..
]--
_-
]_z= 3um
_--
-
z7=O um
+-I-- --__
_-
-
}-
r^
25pm
Plate 3: Displacements
in the y direction
V.
To illustrate the observed sound-induced motions, the area around one hair bundle
in Plate 2 (middle left) has been enlarged in Plate 4. The motion is most easily seen
when the pictures taken at the different phases are displayed in rapid succession. Due
to the limitation of the media, this method of observing the motion is not applicable
here.
From the eight different stimulus phases in which data was collected, Plate 4
shows the two in which the displacement between the structures was greatest. The
two phases used are different for each structure. With the aid of the crosses, the base
can easily be seen to move approximately 1 hair width between the two pictures. The
recursive optical flow algorithm estimates the displacement between the two pictures
to be 0.49 ,im; Mulroy and Williams (1987) state that the typical width of a hair in
this region is 0.45
m. In contrast to the "large" displacement between the bases,
the tips and tectorial membrane moved only half as much: 0.28
m, and 0.25 pm,
respectively.
Plate 4: Motion of the base and tip of a hair bundle and the overlying tectorial
membrane. The top row shows an enlarged view of one of the hair bundles (base
right, tips center) and its overlying tectorial membrane (left) shown in Plate 2. The
second row repeats the first row, but crosses have been placed on most of the salient
features in the pictures. These pictures were taken with the strobe light synchronized
to a particular phase with respect to the stimulus: 00 for the tectorial membrane, 450
for the tips, and 900 for the base. The third row show the same views as the second
row, but these pictures were taken with the strobe light synchronized to 1800 later in
their respective phase. The pictures in the bottom two rows are identical to those in
the second and third rows, but the crosses have been reversed.
50
Plate 4: Displacements of TM, tips, and base
Limbic bulge
The motion of the limbic bulge (Figure 3-5) was also examined with the same stimulus
conditions. The motion is about 26 dB smaller than the motion of the papilla.
I
.
,I
,
.I
I,
00.02
I
, , I
I
'
I
0.02
X -0.02
,
I
0
'
1
I
90
I '
180
1
'
270
-.
I
I
360
0
strobe phase
'
'
90
180
I
270
'
I
360
strobe phase
Figure 3-5: Motion of the limbic bulge. The left panel shows the estimated motion
across the width of the limbic bulge (x direction) versus the stimulus phase. The
right panel, the motion in the y direction. Note that the vertical scales are 10 times
smaller than those in Plate 2.
3.5
3.5.1
Discussion
Performance characteristics of motion measurement
system
The accuracy and precision of our motion estimates were tested using a target whose
motions were calibrated using an independent method. The results in Figure 3-4 have
several important implications.
The interquartile range of the fundamental component's peak-to-peak displacement is 8 nm. This 8 nm represents the typical confidence of the estimate: if one
trial results in an estimate of al
m peak-to-peak, another trial is likely to give an
answer a 2 between a - 0.004 and a + 0.004. Unfortunately, a is not the correct
answer. The time-averaged value of the peak-to-peak displacement from Figure 3-3
(the cross in the left panel of Figure 3-4) is 9 nm below the median value of the
frequency-averaged data. Thus averaging a, a 2 , ... , will not result in the correct
52
answer. This difference between time and frequency averaging results in the fact that
noise in the individual measurements is not averaged out in frequency averaging, but
the noise is averaged out with time averaging. In frequency averaging, half the noise
power in the fundamental component is in-phase with the signal; this power averages
to zero. In contrast, the other half of the noise power is out of phase from the signal;
this power, which always increases the power content in the out-of-phase component
from zero to a positive value, adds to the power of the signal to obtain the total power
in the fundamental. Thus, the power in the fundamental is greater than the power
in the signal.
The phase estimates typically fall within 1.30 of one another. In contrast to the
above magnitude estimates, the time and frequency averaging techniques average to
the same estimate. This result implies that the phase of the measurements' noise is
uniformly spread over 3600, and thus averages to zero.
These results indicate that from one trial, the total error in the estimate is one
the order of 14 nm. If repeated trials are performed, averaging the time waveforms
reduce both the bias and standard deviation by the square root of the number of
trials. Thus, for n trials, the expected error is (14/e)
3.5.2
nm.
Comparison with other motion detection methods
Non-contact observations.
The measurement system is remote from the structures measured. The closest part of
the measuring system to the cochlear duct is the 1.9 mm working distance objective.
In contrast, other techniques-most
notably M6ssbauer (e.g. (Peake and Ling, 1980)
and some laser interferometric methods (e.g. (Ruggero and Rich, 1990))-require
the
placement of some artificial substance on the structure whose motions is desired. Any
technique which places a foreign substance in the ear has the problem of determining
how that substance affects the motion of the structures.
53
Non-contact stimulation.
Our technique of applying pressure in the surrounding fluids and allowing the fluids
to interact with the cochlear duct has the advantage that this is what normally
happens in vivo. Many researchers (e.g. (Crawford and Fettiplace, 1985; Howard and
Hudspeth, 1988; Zwislocki and Cefaratti, 1989)) have used glass fibers to probe inner
ear structures.
Again, any technique which uses a foreign substance in the ear has
the problem of differentiating between the effects of the foreign substance and the
effects of the desired structure.
Stimulation nearly the same as
in situ.
Not only is our stimulating technique non-contact, it is also very similar to the in
situ stimulation to the cochlear duct. The positioning of the cochlea over the hole
in our chamber (at least to first order) matches the bony and other supports in
situ (Freeman, 1990). This technique is completely different some other non-contact
stimulating techniques, most notably water jets. With water jets, a tiny stream of
fluid is squirted directly at a hair bundle- not something that normally happens in
the ear.
Easy to determine exactly what is being measured.
Because the images of a large region of the cochlear duct are taken, the task of
determining exactly what was measured is trivial.
For example, if the motion of
the base of a hair bundle is desired, pictures are taken of a much larger region (for
example, see Plates 2 and 4) and later cropped down to the desired region.
With other motion detection techniques, the task of determining what was measured is difficult. For example, laser interferometric techniques (e.g. (Denk et al., 1989;
Ruggero and Rich, 1990)) typically attempt to bounce a laser beam off a known target
(a hair bundle, basilar membrane, glass microbeads) to measure either the position
or velocity of the target. The problem lies in determining exactly off what the laser
is reflecting. The same problem exists when using a photodiode pair (e.g. (Crawford
54
and Fettiplace, 1985)) to detect motion. Just like the interferometric methods, the
output from the photodiode pair is one time waveform. Thus, assumptions have to
be made about what are all the possible targets in order to decide which target is
presently being observed.
Can measure motions of several structures in the same preparation.
By imaging large regions of the cochlear duct, the motions of many structures can
be measured in the same preparation. In contrast, measuring just two points in the
same preparation is difficult for techniques which place a target on the duct, such as
M6ssbauer (e.g. (Peake and Ling, 1980) and some laser interferometric methods (e.g.
(Ruggero and Rich, 1990)). Furthermore, with our video microscopy technique, the
resolution between measured points is on the order of microns. Thus, these measurements will provide the first data on how the various micromechanical structures are
coupled together. Data from this type of measurements can be used to substantiate
the point impedance model of the organ of corti and also to measure the coupling of
the inner and outer hair cells through the tectorial membrane.
3.5.3
Cochlear mechanics of lizard
All of the results are from a single preparation, and for that reason must be considered
preliminary.
Motions of the sensory receptor organ
Our basic observation, that hydrodynamic stimulation causes a rocking motion of the
sensory receptor organ about an axis parallel to its length, is consistent with previous
reports (Holton and Hudspeth, 1983; Frishkopf and DeRosier, 1983). This observation
is easy to understand in terms of the anatomy (Plate 1). The basilar papilla is
asymmetrically located on the thin basilar membrane. With the assumptions that the
basilar membrane has spatially uniform compliance and it is the dominate compliance
in this region, the basilar membrane on the abneural side of the papilla will stretch
55
the most when a pressure difference exists across the membrane.
This stretching
would cause the papilla to rock back-and-forth as observed.
We found displacements of the receptor organ on the order of 0.5pm in response
to sound pressures on the order of 100 dB SPL in the fluid. This finding is consistent
with those reported by Frishkopf and DeRosier (1983). Holton and Hudspeth (1983)
did not record the sound pressure in their chamber, thus a direct comparison cannot
be made with their results.
Because we measure sound pressure in the fluid, the effect of the middle ear must
be taken into account to estimate an equivalent sound pressure at the tympanic membrane. The middle-ear sound pressure gain for the alligator lizard can be estimated
from a previously reported model (Rosowski et al., 1985) to be approximately 26
dB at 500 Hz. Therefore, sound pressures of 104 dB SPL in the fluid correspond
to sound pressures on the order of 78 dB SPL at the eardrum.
To put this num-
ber into perspective, a loud shout from five feet away is approximately 80 dB SPL,
while normal conversations are held at 60 dB SPL (Green, 1976, page 16). Thus the
sound pressures used in this study are at the loud end of typical sounds, but probably not traumatic.
Our results therefore demonstrate the possibility of measuring
micromechanical responses to moderate intensity sounds.
Motions of the tectorial membrane and hair bundles
We found peak-to-peak angular rotation between 0.034 and 0.055 radians. Holton and
Hudspeth (1983) measured rotations between 0.05 and 0.08 radians resulting from
sound-induced motions of the free-standing hair bundles in the alligator lizard. In
contrast, static rotations of only about 0.02 radians are needed to maximally change
the hair cell's membrane voltage in the bullfrog sacculus (Howard and Hudspeth,
1988). Thus our measurements are on the same order as previous sound-induced
motions and static measurements.
The motion of the tectorial membrane varied in space. Thus the tectorial membrane does not behave as a rigid body. The classical model for inner ear function
(Davis, 1958), which is still widely held in belief today, assumes the tectorial mem56
brane acts like a rigid lever. These data directly contradict this lever model.
Motions of the limbic bulge
The motion of the limbic bulge (Figure 3-5) was about 26 dB smaller than the motion
of the papilla under the same stimulus conditions. This finding is consistent with
Peake and Ling (1980), who measured motions of the neural limbus under the limbic
bulge; at 600 Hz, a 14 dB increase in the sound pressure at the eardrum was needed to
produce the same velocity as the basilar membrane under the papilla. These numbers
cannot be directly compared with the present results, however, because a significant
component of the motion measured by Peake and Ling were in the direction orthogonal
to the present results. In addition, we compared the motions between the papilla and
limbic bulge in a iso-stimulus protocol, while Peak and Ling used an iso-response
protocol. Nevertheless, the fact that both groups measured non-zero motions of the
limbic bulge is significant. The veil portion of the tectorial membrane connects the
top of the limbic bulge to the portion of the tectorial membrane on the papilla. The
physiological significance of the 26dB less movement of the limbic bulge relative to
the papilla is unknown.
3.5.4
Implications for micromechanics
Our results demonstrate the possibility of measuring micromechanical responses at
moderate intensity levels. These measurements are of critical importance since the
micromechanical stage (with a tectorial membrane) in the transformation of sound
to neural messages has previously been addressed only theoretically.
We investigate the micromechanical properties of the alligator lizard to understand
how inner ear mechanisms determine the neural code that relates sounds to nerve
messages in mammals as well as lizards. The lizard ear is more easily understood
than the mammal ear, and yet it shares many of the same basic mechanisms with the
more complex mammal ears. Furthermore, the lizard ear shares non-linear properties
such as two tone suppression and a compressive non-linearity with mammal ears. In
mammals, the mechanism behind these properties has been contributed to the outer
57
hair cells. Nevertheless, lizards do not have outer hair cells. Our measurements will
will provide direct observations of all the key components in the micromechanical
transformation,
and thus should be able to resolve the apparent conflict between
measurements in the lizard and theories in mammals.
58
Bibliography
Aggarwal, J. K. and Nandhakumar, N. (1988). On the computation of motion from
sequences of images-a
review. Proc. IEEE, 76(8):917-934.
Cook, R. 0. and Hamm, C. W. (1979). Fiber optic lever transducer. App. Optics.,
18:3230-3241.
Crawford, A. C. and Fettiplace, R. (1985). The mechanical properties of ciliary
bundles of turtle cochlear hair cells. J. Physiol., 364:359-379.
Dallos, P., Billone, M. C., Durrant, J. D., Wang, C. Y., and Raynor, S. (1972).
Cochlear inner and outer hair cells: Functional differences. Science, 177:356-358.
Davis, H. (1958). A mechano-electrical theory of cochlear action.
Ann. Otol.,
Rhinol. and Laryngol., 67:789-801.
Denk, W., Webb, W. W., and Hudspeth, A. J. (1989). Mechanical properties of
sensory hair bundles are reflected in their brownian motion measured with a laser
differential interferometer. Proc. Natl. Acad. Sci., 86:5371-5375.
Drake, A. W. (1988). Fundamentals of Applied Probability Theory. McGraw-Hill.
Flock, A., Flock, B., and Murray, E. (1977). Studies on the sensory hairs of receptor
cells in the inner ear. Acta. Otolaryngol., 83:85-91.
Freeman, D., Hendrix, D., Shah, D., Fan, L., and Weiss, T. (1993). Effect of lymph
composition on an in vitro preparation of the alligator lizard cochlea. Hearing Res.,
65:83-98.
59
Freeman, D. M. (1990). Anatomical model of the cochlea of the alligator lizard.
Hearing Res., 49:29-38.
Freeman, D. M. and Weiss, T. F. (1990).
Hydrodynamic analysis of a two-
dimensional model for micromechanical resonance of free-standing hair bundles.
Hearing Res., 48:37-68.
Frishkopf, L. S. and DeRosier, D. J. (1983). Mechanical tuning of free-standing
stereociliary bundles and frequency analysis in the alligator lizard cochlea. Hearing
Res., 12:393-404.
Green, D. M. (1976). An Introduction to Hearing. Lawrence Erlbaum Associates,
Hillsdale, New Jersey.
Holton, T. and Hudspeth, A. J. (1983). A micromechanical contribution to cochlear
tuning and tonotopic organization. Science, 222:508-510.
Horn, B. K. and Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17:185-203.
Horn, B. K. and Weldon, Jr., E. (1988). Direct methods for recovering motion.
Internatl. J. of Computer Vision, 2:51-76.
Horn, B. K. P. (1986). Robot Vision. MIT Press.
Howard, J. and Hudspeth, A. J. (1988). Compliance of the hair bundle associated
with gating of mechanoelectrical transduction channels in the bullfrog's saccular
hair cell. Neuron, 1:189-199.
Hudspeth, A. J. and Corey, D. P. (1977). Sensitivity, polarity, and conductance
change in the response of vertebrate hair cells to controlled mechanical stimuli.
Proc. Natl. Acad. Sci. U.S.A., 74:2407-2411.
Inou6, S. (1986). Video Microscopy. Plenum Press.
Janesick, J. R., Elliot, T., Collins, S., Blouke, M. M., and Freeman, J. (1987).
Scientific charge-coupled devices. Optical Engineering, 26(8):692-714.
60
Mulroy, M. J. (1974). Cochlear anatomy of the alligator lizard. Brain, Behavior,
and Evol., 10:69-87.
Mulroy, M. J. and Williams, R. S. (1987). Auditory sterocilia in the alligator lizard.
Hearing Res., 25:11-21.
Patil, P. (1989). A calibrated sensor for measuring microscopic motion. Bachelor's
thesis, Massachusetts Institute of Technology, Cambridge, MA.
Peake, W. T. and Ling, A. L. (1980). Basilar-membrane motion in the alligator
lizard: Its relation to tonotopic organization and frequency selectivity. J. Acoust.
Soc. Am., 67:1736-1745.
Photometrics (1993). Final test report, series 200 camera system. Photometrics,
Ltd., Tucson, Arizona.
Rosowski, J. J., Peake, W. T., Lynch, T. J., Leong, R., and Weiss, T. F. (1985). A
model for signal transmission in an ear having hair cells with free-standing stereocilia: II. macromechanical stage. Hearing Res., 20:139-155.
Ruggero, M. A. and Rich, N. C. (1990).
manufactured
Application of a commerically-
doppler-shift laser velocimeter to the measurement of basilar-
membrane vibration. Hearing Res., 51:215-230.
Weiss, T. F. and Leong, R. (1985). A model for signal transmission in an ear having
hair cells with free-standing stereocilia: III. micromechanical stage. Hearing Res.,
20:157-174.
Zwislocki, J. J. and Cefaratti, L. K. (1989). Tectorial membrane II: stiffness measurements in vivo. Hearing Res., 42:211.
61