>> Hannes Gamper: Okay, good morning, everyone. Welcome... introduce Supreeth Krishna Rao. He's been with us as...

advertisement
>> Hannes Gamper: Okay, good morning, everyone. Welcome to this talk. It's my pleasure to
introduce Supreeth Krishna Rao. He's been with us as a summer intern over the past 12 weeks,
working on ultrasounds. He's a master student at the Worcester Polytechnic Institute of Robotics
Engineering, and without further ado, the floor is yours.
>> Supreeth Krishna Rao: Thank you, Hannes. Very well. Good morning, everybody, the
audio team, Dr. David Heckerman, Mark Heath, Dr. Ivan and Hannes and everyone. So,
basically, today, I will be presenting to you the work that I did with my mentor, Hannes Gamper,
at Microsoft Research, Redmond, Washington, in the last 12 weeks. So we attempted to
basically develop hardware and algorithms for ultrasound Doppler radar, and the primary
objective was basically imaging range and velocity profiles of objects of interest in the field of
view. So let's see in a more formal way what the objective was. So given a system, we wanted
to image and estimate right around the system that is basically a 360-degree horizontal field of
view. In that field of view, we wanted to estimate -- measure and estimate targets' position and
velocity relative to the user, and why would anyone do that? Probably for such situations
wherein, with increasing technology or with more number of disabled people, we need more
intelligent systems to be actively sensing the ever-changing environment for us. And this is one
typical case, wherein a user is happily listening to some music while walking on a busy road.
That's definitely going to come in the very close future, and he need not really worry about the
vehicles approaching him, because such a system aims to warn him about these targets. So the
underlying principle for this system was basically Doppler effect and time of flight, and just to
give you a quick illustration of what Doppler effect really captures is basically when you actually
transmit a signal, there is a relative -- there is actually a change in frequency when there is
relative motion between the source and the receiver. So, in this case, the source would be the
user. The object would be probably the car or the target, and again, the receiver is present with
the user. So let's take a look at why we chose ultrasound to achieve this task, whys and why
nots. And, firstly, ultrasound is very low power consuming, and since it's not very high
frequency -- it's about 40 kilohertz -- high-frequency components are not pumped into the circuit,
so this is basically why the electronics is really cheap when you're designing an ultrasound
system. And small factor -- small form factor is a great advantage, because it offers a great
choice for mobile devices, because as you can see in the latest slides, a transducer, an ultrasound
transducer, is millimeters. It's really small, so even after using probably an array of transducers,
the form factor will not have really blown up. And most important thing, again, is it's outside
human perceptible range, so if the device is being used by the user, it won't really interfere with
his normal day-to-day activities. Well, works very well in indoor, outdoor, well-lit and you don't
need light. So it's as against any other, for example, cameras or something. So it doesn't need
illumination. It's active sensing, and it works equally well indoor and outdoor. And array signal
processing, basically, can be leveraged to get a 360-degree field of view, which we have in this
system. And the responses of these transducers and receivers are not super directional, so we
need beamforming to really improve the spatial angular resolution of our sensing, and we can
leverage active time of flight sensing. That is basically how the transducers work, to get a range
or depth estimate. And again, this is not really available in passive sensing devices, like
cameras. And to cap it all, basically, we attempt to use Doppler effect to get even velocity
profiles of targets in the field of view, and this is every frame, so this is not across several
frames. So every frame, we have an estimate of the velcotiy and the distance. That was the
attempt. And so immediate question to ask, again, is why not use cameras? Well, we don't get
depth information, and also we need illumination, because it's not active sensing, and okay, we
get depth information using stereo cameras, but why not that? So they require global shutter, and
this often shoots up the price of the cameras and the entire setup, and stereo vision cameras
require precise alignment and calibration. Firstly, this is really time consuming, and even after
sufficient calibration, the depth resolution or the depth accuracy is not really fantastic. So one
might say to avoid these calibration issues, why not use a commercial stereo camera setup, like
Bumblebee or Asus? They actually require very computationally intensive processing
algorithms, like sum of absolute differences or sum of squared differences, and these often
necessitate a GPU, and that again shoots up the power requirements and the cost. And again, no
360-degree field of view. Well, to get 360-degree field of view, we can use LIDARs, but they're
expensive and power hungry. How about Kinect? Kinect manages to solve many of these
issues, apart from 360-degree field of view, but 36 watts power consumption, and it requires a
fan to keep it from overheating. Keeping all of these in mind, we decided to use ultrasound, and
we'll see more about that. So we actually have to be objective and take a look at the other side of
the coin, also, so limitations and challenges of ultrasound. I guess the list is pretty long
compared to the previous slide, so challenges are more than -- a lot more. Firstly, calibration is
still required, and it often demands anechoic conditions. An off-the-shelf researcher who wants
to use such an array or such transducers can't really get access to costly anechoic chambers.
That's still there. And sensor responses are basically very wide angle, and this kind of affects the
special angular resolution, so that necessitates a lot of signal processing. And frequency
responses are, again, frequency and temperature dependent, because these were transducers.
Reflections are one more challenge, because you have multipath, multiple and specular
reflections. Again, that necessitates a lot of signal processing. So we are moving closer and
closer to some major issues that we experienced, so basically maintaining good signal tonoise
ratio, achieving pratical frame rates and making sure that the device operates over a wide range,
so some of the constraints that control these parameters are basically power, pulse width and
attenuation of ultrasound in air. So power basically is ultimately encasing the power would
increase the distance and the range, but that would again result in overheating, so one can't really
crank up the power after some point of time, beyond a point. And pulse width improves SNR,
but again, kills both frame rate and range, and of course, we have more than 3 dB per meter of
attenuation of ultrasound signals in the air above 100 kilohertz. So, basically, it calls for some
sort of an optimization between all of these. So these are basically the questions at hand, how to
increase SNR. Actually, good directivity, high frame rate and preent overheating. And again,
estimate velocity in a single shot. That has not really been done.
>>: So [indiscernible] concept flying around, down in like [indiscernible] bats, basically, so
what's their range?
>>: Bats, they are in the same range, at 40, 45 kilohertz, and can go five-ish, 10 meters max.
>>: Ten meters. That's pretty good.
>>: We went further than that.
>>: I see. Okay.
>> Supreeth Krishna Rao: So I'll just walk you through the organization of this stock very
quickly, so we dive into -- we present a literature review and then take a look at the problem
formulation and then give us more background about Doppler effect and how we can use it for
signal design and estimation, and then we will conclude the talk with presenting the current
approach, progress to date and results. So one can actually notice that right starting from early
1950s to 2012, 2013 and now 2015, researchers have invested a lot of time and resources on
trying to make use of ultrasound, so it basically tells a story, because man has not yet rejected
ultrasound, because for all these years, continuous progress has been happening and more and
more applications have been coming up. More recently, you can see that ultrasound imaging
was actually considered for HCI tasks like gesture recognition, activity, speaker, gender, age,
gait recognition, all of these smart-home environments. So ultrasound has stayed with us for a
long time, and micro-Doppler signatures, these are particularly interesting applications wherein
you basically -- so every small part of a moving object results in a Doppler shift, and that is quite
unique. That is what the researchers have reported and probably feeding the signatures into a
deep neural network can help us learn, learn the signatures and do a sound recognition. And
from a range detection perspective, again, work has been happening since 1997, until 2014. And
in fact, the last one that you see here, that was the prototype developed by the previous intern,
and we made use of it sufficiently for our preliminary evaluation and testing, and this was again
carried out under the same group under Dr. Ivan Tashev. So let us take a look at the problem
formulation that is the Doppler radar. Like I mentioned before, this illustrates the concept of
Doppler radar. To get more technical, to be more formal, received signal is basically stretched
and compressed when the target is actually approaching or receding, and ultimately, this is what
we aim to measure. So typically, a tone or a pulse or a chirp is emitted to ensonify the
surroundings, and we make use of a chirp signal that goes from 38 kilohertz to 42 kilohertz, and
reflection of a moving target at a velocity is basically represented by SR of T, and you can see
that the signal is actually stretched and delayed, so the stretch gives us an estimate of the velocity
of the object. Delay gives us definitely the distance, basically, the time of flight, and these
stretch factors are calculated as a ratio of C plus V by C minus V, where velocity's direction
matters, the plus or minus, and time of delay is basically the time required by the ultrasound to
reach the required range and get back. So, essentially, we are trying to estimate these two
factors. That is, the stretch factor and the time delay, to determine the range and velocity. So
particularly from the literature review, this one particular paper gave us some insights about
signal design and estimation, so they proposed something called as the wide-band crossambiguity function, which basically is a 2D coupling. It's basically a 2D representation that
couples the delay and stretch factors. So, effectively, as you can see there, it's basically a crosscorrelation between the received signal and time delayed and stretched version of the transmitted
signal. And some points to be noted is basically for all practical purposes, the integral is over
minus T to plus T by 2, basically, because our system is band limited. So we don't need to do
that, do the overall integral. And like I mentioned, it's basically a 2D representation of the
correlation, and the ultimate task is to estimate these two coordinates in that 2D representation
here. So some insights from the same paper about signal design. We see that LFM stands out, as
it gives better resolution for estimating the two parameters. This is the Doppler stretched domain
that is the time delay domain, and you can see that the Guassian signal would be very, very
ambiguous, as against LFM. Some more results in favor of LFM from the literature, we see that
we get better resolution using the LFM. So we implemented the signal, basically, that you see
here the time-domain version, the frequency domain. Basically, the frequency sweeps from 38
kilohertz to 42 kilohertz, and the pulse width is about five milliseconds. Some of the parameters
that we considered while designing the signal to be transmitted is basically we are sampling at
192 kilohertz. Our center frequency is about 40 kilohertz, because our transducer's resonant
frequency is basically around that range, around 40 kilohertz. Max range we are considering is
about nine meters, currently. This can be changed, and pulse width is about five milliseconds.
So one interesting, interesting thing to note here is the sequence width here. It's 10,076 samples
that basically these are the samples that are -- that represent the duration the sound takes to
basically do a round trip the max range. And sequence width basically decides the max range,
and all of this is basically dependent the speed of sound. So taking a look at the implemented
pulse train that we are pulsing out from the -- thank you.
>>: Does it work?
>> Supreeth Krishna Rao: It kind of does. You'll see the results. Thank you.
>>: I can't wait for the answer.
>> Supreeth Krishna Rao: So looking at the pulse being transmitted, so this is basically -- this
represents one block. We have eight channels, and basically eight channels of transmitters and
receivers, and 20 frames, so one, two. That goes up to 20. And we make use of PlayGraph to
basically interface with the soundcard. We generate the signal in MATLAB. We interface with
the device using PlayGraph to play and record the signals. So going into some more detail into
the estimation framework that is the wideband cross-ambiguity function, I just want to remind
you that is basically a 2D representation of the transmitted and the received signals. So if you
can see here, so this is actually the stretch factor, like I mentioned before. We decided to go for
the following stretch factors based on these constraints. We decided to basically measure
velocities of objects from plus or minus 20 milliseconds -- meters per second. Extremely sorry
for that. That is about 40 miles per hour, and directly deriving from that formula, we get the
stretch factors, and with regards to the number of stretches, that basically decides the resolution
of this ambiguity function that you saw. So for a sample -- for example, for the number of
stretch factors of about 31, this is illustration of the stretched LFM. You can see that across 31
stretches, it is stretched in time, and implementing the ambiguity function and testing it for LFM
for the idela case, that is zero Doppler shift and zero delay, this is the response that we got, and
as you can see, it's pretty high resolution in this domain. That is in the stretch factor domain.
And we tested out initial algorithms on the previous prototype that was available and built in this
group, and this is how it looks. Basically, you have an array of transducers, microphones, and
testing on this setup gave us our first slide. That is some results. As you can see, the plot to the
left is basically the cross-correlation. That gives us the measurement of the distance, so you can
see that there is a strong peak at around 0.6 meters, and you can exactly see that this cross here,
which represents the maxima of this function, is again at around that range. That is 0.6 meters,
and one more thing to be noted is the speed. Since this is both a single-shot estimation of both
the velocity and the delay, the range, so the speed is more to the negative side, because the object
was really approaching the setup, in this case. So, basically, what I mean is, let us assume that -well, this is actually the setup that we were considering, so there was a Kinect facing the setup,
and this plane surface was moved backward and forward in one dimension, across this
dimension, and this represents the depth map of the Kinect. So we just measured the -- we
monitored the depth value at the center of an ROA that represented our object, and that's how we
got this plot. So if you can see, the overlaid data of the Kinect depth estimate and our system's
depth report, reported by our system, you can see that the slope, which represents the velocity, is
quite consistent. Of course, there is an offset, because there was -- by us, in the arrangement
between the Kinect and our setup. That can be corrected for. So diving into the hardware
design, basically, we have these are electric transducers that we use, and this is how the
prototype looks. As you can see, it's basically a low-form-factor device of about 50 millimeter
diameter and height of about 100 millimeters, 110 millimeters, and we have the receiver array,
the transducer array here. So taking a look at the system architecture, we have this basically is
the system architecture. We have our microphones. We have preamplifiers. We make use of
RME to basically mix the signals that are sent and received, and it's interfaced to the system, to
the computer, through a Firewire, and that is going into, again, a preamplifier, and there is our
speaker array. So this looks more intuitive than the previous one, than the previous slide, so
basically this is our test and calibration setup. We have our device, and for calibration, we made
use of a B&K microphone, and as you can see, this is the previous prototype that was used. So
taking a quick look at the IRGUI, that we made use of that was developed by Dr. Mark Thomas.
So we make use of certain regions of this GUI to actually send a test signal at around 40
kilohertz, and then we measure the responses. We update the gains, and this is basically the
impulse response recorded and the magnitude response to the right. So how was this actually
done? So to calibrate our speakers, we actually kept this microphone at one-meter distance, and
we pinged through all of our spekers, and the responses recorded here were basically what you
saw for one channel, that is, and to calibrate the receiver setup, we used a transducer from this
setup. We again pinged, and then we measured the responses using our receivers, and we
recorded the impulse responses. So these are some of the directivity patterns that we observed
from the measurements for the transmitter. They don't look really directional, do they?
>>: They are directional, but not uniform.
>> Supreeth Krishna Rao: Yes, so although they are directional, they are not really directional
for about 15 degrees, plus or minus 7.5 degrees, which we are looking at. And you can see the
transmitter frequency responses. One more challenge that we face -- by the way, this is after
calibration. You can see that they're not very well matched, so this is something that kind of
created some delay in our thing, in our progress, but of course you can see that the resident
frequency is somewhere close to 40 kilohertz.
>>: What area is between 45 and 50 kilohertz? This is where they match in their plot out of the
rays analysis.
>> Supreeth Krishna Rao: Yes. So the receiver directivity pattern looks something like this.
Again, they're quite directional, better than our transmitter mod uniform of the one side.
>>: This is due to the separating from the [indiscernible], right?
>>: Faster than the natural directivity of the microphones itself. They're omni, all your
frequencies, but we're talking even just from the can that it's made from is going to have
consequences.
>> Supreeth Krishna Rao: So one can note that the receiver -- so this is actually the transmitter
that we used from the previous configuration developed by the previous intern, Ivan Dokmanic,
and these are the receiver responses. So they are quite well matched. That is pretty good, better
than this. So all this necessitates beamforming. That's the long story short. All this necessitates
beamforming, because we need better spatial angular resolution, and we made use of the BFGUI
tool, developed here, again, by Dr. Mark Thomas, at the Audio and Acoustics Research Group at
MSR. And this is basically used for generating the beamforming weights. For example, this
represents the method we used for beamforming, MVDR closed form, and this basically shows
the array setup for our speakers, and this is basically the measured data, basically the dimensions.
So if we can see the directivity pattern, and this is exported and stored for later use, which we
will see, and the same goes with receiver beamforming weights. We used the MVDR closed
form. We evaluated MVDR closed form method, and this is what it looks like.
>>: You used omnidirectional directivity pattern on this?
>> Supreeth Krishna Rao: Yes, we used omnidirectional, so that made us go for some other
method of beamforming that we'll see in the near future, because firstly, our setup was -- our
transducers were not really omnidirectional. So these are the results from the previous two GUI
slides that you saw. This is for the receiver. It looks pretty good, but it still has a lot of side
lobes that really catch many refelctions, and with respect to transmitter beamforming, I don't
need to say much. It's not really -- it's not very great. So all this actually made us choose a
different beamforming weights estimation framework that we delivered through numerical
estimation. We had our acoustic channel impulse responses, and this was actually the desired
beam pattern that we were targeting for the receiver and for the speaker. And we actually did
numerical estimation. We basically fit the curves to get this, so we are forcing ->>: This is beam pattern synthesis, right?
>>: Based on the measured data, rather than ->> Supreeth Krishna Rao: Yes.
>>: Okay.
>> Supreeth Krishna Rao: So, as you consume see, the MVDR is represented by green. The
desired is blue, and curve fitted response is in the red, so it is quite evident that the red ones
perform much better than the green ones. That is MVDR classical estimation of the weights.
Same goes with the receiver responses. Pretty good. The red ones perform pretty well, and these
are all the 24 beams, by the way. So for beamforming, we are making use of 24 beams rotating
at step size of 15 degrees, from 0 to 345 degrees, and these are all the 24 beams that you see. So
taking a look at the flow chart of this entire system, this is basically a recap of what we saw. We
have the IRGUI ttool. We generated the acoustic channel responses. We get the -- we design
and generate the desired pattern. I should probably mention a little bit about that. So this was
basically the signal, like I mentioned, that generates this pattern. So we generate that signal, that
pattern, and then we take the pseudo -- it's basically a duplication of the pseudo-inverse of the
acoustic channel matrix. That is basically H, which is basically acoustic channel response and
the desired pattern, so this gives us a curve fitting, and ultimately, we arrive at the beamforming
weights for this particular recording. So if we do another recording -- another measurement of
the acoustic channel responses, we need to regenerate the weights. So once we get the
beamforming weights, let's again traverse from here. We generate the eight-channel 20-block
signals in MATLAB. We played through PlayGraph,and then we segment the received signal
into a matrix, a 4D matrix of it's basically in the frequency. We convert it to frequency domain,
and so NFFD, number of mics, that is eight. Number of speakers, that is eight. Number of
blocks, that is 20 in our case, so it's basically a 4D matrix. And we perform band pass filtering
on the received signal to basically reject all the noise and signals beyond our frequency range of
interest, and after that, we do beamforming for the transmitters, so we have the weights from
here, the received signal, and then we basically do offline beamforming. This was something
that was observed in the previous internship, basically, where the previous intern, he observed
that leveraging the LTI, linear time invariance, nature of the system, he basically suggested that
transmitter beamforming also can be performed offline, which significantly improves the frame
rate. And then, we perform microphone beamforming, as you saw before, and then we perform a
match filtering to basically again get rid of unnecessary reflections and noise, basically. We
perform a background reflection, which you will be seeing in the next slide. So some
preliminary results from these new devices, something that looks something like this. The raw
map of 360 degrees, across 24 beams, and we do postprocessing that is to remove the noise by
using a naive approach, which is basically background reflection. So you can see that -- I'll play
a quick video here. So this was walking detection test to see if the system really detects a person
approaching and slowly walking past the device. Exactly. Thank you, Hannes. Among so much
clutter, so basically, you're -- our environment was quite cluttered. Ideally, this should have been
done in the anechoic chamber, but we received the device towards pretty much the end of the
internship, when about three to four weeks were left, and there were heat sink issues and we had
to install a heat sink, so the device was actually going back and forth from the hardware lab to
us. So with all this -- so that kind of didn't let us move the entire setup to an anechoic chamber
and make better measurements. Well, amidst all this clutter, there is some happy news still. So
you can see that, for the testing, the walking testing, this was the response. This was the
response for the walking detection. You can see. Of course, some of the postprocessing needs to
be done. We are currently just using live background subtraction. We can make use of particle
filtering [indiscernible] filtering or any confidence-based tracking approach to basically fix the
object of interest.
>>: Actually, I think it would be interesting to see the video before background subtraction.
>> Supreeth Krishna Rao: Oh, right, right. Do you want me to play the video again?
>>: Before background subtraction, the raw video, as well.
>> Supreeth Krishna Rao: Right.
>>: It's right there.
>> Supreeth Krishna Rao: So this is actually how it looks. Before background subtraction, you
can notice that there are responses all over. Play that again. So there are responses pretty much
everywhere, so finding a major source of reflection is pretty hard, especially given our cluttered
environments, like as you saw here, pretty much several objects right around the device are at the
same distance, so that is why we get -- that's why we get ->>: So this, you're calling it a naive form of background subtraction. It's only working because
the sensor is fixed, correct?
>> Supreeth Krishna Rao: Absolutely, absolutely.
>>: So if the sensor were moving, you would need much more sophisticated.
>> Supreeth Krishna Rao: Yes, we would need a tracking, probably an optical floor, a tracking
or particle filtering or something like that.
>>: So you say reflections here ensure the positions in meters. What about the speed? Can you
detect the speed of the moving object and use this for the help?
>> Supreeth Krishna Rao: Speed? Could you please come again?
>>: The speed of the moving object, so from the videos we saw, it's just reflections, which is
distance, the time delay.
>> Supreeth Krishna Rao: Yes.
>>: How well does speed detection work?
>> Supreeth Krishna Rao: So we were battling down issues with the device, until yesterday, so
we probably in a couple of days we might be able to really find out how the ambiguity function
behaves on this new device.
>>: Because, technically, another best criteria for filtering is, okay, everything. Subtract
everything that doesn't move and leave only the moving objects which have speed above a
certain threshold, and that will provide a much cleaner image, eventually.
>> Supreeth Krishna Rao: Yes, that's exactly what radars make use of, because they have a
threshold velocity ranges, and any object that is not moving in that range of velocity is rejected
as a noise, so that's what even the radars do. Yes, so all the post -- a lot of postprocessing
remains. It is yet to be done, and we'll figure out soon how the ambiguity function works in
probably a couple of days.
>>: You have one more day.
>> Supreeth Krishna Rao: Yes, not a couple. Yes. So it says here thanks to my beloved mentor,
Hannes Gamper. He helped me a lot, and none of this would have really happened without his
guidance and support, and Dr. Mark Thomas, he gave us several insights during the project to
debug several issues with the hardware, for the algorithms. The GUI tools that we saw, the
calibration, the impulse response recording, the beamforming tools, all of them were developed
by Dr. Mark. And thank you, Dr. Ivan, for your continuous encouragement and ever-smiling
approach, and sincere thanks to Dr. David Heckerman. Basically, this project came into being
because of him, so he wanted this device to be prototyped and tested and the algorithms to be
built, so thank you so much. And especially the hardware lab members, Alex Ching and Jason
Goldstein, they really helped us through in and out issues of hardware right from development,
testing, all of those. That's about it. Thank you so much. Any questions?
>>: So back to the beamformer synthesis for the loud speaker array and the microphone array.
Did you guys use the phase information? Did you get any phase information in order to mask
complete noise?
>>: Yes, so the beam matching actually uses both phase and magnitude. But the question is
whether we should perhaps ignore the phase in some regions rather than use it for the whole 360
degrees.
>>: I mean the phase response of the transducers.
>>: Yes.
>>: Another note here is that technically, what is interesting to see is the multiplication of those
two. This is the joint directivity pattern of the transmitters and the receivers, the receiver beam,
which actually brings another interesting idea. Can we do a joint synthesis of the transmitting
and receiving beamformers in a way to maximize the directivity? We don't have that if we have
a big sidewalk of the transmitter, if we do have a lot there from the receiver.
>>: There was a time when we were deciding what the geometry would be that we would maybe
even interlace, have transmitter, receiver, transmitter, receiver, and arrange them in a way so that
we could try to control all these aliasing artifacts. It seemed easier to create two independent
arrays, because the analysis of those are very much more straightforward. And the other
advantage is that by keeping them separate, the microphones could be crammed into 50
millimeters diameter, which ->>: You've got a substantially lower aliasing from the microphone array than on the loudspeaker
array.
>>: Another thing here is, of course, so we actually measured the impulse responses in the setup
he was showing with truncation of the impulse response, obviously, going to that. Anechoic
chamber might help for calibration purposes, and if we were to move away from the resonance,
then perhaps we would also get ->>: So pretty much, from what I saw, the calibration in your work ranges slightly above the
resource frequency, 44 to 50 kilohertz. You equalize the transmitters, you [indiscernible]. The
response is far here. This is your work area. It's not from that. It's here.
>>: Yes.
>> Supreeth Krishna Rao: Yes, we'll definitely take that into account and retest, recalibrate.
Any more questions?
>>: So I guess I didn't quite understand what is the spatial resolution of objects that you can
detect? Are we talking mostly about cars and people and walls or anything smaller than that?
>> Supreeth Krishna Rao: Can you increase the volume?
>>: Sure. How is the spatial resolution?
>> Supreeth Krishna Rao: Oh, you mean ->>: What size objects can you detect? Do you have to look at something the size of a human
and larger, or can you look at smaller objects?
>> Supreeth Krishna Rao: So several factors come into play, like the range at which the objects
are. That then decides the resolution, and a direct straightforward answer, a frank answer, is we
still don't know, because we got the device just about 2.5, three weeks back, and to get the device
set up, calibrated and get some initial results itself is quite a challenge. But I'm assuming -- so at
least the map resolution is 15 degrees, as you saw. And with regards of how many bins of those
maps a real-world object occupies, we still need to figure that out. We still need to test that.
>>: So a planned 122 kilohertz, one sample times two [indiscernible] is 1.8. Do you record this
as your resolution in this particular [indiscernible]?
>>: Forty-five kilohertz is about eight [indiscernible].
>>: But bats do it with 45 kilohertz chirps. They can capture insects, so it's possible. By the
way, the brain of a bat is not that much more computationally powerful than a modern computer.
>>: The advantage, I guess, the bat has is that both the bat and the insect are suspended in air.
>>: No reflections, yeah.
>>: And we want to do the same thing.
>>: Well, if you attach the device to the drone.
>> Supreeth Krishna Rao: Well, the drone itself might induce some vibrations.
>>: True. The [indiscernible] to the drone. We could certainly take it outside. This was a setup
that was necessitated by mostly time constraints, but if we were to take this outside and repeat
the experiments there, I think we would get rid of a lot reflections and then would get perhaps a
better idea of how this might perform when there's less.
>> Supreeth Krishna Rao: Yes, and if it's an open space, we would definitely not get such
responses, completely -- there's pretty much an object right around the system at very close
ranges, and the problem is the human being is just one of these at the same range. And also, the
signal-to-noise ratio is not very great in such a cluttered environment. You don't know whether
the human being is reflecting or it's hitting the human being, hitting a closer object and then
coming back. By then, the human being might have moved past.
>>: You don't need even need open space. You might choose the atrium, which is enough large
basically to be equivalent of open space except the floor.
>> Supreeth Krishna Rao: Right.
>>: Okay, if there are no more questions ->>: Very good work. You made a progress in that direction.
>> Supreeth Krishna Rao: Thank you.
>>: Thanks for building on top of Ivan's work, and hopefully next year another intern will start
to push this further.
>> Supreeth Krishna Rao: Right, sir. Okay.
>>: Thank you.
Download