Lai_POV_Visual_Audio_hw4

advertisement
Section 1
The research was divided into 4 equally important parts of the projects, Input and
Output of Video and Audio. Each member was in charge of researching and writing up their
own portion of the project. These disciplines will become their exercise for the duration of the
project but there will be times where an overlap of disciplines will occur. Phong was in charge
of input audio, specifically, binaural audio techniques and the write of section 1 as well of the
overview. Ajay was in charge of audio processing and output. Eric C was in charge of video out,
specifically 3d image viewing techniques. Finally Eric P was in charge of 3d video capturing,
techniques and considerations.
Percentage Of
Effort Towards
this assignment
Phong Lai
Eric Chae
Ajay Reddy
Eric Pantaleon
30%
23.33%
23.33%
23.33%
Overview
The objective of this project is to create an immersive environment through of auditory
and visual techniques; to develop a device that incorporates all these techniques into a single
recording device; to create a device that allows post processing and playback of the synced
audio and visual files. The project is wrapped around the idea replicating a person’s point of
view so that the audience can experience the event as the user.
There will be 2 devices. One records 3d video with optical specifications close to that of
the average human sight and records 3d audio with a flat frequency response and range of 20
hertz to 20k hertz (average human frequency range for hearing). The other device will be the
playback device in order to see the 3d video and hear the 3d audio. This device will have
specifications in order to replicate the colors and audio as true as possible. Some post
processing is involved to balance the in perfections of the screen and speakers.
Application for product will be for any situation that will require a live feed in the
perspective of the one using it. Combat, firefighters, police training were the initial applications,
but that model of this product will be of higher ruggedness, optical and auditory specifications.
The base model that we are working on can be used for consumer, professional and enthusiast
film making, and for entertainment purposes.
Section 2
Audio In
In this project we will attempt to create a full immersive environment. Much standard
methods of recording are very directional and reject sounds from the sides and behind the
microphone. Other microphones have a specific frequency response range which compresses
the dynamic range of what the human ear can hear. For this project, we will use a technique
referred to as binaural recording to create an accurate audio representation of what the user is
hearing. Binaural recording techniques are based off the pursuit of capturing audio the way our
ears hear them. It uses the properties off our own ears to localize sounds coming from different
directions and when played back through headphones, our brains will make a connection with
this audio and tell when where the source is coming from. This technique creates space in the
audio track, making 3d audio playback through stereo headphones.
Our project will use 2 omnidirectional elements located in or near the ear to simulate
the spacing between of our ears for accurate panning effects. The in-ear microphones will be
able to use the ear to be better localize sound sources, behind and in front the other user. Each
user will color the track in their unique way, since each person’s ear is not exactly identical as
well as the spacing between the ears. When the track is played back, it is as if they are listening
to a piece of audio very close to how the user heard it. The benefit of using it this way is to
reproduce nuances that would not hear with a directional microphone such as the user
breathing, or the user’s quite footprints will be recorded. Another benefit is that the
microphones are very flat in frequency response whereas directional microphones color the
audio which can explain why it sounds weird hearing one’s voice through a recording. The final
benefit of these microphones is they are more prone to floor vibrations because the body will
damp much of that in the recording. Standard microphone techniques will pick up floor
vibrations and in order to reduce that, the microphone would be installed in a suspended
contraction to absorb vibrations.
The disadvantage with this microphone is because of its advantages. Omnidirectional
microphone is extremely sensitive to everything, nuances like heavy breathing may not desired
in the recording. The binaural technique works extremely well if the microphone is static and
not rotating with the environment. The user will have to avoid moving his/ her head so often
when the device is on. The video capture aspect of this project will solve the localization issue
as the viewer has the ability change audio perspective relative to the position of the camera.
The final obstacle that the group will have to overcome in regards to audio recording is to
develop a way to protect wind from hitting the element, creating a massive amount of clipping
to the digital track. This must be done in such a way that it will not affect the frequency
response or the sensitivity of the microphones.
Another fundamental piece to audio recording is developing a preamp with sufficient
gain without introducing electronic noise to the recording. We may go about creating a preamp
with a clean gain and lowering the preamp of the recording device. The final feature of audio
recording is creating a multiband compressor in the extremely high and extremely low
frequencies so that the listener is not in pain or irritated by the audio. Percussive noises tend to
peak really high and clip the audio track. I believe implementing a compressor before it reaches
the recording device will make the track more pleasing without losing the details we work hard
to capture.
Audio Out
One of the things we will have to do in the course of our project is process recorded
audio before it is played back to the end consumer at a later date. This means that we will have
to deal with the issue of dynamic range compression, which essentially means making loud
sounds quieter and quieter sounds louder. I was concerned about this at first because I thought
it would be a complicated procedure, or something that would have to be done manually for
each recording, but I was pleased to find out that we can simply use a few LTI filters to do this
process.1 That’s very good for us, because it means compression can be automated and should
be pretty straight forward to implement. I therefore do not foresee amplitude compression of
audio being a huge problem for us, since it’s not really revolutionary technology and there are
problem many audio processing programs that we can use to do this sort of thing for us.
One of the things that may be an issue for us, however, is surround sound playback.
Since the traditional means by which to achieve surround sound effects is to use several
speakers placed all around the end user, this is not really a practical approach for us as we will
have to simulate the same effect with a pair of headphones.8 Furthermore, we will have to
simulate sounds coming from closeby and sounds coming from far away. However, even this
problem does not seem insurmountable as there are things we can do with raw audio itself
(adjusting the volume, putting it through an LPF, et cetera) to simulate distance, and better yet,
since we intend to record audio with condenser microphones, it is likely that what the
microphone hears should be fairly close to what an actual person would hear in the same
position.
Unfortunately, I was unable to find a way around the issue of surround sound with
headphones. What the industry uses these days are several speakers in each padded set, placed
in various locations around the user’s ear. By cleverly using this setup and mixing the sound
properly when it is played, one can indeed achieve a pretty good surround sound effect. I
believe that if this solution is good enough for the industry, it should work for us as well.
Another problem I think has been solved for us is the issue of audio processing for 3D,
immersive surround sound. Sound engineering has gotten pretty good in recent years, and
although our goal is a bit different from the normal surround sound setup a gamer or
moviegoer might enjoy, it should not be too hard for a good sound engineer to figure out how
to work with virtual reality headsets. Problems like acoustics, simulating distance, and figuring
out how to make audio recordings as realistic as possible should not be overwhelming if we can
get the right software for the job.
Video In
Humans have stereoscopic vision. This means that we have two viewpoints (our eyes) in
the same plane observing simultaneously. Because of this, humans are able to determine how
far or close something is very quickly, or depth perception. This is very important in viewing and
recording 3D images and videos. There are a couple of depth cues that help humans see three
dimensionally and will help in understanding how to record 3D video.
Convergence, binocular parallax, retinal image size, linear perspective, and overlapping
are all depth cues that can help in making a video become more 3D. Convergence is very
effective in short distances. When an object gets closer and closer towards the viewpoints, your
eyes will start to point inward towards each other. Binocular parallax is the difference of images
between both eyes. You can compare this by closing one eye at a time and comparing the
difference in what you see when each eye is shut. When you open them, the difference in each
image is processed together seamlessly. Retinal image size can be described as comparing the
size of a car you see to the size that you know a real car is. If the car appears to be smaller than
normal, then you know that the car is some distance away. Linear perspective is like looking
into the horizon and seeing the parallel road converge in the distance. Overlapping is when you
see an object in front of another. You can tell one object is closer because you cannot see parts
of the other object behind it. This can also pertain to shadowing depending upon where the
source of light is. When an object casts a shadow, you can tell whether it is close or far, or
behind or in front of you, depending on where the light source is. These cues can help a video
be more 3D.
Now that we know how our eyes work with depth perception, we can determine how to
record in 3D. Camera lenses are already somewhat modeled after the human eye in which it
captures still images and videos the same way humans see them, but only in a 2D perspective.
If the camera lens is modeled after a human eye, then why can we not have 3D images or
videos? The answer is very simple; one camera lens is equivalent to one eye, so two cameras
would be needed to simulate stereoscopic vision! Though the answer is simple, the
implementation may prove to be difficult.
3D images and videos can be achieved by having two cameras set on the same plane
recording simultaneously. This can be done with any camera as long as both cameras are the
same. There are some people who took two Ipod Nanos with the video camera and mounted
them to a metal bar and made a 3D video by syncing the two videos together. This makes the
solution seem really easy, but in actuality a true 3D image is not achieved. This rig did produce a
three dimensional video, but did not capture it the same way they human eyes would. The
lenses were spaced too far apart causing the 3D to be very poor in quality. In order to get the
true 3D video recording, we must model after the human eyes in which each of the lenses are
spaced the same distance as the average distance of the human eyes. Also the video quality
must be top notch in order to have a great user experience. Then the two video files will need
to be synced together according to how the video will be viewed (see video out section).
Video Out
There are several ways to display 3D video and images. Since two images (one for each
eye) are required to create the illusion of 3D, there must be some way to display them
simultaneously to each eye. This can be thought of as analogous to stereo audio, but with eyes
instead. In fact, using two images to create the illusion of 3D is called stereoscopy. The basic
requirement is that each eye gets its own independent visual feed.
The most obvious way to do this is by actually using two physical displays side by side.
This was the method of choice in the early days of stereoscopy. The two images would be
printed side by side and the viewer would simply focus each eye on their respective images.
However, it is actually quite difficult to do this. Both eyes must be focused parallel to each
other, which is rather unnatural and disorienting. The natural way of viewing objects is for both
eyes’ focus to converge at a single point. An alternative method is to switch the left and right
images and require the viewer to be view them cross-eyed. Unfortunately, this does not help
much and still requires a degree of concentration.
One way of making viewing such images more seamless is to use special glasses, called
stereoscopes. These glasses use lenses to converge each image together into one in much the
same way binoculars work. Since each image is different, the illusion of 3D is preserved. With
modern day technology, the glasses and display can be integrated into one by simply having
two head-mounted LCD screens. This method has the benefit of being very immersive as it
tends to block out any external visual stimuli and is the method of choice for virtual reality
machines. Due to this fact, it is one of two possible choices for implementing our project.
The other is a more widely used method where two images are multiplexed into one
display. Most people are familiar with the red/cyan tinted glasses that are often given out in 3D
movie viewings. Each image is tinted cyan or red and each respective eye is given a
corresponding tinted filter so that only one image is presented to each eye. Alternatively, each
image can be polarized differently and the viewer must use polarized glasses to separate the
image into two. Besides multiplexing visually, stereoscopic videos can be time-multiplexed
where each frame alternates between the left and right videos. Special active shutter glasses
block each eye in synchronization with the final video. The main advantage of using this type of
multiplexing is that there is no color tint, no darkening, and the images can be truly
independent from each other with little interference. However, this system is significantly
more complex as it requires a high degree of synchronization between the display and the
glasses. Additionally, the display must be able to render at twice the frame rate (or the video
must sacrifice half the frame rate).
Multiplexing is not as immersive as head-mounted displays, but it is, with the exception
of time-multiplexing, much easier and cheaper to do since it only requires a television/monitor,
which almost everyone has some form of, and very simple glasses. For this project, it may be
beneficial to use multiplexing as a cheap and easy way to test our recording method and move
on to a head-mounted display later to provide the total immersion we desire
References
http://asadl.org/jasa/resource/1/jasman/v22/i6/p801_s1?isAuthorized=no
http://www.maijala.net/panu/papers/in97/
http://ieeexplore.ieee.org/xpl/login.jsp?reload=true&tp=&arnumber=1285854&url=http%3A%
2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1285854
http://www.aaai.org/Papers/AAAI/1999/AAAI99-109.pdf
http://www.soundprofessionals.com/cgi-bin/gold/category/110/mics
https://ccrma.stanford.edu/~jos/fp/Nonlinear_Filter_Example_Dynamic.html
http://electronics.howstuffworks.com/home-theater3.htm
http://www.howtogeek.com/57903/htg-explains-how-does-dynamic-range-compression-work/
http://160.78.24.2/Public/AES-114/00075.pdf
ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=290868
http://decoy.iki.fi/dsound/ambisonic/motherlode/source/poletti_3D%20surround%20system.p
df
http://socialsounddesign.com/questions/10395/how-to-give-a-feeling-of-distance-position
http://www.tomshardware.com/news/Headset-7.1-surround-sound-PC-Gaming-TiamatFacebook,13245.html
http://www.aes.org/e-lib/browse.cfm?elib=7082
http://www.vision3d.com/3views.html
https://pro.sony.com/bbsc/ssr/mkt-digitalcinema/resource.demos.bbsccms-assets-mktdigicinema-demos-digitalcinema3d.shtml
http://www.oculusvr.com/
http://stcroixstudios.com/wilder/anaglyph/whatsanaglyph.html
http://www.dlp.com/projector/dlp-innovations/dlp-link.aspx
http://phys.org/news173082582.html
http://www.vetmed.vt.edu/education/curriculum/vm8054/eye/binocs.htm
http://www.zurb.com/article/394/make-your-own-3d-video-in-three-easy-step
http://www.hitl.washington.edu/scivw/EVE/III.A.1.c.DepthCues.html
Download