Section 1 The research was divided into 4 equally important parts of the projects, Input and Output of Video and Audio. Each member was in charge of researching and writing up their own portion of the project. These disciplines will become their exercise for the duration of the project but there will be times where an overlap of disciplines will occur. Phong was in charge of input audio, specifically, binaural audio techniques and the write of section 1 as well of the overview. Ajay was in charge of audio processing and output. Eric C was in charge of video out, specifically 3d image viewing techniques. Finally Eric P was in charge of 3d video capturing, techniques and considerations. Percentage Of Effort Towards this assignment Phong Lai Eric Chae Ajay Reddy Eric Pantaleon 30% 23.33% 23.33% 23.33% Overview The objective of this project is to create an immersive environment through of auditory and visual techniques; to develop a device that incorporates all these techniques into a single recording device; to create a device that allows post processing and playback of the synced audio and visual files. The project is wrapped around the idea replicating a person’s point of view so that the audience can experience the event as the user. There will be 2 devices. One records 3d video with optical specifications close to that of the average human sight and records 3d audio with a flat frequency response and range of 20 hertz to 20k hertz (average human frequency range for hearing). The other device will be the playback device in order to see the 3d video and hear the 3d audio. This device will have specifications in order to replicate the colors and audio as true as possible. Some post processing is involved to balance the in perfections of the screen and speakers. Application for product will be for any situation that will require a live feed in the perspective of the one using it. Combat, firefighters, police training were the initial applications, but that model of this product will be of higher ruggedness, optical and auditory specifications. The base model that we are working on can be used for consumer, professional and enthusiast film making, and for entertainment purposes. Section 2 Audio In In this project we will attempt to create a full immersive environment. Much standard methods of recording are very directional and reject sounds from the sides and behind the microphone. Other microphones have a specific frequency response range which compresses the dynamic range of what the human ear can hear. For this project, we will use a technique referred to as binaural recording to create an accurate audio representation of what the user is hearing. Binaural recording techniques are based off the pursuit of capturing audio the way our ears hear them. It uses the properties off our own ears to localize sounds coming from different directions and when played back through headphones, our brains will make a connection with this audio and tell when where the source is coming from. This technique creates space in the audio track, making 3d audio playback through stereo headphones. Our project will use 2 omnidirectional elements located in or near the ear to simulate the spacing between of our ears for accurate panning effects. The in-ear microphones will be able to use the ear to be better localize sound sources, behind and in front the other user. Each user will color the track in their unique way, since each person’s ear is not exactly identical as well as the spacing between the ears. When the track is played back, it is as if they are listening to a piece of audio very close to how the user heard it. The benefit of using it this way is to reproduce nuances that would not hear with a directional microphone such as the user breathing, or the user’s quite footprints will be recorded. Another benefit is that the microphones are very flat in frequency response whereas directional microphones color the audio which can explain why it sounds weird hearing one’s voice through a recording. The final benefit of these microphones is they are more prone to floor vibrations because the body will damp much of that in the recording. Standard microphone techniques will pick up floor vibrations and in order to reduce that, the microphone would be installed in a suspended contraction to absorb vibrations. The disadvantage with this microphone is because of its advantages. Omnidirectional microphone is extremely sensitive to everything, nuances like heavy breathing may not desired in the recording. The binaural technique works extremely well if the microphone is static and not rotating with the environment. The user will have to avoid moving his/ her head so often when the device is on. The video capture aspect of this project will solve the localization issue as the viewer has the ability change audio perspective relative to the position of the camera. The final obstacle that the group will have to overcome in regards to audio recording is to develop a way to protect wind from hitting the element, creating a massive amount of clipping to the digital track. This must be done in such a way that it will not affect the frequency response or the sensitivity of the microphones. Another fundamental piece to audio recording is developing a preamp with sufficient gain without introducing electronic noise to the recording. We may go about creating a preamp with a clean gain and lowering the preamp of the recording device. The final feature of audio recording is creating a multiband compressor in the extremely high and extremely low frequencies so that the listener is not in pain or irritated by the audio. Percussive noises tend to peak really high and clip the audio track. I believe implementing a compressor before it reaches the recording device will make the track more pleasing without losing the details we work hard to capture. Audio Out One of the things we will have to do in the course of our project is process recorded audio before it is played back to the end consumer at a later date. This means that we will have to deal with the issue of dynamic range compression, which essentially means making loud sounds quieter and quieter sounds louder. I was concerned about this at first because I thought it would be a complicated procedure, or something that would have to be done manually for each recording, but I was pleased to find out that we can simply use a few LTI filters to do this process.1 That’s very good for us, because it means compression can be automated and should be pretty straight forward to implement. I therefore do not foresee amplitude compression of audio being a huge problem for us, since it’s not really revolutionary technology and there are problem many audio processing programs that we can use to do this sort of thing for us. One of the things that may be an issue for us, however, is surround sound playback. Since the traditional means by which to achieve surround sound effects is to use several speakers placed all around the end user, this is not really a practical approach for us as we will have to simulate the same effect with a pair of headphones.8 Furthermore, we will have to simulate sounds coming from closeby and sounds coming from far away. However, even this problem does not seem insurmountable as there are things we can do with raw audio itself (adjusting the volume, putting it through an LPF, et cetera) to simulate distance, and better yet, since we intend to record audio with condenser microphones, it is likely that what the microphone hears should be fairly close to what an actual person would hear in the same position. Unfortunately, I was unable to find a way around the issue of surround sound with headphones. What the industry uses these days are several speakers in each padded set, placed in various locations around the user’s ear. By cleverly using this setup and mixing the sound properly when it is played, one can indeed achieve a pretty good surround sound effect. I believe that if this solution is good enough for the industry, it should work for us as well. Another problem I think has been solved for us is the issue of audio processing for 3D, immersive surround sound. Sound engineering has gotten pretty good in recent years, and although our goal is a bit different from the normal surround sound setup a gamer or moviegoer might enjoy, it should not be too hard for a good sound engineer to figure out how to work with virtual reality headsets. Problems like acoustics, simulating distance, and figuring out how to make audio recordings as realistic as possible should not be overwhelming if we can get the right software for the job. Video In Humans have stereoscopic vision. This means that we have two viewpoints (our eyes) in the same plane observing simultaneously. Because of this, humans are able to determine how far or close something is very quickly, or depth perception. This is very important in viewing and recording 3D images and videos. There are a couple of depth cues that help humans see three dimensionally and will help in understanding how to record 3D video. Convergence, binocular parallax, retinal image size, linear perspective, and overlapping are all depth cues that can help in making a video become more 3D. Convergence is very effective in short distances. When an object gets closer and closer towards the viewpoints, your eyes will start to point inward towards each other. Binocular parallax is the difference of images between both eyes. You can compare this by closing one eye at a time and comparing the difference in what you see when each eye is shut. When you open them, the difference in each image is processed together seamlessly. Retinal image size can be described as comparing the size of a car you see to the size that you know a real car is. If the car appears to be smaller than normal, then you know that the car is some distance away. Linear perspective is like looking into the horizon and seeing the parallel road converge in the distance. Overlapping is when you see an object in front of another. You can tell one object is closer because you cannot see parts of the other object behind it. This can also pertain to shadowing depending upon where the source of light is. When an object casts a shadow, you can tell whether it is close or far, or behind or in front of you, depending on where the light source is. These cues can help a video be more 3D. Now that we know how our eyes work with depth perception, we can determine how to record in 3D. Camera lenses are already somewhat modeled after the human eye in which it captures still images and videos the same way humans see them, but only in a 2D perspective. If the camera lens is modeled after a human eye, then why can we not have 3D images or videos? The answer is very simple; one camera lens is equivalent to one eye, so two cameras would be needed to simulate stereoscopic vision! Though the answer is simple, the implementation may prove to be difficult. 3D images and videos can be achieved by having two cameras set on the same plane recording simultaneously. This can be done with any camera as long as both cameras are the same. There are some people who took two Ipod Nanos with the video camera and mounted them to a metal bar and made a 3D video by syncing the two videos together. This makes the solution seem really easy, but in actuality a true 3D image is not achieved. This rig did produce a three dimensional video, but did not capture it the same way they human eyes would. The lenses were spaced too far apart causing the 3D to be very poor in quality. In order to get the true 3D video recording, we must model after the human eyes in which each of the lenses are spaced the same distance as the average distance of the human eyes. Also the video quality must be top notch in order to have a great user experience. Then the two video files will need to be synced together according to how the video will be viewed (see video out section). Video Out There are several ways to display 3D video and images. Since two images (one for each eye) are required to create the illusion of 3D, there must be some way to display them simultaneously to each eye. This can be thought of as analogous to stereo audio, but with eyes instead. In fact, using two images to create the illusion of 3D is called stereoscopy. The basic requirement is that each eye gets its own independent visual feed. The most obvious way to do this is by actually using two physical displays side by side. This was the method of choice in the early days of stereoscopy. The two images would be printed side by side and the viewer would simply focus each eye on their respective images. However, it is actually quite difficult to do this. Both eyes must be focused parallel to each other, which is rather unnatural and disorienting. The natural way of viewing objects is for both eyes’ focus to converge at a single point. An alternative method is to switch the left and right images and require the viewer to be view them cross-eyed. Unfortunately, this does not help much and still requires a degree of concentration. One way of making viewing such images more seamless is to use special glasses, called stereoscopes. These glasses use lenses to converge each image together into one in much the same way binoculars work. Since each image is different, the illusion of 3D is preserved. With modern day technology, the glasses and display can be integrated into one by simply having two head-mounted LCD screens. This method has the benefit of being very immersive as it tends to block out any external visual stimuli and is the method of choice for virtual reality machines. Due to this fact, it is one of two possible choices for implementing our project. The other is a more widely used method where two images are multiplexed into one display. Most people are familiar with the red/cyan tinted glasses that are often given out in 3D movie viewings. Each image is tinted cyan or red and each respective eye is given a corresponding tinted filter so that only one image is presented to each eye. Alternatively, each image can be polarized differently and the viewer must use polarized glasses to separate the image into two. Besides multiplexing visually, stereoscopic videos can be time-multiplexed where each frame alternates between the left and right videos. Special active shutter glasses block each eye in synchronization with the final video. The main advantage of using this type of multiplexing is that there is no color tint, no darkening, and the images can be truly independent from each other with little interference. However, this system is significantly more complex as it requires a high degree of synchronization between the display and the glasses. Additionally, the display must be able to render at twice the frame rate (or the video must sacrifice half the frame rate). Multiplexing is not as immersive as head-mounted displays, but it is, with the exception of time-multiplexing, much easier and cheaper to do since it only requires a television/monitor, which almost everyone has some form of, and very simple glasses. For this project, it may be beneficial to use multiplexing as a cheap and easy way to test our recording method and move on to a head-mounted display later to provide the total immersion we desire References http://asadl.org/jasa/resource/1/jasman/v22/i6/p801_s1?isAuthorized=no http://www.maijala.net/panu/papers/in97/ http://ieeexplore.ieee.org/xpl/login.jsp?reload=true&tp=&arnumber=1285854&url=http%3A% 2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1285854 http://www.aaai.org/Papers/AAAI/1999/AAAI99-109.pdf http://www.soundprofessionals.com/cgi-bin/gold/category/110/mics https://ccrma.stanford.edu/~jos/fp/Nonlinear_Filter_Example_Dynamic.html http://electronics.howstuffworks.com/home-theater3.htm http://www.howtogeek.com/57903/htg-explains-how-does-dynamic-range-compression-work/ http://160.78.24.2/Public/AES-114/00075.pdf ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=290868 http://decoy.iki.fi/dsound/ambisonic/motherlode/source/poletti_3D%20surround%20system.p df http://socialsounddesign.com/questions/10395/how-to-give-a-feeling-of-distance-position http://www.tomshardware.com/news/Headset-7.1-surround-sound-PC-Gaming-TiamatFacebook,13245.html http://www.aes.org/e-lib/browse.cfm?elib=7082 http://www.vision3d.com/3views.html https://pro.sony.com/bbsc/ssr/mkt-digitalcinema/resource.demos.bbsccms-assets-mktdigicinema-demos-digitalcinema3d.shtml http://www.oculusvr.com/ http://stcroixstudios.com/wilder/anaglyph/whatsanaglyph.html http://www.dlp.com/projector/dlp-innovations/dlp-link.aspx http://phys.org/news173082582.html http://www.vetmed.vt.edu/education/curriculum/vm8054/eye/binocs.htm http://www.zurb.com/article/394/make-your-own-3d-video-in-three-easy-step http://www.hitl.washington.edu/scivw/EVE/III.A.1.c.DepthCues.html