SUPER DEPTH-MAP RENDERING BY CONVERTING HOLOSCOPIC VIEWPOINT TO PERSPECTIVE PROJECTION E. Alazawi, M. Abbod, A. Aggoun, M. R. Swash and O. Abdul Fatah Electronic and Computer Engineering, School of Engineering and Design, Brunel University, London, UK (Eman.Alazawi@brunel.ac.uk) ABSTRACT The expansion of 3D technology will enable observers to perceive 3D without any eye-wear devices. Holoscopic 3D imaging technology offers natural 3D representation of real 3D scenes that can be viewed by multiple viewers independently of their position. However, the creation of a super depth-map and reconstruction of the 3D object from a holoscopic 3D image is still in its infancy. The aim of this work is to build a high-quality depth map of a real 3D scene from a holoscopic 3D image through extraction of multi-view high resolution Viewpoint Images (VPIs) to compensate for the poor features of VPIs. To manage this, we suggest a reconstruction method based on the perspective formula to convert sets of directional orthographic low resolution VPIs into perspective projection geometry. Following that, we implement an Auto-Feature point algorithm for synthesizing VPIs to distinctive Feature-Edge (FE) blocks to localize and provide an individual feature detector that is responsible for integration of 3D information. Detailed experiments proved the reliability and efficiency of the proposed method, which outperforms state-of-the-art methods for creation of depth map. Index Terms — depth-map, feature descriptors, holoscopic 3D image, orthographic and perspective projection, viewpoints image 1. INTRODUCTION recent years, holoscopic 3D imaging (H3DI), also known as Integral Imaging (InIm), has attracted a great deal of interest due to its ability to provide a 3D volume in real true color [1]. Its application areas are very wide, such as in 3DTV, 3D cinema, medicine, robotic vision, biometrics, military, design, and video games [2]. The following characteristics make this promising candidate technology an ideal system when compared to other existing 3D technologies [1-7]: 1. Allows view of 3D images without any special eye wear. 2. Allows natural 3D imaging as the object is constructed in space in an optically constructed environment (i.e. fatigue free viewing with less visual discomfort) using the principle of “fly’s eye”. 3. Records the 3D information in 2D form and displays it in full 3D with optical components. 4. Offers a unique feature, useful for post-production that has the ability to produce image(s) of different focal planes. 5. Offers full parallax in real-time recording without complicated and expensive camera calibration. I N Holoscopic imaging was first proposed in March 1908 by the physicist Professor Gabriel Lippmann [3], and following that progress has been made by many researchers. From resent developments in optical manufacturing technology, H3DI has become a practical, prospective 3D display technology. Recently, single aperture light field 3D capture and display have been developed and extensively disseminated for numerous professional XXX-X-XXXX-XXXX-X/XX/$XX.XX ©2014 IEEE users [4, 5]. The establishment of the technology demands a number of image processing steps to reconstruct 3D-objects through depth estimation; and therefore, it turns out to be ready for massive commercialization. Therefore, it is crucial to obtain precise depth information maps to enable content based image coding and transmission of holoscopic images through rectifying 3D reconstruction and spatial resolution analysis. Recently, depth-through-disparity analysis approaches based on feature matching [6, 7] from different extracted VPIs were adopted to achieve accurate depth estimation by taking advantage of the information repetition between multi-pairs of VPIs. Experimental results showed that the 3D objects contain enormous amounts of non-information due to homogeneous regions and the approaches failed to produce smooth depth contour scenes. Very recently, the authors [8, 9] adopted the principles of the auto-thresholding descriptors technique to exploit high value information existing in the central VPI by extracting reliable sets of features from synthetic and real images with both unidirectional and omnidirectional holoscopic 3D images. A trade-off between the depth accuracy and computation speed has been shown to exist. However, there is still evidence of the requirement for a foreground mask to calculate the depth from bulky non-informative and homogeneous regions. To this end, the trade-off between execution time and quality for most depth estimation techniques and algorithms remains a difficult task and has occupied the attention of many researchers. The aim of this approach is to enable both depth accuracy and a fast execution time from a 3D holoscopic system. It is a novel method for computing the disparity map based when transforming the captured orthographic projection VPIs into the perspective projection geometry VPIs. The method is based on the combination of three techniques: 1) generation of high-resolution VPIs by converting the extracted VPIs from orthographic (i.e. low resolution) to perspective (i.e. high resolution) projection geometry. This novelty method plays a crucial role in improving the feature matching algorithm by setting reliable feature information blocks. 2) Searching for the optimal threshold value, which is the guide setting and extraction of a reliable set of 3D information features, which is the key to success in realizing reliable features on the high resolution VPIs for the next stage. 3) An adaptive hybrid multibaseline algorithm [8, 9] using a novel automatically modified aggregation cost window to improve the performance of the depth estimation and simultaneously maintain a low computation time. 2. HOLOSCOPIC 3D IMAGING SYSTEM The principal of 3D holoscopic imaging system involves two processes: capturing and display (see Fig. 1). In the capture process there is the “recording” of the distribution of the light rays of the object via spherical or lenticular micro-lenses that are closely packed together in an array that is in contact with a recording device [1]. The planar detector surface records the holo- f :Focal Length Lens Array 3D Object Elemental Images f :Focal Length Recorded Flat Planer 3D Image Medium Display Display Process Recording Process Fig. 1 3D Holoscopic system processes. scopic image as a 2D distribution of intensities and it is sampled as a form of Elemental Image (EI) array. Each different 2D image, named “viewpoint images” (VPIs), is projected at a slightly different angle than its neighbor in orthographic projection geometry as shown in Fig. 2. Therefore, the 2D-VPIs contain the intensity and directional information of 3D depth and the 3D resolution is related to the total number of pixels behind each of the micro-lenses. The display process “replay” of the H3DI is the reverse process of the recording process. The micro-lens array is placed in front of the planar presentation surface and the white light ray’s junction from the rear is derived from each of the micro-lenses thereby reconstructing the object in space as shown in Fig.1. The reconstructed object image is inverted in depth (pseudoscopic). In the last two decades Aggoun [1] and Okui [10] converted the pseudoscopic image into an orthoscopic projection through optical and digital techniques. 3. PROPOSED METHOD The proposed 3D depth estimation framework is shown in Fig.6, which comprises three phases of the technique as follows: pixel integration of the same spatial point across different views to enhance the resolution of the VPI by empowering more pixels to represent the same point. 3. Shift-by-one pixel and integrate all the selected sets of LRVPIs to return the window size to where the plane is focused and produces only one image plane “in-focus”. Fig 3 ( c) shows an example of the process, in which the depth plane z1 can be seen from different EIs by setting the shift value to 1; therefore, pixels under EI n*shift will pick up the position point z1 from different EIs, where, n =1, 2,…, N is the number of the EIs. Therefore, with up-sampling, shifting and integration of one pixel shift will focus at one depth plane (z1). The enhancement at the depth plane z1 and rays from neighbouring VPIs presents the points directly to increase FOV as shown in Fig. 3 (c). m m n n i View-Point Image j Fig.2 Principle of extraction for VPIs from ODHI by periodically extracting pixels from the captured EI (for simplicity, assume there are only 3×3 pixels under each micro-lens). Extract one pixel from the same position under different micro-lenses and place them in an orderly fashion to form one VPI. (a) 3.1 From Orthographic to Perspective Projection The VPIs are a collection of the pixels at the same position in every EI and have orthographic projection geometry. Therefore, the object space is sampled at a parallel grid without any vanishing point. The Field Of View (FOV) of the EI is limited to 2 tan−1 (π/2π) , where f and π are the focal length and the lens pitch of the elemental lens respectively, and the resolution of the VPIs does not exceed the number of EIs, which are too small and coarse. Therefore, these images have low resolution due to the limited size of each EI. Since the set of EIs represents the ray space of the 3D object, 3D information of the object is embedded in the EIs. In other words, the accuracy of the depth estimation has a direct effect on the quality of the generated VPI [11]. The VPIs’ resolutions are low so not many details (features) are visible for the corresponding process. In this approach to extract more reliable feature correspondences, it is vital to improve the quality of these images through production of high resolution VPIs via the transformation of orthographic projection VPIs to perspective projection geometry. 1. Firstly, a set of N number Low Resolution VPIs (LRVPIs) are selected (here we select a 5× 5 EI grid) to generate one High Resolution VPI (HRVPI). Identify each LR-VPI as πππΌπ,π,π,π , where i, j are the VPI coordinates and n, m are the coordinates of the parallel light rays (Omni-directional Holoscopic Image coordinates ODHI). Fig. 2 shows the principles of transforming EIs into VPIs from the ODHI system. 2. Up-sample each LRVPI by N steps in the horizontal and vertical directions. Up-sampled VPs are stacked adjacently horizontally and top-to-bottom in a vertical direction to form a 4D stack of πππΌπ,π,π,π images, where i and j are the VPIs’ coordinates and i and j are the indexed number of VPs ranging from 1 to N (see Fig. 3(a, b)). The goal of this step is to enable sub- [Type text] (b) Increased Resolution point Z1 Different Depth Planes (c) Elemental Image Fig. 3 Example of the generation process of HR perspective projection geometry, (a) the sets of VPIs (=5), (b) up-sampling by the number of the sets of VPIs (=5) using bi-cubic interpolation and shift by one pixel in horizontal and vertical directions and (c) the resulting resolution depth plane z1 in perspective projection geometry. 4. Post-processing step on HRVPI: Fixed shifting of the neighbors for reconstruction of SR images often results in blurring effects, due to over-or under-fitting. Therefore, a new simplified model of the basics for a typical de-blurring process is employed as a point-spread function. The first step of the filtering process is to convolute the blurred high resolution image (HRI) with a 2D Gaussian filter kernel of standard deviation = 2 and kernel size = [15 15] in each direction (rounded to odd integer), where the 2D distribution is split into a pair of 1D distributions in the horizontal and vertical directions. The second step is to suppress the low frequencies and also amplify the high frequencies, the non-filtered HRI is multiplied by 2 and subtracted from the filter HRI. This is a simple and effec- tive step in removing the noise in the signal that has an inverse smoothing affect along the horizontal and vertical directions, without affecting the detail of the HR image, which provides gentler smoothing and preserves the edges. The HRVPI result is shown in Fig. 4 (e, f). 20 20 40 100 20 40 60 20 40 60 20 80 40 60 80 200 40 100 60 80 100 60 120 20 80 60 40 100 120 80 100 120 140 160 180 80 20 10040 60 80 100 120 140 160 180 20 10040 60 80 100 120 140 160 40 60 80 100 120 140 160 180 20 40 60 80 100 120 140 160 300 120 180 120 20 120 180 400 (a) 500 (b) 600 100 200 300 400 500 600 700 800 polation technique in section 3.1. The following explains the rendering process where seven views are obtained from each EI of resolution 29 × 29 pixels. The process to generate the HRVPIs from the VPIs is shown in Fig. 5 in which different colors (1-29) correspond to the EIs’ positions. As also shown by different colors the multi-view HRVPIs (HR1-HR7), each HRVPI view is maintained at a constant distance from each other in the rendering. A set of seven HRVPIs have been generated to estimate the super resolution disparity map, starting from the 5×5 to 5×9 position of the LRVPI to generate HR1 then shifted in the horizontal direction by three VPIs and starting from VPI position 5×8 to generate HR2 to the end of the VPIs position array which is [29 29]. 900 (c) Fig.5 VPIs selection process to generate seven HRVPIs. (d) In Fig. 6 the encompassing framework of the proposed algorithm shows the integration of the performance of the adaptive multi-baseline algorithm in [8, 9] for estimation of the super depth map. The principle steps in the comprehensive algorithm are: (e) (f) Fig. 4 Experimental results, (a) 5 sets of (LRVPI) each size 193 × 129 pixels. (b) Resized VPI using ‘Bi-cubic’ interpolation method, (c, d) HRVPI resulting in size 653 × 973 pixels and magnified blurred section, and (e, f) de-blurred HRVPI successfully generated from the proposed method and magnified de-blurred section. 3.2 Auto-Optimal Thresholding & Feature-descriptors The high resolution VPIs contain abundant information that should be exploited for depth estimation. The information can be successfully used for object detection and recovery, and then employed to extract robust correspondences, thus leading to a reliable estimation of 3D object depth. The principle of the method is to search for the optimal threshold value, which is the guide setting and then extract a reliable set of features. Recently, the authors’ previous work has detailed this process [8, 9] using low resolution VPI. In this proposed approach the same algorithm is implemented for the high resolution VPI to extract reliable feature information blocks. The Feature-Match descriptor is an efficient and informative procedure implemented via assessing the intensity variance of the image blocks for disparity analysis. The optimal threshold gives the highest local contrast by comparing small patches extracted from each region in the image to their immediate neighborhood. The spatial intensity distribution in an image is used as a feature to represent the image in the feature match selection algorithm. 3.3 Generate Multi-view HRIs from Holoscopic 3D Image The progress of producing HRVPIs in perspective projection leads to the second novel method of generating multi-view HRVPIs from H3DI in different perception views. Due to the small size of image sensor in the H3DI technique, which limits the use of larger micro-lens with wide viewing angle, the baseline is limited at this stage. This new generating of multi-views does not necessarily require the views to have long baselines like the auto-stereoscopic multi-view imaging technique. The different perception views are formed from the same scene as recorded from different perspectives through the use of a H3DI pixel format. In another word, it is converted into a multi-view 3D image pixel format with the correct slanting using the new inter- [Type text] 3.4 Depth Map Estimation Algorithm 1. Select sets of EIs for process. 2. Convert EIs into VPIs using the strong interval correlations between pixels displaced by one micro-lens. 3. Transform the orthographic projection LRVPIs into prospective projection HRVPIs using the new interpolation algorithm in section 3.1. 4. Use the auto-optimal threshold from section 3.2 by selecting the reference VPI for the guide setting and extraction of a set of reliable features. 5. Employ the adaptive multi-baseline disparity algorithm presented in the authors’ previous work [8, 9] using adaptive window shape (AWS). Thus, a robust and precise filtering smooth cost aggregation function is developed by Summing the Sub Square Differences (SSSD) functions of the windows in the neighborhood of the disparity score function. The filtered output disparity πΆ(ππ, π)ππ’π‘ at the ππ block disparity map is given by: πΆ(ππ, π)ππ’π‘ = πΆ(ππ, π)ππππ‘πππ + ∑ππ∈π(ππ) π(ππ, ππ) × πππ{πΆ(ππ, π + πΏ) } where, πΆ(ππ, π)ππππ‘πππ indicate the cost match function of feature block ππ, while ∑ππ∈π(ππ) π(ππ, ππ) × πππ{πΆ(ππ, π + πΏ) is the match cost function of the nearest neighbor block (ππ) within N neighbor and πΏ is a small value that allows for a reduction in the variance within each block. 4. DISCUSSION AND COMPARASION OF RESULTS To demonstrate the above described approach, experiments were carried on real-data ODHIs “Box-Tags” and “AirplaneMan” with comparisons to the state-of-the-art in [9] was conducted. The ODHI resolution was 5616 × 3744 pixels with 193 × 129 micro-lenses. This gave an EI resolution of 29 × 29 pixels while the number of micro-lenses used in the recording was used to determine the VP resolution. Thus the VP resolution was 193 × 129 pixels, the same as the number of micro-lenses. The resulting algorithm has been proved to be more distinctive, robust, and precise in terms of depth estimation to camera viewpoint changes compared against the other outputs from the state-ofthe-art algorithms in [9]. Fig. 6 illustrates the achievement depth map results of the proposed algorithm to verify the competence of the algorithm to extract accurate 3D depth. This approach is very clearly focused as a simple and efficient way of generating super resolution depth maps using small sets of VPIs. Input Holoscopic 3D Image Select Central HRVPI Second Phase Auto-Feature Block Setting Process Extract Set of LR-VPIs First Phase General Set of HRVPIs Process Multi- Pairs HRVPIs Corresponding 20 20 40 20 40 60 20 40 20 60 20 80 40 20 60 40 80 40 100 60 20 40 80 60 100 80 20 20 60 120 40 100 40 60 60 80 100 10040 60 60 80 100 80 120 160 120 180 140 160 180 40 100 10040 20 140 20 40 80 20 120 80 120 60 60 80 100 120 140 160 180 80 120 100 20 120 120 20 40 60 60 80 100 120 140 160 180 80 40 100 120 60 20 80 40 100 60 120 80 140 100 160 120 180 140 160 180 80 20 10040 60 20 10040 80 100 120 140 160 180 80 100 120 140 160 120 60 180 120 20 40 60 20 40 80 100 120 140 160 180 80 100 120 140 160 120 60 180 Third Phase Adaptive multibaseline algorithm Final 3D Depth Map Fig.6 Overall representation of the proposed procedure process for 3D depth estimation on real data “Box” ODHI. Experimentally, sets of seven multi-view HRVPIs and a long baseline achieved good results in increasing the accuracy of depth maps. Experiments identifying the precise depth map and which objects are present in a scene are shown (see Fig. 7) that the proposed method outperformed another state-of-the-art [9] algorithm from two aspects: accuracy and speed. Where, sets of 49 LRVPIs were used to generate the resultant depth map from the author’s previous algorithm [9]. Due to space limitations, only two comparison of the depth map obtained from real-world ODHI, the “Box-Tags” and the “Airplane-Man”, are shown. rithm was superior to a current state-of-the-art algorithm [9] and achieved a comparable performance. However, results show that there is evidence that visual features of an object such as its shape (contour) on the depth map still require incorporating of the surface integration process. ACKNOWLEDGEMENTS The authors gratefully acknowledge the support of the European Commission under the Seventh Framework Programme (FP7) project 3D VIVANT (Live Immerse Video-Audio Interactive Multimedia). 6. REFERENCES 100 200 300 400 500 600 100 200 300 400 500 600 700 800 900 (a) (b) Fig. 7 Comparison depth map estimation results on, a) “BoxTags” and (b) “Airplane-Man”. First row represent the HRVPIs, middle row shows the depth map results using [9] algorithm and the last row display the propose algorithm results. 5. CONCLUSION AND FUTURE WORK A novel approach was presented in this paper to create a super resolution depth map from the Omni-directional holoscopic 3D imaging technique. The novelty of this approach is that it converts the sets of orthographical projection (low resolution) viewpoint images into a form of perspective projection (high resolution) geometry. The high resolution viewpoint image has been achieved using a modified Gaussian filter on the new form of viewpoint image to reduce the de-blurring effects. In addition, the approach successfully generates multi-view high resolution viewpoint images from a holoscopic 3D imaging system, which is used to generate 3D object depth. It is worth to mentioning that subjective quality criteria (visual quality) are used to evaluate the performance of the proposed process. The reason for this is that there are no known original (reference) images where the sets of LRVPIs are sampled from pixels of each Elemental Image (EI). The results confirmed the efficiency, robustness, and speediness of the approach via the enhancement of the depth map accuracy and reduction of the computational complexity. The experimental results have successfully verified that the algo- [Type text] [1] A. Aggoun, “3D Holoscopic Imaging Technology for RealTime Volume Processing and Display,” High-Quality Visual Experience Signals and Communication Technology, Vol. IV, pp. 411-428, 2010. [2] L. Onural, “Television in 3-D: What are the Prospects,” Proc. IEEE, Vol. 95, No. 6, 2007. [3] G. M. Lippmann, Compt. Rend. Acad. Sci. Vol. 146, 446, 1908. [4] J.-Y. Son, B. Javidi, S. Yano, and K.-H. Choi, “Recent Developments in 3-D Imaging Technologies,” Journal of Display Technology, Vol. 6, No. 10, pp. 394–403, Oct. 2010. [5] Y. Kim, K. Hong, and B. Lee, “Recent researches based on integral imaging display method,” 3D Research, Vol. 1, No. 1, pp. 17–27, Aug. 2011. [6] C. H. Wu, M. McCormick, A. Aggoun, and S.-Y. Kung, “Depth Mapping of Integral Images through Viewpoint Image Extraction with a Hybrid Disparity Analysis Algorithm,” Journal of Display Technology, Vol. 4, pp. 101-108, 2008. [7] O. Abdul Fatah, A. Aggoun, M. Nawaz, J. Cosmas, E. Tsekleves, M. Swash, and E. Alazawi, “Depth Mapping of Integral Images Using Hybrid Disparity Analysis Algorithm, IEEE International Symposium Broadband Multimedia Systems Broadcasting, pp. 1-4, South Korea, 2012. [8] E. Alazawi, A. Aggoun, O. Abdul Fatah, M. Abbod, and M. R. Swash, “Adaptive Depth Map Estimation from 3D Integral Image,” IEEE International Symposium Broadband Multimedia Systems Broadcasting, pp. 1-6, London, UK, 2013. [9] E. Alazawi, A. Aggoun, M. Abbod, M. R. Swash, and O. Abdul Fatah, “Scene Depth Extraction from Holoscopic Imaging Technology,” IEEE 3DTV-CON: Vision Beyond Depth, AECC, Aberdeen, Scotland, 7-8 October 2013. [10] M. Okui, F. Okano, “3D Display Research at NHK”, Workshop on 3D Media, Applications and Devices, Berlin, Germany, 2009. [11] O. Abdul Fatah, P. M. P. Lanigan, A. Aggoun, M. Swash, and E. Alazawi, “Three-Dimensional Integral Image Reconstruction base on Viewpoints Interpolation,” IEEE International Symposium Broadband Multimedia Systems Broadcasting, pp. 1-4, London, UK, 2013.