SUPER DEPTH-MAP RENDERING BY CONVERTING HOLOSCOPIC VIEWPOINT TO PERSPECTIVE PROJECTION

advertisement
SUPER DEPTH-MAP RENDERING BY CONVERTING HOLOSCOPIC
VIEWPOINT TO PERSPECTIVE PROJECTION
E. Alazawi, M. Abbod, A. Aggoun, M. R. Swash and O. Abdul Fatah
Electronic and Computer Engineering, School of Engineering and Design,
Brunel University, London, UK
(Eman.Alazawi@brunel.ac.uk)
ABSTRACT
The expansion of 3D technology will enable observers to perceive 3D without any eye-wear devices. Holoscopic 3D imaging
technology offers natural 3D representation of real 3D scenes
that can be viewed by multiple viewers independently of their
position. However, the creation of a super depth-map and reconstruction of the 3D object from a holoscopic 3D image is still in
its infancy. The aim of this work is to build a high-quality depth
map of a real 3D scene from a holoscopic 3D image through
extraction of multi-view high resolution Viewpoint Images
(VPIs) to compensate for the poor features of VPIs. To manage
this, we suggest a reconstruction method based on the perspective formula to convert sets of directional orthographic low resolution VPIs into perspective projection geometry. Following
that, we implement an Auto-Feature point algorithm for synthesizing VPIs to distinctive Feature-Edge (FE) blocks to localize
and provide an individual feature detector that is responsible for
integration of 3D information. Detailed experiments proved the
reliability and efficiency of the proposed method, which outperforms state-of-the-art methods for creation of depth map.
Index Terms — depth-map, feature descriptors, holoscopic
3D image, orthographic and perspective projection, viewpoints
image
1. INTRODUCTION
recent years, holoscopic 3D imaging (H3DI), also known as
Integral Imaging (InIm), has attracted a great deal of interest
due to its ability to provide a 3D volume in real true color [1].
Its application areas are very wide, such as in 3DTV, 3D cinema,
medicine, robotic vision, biometrics, military, design, and video
games [2]. The following characteristics make this promising
candidate technology an ideal system when compared to other
existing 3D technologies [1-7]:
1. Allows view of 3D images without any special eye wear.
2. Allows natural 3D imaging as the object is constructed in
space in an optically constructed environment (i.e. fatigue
free viewing with less visual discomfort) using the principle
of “fly’s eye”.
3. Records the 3D information in 2D form and displays it in
full 3D with optical components.
4. Offers a unique feature, useful for post-production that has
the ability to produce image(s) of different focal planes.
5. Offers full parallax in real-time recording without complicated and expensive camera calibration.
I
N
Holoscopic imaging was first proposed in March 1908 by the
physicist Professor Gabriel Lippmann [3], and following that
progress has been made by many researchers. From resent developments in optical manufacturing technology, H3DI has become a practical, prospective 3D display technology. Recently,
single aperture light field 3D capture and display have been developed and extensively disseminated for numerous professional
XXX-X-XXXX-XXXX-X/XX/$XX.XX ©2014 IEEE
users [4, 5]. The establishment of the technology demands a
number of image processing steps to reconstruct 3D-objects
through depth estimation; and therefore, it turns out to be ready
for massive commercialization. Therefore, it is crucial to obtain
precise depth information maps to enable content based image
coding and transmission of holoscopic images through rectifying
3D reconstruction and spatial resolution analysis.
Recently, depth-through-disparity analysis approaches based
on feature matching [6, 7] from different extracted VPIs were
adopted to achieve accurate depth estimation by taking advantage of the information repetition between multi-pairs of
VPIs. Experimental results showed that the 3D objects contain
enormous amounts of non-information due to homogeneous
regions and the approaches failed to produce smooth depth contour scenes. Very recently, the authors [8, 9] adopted the principles of the auto-thresholding descriptors technique to exploit
high value information existing in the central VPI by extracting
reliable sets of features from synthetic and real images with both
unidirectional and omnidirectional holoscopic 3D images. A
trade-off between the depth accuracy and computation speed has
been shown to exist. However, there is still evidence of the requirement for a foreground mask to calculate the depth from
bulky non-informative and homogeneous regions. To this end,
the trade-off between execution time and quality for most depth
estimation techniques and algorithms remains a difficult task and
has occupied the attention of many researchers.
The aim of this approach is to enable both depth accuracy and
a fast execution time from a 3D holoscopic system. It is a novel
method for computing the disparity map based when transforming the captured orthographic projection VPIs into the perspective projection geometry VPIs.
The method is based on the combination of three techniques:
1) generation of high-resolution VPIs by converting the extracted VPIs from orthographic (i.e. low resolution) to perspective
(i.e. high resolution) projection geometry. This novelty method
plays a crucial role in improving the feature matching algorithm
by setting reliable feature information blocks. 2) Searching for
the optimal threshold value, which is the guide setting and extraction of a reliable set of 3D information features, which is the
key to success in realizing reliable features on the high resolution VPIs for the next stage. 3) An adaptive hybrid multibaseline algorithm [8, 9] using a novel automatically modified
aggregation cost window to improve the performance of the
depth estimation and simultaneously maintain a low computation
time.
2. HOLOSCOPIC 3D IMAGING SYSTEM
The principal of 3D holoscopic imaging system involves two
processes: capturing and display (see Fig. 1). In the capture process there is the “recording” of the distribution of the light rays
of the object via spherical or lenticular micro-lenses that are
closely packed together in an array that is in contact with a recording device [1]. The planar detector surface records the holo-
f :Focal Length
Lens Array
3D Object
Elemental Images
f :Focal Length
Recorded Flat Planer
3D Image
Medium Display
Display Process
Recording Process
Fig. 1 3D Holoscopic system processes.
scopic image as a 2D distribution of intensities and it is sampled
as a form of Elemental Image (EI) array. Each different 2D image, named “viewpoint images” (VPIs), is projected at a slightly
different angle than its neighbor in orthographic projection geometry as shown in Fig. 2. Therefore, the 2D-VPIs contain the
intensity and directional information of 3D depth and the 3D
resolution is related to the total number of pixels behind each of
the micro-lenses. The display process “replay” of the H3DI is
the reverse process of the recording process. The micro-lens
array is placed in front of the planar presentation surface and the
white light ray’s junction from the rear is derived from each of
the micro-lenses thereby reconstructing the object in space as
shown in Fig.1. The reconstructed object image is inverted in
depth (pseudoscopic). In the last two decades Aggoun [1] and
Okui [10] converted the pseudoscopic image into an orthoscopic
projection through optical and digital techniques.
3. PROPOSED METHOD
The proposed 3D depth estimation framework is shown in Fig.6,
which comprises three phases of the technique as follows:
pixel integration of the same spatial point across different
views to enhance the resolution of the VPI by empowering
more pixels to represent the same point.
3. Shift-by-one pixel and integrate all the selected sets of LRVPIs to return the window size to where the plane is focused
and produces only one image plane “in-focus”. Fig 3 ( c)
shows an example of the process, in which the depth plane z1
can be seen from different EIs by setting the shift value to 1;
therefore, pixels under EI n*shift will pick up the position
point z1 from different EIs, where, n =1, 2,…, N is the number
of the EIs. Therefore, with up-sampling, shifting and integration of one pixel shift will focus at one depth plane (z1). The
enhancement at the depth plane z1 and rays from neighbouring
VPIs presents the points directly to increase FOV as shown in
Fig. 3 (c).
m
m
n
n
i
View-Point
Image
j
Fig.2 Principle of extraction for VPIs from ODHI by periodically
extracting pixels from the captured EI (for simplicity, assume there
are only 3×3 pixels under each micro-lens). Extract one pixel from
the same position under different micro-lenses and place them in an
orderly fashion to form one VPI.
(a)
3.1 From Orthographic to Perspective Projection
The VPIs are a collection of the pixels at the same position in
every EI and have orthographic projection geometry. Therefore,
the object space is sampled at a parallel grid without any vanishing point. The Field Of View (FOV) of the EI is limited
to 2 tan−1 (πœ‘/2𝑓) , where f and πœ‘ are the focal length and the
lens pitch of the elemental lens respectively, and the resolution
of the VPIs does not exceed the number of EIs, which are too
small and coarse. Therefore, these images have low resolution
due to the limited size of each EI. Since the set of EIs represents
the ray space of the 3D object, 3D information of the object is
embedded in the EIs. In other words, the accuracy of the depth
estimation has a direct effect on the quality of the generated VPI
[11]. The VPIs’ resolutions are low so not many details (features) are visible for the corresponding process. In this approach
to extract more reliable feature correspondences, it is vital to
improve the quality of these images through production of high
resolution VPIs via the transformation of orthographic projection
VPIs to perspective projection geometry.
1. Firstly, a set of N number Low Resolution VPIs (LRVPIs) are
selected (here we select a 5× 5 EI grid) to generate one High
Resolution VPI (HRVPI). Identify each LR-VPI as 𝑉𝑃𝐼𝑖,𝑗,𝑛,π‘š ,
where i, j are the VPI coordinates and n, m are the coordinates
of the parallel light rays (Omni-directional Holoscopic Image
coordinates ODHI). Fig. 2 shows the principles of transforming EIs into VPIs from the ODHI system.
2. Up-sample each LRVPI by N steps in the horizontal and vertical directions. Up-sampled VPs are stacked adjacently horizontally and top-to-bottom in a vertical direction to form a 4D
stack of 𝑉𝑃𝐼𝑖,𝑗,𝑛,π‘š images, where i and j are the VPIs’ coordinates and i and j are the indexed number of VPs ranging from
1 to N (see Fig. 3(a, b)). The goal of this step is to enable sub-
[Type text]
(b)
Increased Resolution
point
Z1
Different Depth
Planes
(c)
Elemental Image
Fig. 3 Example of the generation process of HR perspective projection geometry, (a) the sets of VPIs (=5), (b) up-sampling by the number of the sets of VPIs (=5) using bi-cubic interpolation and shift by
one pixel in horizontal and vertical directions and (c) the resulting
resolution depth plane z1 in perspective projection geometry.
4. Post-processing step on HRVPI: Fixed shifting of the neighbors for reconstruction of SR images often results in blurring
effects, due to over-or under-fitting. Therefore, a new simplified model of the basics for a typical de-blurring process is
employed as a point-spread function. The first step of the filtering process is to convolute the blurred high resolution image
(HRI) with a 2D Gaussian filter kernel of standard deviation =
2 and kernel size = [15 15] in each direction (rounded to odd
integer), where the 2D distribution is split into a pair of 1D
distributions in the horizontal and vertical directions. The second step is to suppress the low frequencies and also amplify
the high frequencies, the non-filtered HRI is multiplied by 2
and subtracted from the filter HRI. This is a simple and effec-
tive step in removing the noise in the signal that has an inverse
smoothing affect along the horizontal and vertical directions,
without affecting the detail of the HR image, which provides
gentler smoothing and preserves the edges. The HRVPI result
is shown in Fig. 4 (e, f).
20
20
40
100
20
40
60
20
40
60
20
80
40
60
80
200
40
100
60
80
100
60
120
20
80
60
40
100
120
80
100
120
140
160
180
80
20
10040
60
80
100
120
140
160
180
20
10040
60
80
100
120
140
160
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
300
120
180
120
20
120
180
400
(a)
500
(b)
600
100
200
300
400
500
600
700
800
polation technique in section 3.1. The following explains the
rendering process where seven views are obtained from each EI
of resolution 29 × 29 pixels. The process to generate the HRVPIs
from the VPIs is shown in Fig. 5 in which different colors (1-29)
correspond to the EIs’ positions. As also shown by different
colors the multi-view HRVPIs (HR1-HR7), each HRVPI view is
maintained at a constant distance from each other in the rendering. A set of seven HRVPIs have been generated to estimate the
super resolution disparity map, starting from the 5×5 to 5×9
position of the LRVPI to generate HR1 then shifted in the horizontal direction by three VPIs and starting from VPI position
5×8 to generate HR2 to the end of the VPIs position array which
is [29 29].
900
(c)
Fig.5 VPIs selection process to generate seven HRVPIs.
(d)
In Fig. 6 the encompassing framework of the proposed algorithm shows the integration of the performance of the adaptive
multi-baseline algorithm in [8, 9] for estimation of the super
depth map.
The principle steps in the comprehensive algorithm are:
(e)
(f)
Fig. 4 Experimental results, (a) 5 sets of (LRVPI) each size 193 × 129
pixels. (b) Resized VPI using ‘Bi-cubic’ interpolation method, (c, d)
HRVPI resulting in size 653 × 973 pixels and magnified blurred
section, and (e, f) de-blurred HRVPI successfully generated from the
proposed method and magnified de-blurred section.
3.2 Auto-Optimal Thresholding & Feature-descriptors
The high resolution VPIs contain abundant information that
should be exploited for depth estimation. The information can be
successfully used for object detection and recovery, and then
employed to extract robust correspondences, thus leading to a
reliable estimation of 3D object depth. The principle of the
method is to search for the optimal threshold value, which is the
guide setting and then extract a reliable set of features. Recently,
the authors’ previous work has detailed this process [8, 9] using
low resolution VPI. In this proposed approach the same algorithm is implemented for the high resolution VPI to extract reliable feature information blocks. The Feature-Match descriptor is
an efficient and informative procedure implemented via assessing the intensity variance of the image blocks for disparity
analysis. The optimal threshold gives the highest local contrast
by comparing small patches extracted from each region in the
image to their immediate neighborhood. The spatial intensity
distribution in an image is used as a feature to represent the image in the feature match selection algorithm.
3.3 Generate Multi-view HRIs from Holoscopic 3D Image
The progress of producing HRVPIs in perspective projection
leads to the second novel method of generating multi-view
HRVPIs from H3DI in different perception views. Due to the
small size of image sensor in the H3DI technique, which limits
the use of larger micro-lens with wide viewing angle, the baseline is limited at this stage. This new generating of multi-views
does not necessarily require the views to have long baselines like
the auto-stereoscopic multi-view imaging technique. The different perception views are formed from the same scene as recorded from different perspectives through the use of a H3DI pixel
format. In another word, it is converted into a multi-view 3D
image pixel format with the correct slanting using the new inter-
[Type text]
3.4 Depth Map Estimation Algorithm
1. Select sets of EIs for process.
2. Convert EIs into VPIs using the strong interval correlations
between pixels displaced by one micro-lens.
3. Transform the orthographic projection LRVPIs into prospective projection HRVPIs using the new interpolation algorithm
in section 3.1.
4. Use the auto-optimal threshold from section 3.2 by selecting
the reference VPI for the guide setting and extraction of a set
of reliable features.
5. Employ the adaptive multi-baseline disparity algorithm presented in the authors’ previous work [8, 9] using adaptive window shape (AWS). Thus, a robust and precise filtering smooth
cost aggregation function is developed by Summing the Sub
Square Differences (SSSD) functions of the windows in the
neighborhood of the disparity score function. The filtered output disparity 𝐢(𝑏𝑝, 𝑑)𝑂𝑒𝑑 at the 𝑏𝑝 block disparity map is given by:
𝐢(𝑏𝑝, 𝑑)𝑂𝑒𝑑 = 𝐢(𝑏𝑝, 𝑑)π‘–π‘›π‘–π‘‘π‘–π‘Žπ‘™ + ∑𝑏𝑛∈𝑁(𝑏𝑝) π‘Š(𝑏𝑛, 𝑏𝑝) ×
π‘šπ‘–π‘›{𝐢(𝑏𝑛, 𝑑 + 𝛿) }
where, 𝐢(𝑏𝑝, 𝑑)π‘–π‘›π‘–π‘‘π‘–π‘Žπ‘™ indicate the cost match function of feature block 𝑏𝑝, while ∑𝑏𝑛∈𝑁(𝑏𝑝) π‘Š(𝑏𝑛, 𝑏𝑝) × π‘šπ‘–π‘›{𝐢(𝑏𝑛, 𝑑 +
𝛿) is the match cost function of the nearest neighbor block
(𝑏𝑛) within N neighbor and 𝛿 is a small value that allows for a
reduction in the variance within each block.
4. DISCUSSION AND COMPARASION OF RESULTS
To demonstrate the above described approach, experiments
were carried on real-data ODHIs “Box-Tags” and “AirplaneMan” with comparisons to the state-of-the-art in [9] was conducted. The ODHI resolution was 5616 × 3744 pixels with 193
× 129 micro-lenses. This gave an EI resolution of 29 × 29 pixels
while the number of micro-lenses used in the recording was used
to determine the VP resolution. Thus the VP resolution was 193
× 129 pixels, the same as the number of micro-lenses. The resulting algorithm has been proved to be more distinctive, robust,
and precise in terms of depth estimation to camera viewpoint
changes compared against the other outputs from the state-ofthe-art algorithms in [9]. Fig. 6 illustrates the achievement depth
map results of the proposed algorithm to verify the competence
of the algorithm to extract accurate 3D depth. This approach is
very clearly focused as a simple and efficient way of generating
super resolution depth maps using small sets of VPIs.
Input
Holoscopic
3D Image
Select Central
HRVPI
Second Phase
Auto-Feature Block
Setting Process
Extract
Set of
LR-VPIs
First Phase
General Set of
HRVPIs Process
Multi- Pairs
HRVPIs Corresponding
20
20
40
20
40
60
20
40
20
60
20
80
40
20
60
40
80
40
100
60
20
40
80
60
100
80
20
20
60
120
40
100
40
60
60
80
100
10040
60
60
80
100
80
120
160
120
180
140
160
180
40
100
10040
20
140
20
40
80
20
120
80
120
60
60
80
100
120
140
160
180
80
120
100
20
120
120
20
40
60
60
80
100
120
140
160
180
80
40
100
120
60
20
80
40
100
60
120
80
140
100
160
120
180
140
160
180
80
20
10040
60
20
10040
80
100
120
140
160
180
80
100
120
140
160
120
60
180
120
20
40
60
20
40
80
100
120
140
160
180
80
100
120
140
160
120
60
180
Third Phase
Adaptive multibaseline algorithm
Final 3D
Depth Map
Fig.6 Overall representation of the proposed procedure process for 3D depth estimation on real data “Box” ODHI.
Experimentally, sets of seven multi-view HRVPIs and a long
baseline achieved good results in increasing the accuracy of
depth maps. Experiments identifying the precise depth map and
which objects are present in a scene are shown (see Fig. 7) that
the proposed method outperformed another state-of-the-art [9]
algorithm from two aspects: accuracy and speed. Where, sets of
49 LRVPIs were used to generate the resultant depth map from
the author’s previous algorithm [9]. Due to space limitations,
only two comparison of the depth map obtained from real-world
ODHI, the “Box-Tags” and the “Airplane-Man”, are shown.
rithm was superior to a current state-of-the-art algorithm [9] and
achieved a comparable performance. However, results show that
there is evidence that visual features of an object such as its
shape (contour) on the depth map still require incorporating of
the surface integration process.
ACKNOWLEDGEMENTS
The authors gratefully acknowledge the support of the European
Commission under the Seventh Framework Programme (FP7)
project 3D VIVANT (Live Immerse Video-Audio Interactive
Multimedia).
6. REFERENCES
100
200
300
400
500
600
100
200
300
400
500
600
700
800
900
(a)
(b)
Fig. 7 Comparison depth map estimation results on, a) “BoxTags” and (b) “Airplane-Man”. First row represent the HRVPIs,
middle row shows the depth map results using [9] algorithm and
the last row display the propose algorithm results.
5. CONCLUSION AND FUTURE WORK
A novel approach was presented in this paper to create a super
resolution depth map from the Omni-directional holoscopic 3D
imaging technique. The novelty of this approach is that it converts the sets of orthographical projection (low resolution) viewpoint images into a form of perspective projection (high resolution) geometry. The high resolution viewpoint image has been
achieved using a modified Gaussian filter on the new form of
viewpoint image to reduce the de-blurring effects. In addition,
the approach successfully generates multi-view high resolution
viewpoint images from a holoscopic 3D imaging system, which
is used to generate 3D object depth. It is worth to mentioning
that subjective quality criteria (visual quality) are used to evaluate the performance of the proposed process. The reason for this
is that there are no known original (reference) images where the
sets of LRVPIs are sampled from pixels of each Elemental Image (EI). The results confirmed the efficiency, robustness, and
speediness of the approach via the enhancement of the depth
map accuracy and reduction of the computational complexity.
The experimental results have successfully verified that the algo-
[Type text]
[1] A. Aggoun, “3D Holoscopic Imaging Technology for RealTime Volume Processing and Display,” High-Quality Visual Experience Signals and Communication Technology,
Vol. IV, pp. 411-428, 2010.
[2] L. Onural, “Television in 3-D: What are the Prospects,”
Proc. IEEE, Vol. 95, No. 6, 2007.
[3] G. M. Lippmann, Compt. Rend. Acad. Sci. Vol. 146, 446,
1908.
[4] J.-Y. Son, B. Javidi, S. Yano, and K.-H. Choi, “Recent Developments in 3-D Imaging Technologies,” Journal of Display Technology, Vol. 6, No. 10, pp. 394–403, Oct. 2010.
[5] Y. Kim, K. Hong, and B. Lee, “Recent researches based on
integral imaging display method,” 3D Research, Vol. 1, No.
1, pp. 17–27, Aug. 2011.
[6] C. H. Wu, M. McCormick, A. Aggoun, and S.-Y. Kung,
“Depth Mapping of Integral Images through Viewpoint Image Extraction with a Hybrid Disparity Analysis Algorithm,” Journal of Display Technology, Vol. 4, pp. 101-108,
2008.
[7] O. Abdul Fatah, A. Aggoun, M. Nawaz, J. Cosmas, E.
Tsekleves, M. Swash, and E. Alazawi, “Depth Mapping of
Integral Images Using Hybrid Disparity Analysis Algorithm, IEEE International Symposium Broadband Multimedia Systems Broadcasting, pp. 1-4, South Korea, 2012.
[8] E. Alazawi, A. Aggoun, O. Abdul Fatah, M. Abbod, and M.
R. Swash, “Adaptive Depth Map Estimation from 3D Integral Image,” IEEE International Symposium Broadband
Multimedia Systems Broadcasting, pp. 1-6, London, UK,
2013.
[9] E. Alazawi, A. Aggoun, M. Abbod, M. R. Swash, and O.
Abdul Fatah, “Scene Depth Extraction from Holoscopic Imaging Technology,” IEEE 3DTV-CON: Vision Beyond
Depth, AECC, Aberdeen, Scotland, 7-8 October 2013.
[10] M. Okui, F. Okano, “3D Display Research at NHK”, Workshop on 3D Media, Applications and Devices, Berlin, Germany, 2009.
[11] O. Abdul Fatah, P. M. P. Lanigan, A. Aggoun, M. Swash,
and E. Alazawi, “Three-Dimensional Integral Image Reconstruction base on Viewpoints Interpolation,” IEEE International Symposium Broadband Multimedia Systems Broadcasting, pp. 1-4, London, UK, 2013.
Download