OPEN TO RIDICULE Deploying plaything technology for

advertisement
5B-004.qxd
4/28/2013
4:32 AM
Page 541
R. Stouffs, P. Janssen, S. Roudavski, B. Tunçer (eds.), Open Systems: Proceedings of the 18th International
Conference on Computer-Aided Architectural Design Research in Asia (CAADRIA 2013), 541–550. © 2013,
The Association for Computer-Aided Architectural Design Research in Asia (CAADRIA), Hong Kong, and
Center for Advanced Studies in Architecture (CASA), Department of Architecture-NUS, Singapore.
OPEN TO RIDICULE
Deploying plaything technology for 3D modelling of urban informal
settlements in Asia
Joerg REKITTKE
National University of Singapore, Singapore
rekittke@nus.edu.sg
Yazid NINSALAM
Singapore-ETH Centre, Singapore
ninsalam@arch.ethz.ch
and
Philip PAAR
Laubwerk, Berlin, Germany
paar@laubwerk.com
Abstract. As technology affine urban landscape researchers, working in
Asian mega cities, we roam through crowded and narrow, widely informal
city layouts, where we apply digital fieldwork equipment and conduct
design work. We use low cost cameras and camera drones, tools that had
been developed as gadgets for outdoor freaks or plaything for nerds. In this
paper, we describe recent advances in the development of a method of onsite data and image gathering, which allows the processing of concrete 3D
models of informal city spaces. The visual quality of these models is still
moderate, but the resulting three-dimensional spatial puzzle makes a widely
inaccessible and undocumented piece of city terrain visible, understandable
and designable. The software used is free.
Keywords. Fieldwork tools; mapping; 3D modelling.
1. Investigation Area
Kampung Melayu-Bukit Duri is an urban village in East Jakarta, situated at both
sides of the Ciliwung River. The river is prone to floods of cataclysmic proportions, and is heavily polluted (Figure 1). We are investigating this area in the
541
5B-004.qxd
4/28/2013
542
4:32 AM
Page 542
J. REKITTKE, Y. NINSALAM AND P. PAAR
Figure 1. In the foreground the specific riverbed profile consisting of a metre-thick mix of waste
and sediment can be seen (Photo: Rekittke, 2011).
course of the research module Landscape Ecology in the ‘Future Cities
Laboratory’ (Rekittke and Girot, 2012), hosted and managed by the Singapore
ETH Centre for Global Environmental Sustainability (Cairns et al., 2012).
2. White Spots on the Digital World Map
In order to build complete and detailed models of complex terrain and urban territory, the direct contact to ground and detail remains indispensable – for all the
available remote sensing technology. Those who depend on the direct and realtime contact to site and substantiality, we refer to as ‘foot soldiers’. Those who
work from the distance, we refer to as ‘aviators’ (Rekittke et al., 2012).
Being a foot soldier [...] does not mean that the chosen equipment is not
related to the mother technology of the aviators – satellites, radio, and data
networks – rather, it has to be light, portable and inconspicuous. [...] Their
mission is to provide for terrain data and details that cannot be gathered by
the aviators [...]. (Rekittke et al., 2012, p. 199)
Foot soldier craft is appropriate to sort out remaining white spots on the digital
world map, situated under vegetation and urban canopy like roofs, architectural
overhangs, superstructures and in urban canyons, where remote sensing technology
proves to be blind (Figure 2). Where we work, no one hands out any expedient data
to us, suitable for copy and paste transfer to our computers. Furthermore, the speed
of change within the context of megacities accelerates the expiry date of the little
available material. Design research needs up to date data for spatial analysis and
academic design studios often operate on a low-cost and low-key level. We subsume such approach under the term of ‘Grassroots GIS’ (Rekittke and Paar, 2010).
5B-004.qxd
4/28/2013
4:32 AM
Page 543
OPEN TO RIDICULE
543
Figure 2. Under the bridge (bottom of image) 200 people are living; the Ciliwung River edges are
submerged, canopy camouflages the terrain (Image: Google Earth, 2012).
3. Direct Landscape Exploration with Unorthodox Equipment
The fieldwork at hand took six days, focussing the river and its edges. Initially we
walked along the banks, but our feet sank too deep into the sludge and the banks
became too steep for walking. In order to model 2.86 km of the river stretch, we
had to get onto the river. A Challenger Inflatable, the mother of all rubber boats was
purchased (Figure 3), ideal for rafting between the floating toilets and trash of the
Ciliwung River. It carries four people and a load of 400 kg. We carried a Garmin
GPS tracking device, for digital video taking we used three combined sets of the
GoPro HD Hero2 Outdoor Edition – off the shelf action video cameras fitted in
waterproof housings and attached to 3-way pivots on a telescopic mast (Figure 4).
Figure 3. Garmin GPS track loaded and visualised in Google Earth with accompanying elevation
data expressed as the graph. Measurements displayed – the Garmin is imprecise – include relative
height, speed and duration of the boat trips (Images: Ninsalam/Paar, 2012).
5B-004.qxd
4/28/2013
544
4:33 AM
Page 544
J. REKITTKE, Y. NINSALAM AND P. PAAR
Figure 4. Requisite handicraft work: three GoPro HD Hero2 cameras, mounted on a telescopic
mast, balanced in the middle of the flabby boat (Photos: Rekittke/Heng, 2012).
Figure 5. Drone takeoff from the boat. The video quality of the inbuilt camera is moderate
(Photos: Paar, 2012).
The GoPro camera captures high-resolution videos and photos with a 2.8 mm
fix objective. For the recording of the river trips we chose the 127° medium field
of view (FOV) setting with a resolution of 1920 x 1080 pixels, and a rate of 30
frames per second. A tropics-specific problem had been the occasional overheating of the cameras, caused by prolonged usage in the waterproof housing. This
contributed to a partial loss of initial recordings before the detection of the issue.
In order to visually illuminate completely inaccessible or hidden areas of the
site, we carried an iPad-app-controlled Parrot AR.Drone 2.0 (Figure 5), being
about an inexpensive plaything quadrocopter with an effective range of about 50
meters, equipped with an in-built video camera, allowing wireless video recording
and transmission via a picture resolution of 1280 x 720 dpi, a 92° diagonal wide
angle lens and a recording rate of 30 frames per second.
The drone is remote controlled by the gratis AR.Free Flight app, which records
and transmits data to control three sensory functions of the aircraft: 1) yaw, pitch
and roll movement; 2) ground altitude; 3) x/y axis ground movement. When taking
5B-004.qxd
4/28/2013
4:33 AM
Page 545
OPEN TO RIDICULE
545
off from a steep and uneven riverbank, the drone initially worked well and delivered precious video material but later displayed an Ultrasound Emergency,
presumably due to the dusty uneven terrain, and finally crashed. When launched
from the boat, things went unexpectedly well. The drone flew smoothly and captured acceptable video recordings – quod erat demonstrandum. During the final
attempt the drone shaved some bamboo leaves, struck an open toilet pipe of a slum
shack and miserably crashed into a feculent garbage heap.
4. Post-processing with Free Software Components
The hardware we use for post-processing represents common standards in computer
gamer circles – Grassroots GIS asks for affordable off-the-shelf articles. All software
applied for the described work is free or open source. For the post-processing of former fieldwork material we used the web-based software Autodesk 123D Catch that
allows for an upload of less than one hundred photos. This time, we experimented
with uploads of up to three thousand photos simultaneously, via the programmes
VisualSfM version 0.5.20 and CMPMVS version 4.0. VisualSfM is a graphics user
interface (GUI) application of Structure from Motion (SfM) (Wu, 2011). The system
is based upon the measured correspondence of image features, such as points, edges
and lines – inferring camera poses from a selection of photos (Wang, 2011).
Subsequently, the measured camera poses are processed by CMPMVS, multi-view
reconstruction software, which generates a textured mesh and reconstructs the surface
of the final 3D model (Havlena et al., 2010). The resulting product of VisualSfM – a
point cloud – then can be processed with the help of CMPMVS. The standard workflow (Figure 6) for VisualSfM and CMPMVS can be abstracted as follows:
Figure 6. VisualSfM workflow: Feature Detection and Bundle Adjustment (top left and right);
Dense and Surface Reconstruction in CMPMVS and video output (bottom left and right).
5B-004.qxd
4/28/2013
546
4:33 AM
Page 546
J. REKITTKE, Y. NINSALAM AND P. PAAR
• Upload of the images into the SfM workspace as an image file in jpg-format.
• Feature detection and full pairwise image matching, this generates a set of measured image features and correspondence (Wu, 2007).
• Incremental reconstruction, this will start the multicore bundle adjustment, which
calculates the 3D point positions and camera parameters from the set of measured
image features and correspondence (Wu et al., 2011)
• Dense reconstruction by using a multi-view stereo (MVS) application – CMPMVS –
this performs the textured surface reconstruction from the point clouds (Havlena
et al., 2010).
5. Choice of Optimal Software
The Autodesk software 123D Catch offers limited functionality concerning a
detailed determination of the resulting model, but it is the best choice for basic
users. Features that hitherto outplay VisualSfM include the implementation of a
reference scale allowing scaling of the entire 3D model, taking of reference measurements, animation of the 3D model in the software directly, and a complete,
cloud-based 3D surface reconstruction pipeline. VisualSfM stores and processes
the selected image material on a local folder. This bypasses any time consuming
and often impractical web-based upload and enables the user to check feature
matching and sparse reconstruction processing in the field. The main goal of our
research is to achieve the best model quality possible. VisualSfM therefore currently turns out to be the optimal option (Figure 7).
The pipeline allows for a flexible addition of selected photos to integrate and
improve existing models. It also allows manually performed initialization of the
automatic feature detection process by the determination of a proper pair of photographs. A ‘Pause’ and ‘Run’ button facilitates the integration of more photos
during the feature-matching phase.
Figure 7. Screenshot of a model generated from Autodesk 123D Catch (left); screenshot of the
same model generated from VisualSfM (right).
5B-004.qxd
4/28/2013
4:33 AM
Page 547
OPEN TO RIDICULE
547
6. Choice of Optimal Picture Capturing Method
The quality of the used image material is closely related to the choice of equipment and capturing methods. In combination this defines the quality of the final
model. We experimented with both photo taking and video filming.
In order to produce still photos from the site we use common standard DSLR
cameras, which deliver imagery of very high resolution. Our video material was
generated by GoPro HD Hero2 action cameras and Parrot AR.Drones 2.0. The
action cameras were initially made for sports freaks like free fallers, bungee
jumpers and other contemporaries tired of life and keen to document their adrenalin-warped faces or bird’s eye view environments during flight. The inexpensive
drones are considered to be gadgets. The technology of these devices is developing fast and will achieve impressive functionality and quality soon. The use of the
chosen tools is practical in harsh environments and small budget contexts. When
using site photos, all available images are fed into the respective black box – 123D
Catch or VisualSfM. Video format recordings cannot be fed into the system in an
unprocessed form, a certain number of frames have to be extracted. In order to
find out the adequate number of extracted frames, we had to carry out tests. For
this purpose we defined a one hundred seconds sequence of the river, approximately covering a leg of eighty meters. Our initial hypothesis reads that the bigger
the number of extracted frames, the better the models would be. This assumption
turned out to be unfounded. The most successful model quality results from a
feed-in range between one to five frames per second (Figure 8). The camera
Figure 8. Less is more: CMPMVS screenshot of a landscape model at one frame per second feedin rate, representing the actual quality we are able to generate with our low-cost tools.
5B-004.qxd
4/28/2013
4:33 AM
548
Page 548
J. REKITTKE, Y. NINSALAM AND P. PAAR
records thirty frames per second, accordingly, a five frames per second selection
implies the use of every sixth frame.
7. State of the Art
While we worked on the post-processing methods in our project, the Street View
producer Google announced a field campaign which definitely marks the current
high-end reference and benchmark related to our moderate foot soldier approach.
The company mounted its expensive camera technology on a backpack – showcasing the Grand Canyon’s most popular hiking trails.
The so-called trekker captures images every 2.5 seconds with 15 cameras
that are 5 megapixels each [...]. The GPS data is limited, so Google must
compensate with sensors that record temperature, vibrations and the orientation of the device [...], before it stiches the images together and makes
them available to users [...]. (Fonseca, 2012)
The used rosette of cameras (Figure 9) is the latest evolution in mapping technology made by the Mountain View Company, California.
Figure 9. The ‘trekker’ during a demonstration at Grand Canyon National Park. The device
weighs about 18 kilograms, it is about 1.2 metres in height (Photos: AP/Rick Bowmer).
8. Benefits and Limits of Plaything Technology
The rubber boat represents an openness to ridicule, as does the plaything drone
that we had aboard. One of our main benefits of these playthings unquestionably
is the fact that we never would have dared to apply the described technology and
methods in the given environment, if they would have been subject to high expenditure and material value. The price we pay for the low-cost solution is the limited
model quality we can currently achieve. The front camera objective features such
5B-004.qxd
4/28/2013
4:33 AM
Page 549
OPEN TO RIDICULE
549
Figure 10. Results from an AR. Drone 2.0 flight: (left) faulty model after auto-rectification of
barrel distortion in VisualSfM; (right) original image–shaded surface–textured surface series
without radial distortion (l.) and with distortion option turned on (r.).
a large distortion error value that it has to be auto-rectified by the software
VisualSfM. The drone camera is too low-grade to be calibrated for radial distortion with the help of existent software. There are sundry software-based
possibilities to bypass such demerits, but in our work we deliberately do not aim
for perfection. In the context of our academic design research studio work with
students, even a deficient model makes sense and is better than nothing (Figure
10). However, the better the available and chosen camera technology will become,
the better the quality of the producible models will be.
9. Foot Soldier Future
The craft of the fieldwork conducting foot soldier will remain shirtsleeve, what
will rapidly change are precision, accuracy and dependability of the available
equipment. The complexity of operations will have to be increased, from the current image-based modelling approach towards a conflation of GIS data sets and
their corresponding image-based geometries. Conflation – in GIS – is defined as
the process of combining geographic information from overlapping sources in
order to retain accurate data, minimize redundancy, and reconcile data conflicts
(Longley et al., 2001). We already ordered the next generation of outdoor action
cameras – insignificantly more expensive but considerably more powerful. Also,
an improved aircraft will be in our baggage. One day, when the price drops, a
portable 3D scanner system with real-time 3D reconstruction capability might be
part of our equipment. These scanners come with software development kits,
which enable developers to create various new applications (Andersen et al.,
2012). And certainly one must experiment with Light Detection And Ranging
(LIDAR) technology, which is being integrated into standard GIS software
5B-004.qxd
4/28/2013
4:33 AM
550
Page 550
J. REKITTKE, Y. NINSALAM AND P. PAAR
(Jordan, 2012). With the help of these advanced tools, many of the remaining
white spots on the digital world map will be successfully filled with three-dimensional information.
Acknowledgements
Special thanks goes to Professor Christophe Girot, in his capacity as head of the research module
“Landscape Ecology” of the Future Cities Laboratory, Singapore ETH Centre for Global
Environmental Sustainability. Our work is embedded in the Future Cities Laboratory, in form of a
Design Research Studio as well as in form of PhD research. In the same way we would like to thank
Professor Wong Yunn Chii, head of the NUS Department of Architecture, for his unorthodox support – in face of our out of the ordinary travel preferences.
References
Andersen, M.R., Jensen, T., Lisouski, P., Mortensen, A.K., Hansen, M.K., Gregersen, T., and
Ahrendt, P.: 2012, Kinect Depth Sensor Evaluation for Computer Vision Applications (Technical
Report ECE-TR-6), Department of Engineering, Aarhus University, 1–39.
Cairns, S., Gurusamy, S., Prescott, M., and Theodoratos, N. (eds.): 2012, Rivers in Future Cities: The
Case of Ciliwung, Jakarta, Indonesia. Module VII: Landscape Ecology, FCL – Future Cities
Laboratory Gazette, 12, Singapore ETH Centre for Global Environmental Sustainability.
Fonseca, F.: 2012, Google cameras map popular Grand Canyon trails. Associated Press, Oct 24,
2012. Available from: <http://bigstory.ap.org/article/google-cameras-map-popular-grandcanyon-trails-2> (accessed 23 November 2012).
Havlena, M., Torii, A., and Pajdla, T.: 2010, Efficient Structure from Motion by Graph Optimization,
in Daniilidis, K., Maragos, P. and Paragios, N. (eds.), ECCV 2010, Part II, Lecture Notes in
Computer Science (LNCS) Volume 6312, 2010, Springer-Verlag, Berlin, Heidelberg, 100-113.
Jordan, L.: 2012, “Integrating LiDAR and 3D into GIS”. Available from: <http://www.youtube.com/
watch?v=0USiCcsiFqE> (accessed 25 November 2012).
Longley, P.A., Goodchild, M.F., Maguire, D.J., and Rhind, D.W.: 2001, Geographic Information
Systems and Science, John Wiley & Sons, New York.
Rekittke, J. and Girot, C.: 2012, Cali Ciliwung Experiment, S.L.U.M. Lab. Sustainable Living Urban
Model, Edition 7, 68–69.
Rekittke, J., Paar, P., and Ninsalam, Y.: 2012. Foot Soldiers of GeoDesign, in
Buhmann/Ervin/Pietsch (eds.), Proceedings of Digital Landscape Architecture 2012, Anhalt
University of Applied Sciences, Wichmann, Heidelberg, 199–210.
Rekittke, J. and Paar, P., 2010. Grassroots GIS – Digital Outdoor Designing Where the Streets Have
No Name, in Buhmann, E., Pietsch, M., and Kretzler, E. (eds.), Proceedings of Digital
Landscape Architecture 2010, Anhalt University of Applied Sciences, Wichmann, Heidelberg,
69–78.
Wang, Y.-F.: 2011, A Comparison Study of Five 3D Modeling Systems Based on the SfMPrinciples
(Technical Report No. Visualsize Inc. TR 2011-01), Goleta, USA, 1–30.
Wu, C.: 2011, “VisualSFM: A Visual Structure from Motion System”. Available from: < http://
homes.cs.washington.edu/~ccwu/vsfm/> (accessed 3 November 2012).
Wu, C.: 2007, “SiftGPU: A GPU implementation of Scale Invariant Feature Transform (SIFT)”.
Available from: < http://cs.unc.edu/~ccwu/siftgpu> (accessed 1 November 2012).
Wu, C., Agarwal, S., Curless, B., and Seitz, S.M.: 2011, Multicore Bundle Adjustment, Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition 2011 (CVPR), 3057–3064.
Download