5B-004.qxd 4/28/2013 4:32 AM Page 541 R. Stouffs, P. Janssen, S. Roudavski, B. Tunçer (eds.), Open Systems: Proceedings of the 18th International Conference on Computer-Aided Architectural Design Research in Asia (CAADRIA 2013), 541–550. © 2013, The Association for Computer-Aided Architectural Design Research in Asia (CAADRIA), Hong Kong, and Center for Advanced Studies in Architecture (CASA), Department of Architecture-NUS, Singapore. OPEN TO RIDICULE Deploying plaything technology for 3D modelling of urban informal settlements in Asia Joerg REKITTKE National University of Singapore, Singapore rekittke@nus.edu.sg Yazid NINSALAM Singapore-ETH Centre, Singapore ninsalam@arch.ethz.ch and Philip PAAR Laubwerk, Berlin, Germany paar@laubwerk.com Abstract. As technology affine urban landscape researchers, working in Asian mega cities, we roam through crowded and narrow, widely informal city layouts, where we apply digital fieldwork equipment and conduct design work. We use low cost cameras and camera drones, tools that had been developed as gadgets for outdoor freaks or plaything for nerds. In this paper, we describe recent advances in the development of a method of onsite data and image gathering, which allows the processing of concrete 3D models of informal city spaces. The visual quality of these models is still moderate, but the resulting three-dimensional spatial puzzle makes a widely inaccessible and undocumented piece of city terrain visible, understandable and designable. The software used is free. Keywords. Fieldwork tools; mapping; 3D modelling. 1. Investigation Area Kampung Melayu-Bukit Duri is an urban village in East Jakarta, situated at both sides of the Ciliwung River. The river is prone to floods of cataclysmic proportions, and is heavily polluted (Figure 1). We are investigating this area in the 541 5B-004.qxd 4/28/2013 542 4:32 AM Page 542 J. REKITTKE, Y. NINSALAM AND P. PAAR Figure 1. In the foreground the specific riverbed profile consisting of a metre-thick mix of waste and sediment can be seen (Photo: Rekittke, 2011). course of the research module Landscape Ecology in the ‘Future Cities Laboratory’ (Rekittke and Girot, 2012), hosted and managed by the Singapore ETH Centre for Global Environmental Sustainability (Cairns et al., 2012). 2. White Spots on the Digital World Map In order to build complete and detailed models of complex terrain and urban territory, the direct contact to ground and detail remains indispensable – for all the available remote sensing technology. Those who depend on the direct and realtime contact to site and substantiality, we refer to as ‘foot soldiers’. Those who work from the distance, we refer to as ‘aviators’ (Rekittke et al., 2012). Being a foot soldier [...] does not mean that the chosen equipment is not related to the mother technology of the aviators – satellites, radio, and data networks – rather, it has to be light, portable and inconspicuous. [...] Their mission is to provide for terrain data and details that cannot be gathered by the aviators [...]. (Rekittke et al., 2012, p. 199) Foot soldier craft is appropriate to sort out remaining white spots on the digital world map, situated under vegetation and urban canopy like roofs, architectural overhangs, superstructures and in urban canyons, where remote sensing technology proves to be blind (Figure 2). Where we work, no one hands out any expedient data to us, suitable for copy and paste transfer to our computers. Furthermore, the speed of change within the context of megacities accelerates the expiry date of the little available material. Design research needs up to date data for spatial analysis and academic design studios often operate on a low-cost and low-key level. We subsume such approach under the term of ‘Grassroots GIS’ (Rekittke and Paar, 2010). 5B-004.qxd 4/28/2013 4:32 AM Page 543 OPEN TO RIDICULE 543 Figure 2. Under the bridge (bottom of image) 200 people are living; the Ciliwung River edges are submerged, canopy camouflages the terrain (Image: Google Earth, 2012). 3. Direct Landscape Exploration with Unorthodox Equipment The fieldwork at hand took six days, focussing the river and its edges. Initially we walked along the banks, but our feet sank too deep into the sludge and the banks became too steep for walking. In order to model 2.86 km of the river stretch, we had to get onto the river. A Challenger Inflatable, the mother of all rubber boats was purchased (Figure 3), ideal for rafting between the floating toilets and trash of the Ciliwung River. It carries four people and a load of 400 kg. We carried a Garmin GPS tracking device, for digital video taking we used three combined sets of the GoPro HD Hero2 Outdoor Edition – off the shelf action video cameras fitted in waterproof housings and attached to 3-way pivots on a telescopic mast (Figure 4). Figure 3. Garmin GPS track loaded and visualised in Google Earth with accompanying elevation data expressed as the graph. Measurements displayed – the Garmin is imprecise – include relative height, speed and duration of the boat trips (Images: Ninsalam/Paar, 2012). 5B-004.qxd 4/28/2013 544 4:33 AM Page 544 J. REKITTKE, Y. NINSALAM AND P. PAAR Figure 4. Requisite handicraft work: three GoPro HD Hero2 cameras, mounted on a telescopic mast, balanced in the middle of the flabby boat (Photos: Rekittke/Heng, 2012). Figure 5. Drone takeoff from the boat. The video quality of the inbuilt camera is moderate (Photos: Paar, 2012). The GoPro camera captures high-resolution videos and photos with a 2.8 mm fix objective. For the recording of the river trips we chose the 127° medium field of view (FOV) setting with a resolution of 1920 x 1080 pixels, and a rate of 30 frames per second. A tropics-specific problem had been the occasional overheating of the cameras, caused by prolonged usage in the waterproof housing. This contributed to a partial loss of initial recordings before the detection of the issue. In order to visually illuminate completely inaccessible or hidden areas of the site, we carried an iPad-app-controlled Parrot AR.Drone 2.0 (Figure 5), being about an inexpensive plaything quadrocopter with an effective range of about 50 meters, equipped with an in-built video camera, allowing wireless video recording and transmission via a picture resolution of 1280 x 720 dpi, a 92° diagonal wide angle lens and a recording rate of 30 frames per second. The drone is remote controlled by the gratis AR.Free Flight app, which records and transmits data to control three sensory functions of the aircraft: 1) yaw, pitch and roll movement; 2) ground altitude; 3) x/y axis ground movement. When taking 5B-004.qxd 4/28/2013 4:33 AM Page 545 OPEN TO RIDICULE 545 off from a steep and uneven riverbank, the drone initially worked well and delivered precious video material but later displayed an Ultrasound Emergency, presumably due to the dusty uneven terrain, and finally crashed. When launched from the boat, things went unexpectedly well. The drone flew smoothly and captured acceptable video recordings – quod erat demonstrandum. During the final attempt the drone shaved some bamboo leaves, struck an open toilet pipe of a slum shack and miserably crashed into a feculent garbage heap. 4. Post-processing with Free Software Components The hardware we use for post-processing represents common standards in computer gamer circles – Grassroots GIS asks for affordable off-the-shelf articles. All software applied for the described work is free or open source. For the post-processing of former fieldwork material we used the web-based software Autodesk 123D Catch that allows for an upload of less than one hundred photos. This time, we experimented with uploads of up to three thousand photos simultaneously, via the programmes VisualSfM version 0.5.20 and CMPMVS version 4.0. VisualSfM is a graphics user interface (GUI) application of Structure from Motion (SfM) (Wu, 2011). The system is based upon the measured correspondence of image features, such as points, edges and lines – inferring camera poses from a selection of photos (Wang, 2011). Subsequently, the measured camera poses are processed by CMPMVS, multi-view reconstruction software, which generates a textured mesh and reconstructs the surface of the final 3D model (Havlena et al., 2010). The resulting product of VisualSfM – a point cloud – then can be processed with the help of CMPMVS. The standard workflow (Figure 6) for VisualSfM and CMPMVS can be abstracted as follows: Figure 6. VisualSfM workflow: Feature Detection and Bundle Adjustment (top left and right); Dense and Surface Reconstruction in CMPMVS and video output (bottom left and right). 5B-004.qxd 4/28/2013 546 4:33 AM Page 546 J. REKITTKE, Y. NINSALAM AND P. PAAR • Upload of the images into the SfM workspace as an image file in jpg-format. • Feature detection and full pairwise image matching, this generates a set of measured image features and correspondence (Wu, 2007). • Incremental reconstruction, this will start the multicore bundle adjustment, which calculates the 3D point positions and camera parameters from the set of measured image features and correspondence (Wu et al., 2011) • Dense reconstruction by using a multi-view stereo (MVS) application – CMPMVS – this performs the textured surface reconstruction from the point clouds (Havlena et al., 2010). 5. Choice of Optimal Software The Autodesk software 123D Catch offers limited functionality concerning a detailed determination of the resulting model, but it is the best choice for basic users. Features that hitherto outplay VisualSfM include the implementation of a reference scale allowing scaling of the entire 3D model, taking of reference measurements, animation of the 3D model in the software directly, and a complete, cloud-based 3D surface reconstruction pipeline. VisualSfM stores and processes the selected image material on a local folder. This bypasses any time consuming and often impractical web-based upload and enables the user to check feature matching and sparse reconstruction processing in the field. The main goal of our research is to achieve the best model quality possible. VisualSfM therefore currently turns out to be the optimal option (Figure 7). The pipeline allows for a flexible addition of selected photos to integrate and improve existing models. It also allows manually performed initialization of the automatic feature detection process by the determination of a proper pair of photographs. A ‘Pause’ and ‘Run’ button facilitates the integration of more photos during the feature-matching phase. Figure 7. Screenshot of a model generated from Autodesk 123D Catch (left); screenshot of the same model generated from VisualSfM (right). 5B-004.qxd 4/28/2013 4:33 AM Page 547 OPEN TO RIDICULE 547 6. Choice of Optimal Picture Capturing Method The quality of the used image material is closely related to the choice of equipment and capturing methods. In combination this defines the quality of the final model. We experimented with both photo taking and video filming. In order to produce still photos from the site we use common standard DSLR cameras, which deliver imagery of very high resolution. Our video material was generated by GoPro HD Hero2 action cameras and Parrot AR.Drones 2.0. The action cameras were initially made for sports freaks like free fallers, bungee jumpers and other contemporaries tired of life and keen to document their adrenalin-warped faces or bird’s eye view environments during flight. The inexpensive drones are considered to be gadgets. The technology of these devices is developing fast and will achieve impressive functionality and quality soon. The use of the chosen tools is practical in harsh environments and small budget contexts. When using site photos, all available images are fed into the respective black box – 123D Catch or VisualSfM. Video format recordings cannot be fed into the system in an unprocessed form, a certain number of frames have to be extracted. In order to find out the adequate number of extracted frames, we had to carry out tests. For this purpose we defined a one hundred seconds sequence of the river, approximately covering a leg of eighty meters. Our initial hypothesis reads that the bigger the number of extracted frames, the better the models would be. This assumption turned out to be unfounded. The most successful model quality results from a feed-in range between one to five frames per second (Figure 8). The camera Figure 8. Less is more: CMPMVS screenshot of a landscape model at one frame per second feedin rate, representing the actual quality we are able to generate with our low-cost tools. 5B-004.qxd 4/28/2013 4:33 AM 548 Page 548 J. REKITTKE, Y. NINSALAM AND P. PAAR records thirty frames per second, accordingly, a five frames per second selection implies the use of every sixth frame. 7. State of the Art While we worked on the post-processing methods in our project, the Street View producer Google announced a field campaign which definitely marks the current high-end reference and benchmark related to our moderate foot soldier approach. The company mounted its expensive camera technology on a backpack – showcasing the Grand Canyon’s most popular hiking trails. The so-called trekker captures images every 2.5 seconds with 15 cameras that are 5 megapixels each [...]. The GPS data is limited, so Google must compensate with sensors that record temperature, vibrations and the orientation of the device [...], before it stiches the images together and makes them available to users [...]. (Fonseca, 2012) The used rosette of cameras (Figure 9) is the latest evolution in mapping technology made by the Mountain View Company, California. Figure 9. The ‘trekker’ during a demonstration at Grand Canyon National Park. The device weighs about 18 kilograms, it is about 1.2 metres in height (Photos: AP/Rick Bowmer). 8. Benefits and Limits of Plaything Technology The rubber boat represents an openness to ridicule, as does the plaything drone that we had aboard. One of our main benefits of these playthings unquestionably is the fact that we never would have dared to apply the described technology and methods in the given environment, if they would have been subject to high expenditure and material value. The price we pay for the low-cost solution is the limited model quality we can currently achieve. The front camera objective features such 5B-004.qxd 4/28/2013 4:33 AM Page 549 OPEN TO RIDICULE 549 Figure 10. Results from an AR. Drone 2.0 flight: (left) faulty model after auto-rectification of barrel distortion in VisualSfM; (right) original image–shaded surface–textured surface series without radial distortion (l.) and with distortion option turned on (r.). a large distortion error value that it has to be auto-rectified by the software VisualSfM. The drone camera is too low-grade to be calibrated for radial distortion with the help of existent software. There are sundry software-based possibilities to bypass such demerits, but in our work we deliberately do not aim for perfection. In the context of our academic design research studio work with students, even a deficient model makes sense and is better than nothing (Figure 10). However, the better the available and chosen camera technology will become, the better the quality of the producible models will be. 9. Foot Soldier Future The craft of the fieldwork conducting foot soldier will remain shirtsleeve, what will rapidly change are precision, accuracy and dependability of the available equipment. The complexity of operations will have to be increased, from the current image-based modelling approach towards a conflation of GIS data sets and their corresponding image-based geometries. Conflation – in GIS – is defined as the process of combining geographic information from overlapping sources in order to retain accurate data, minimize redundancy, and reconcile data conflicts (Longley et al., 2001). We already ordered the next generation of outdoor action cameras – insignificantly more expensive but considerably more powerful. Also, an improved aircraft will be in our baggage. One day, when the price drops, a portable 3D scanner system with real-time 3D reconstruction capability might be part of our equipment. These scanners come with software development kits, which enable developers to create various new applications (Andersen et al., 2012). And certainly one must experiment with Light Detection And Ranging (LIDAR) technology, which is being integrated into standard GIS software 5B-004.qxd 4/28/2013 4:33 AM 550 Page 550 J. REKITTKE, Y. NINSALAM AND P. PAAR (Jordan, 2012). With the help of these advanced tools, many of the remaining white spots on the digital world map will be successfully filled with three-dimensional information. Acknowledgements Special thanks goes to Professor Christophe Girot, in his capacity as head of the research module “Landscape Ecology” of the Future Cities Laboratory, Singapore ETH Centre for Global Environmental Sustainability. Our work is embedded in the Future Cities Laboratory, in form of a Design Research Studio as well as in form of PhD research. In the same way we would like to thank Professor Wong Yunn Chii, head of the NUS Department of Architecture, for his unorthodox support – in face of our out of the ordinary travel preferences. References Andersen, M.R., Jensen, T., Lisouski, P., Mortensen, A.K., Hansen, M.K., Gregersen, T., and Ahrendt, P.: 2012, Kinect Depth Sensor Evaluation for Computer Vision Applications (Technical Report ECE-TR-6), Department of Engineering, Aarhus University, 1–39. Cairns, S., Gurusamy, S., Prescott, M., and Theodoratos, N. (eds.): 2012, Rivers in Future Cities: The Case of Ciliwung, Jakarta, Indonesia. Module VII: Landscape Ecology, FCL – Future Cities Laboratory Gazette, 12, Singapore ETH Centre for Global Environmental Sustainability. Fonseca, F.: 2012, Google cameras map popular Grand Canyon trails. Associated Press, Oct 24, 2012. Available from: <http://bigstory.ap.org/article/google-cameras-map-popular-grandcanyon-trails-2> (accessed 23 November 2012). Havlena, M., Torii, A., and Pajdla, T.: 2010, Efficient Structure from Motion by Graph Optimization, in Daniilidis, K., Maragos, P. and Paragios, N. (eds.), ECCV 2010, Part II, Lecture Notes in Computer Science (LNCS) Volume 6312, 2010, Springer-Verlag, Berlin, Heidelberg, 100-113. Jordan, L.: 2012, “Integrating LiDAR and 3D into GIS”. Available from: <http://www.youtube.com/ watch?v=0USiCcsiFqE> (accessed 25 November 2012). Longley, P.A., Goodchild, M.F., Maguire, D.J., and Rhind, D.W.: 2001, Geographic Information Systems and Science, John Wiley & Sons, New York. Rekittke, J. and Girot, C.: 2012, Cali Ciliwung Experiment, S.L.U.M. Lab. Sustainable Living Urban Model, Edition 7, 68–69. Rekittke, J., Paar, P., and Ninsalam, Y.: 2012. Foot Soldiers of GeoDesign, in Buhmann/Ervin/Pietsch (eds.), Proceedings of Digital Landscape Architecture 2012, Anhalt University of Applied Sciences, Wichmann, Heidelberg, 199–210. Rekittke, J. and Paar, P., 2010. Grassroots GIS – Digital Outdoor Designing Where the Streets Have No Name, in Buhmann, E., Pietsch, M., and Kretzler, E. (eds.), Proceedings of Digital Landscape Architecture 2010, Anhalt University of Applied Sciences, Wichmann, Heidelberg, 69–78. Wang, Y.-F.: 2011, A Comparison Study of Five 3D Modeling Systems Based on the SfMPrinciples (Technical Report No. Visualsize Inc. TR 2011-01), Goleta, USA, 1–30. Wu, C.: 2011, “VisualSFM: A Visual Structure from Motion System”. Available from: < http:// homes.cs.washington.edu/~ccwu/vsfm/> (accessed 3 November 2012). Wu, C.: 2007, “SiftGPU: A GPU implementation of Scale Invariant Feature Transform (SIFT)”. Available from: < http://cs.unc.edu/~ccwu/siftgpu> (accessed 1 November 2012). Wu, C., Agarwal, S., Curless, B., and Seitz, S.M.: 2011, Multicore Bundle Adjustment, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2011 (CVPR), 3057–3064.