Geoprocessing with GDAL and Numpy in Python Delong Zhao 11-03-2011 Geoprocessing with GDAL and Numpy in Python • GDAL - Geospatial Data Abstraction Library • Numpy - the N-dimensional array package for scientific computing with Python. • Both of them are open source software Read raster dataset using GDAL Do some calculation using Numpy Output to geospatial dataset using GDAL GDAL • Supports about 100 raster formats – ArcInfo grids, ArcSDE raster, Imagine, Idrisi, – ENVI, GRASS, GeoTIFF – HDF4, HDF5 – USGS DOQ, USGS DEM – ECW, MrSID – TIFF, JPEG, JPEG2000, PNG, GIF, BMP – See http://www.gdal.org/formats_list.html Numpy • the fundamental package needed for scientific computing with Python. • a powerful N-dimensional array object • sophisticated (broadcasting) functions • tools for integrating C/C++ and Fortran code • useful linear algebra, Fourier transform, and random number capabilities. • http://numpy.scipy.org/ Installation • 1. Enthought python scientific computing package, includes numpy – http://www.enthought.com/ • 2. GDAL - Geospatial Data Abstraction Library – http://www.lfd.uci.edu/~gohlke/pythonlibs/ • Or all of these has been installed on EOMF and Cybercommons servers Tutorial • http://itmetr.net/itmetr.cgi/PyIntro • http://www.gis.usu.edu/~chrisg/python/2009 / • http://www.scipy.org/NumPy_for_Matlab_Us ers • https://www.cfa.harvard.edu/~jbattat/comput er/python/science/idl-numpy.html Sample 1 • Read two tif files (red band and nir band) • Calculate • Output NDVI in same projection and georeference as the input file. • Numpy example Algorithm development for global cropping intensity from 2000-2011 1-crop per year 2-crop per year 3-crop per year Dynamics of winter wheat and paddy rice fields in Nanjing, Jiangsu, China 1 .0 V e g e ta tio n in d ice s 0 .8 NDVI LSW I EVI F ie ld site in Jia n g su (c) 7/3/99 2-weeks after rice transplanting 0 .6 0 .4 0 .2 winter wheat paddy rice 0 .0 1 /1 /0 2 3 /1 /0 2 5 /1 /0 2 7 /1 /0 2 9 /1 /0 2 1 1 /1 /0 2 1 /1 /0 3 T im e (8 -d a y in te rva l) (b) 6/11/99 rice field preparation (d) 9/6/99 rice plant heading MODIS 8-day composites of surface reflectance product (MOD09A1) NDSI NDVI, EVI, Snow mask Cloud mask LSWI Evergreen vegetation mask Permanent water mask Temporal profile analysis of individual pixels Cropping intensity ( # of crops per year) Crop calendar (planting & harvesting dates) Inundation and paddy rice Global Mapping of Croplands 1 .0 o n e M O D IS p ixe l in B a n g k o k a re a NDVI LSW I EVI V e g e ta tio n In d ice s 0 .8 0 .6 0 .4 0 .2 0 .0 S ta rtin g d a te S ta rtin g d a te -0 .2 1 2 3 4 5 6 7 8 9 10 11 12 M o n th in 2 0 0 4 1 .0 O n e M O D IS p ixe l in M e k o n g b a s in NDVI LSW I EVI V e g e ta tio n In d ice s 0 .8 0 .6 0 .4 0 .2 0 .0 flo o d in g & tra n s p la n tin g -0 .2 Cropping Intensity map in 2004 1 2 3 4 5 flo o d in g & tra n s p la n tin g flo o d in g & tra n s p la n tin g 6 7 8 M o n th in 2 0 0 4 9 10 11 12 Python multiprocessing • http://docs.python.org/library/multiprocessin g.html • • • import multiprocessing pool = multiprocessing.Pool(processes=multiprocessing.cpu_count()) pool.map(doprocess,findfiles(root_dir)) Benchmark • I did some benchmark. By using all 8 cpu and 16G memory on one eomf server • can finish 1 MODIS tile NDVI, EVI, CLOUD,SNOW, LANDWATER, FLOOD, Drought products in 15 minutes. • This means we can finish global 296 tiles 20002011 MODIS data in 786 hours (32 days) with one server. • And we have 6 computation servers, we can improve it to 6 days if all the servers can do the job.