EA poster: Parallel Places Introducing Parallel Programming in Geographical Information Science Shannon Kobs Nawotniak Department of Geosciences Idaho State University Pocatello, USA kobsshan@isu.edu Abstract—Two Geographic Information Science (GIS) courses at Idaho State University participated in the 2014-2015 NSF/TCPP Curriculum Guidelines Early Adopter program for Parallel and Distributed Computing (PDC). Students with little or no background in programming learned to use PDC to solve spatial problems in science. Student anxiety and programming language restrictions resulted in challenges that we were largely able to overcome. Initial evaluations from students and a reviewer are overwhelmingly positive. GIS; parallel; python; fortran90; MPI; openMP, early adoption; education I. INTRODUCTION Parallel and Distributed Computing (PDC) has become increasingly fundamental to large data processing required in Geographic Information Science (GIS), including LiDAR point cloud and hyperspectral raster processing. However, PDC computing courses have remained largely beyond the reach of GIS students, who often come from backgrounds in Geography, Geology, History, Anthropology, etc., and typically have no programming experience. In the Idaho State University GIS minor and graduate programs, we have designed a 2-course sequence to introduce PDC as part of the NSF/TCPP Curriculum Guidelines Early Adopter program. Classes relied strongly on in-class live coding and outside of class efforts at individualized assistance by the instructor and TA (funded by the Early Adopter program). II. geodatabases of road data and processing volcanic eruption model outputs for time-slice visualization in ArcGIS (Fig. 1). Instead of an assigned textbook, students use online resources. Programming for GIS included 10 students: 3 anthropologists, 3 geologists, 2 engineers, 2 environmental scientists/ecologists; 6 women, 4 men; 9 graduate, 1 undergraduate. Only one entering student reported confidence and experience programming in any language. B. Advanced GIS Programming (GEOL 6628) The second course, Advanced GIS Programming takes a more traditional approach to PDC and uses Fortran90 with MPI and openMP. The students use shared and distributed memory systems to solve spatial problems beyond what they can accomplish using ArcGIS tools and libraries. This includes geothermal anomaly detection (Fig. 2) [1] and simple landscape evolution using topographic diffusion. This course uses Pacheco [2] as a text despite its introductory level, recognizing the hybridized nature of a graduate level course aimed at novice programmers. GEOL6628 started at 9 students but dropped to 7 by midterms due to personal conflicts and overscheduling (no withdrawals were related to failing grades). The class was comprised of 3 geologists, 2 anthropologists, 2 environmental science/ecologists, 1 engineer, and 1 mathematician, with 5 women and 4 men. All of the students reported experience in python, though THE COURSES A. Programming for GIS (GEOL 4428/5528) The first course in the sequence, Programming for GIS, serves as a first introduction to code development and parallelism. Students learn python and how to manipulate the arcpy library for ArcGIS, industry standard software that they are already familiar with and expect to use professionally. The students use pool and multiprocessing to parallelize functions operating on a list of inputs, measure their speedup and efficiency, and discuss to differences between shared and distributed memory. Parallel tasks center around spatial data manipulation, such as exploring Fig. 1. Example of a time-slice of ash concentration in a volcanic eruption plume, displayed as part of a sequence in ArcGIS. This was processed as part of a parallelism exercise in GEOL 4428/5528 to demonstrate use of python/arcpy parallelism for processing and visualizing a series of files. only the 6 who continued from GEOL5528 the previous semester reported experience with parallelism. III. CHALLENGES A. Fear/Anxiety The first challenge we encountered is the high level of discomfort many students brought with them to programming. We tried to offset this by emphasizing individual assistance outside of class and creating coffee/cookie breaks during class to promote a relaxed atmosphere while we discussed conceptual problems. Students manipulated a LitleFe cluster [3] to demystify computer hardware. We suspect that technology fears may have been exacerbated by grade-related anxiety; participating students were operating outside of their comfort zones and expected to maintain grades consistent with graduate level study (B or above). We do not suggest adopting “soft” grading, but are considering approaches by which to extend deadlines or allow students to earn points back without creating unreasonable additional effort on the part of the professor and TA. B. Languages The second major challenge is the switch in languages that occurs midway through our sequence. Due to employer expectations for programming in GIS, we are compelled to continue teaching python for ArcGIS in the course sequence. This presents a challenge because: Parallel python is not well-posed for integration with ArcGIS via arcpy. File locking mechanisms embedded within standard arcpy library tools hindered parallelization. As a result, students often were unable to achieve real speedup on ArcGISintegrated problems. Language transition from python to Fortran90 resulted in a slow start in the spring semester. Several weeks at the beginning of the second semester were necessary to orient students to syntax differences, arrays, and various conceptual differences in language style. We found an unexpected benefit to these challenges, however. The slowness of arcpy and its challenges to parallelization resulted in student frustration with the limitations of standard software. This strongly motivated them to enroll in the advanced course; enrollment in GEOL 6628 reached record levels this term. The enforced delay during the language transition also coincided with a necessary transition from Windows (required for ArcGIS) to Linux command line and highlighted lingering misunderstandings regarding programming logic. IV. EVALUATION AND CONCLUSIONS External evaluation of GEOL 4428/5528 by the ISU GIS coordinator noted instructor accessibility to students during Fig. 2. Geothermal anomaly detection in the East African Rift, done using fortran90 with openMP. ArcGIS was unable to handle the input data. and after class time; extensive incorporation of local GIS datasets in order to increase the students’ sense of familiarity with the problems despite the new approaches; and frequent use of analogies to illustrate complex programming concepts. Students anonymously evaluated the course as very good to excellent, citing in-class coding and a supportive atmosphere as course strengths. More tellingly, enrollment rates have already increased in the GIS programming sequence. GEOL 6628, the advanced course, is being taught in the Spring 2015 semester; evaluations are not yet available. Intended term projects are ambitious and include collaboration with Idaho National Lab researchers. Based on available feedback, we consider our early adoption of the NSF/TCPP Curriculum Guidelines for PDC education a success. We plan to continue developing and refining content that will enhance GIS education through PDC. ACKNOWLEDGMENT We thank the NSF/TCPP Curriculum Guidelines Early Adopter Program for their support; the ISU Department of Geosciences for allowing us to pursue this work outside of traditional geology/GIS education; and Meghan Fisher for her work as the TA in these courses. REFERENCES [1] [2] [3] S. Karki, S.K. Nawotniak, H.C. Bottenberg, M. McCurry, J. Welhan, “Determination of Geothermal Anomalies through Multivariate Regression of Background Variables at Yellowstone National Park using Landsat 5 TM Thermal Band Data” Geothermal Resources Council Transactions, vol 38, pp. 503-510, 2014. P.S. Pacheco, An Introduction to Parallel Programming, Morgan Kaufmann: Burlington, MA, 2011. Littlefe.net