Esri Press, 380 New York Street, Redlands, California 92373-8100 Copyright © 2020 Esri All rights reserved. Printed in the United States of America 24 23 22 21 20 1 2 3 4 5 6 7 8 9 10 e-ISBN: 9781589486195 The Library of Congress has cataloged the print edition as follows: Library of Congress Control Number: 2020936496 The information contained in this document is the exclusive property of Esri unless otherwise noted. This work is protected under United States copyright law and the copyright laws of the given countries of origin and applicable international laws, treaties, and/or conventions. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying or recording, or by any information storage or retrieval system, except as expressly permitted in writing by Esri. All requests should be sent to Attention: Contracts and Legal Services Manager, Esri, 380 New York Street, Redlands, California 92373-8100, USA. The information contained in this document is subject to change without notice. US Government Restricted/Limited Rights: Any software, documentation, and/or data delivered hereunder is subject to the terms of the License Agreement. The commercial license rights in the License Agreement strictly govern Licensee’s use, reproduction, or disclosure of the software, data, and documentation. In no event shall the US Government acquire greater than RESTRICTED/LIMITED RIGHTS. At a minimum, use, duplication, or disclosure by the US Government is subject to restrictions as set forth in FAR §52.227-14 Alternates I, II, and III (DEC 2007); FAR §52.227-19(b) (DEC 2007) and/or FAR §12.211/12.212 (Commercial Technical Data/Computer Software); and DFARS §252.227-7015 (DEC 2011) (Technical Data – Commercial Items) and/or DFARS §227.7202 (Commercial Computer Software and Commercial Computer Software Documentation), as applicable. Contractor/Manufacturer is Esri, 380 New York Street, Redlands, CA 92373-8100, USA. @esri.com, 3D Analyst, ACORN, Address Coder, ADF, AML, ArcAtlas, ArcCAD, ArcCatalog, ArcCOGO, ArcData, ArcDoc, ArcEdit, ArcEditor, ArcEurope, ArcExplorer, ArcExpress, ArcGIS, arcgis.com, ArcGlobe, ArcGrid, ArcIMS, ARC/INFO, ArcInfo, ArcInfo Librarian, ArcLessons, ArcLocation, ArcLogistics, ArcMap, ArcNetwork, ArcNews, ArcObjects, ArcOpen, ArcPad, ArcPlot, ArcPress, ArcPy, ArcReader, ArcScan, ArcScene, ArcSchool, ArcScripts, ArcSDE, ArcSdl, ArcSketch, ArcStorm, ArcSurvey, ArcTIN, ArcToolbox, ArcTools, ArcUSA, ArcUser, ArcView, ArcVoyager, ArcWatch, ArcWeb, ArcWorld, ArcXML, Atlas GIS, AtlasWare, Avenue, BAO, Business Analyst, Business Analyst Online, BusinessMAP, CityEngine, CommunityInfo, Database Integrator, DBI Kit, EDN, Esri, Esri CityEngine, esri.com, Esri — Team GIS, Esri — The GIS Company, Esri — The GIS People, Esri — The GIS Software Leader, FormEdit, GeoCollector, Geographic Design System, Geography Matters, Geography Network, geographynetwork.com, Geoloqi, Geotrigger, GIS by Esri, gis.com, GISData Server, GIS Day, gisday.com, GIS for Everyone, JTX, MapIt, Maplex, MapObjects, MapStudio, ModelBuilder, MOLE, MPS —Atlas, PLTS, Rent-aTech, SDE, See What Others Can’t, SML, Sourcebook·America, SpatiaLABS, Spatial Database Engine, StreetMap, Tapestry, the ARC/INFO logo, the ArcGIS Explorer logo, the ArcGIS logo, the ArcPad logo, the Esri globe logo, the Esri Press logo, The Geographic Advantage, The Geographic Approach, the GIS Day logo, the MapIt logo, The World’s Leading Desktop GIS, Water Writes, and Your Personal Geographic Information System are trademarks, service marks, or registered marks of Esri in the United States, the European Community, or certain other jurisdictions. Other companies and products or services mentioned herein may be trademarks, service marks, or registered marks of their respective mark owners. 173946 Contents Preface vii Acknowledgments xi Chapter 1 Introducing advanced Python scripting 1 Chapter 2 Creating Python functions and classes 19 Chapter 3 Creating Python script tools Chapter 4 Python toolboxes Chapter 5 Sharing tools Chapter 6 Managing Python packages and environments 49 93 115 145 Chapter 7 Essential Python modules and packages for geoprocessing 163 Chapter 8 Migrating scripts from Python 2 to 3 Chapter 9 ArcGIS API for Python Index 273 225 207 Preface Programming has become an increasingly important aspect of the skillset of GIS professionals in many fields. Most GIS jobs require at least some experience in programming, and Python is often at the top of the list. Python scripting allows you to automate tasks in ArcGIS® Pro that would be cumbersome using the regular menu-driven interface. Python Scripting for ArcGIS Pro, also published by Esri Press (2020), covers the fundamentals of learning Python to write scripts but does not get into the more advanced skills to develop tools to be shared with others. This is where the current book, Advanced Python Scripting for ArcGIS Pro, comes in. If you are looking to take your GIS programming skills to the next level, this book is for you. Before getting further into the contents of the book, a bit of history is in order. In 2013, Esri Press published Python Scripting for ArcGIS. I wrote the book to serve as an easy-to-understand introduction to Python for creating scripts for ArcGIS Desktop using Python 2. The book quickly became popular among students and professionals, but several years later the book was no longer current. ArcGIS Pro was released in 2015 and further established Python as the preferred scripting language within the ArcGIS platform. ArcGIS Pro uses Python version 3, which is significantly different from version 2. As the industry started to shift from ArcGIS Desktop to ArcGIS Pro, interest grew in an updated version of the book. Both the changes in ArcGIS and the differences in Python versions necessitated a completely new book—not just a second edition of the existing book with minor code updates. That new book is Python Scripting for ArcGIS Pro. In addition, the interest in using Python in the geospatial community continues to grow. This has led to an increasing interest in developing Python tools to share with others, using third-party packages created by the open-source geospatial community, and applying Python to new areas such as web GIS. The current book, Advanced Python Scripting for ArcGIS Pro, covers these topics while at the same time teaching best practices in Python coding. The current book is written for ArcGIS Pro 2.5, which uses Python 3.6.9. As new functionality is added to future releases of ArcGIS Pro, the code in this book will continue to work for the foreseeable future. However, much of the code will not work in ArcGIS Desktop 10.x, although sometimes only minor changes are needed. One chapter in this book is specifically dedicated to explaining the differences between the versions of Python and ArcGIS and how to migrate existing scripts and tools from ArcGIS Desktop 10.x to ArcGIS Pro. In addition, some of the code in this book uses the ArcGIS API for Python 1.7.0, which installs with ArcGIS Pro 2.5. This book is designed to enhance the skills of those who already have a good foundation in Python to write scripts for ArcGIS. The book covers how to take those scripts and develop them into tools and notebooks to share with others, as well as several other more advanced tasks. A good familiarity with ArcGIS Pro is assumed, including managing data, creating cartographic output, and running tools. You should be familiar with the basic concepts of GIS, including coordinate systems, data formats, table operations, and basic spatial analysis methods. You also need a good foundation in Python and the use of ArcPy for basic tasks, including all the topics covered in Python Scripting for ArcGIS Pro. The primary audience for this book is experienced ArcGIS Pro users who have already been using Python for some time to write scripts to automate their workflows. If you are already familiar with writing scripts in Python for ArcGIS Desktop 10.x, you may still want to consider the Python Scripting for ArcGIS Pro book, which contains several chapters on topics that have changed significantly between working with Python in ArcGIS Desktop 10.x and ArcGIS Pro. This includes setting up your Python editor, working with rasters, and map scripting. This book also is intended for upper-division undergraduate and graduate courses in GIS. Many colleges and universities teach courses in GIS programming, which has become one of the core skills in GIS degrees and specializations. Students who are just starting out with learning Python should first use the Python Scripting for ArcGIS book. By the end of that book, students should be able to write Python scripts to automate tasks for ArcGIS Pro. The topics in the current book follow logically on the topics in Python Scripting for ArcGIS. Therefore, Advanced Python Scripting for ArcGIS Pro could be used as a second textbook in a first course on GIS programming, or as the main textbook for the second course in a two-course sequence. This book contains nine chapters. Following the introductory chapter, chapter 2 covers creating Python functions and classes, which is an essential part of developing more organized and reusable code. The next three chapters cover the development of Python script tools and Python toolboxes, which make it easier to share the functionality of Python scripts with others. The next two chapters cover how to manage Python packages and environments and illustrate how to work with some of the most widely used third-party packages. The next chapter shows how to migrate scripts and tools from ArcGIS Desktop 10.x to ArcGIS Pro, which includes migrating from Python 2 to 3. And the final chapter covers the ArcGIS API for Python, which expands the use of Python scripting to web GIS using Jupyter Notebook. This book does not cover more introductory topics, including Python fundamentals, setting up a Python editor, using ArcPy to write scripts to work with spatial and tabular data, working with geometries, raster analysis, and map scripting. Those topics are covered in Python Scripting for ArcGIS Pro and are not repeated in this book. The chapters in this book are accompanied by exercises that reinforce the concepts covered in the chapters. These exercises and data are located in the Learn organization’s ArcGIS Online group Advanced Python Scripting for ArcGIS Pro (Esri Press), at https://go.esri.com/PythonAdvData. For general book information, go to https://go.esri.com/PythonProAdv. You should first read each chapter and then complete the accompanying exercise before moving on to the next chapter. Depending on your learning style and familiarity with coding, you can try out some of the code in the chapters as you read them, but you also can read the entire chapter first, and then start the exercise. To complete the exercises, you must have ArcGIS Pro 2.5 or later installed on your computer. This book will teach you how to develop tools and notebooks in Python for ArcGIS Pro. My hope is that the book will contribute to increasing your confidence in writing more advanced scripts and to develop those scripts into tools and notebooks to share with others. I look forward to learning about your contributions to the Python and GIS community. I sincerely hope this book will allow you to experience the versatility and power of Python coding. Paul A. Zandbergen Vancouver, BC, Canada Acknowledgments A book of this scope materializes only with the support of many individuals. First, I would like to recognize the numerous students in my courses over the years at several institutions. You learn something best by teaching it to others, and I’m fortunate to have worked with many aspiring GIS professionals interested in learning Python. Much of what I know about what needs to go in a book like this, I have learned from them. The contributions of the staff at Esri Press cannot be underestimated. Their ongoing feedback throughout the writing and editing of the manuscript has been invaluable. Other Esri staff members also have left their mark on the book, especially David Wynne and Atma Mani. Their insider perspectives have made the book more accurate and more complete. Since the publication of the first book, Python Scripting for ArcGIS, I have received a lot of feedback from numerous students, instructors, GIS professionals, and anonymous reviewers. I’ve done my best to incorporate all that I’ve learned from them into this new book. I also would like to thank my parents, who always encouraged me to seek a career path that would allow me to fulfill my curiosity about the world while at the same time trying to make it a better place. Most importantly, this book would not be possible without the continued support of my family. Marcia, Daniel, and Sofia, thank you for believing in me and allowing me to pursue my passions. Paul A. Zandbergen Vancouver, BC, Canada Chapter 1 Introducing advanced Python scripting 1.1 Introduction Python has become one of the most widely used programming languages, and this increase includes geospatial applications. Python is employed for many different tasks, from automating data processing using desktop software, to web scraping for downloading structured data, to developing machine-learning algorithms for classifying imagery hosted in the cloud. Python is a versatile, open-source programming language supported on different platforms. These features contribute to its growing popularity in the geospatial community. Python is also the preferred scripting language for working with ArcGIS Pro. This book represents the logical follow-up to Python Scripting for ArcGIS Pro, also published by Esri Press (2020), which introduces the fundamentals of Python and teaches you how to write basic scripts to automate workflows. Advanced Python Scripting for ArcGIS Pro picks up where Python Scripting for ArcGIS Pro left off by focusing on more advanced scripting techniques and the development of tools and notebooks to be shared with others. This book also includes working with third-party packages and the ArcGIS API for Python, which opens new and exciting possibilities to use Python for geospatial applications. This book is written for ArcGIS Pro and Python 3. The topics covered in this book require substantial previous experience in writing Python scripts for ArcGIS. The fundamentals of Python and ArcPy, including setting up a Python editor and writing basic scripts for data processing using ArcPy, are covered in Python Scripting for ArcGIS Pro. 1.2 Python scripting in ArcGIS Pro using ArcPy ArcGIS Pro provides support for the use of Python as a scripting language, including the ArcPy package installed as part of ArcGIS Pro. ArcPy provides access to all the tools available in ArcGIS Pro, including those that are part of ArcGIS Pro extensions. This feature makes Python scripting an attractive and efficient method for automating tasks in ArcGIS Pro. Python scripting has become a fundamental tool for GIS professionals to extend the functionality of ArcGIS Pro and automate workflows. Python is the scripting language of choice to work with ArcGIS Pro and is included in every ArcGIS Pro installation. Python is also directly embedded in many tools in ArcGIS Pro. For example, Python is one of the standard expression types for field calculations. As another example, several geoprocessing tools in ArcGIS Pro consist of Python scripts, even though the casual user does not necessarily notice it (or need to). One of the goals for using the current book is to learn how to develop new geoprocessing tools that expand the functionality of ArcGIS Pro but that look and feel like regular tools that are part of the software. This is accomplished using Python script tools and Python toolboxes. A secondary goal is to become familiar with the ArcGIS API for Python to expand the use of Python to working with web GIS. This is accomplished using notebooks. 1.3 Python versions and ArcGIS Compared with other programming languages, Python has gone through a limited number of versions, reflecting a philosophy of incremental change and backward compatibility. Python 3 was released in 2008 as a major overhaul, with the primary goal to clean up the code base and remove redundancy. The most recent version, at the time of writing, is 3.8, with 3.9 under development. Some of the changes in Python 3 are fundamental, which result in breaking with the backward compatibility philosophy of Python. As a result, not all code written in Python 3 works in Python 2. Some of the new functionality added in Python 3 was also added to Python 2, a process known as backporting. With careful attention to detail, it is therefore possible to write code that works in both versions. The two versions of Python will continue to coexist for some time, but officially Python 2 will no longer be maintained past 2020. This means that any existing code will continue to work, but there will be no further improvements to version 2. ArcGIS Desktop 10.x uses Python 2 whereas ArcGIS Pro uses Python 3, which has several implications. If you are going to write scripts for both versions of ArcGIS or are planning to migrate scripts and tools from ArcGIS Desktop 10.x to ArcGIS Pro, you must learn some of the differences between the two versions of Python. Resources and utilities exist to assist with this conversion, which are covered in chapter 8. The purpose of this book is to focus on writing scripts and developing tools for ArcGIS Pro using Python 3. Although Python code is not 100 percent backward compatible between versions 3 and 2, it is, in principle, possible to write Python code that works for both versions. However, because of fundamental differences between ArcGIS Desktop 10.x and ArcGIS Pro, many scripts and tools written for one version are unlikely to work in the other. Nonetheless, sometimes the differences are small, and strategies to identify and correct for these differences are covered in chapter 8. Note: Many GIS users will continue to use both ArcGIS Desktop 10.x and ArcGIS Pro for some time. At the time of writing, the most current versions are ArcMap 10.7.1 and ArcGIS Pro 2.5. The installation of ArcMap 10.7.1 includes the installation of Python 2.7.16, and the installation of ArcGIS Pro 2.5 includes the installation of Python 3.6.9. These two versions can run on the same computer. When working with ArcGIS Pro 2.5, you should use only version 3.6.9. 1.4 ArcGIS API for Python and Jupyter Notebook Python and ArcPy make it possible to extend the functionality of ArcGIS Pro using scripts and tools. ArcGIS Pro is a software application that runs on desktop computers and is primarily designed to work with local datasets. Increasingly, however, geospatial data and their applications reside on the web, referred to as web GIS. Web GIS is a type of distributed information system that allows you to store, manage, visualize, and analyze geographic data. ArcPy has limited functionality to work directly with web GIS. The ArcGIS API for Python is a different Python package from Esri to work directly with web GIS. This API complements the use of ArcPy for desktop GIS. Code that uses the ArcGIS API for Python is typically written in Jupyter Notebook, an open-source web application that works like a Python editor and provides built-in visualization capabilities. Notebooks can also be used directly within ArcGIS Pro. Details on using the ArcGIS API for Python are covered in chapter 9. 1.5 The structure of this book Advanced Python Scripting for ArcGIS Pro consists of nine chapters that focus on developing tools for ArcGIS Pro and writing more advanced scripts. Sample code is provided throughout the text. Chapter 1 introduces Python scripting for ArcGIS Pro and illustrates several example scripts, tools, and notebooks that were developed using Python. Chapter 2 demonstrates how to create custom functions and classes in Python. Custom functions and classes make it easier to organize more complex code and use parts of your code in multiple scripts. Custom functions and classes are widely used in script tools and Python toolboxes. Chapter 3 explains how to create custom script tools, which make Python scripts available as regular geoprocessing tools with a familiar tool dialog box. Script tools are one of the preferred methods for sharing Python scripts with other users and make it easier to add a Python script as a tool to a larger sequence of operations. Chapter 4 covers how to create Python toolboxes as an alternative to Python script tools. In a Python toolbox, the tool dialog box is written in Python itself, which is often more robust. Chapter 5 outlines strategies for sharing tools with others, including how to organize your files, work with paths, and provide documentation for tools. Chapter 6 covers the use of managing packages using conda. Packages allow you to add functionality to Python, and conda is a convenient way to install and manage these packages as well as Python environments, which control which packages are available. Chapter 7 describes the use of selected built-in modules and third-party packages other than ArcPy, which can greatly enhance the functionality of your scripts. The modules and packages include ftplib, urllib, openpyxl, json, NumPy, Pandas, and Matplotlib. Chapter 8 explains the key steps in migrating scripts and tools from ArcGIS Desktop 10.x to ArcGIS Pro, including the use of several utilities to facilitate this process. Chapter 9 introduces ArcGIS API for Python, which makes it possible to use Python to work with web GIS. This chapter also introduces Jupyter Notebook as the preferred way to write and document Python code using this API. The resulting notebooks can also be shared with others. 1.6 A note about code in this book Most of the code in this book is written for ArcGIS Pro 2.5, which uses Python 3.6.9. Most of the code will work in earlier versions of ArcGIS Pro, except for the most recently added functionality. As new functionality is added to future releases of ArcGIS Pro, the code in this book will continue to work for the foreseeable future. However, much of the code will not work in ArcGIS Desktop 10.x. Some of the code in this book also uses the ArcGIS API for Python version 1.7.0. This is the version that is installed with ArcGIS Pro 2.5, but the ArcGIS API for Python can also be installed separately. If installed separately, Python 3.5 is required to use the ArcGIS API for Python. Note: The update cycle of the ArcGIS API for Python does not follow the same schedule as ArcGIS Pro. For example, at the time of writing, version 1.8.0 of the ArcGIS API for Python has been released, whereas ArcGIS Pro 2.5 installs with version 1.7.0. This version will be updated with future releases of the ArcGIS Pro software. The differences in these versions are typically small. The code in this book employs the coding conventions of the official Style Guide for Python Code, also referred to as PEP 8. The complete style guide can be found at http://www.python.org/dev/peps/pep-0008/. Although not required, following coding guidelines improves the consistency and readability of your code. 1.7 Working with Python editors Writing scripts and developing tools requires a Python editor. You are expected to be already familiar with using a Python editor and configuring it to use the correct environment. Details on working with Python editors are covered in Python Scripting for ArcGIS Pro. The code is this book is not specific to one Python editor. IDLE is installed by default with every Python installation, and therefore most code illustrations in this book use IDLE as the Python editor of choice. Other recommended editors include PyCharm and Spyder, and some code illustrations use these editors as well. You are free to use the Python editor of your choice. Regardless of which editor is used for code illustrations, the Python code is the same for any Python editor. To use a Python editor with the code in this book, however, it must be configured to work with the default environment arcgispro-py3 or a cloned environment. Chapter 6 provides details on using conda to manage environments, but the configuration of Python editors is covered in Python Scripting for ArcGIS Pro. You can also use the Python window in ArcGIS Pro to write and test Python code. However, the Python window is most suited to running short snippets of code for testing purposes. The more complicated and longer scripts developed in this book require the use of a dedicated Python editor, such as IDLE, Spyder, or PyCharm. 1.8 Exploring example scripts, tools, and notebooks This section uses several examples to illustrate how Python is used to create scripts, tools, and notebooks. The examples were obtained from Esri and the ArcGIS user community. One of the reasons for presenting these examples is for you to become more familiar with looking at Python code and tools developed by others. One of the best ways to learn how to write code and develop tools is to work with existing examples. You are not expected to fully understand all the code at this point, but the examples will give you a flavor of what is to come in this book. Example 1: Terrain Tools The Terrain Tools were developed by Esri and extend what is available in ArcGIS Pro by providing capabilities for creating alternative terrain representations. These representations include different types of hillshade surfaces and contours, which can greatly enhance the cartographic display of terrain data. The tools are made available as a collection of tools in a toolbox. Each tool consists of a tool dialog box and has a corresponding Python script. Although these scripts are written in Python, the functionality of the script can be accessed the same way as any other geoprocessing tools. The figure illustrates what the toolbox looks like in ArcGIS Pro. The “scroll” icon indicates that these tools are written in Python, also referred to as Python script tools. The tool dialog boxes look like those of regular geoprocessing tools in ArcGIS Pro. As an example, consider the Illuminated Contours tool. The tool provides an analytical version of the hand-drawn Tanaka method of symbolizing contours that includes coloring and varying the thickness of contour lines. Assuming a certain lighting direction, contours are drawn lighter on parts of the terrain that are illuminated and darker on parts of the terrain that are not illuminated. The tool dialog box looks much like the regular Contour tool available in ArcGIS Pro. The Illuminated Contours tool has five parameters, two of which are optional. The required parameters include the input raster, which is a digital elevation model or DEM, as well as the contour interval to be used and the output contour feature class. The optional parameters include the base contour and z-factor to be used. The result of the tool is a new polyline feature class, in which each contour is broken up into segments with new attributes for the color (grayscale, from white to black) and the appropriate thickness. An example of the resulting illuminated contours is shown, with the contours overlaid on top of the regular DEM shown in grayscale from dark (low elevation) to light (high elevation). The assumed lighting direction is from the northwest, as revealed in the different levels of illumination of the contours. The Illuminated Contours tool effectively carries out a series of steps, which can be accomplished by running regular geoprocessing tools and applying symbology. Some of these steps include creating contours from a DEM, creating a default hillshade, converting hillshade brightness values, reclassifying this grid of values into five-degree intervals, converting the reclassified grid to polygons, intersecting the contour polylines with these polygons, and assigning symbology on the basis of the new attributes of the polylines. The purpose of the script tool is to automate these steps and provide a user-friendly interface. A single Python script is used in this tool, and the script can be opened to get an inside look at what the tool does. When you open the script in a Python editor, it looks like the figure. When scrolling through the script, you will find the equivalent of the tasks you would need to carry out in ArcGIS Pro using existing tools. For example, making sure you have a license for the Spatial Analyst extension; running geoprocessing tools such as Hillshade, Contour, Reclassify, and Intersect; and applying symbology using a layer file. You could complete these steps using existing tools and a few manual manipulations, but the Python script tool includes all of them in a single easy-to-use tool dialog box. One of the nice things about working with Python script tools is that you can view the underlying code. Not only can you learn from the code from others, you can also copy it and make a modified version of it for your own work. Detailed documentation, all the source code, and example datasets to experiment with these tools can be found in ArcGIS Online at www.arcgis.com by searching for Terrain Tools Sample. Example 2: Random Sample The Random Sample tool was developed by the author and is discussed in more detail in chapter 3. This tool creates a random sample based on an input feature class and a user-defined number of features. The output is saved as a new feature class. The tool is created as a Python script tool. The tool dialog box is shown in the figure. The tool provides functionality not available in ArcGIS Pro. Several online resources such as Stack Exchange (http://stackexchange.com) and Esri Support (http://support.esri.com) provide various code solutions to select features at random from a feature class, but employing these solutions requires substantial coding skills. By developing a Python script tool, the script becomes more user-friendly. As a Python developer, you can write this type of script, develop and test the Python script tool, and then make the tool available to other users who can use the tool without having to learn Python. The Python script for this tool is shown in the figure. The code for this tool is explained in detail in chapter 3, including the steps to develop and test the tool dialog box. The strategy for sharing this type of tool is covered in chapter 5. By the end of this book, you will be able to develop tools like this Python script tool. Example 3: 3D Fences toolbox The 3D Fences toolbox was developed by Esri’s Applications Prototype Lab. This toolbox makes it possible to create 3D fence diagrams on the basis of point data with a z-dimension field and at least one value field. An example application of this tool is to use sampling points of measurements of oil in seawater after an oil spill. Not only does each sampling point have an x,y coordinates, it also has a z-dimension (depth) and a measurement (oil concentration). The tool creates a vertical subset of the 3D data—i.e., a slice—and transforms this subset onto a 2D plane. The value of interest, for example, oil concentration, is interpolated using Empirical Bayesian Kriging (EBK). The results are transformed as points into the original coordinate space as a “fence.” This transformation allows for a closer examination of 3D data, which is more difficult to do using the original point cloud of measurements. This tool is relatively sophisticated, but it is also written entirely in Python. The tool is made available in two different versions in a Python toolbox as shown in the figure. Details on Python script tools and Python toolboxes are covered in chapters 3 and 4, respectively. At this stage, it is enough to know that, in both cases, all the code is written in Python, and the tool dialog boxes look just like regular geoprocessing tools in ArcGIS Pro. The tool dialog box has a lot of options for inputs, outputs, and analysis settings, reflecting the relatively complex nature of the interpolation. Like regular geoprocessing tools, some of the parameters have suggested defaults. Some of the key parameters of the tool are the input point features with a zdimension and a field for the measurement of interest, the interpolation settings, and the preexisting 2D linear features along which the fence will be created. The output is a 3D point feature class. The following example (courtesy of the tool’s author) shows the results from the Feature-based Fences tool as a scene in ArcGIS Pro, with the original points used in the interpolation shown in red, and the resulting 3D fence as a color ramp. The Python code associated with these tools is relatively long and complex, as could be expected for a sophisticated tool. The entire code is more than 1,000 lines, although the script also includes notes, comments, and blank lines to facilitate reading. Even though the code appears complex at first, you probably recognize some existing tools, such as Copy Features, Add Field, and the Empirical Bayesian Kriging tool. The entire workflow is elaborate and would be cumbersome to complete step by step in ArcGIS Pro. Developing a tool of this complexity requires advanced coding skills and a significant time investment. Once created, however, the tool can be used many times, and it can be shared with other users. Documentation and all the source code for this tool can be found in ArcGIS Online at www.arcgis.com by searching for 3D Fences toolbox. Example 4: Notebook for wildfire analysis Python and ArcPy make it possible to extend the functionality of ArcGIS Pro, as illustrated in the previous examples. The following example uses the ArcGIS API for Python to work with web GIS. The example is one of the sample notebooks provided with the documentation of the ArcGIS API for Python. The specific notebook illustrates an analysis of the Thomas Fire in 2017 in California. The figure shows the top portion of the notebook as part of the online documentation. A notebook shows Python code combined with text, graphics, and other elements. As you run part of the code, the results update interactively. A notebook provides a different interface to working with Python code compared with more traditional Python editors, and it does not produce a tool for use in ArcGIS Pro with the familiar interface of a tool dialog box. Instead, users interact directly with the code and display the results within the notebook. Notebooks can be opened directly in ArcGIS Pro. The figure shows the same notebook opened in ArcGIS Pro. A user can inspect the code, update datasets or analysis parameters, and run the code to view the updated results within the notebook. The following example shows a side-by-side comparison of imagery before and after the wildfire. A user can enter a new address to be geocoded, run the code, and view the updated imagery. You can also use the ArcGIS API for Python to perform many different types of analysis, similar to the geoprocessing tools in ArcGIS Pro. The figure shows an example of the use of map algebra to calculate a normalized burn ratio to determine the impacts of the fire by using the before-and-after imagery. The results are symbolized and added to a map display in the notebook. The maps show the burned areas (in red) on top of a background satellite image. This notebook can be found under the sample notebooks at https://developers.arcgis.com/python/sample-notebooks by searching for Thomas Fire. Although a notebook does not have the familiar interface of a geoprocessing tool, it provides a more interactive approach to work with code and the results. Notebooks can be shared with others and hosted in an ArcGIS Enterprise portal. The use of notebooks makes it easier to document and share workflows. Creating notebooks using the ArcGIS API for Python is covered in detail in chapter 9. You can benefit from these examples because they provide insight into why you would develop these tools and notebooks in the first place. Perhaps you have been using ArcGIS Pro for a while and have wondered why there is not a tool for a certain task. Or you have established a workflow that requires many repetitive tasks, and you are looking for a way to automate the steps. Or you want to document your workflow and share not only the code, but also the data and the results. Having a strong motivation to develop a certain script, tool or notebook will make it easier as you embark on strengthening your Python skills. Once you learn how to use Python for developing scripts, tools, and notebooks, you will find that one of the best ways to keep learning Python scripting is to work with existing code written by others. Using example code can also speed up the process of creating your own scripts, tools, and notebooks. Points to remember ArcGIS Pro supports the use of scripting to automate workflows. Python is the preferred scripting language for working with ArcGIS Pro. There is a large user community, and a growing set of third-party packages for use in Python that provide additional functionality. This book focuses on more advanced scripting techniques, and the development of scripts, tools, and notebooks to be shared. The topics covered require substantial previous experience in writing Python scripts for ArcGIS. The fundamentals of Python and ArcPy, including setting up a Python editor and writing basic scripts for data processing using ArcPy, are covered in Python Scripting for ArcGIS Pro. ArcGIS Pro works with Python 3, whereas ArcGIS Desktop 10.x works with Python 2. This book focusses on the use of ArcGIS Pro and Python 3, but migrating scripts and tools from ArcGIS Desktop 10.x to ArcGIS Pro is covered in chapter 8. In addition to using ArcPy to write scripts and develop tools for ArcGIS Pro, the book also covers the use of the ArcGIS API for Python to work with web GIS. This includes the use of notebooks, which provide an interactive approach to working with Python code, geospatial datasets, and analysis results. The use of notebooks makes it easier to document and share workflows. One of the best ways to continue learning Python scripting is to examine the work published by others. There are many published examples of Python scripts and tools developed for ArcGIS Pro using ArcPy, as well as sample notebooks developed for web GIS using the ArcGIS API for Python. Key terms backporting backward compatibility conda environment Jupyter Notebook notebook Python editor Python script tool Python toolbox script scripting language web GIS Review questions Which version of Python is used with ArcGIS Pro? What is the main goal of developing scripts and tools for ArcGIS Pro using Python? Which Python editors are recommended to write script and develop tools for ArcGIS Pro? What are some of the similarities and differences between Python tools for ArcGIS Pro and notebooks? What are some of the similarities and differences between ArcPy and the ArcGIS API for Python? Discuss one of the examples presented in this chapter and explain how it adds functionality not available in ArcGIS Pro. Chapter 2 Creating Python functions and classes 2.1 Introduction This chapter describes how to create custom functions in Python that can be called from elsewhere in the same script or from another script. Functions are organized into modules, and modules can be organized into a Python package. ArcPy itself is a collection of modules organized into a package. By creating custom functions, you can organize your code into logical parts and reuse frequently needed procedures. This chapter also describes how to create custom classes in Python, which makes it easier to group together functions and variables. Custom functions and classes are important for writing longer and more complex scripts. They allow you to better organize your code as well as reuse important elements of your code. A good understanding of functions and classes is also important because they are used frequently in other chapters in the book. Many example scripts and tools published by others also contain custom functions and classes. 2.2 Functions and modules Before getting into creating custom functions, a quick review of functions is in order. Functions are blocks of code that perform a specific task. Python includes many built-in functions, such as help(), int(), print(), and str(). Most functions require one or more arguments, which serve as the input for the function. Using a function is referred to as calling the function. When you call a function, you supply it with arguments. Consider the print() function: name = "Paul" print(name) The result is Paul In this example, the argument of the print() function is a variable, and this variable has a value. The print() function outputs the value to the console. The general syntax of a function is: <function>(<arguments>) In this syntax, <function> stands for the name of the function, followed by the parameters of the function in parentheses. Function arguments are also called parameters, and these terms are often used interchangeably. Python has several dozen built-in functions. For a complete list of built-in functions, see https://docs.python.org/3/library/functions.html. You will use several built-in functions in a typical Python script. You can also import additional functionality from other modules. A module is like an extension that can be imported into Python to extend its capabilities. Typically, a module consists of several specialized functions. Modules are imported using a special statement called import. The general syntax for the import statement is import <module> Once you import a module in a script, all functions in that module are available to use in that script. Consider the random module, for example. You can import this module to access several different functions. The following code generates a random number from 1 to 99 using the randrange() function of the random module. import random random_number = random.randrange(1, 100) print(random_number) The code to generate a random number has already been written and is shared with the Python user community. This code can be used freely by anyone who needs it. The random module contains several different functions, and many of them are closely related. Whenever your script needs a random number, you don’t have to write the code yourself. You can import the random module and use any of its functions. One of the most widely used modules is the os module, which includes several functions related to the operating system. For example, the os.mkdir() function creates a new folder in the current working directory, as follows: import os os.mkdir("test") The general syntax to use a function from a module that is not one of the built-in functions is as follows: import <module> <module>.<function>(<arguments>) In other words, you first must import the module using import<module>, and then reference the module when calling the function using <module>. <function>.<parameters>. When writing scripts for ArcGIS Pro, you can use the ArcPy package to access the functionality of ArcGIS Pro within a script. ArcPy is referred to as a package because it consists of several modules, functions, and classes, but to work with ArcPy, you import it just like a module. That is why most geoprocessing scripts start off as follows: import arcpy Once you import ArcPy, you can use one of its many functions. For example, the arcpy.Exists() function determines whether a dataset exists and returns a Boolean value of True or False. The following code determines whether a shapefile exists: import arcpy print(arcpy.Exists("C:/Data/streams.shp")) This code follows the regular Python syntax <module>.<function> (<parameters>), where arcpy is the module, and Exists() is the function, even though ArcPy is technically considered a package. ArcPy includes several modules, including the data access module arcpy.da. This module is used for describing data, performing editing tasks, and following database workflows. The da.Describe() function determines the type of dataset, as well as several properties of the dataset. For example, the following code determines the geometry shape type of a shapefile: import arcpy desc = arcpy.da.Describe("C:/Data/streams.shp") print(desc["shapeType"]) For a polyline shapefile, the result is Polyline. The general syntax for using a function of an ArcPy module is arcpy.<module>.<function>(<arguments>) In the preceding example code, Describe() is a function of the arcpy.da module. When referring to a function, it is important to refer to the module that it is part of. For example, ArcPy also includes a Describe() function. So both arcpy.Describe() and arcpy.daDescribe() are valid functions, but they work in different ways. Now that you’ve reviewed the use of functions and modules, the next section introduces creating your own custom functions. 2.3 Creating functions In addition to using existing functions, you can create your own custom functions that can be called from within the same script or from other scripts. Once you write your own functions, you can reuse them whenever needed. This capability makes code more efficient because there is no need to write code for the same task over and over. Python functions are defined using the def keyword. The def statement contains the name of the function, followed by any arguments in parentheses. The syntax of the def statement is def <functionname>(<arguments>): There is a colon at the end of the statement, and the code following a def statement is indented the same as any block of code. This indented block of code is the function definition. For example, consider the script helloworld.py as follows: def printmessage(): print("Hello world") In this example, the function printmessage() has no arguments, but many functions use parameters to pass values. Elsewhere in the same script, you can call this function directly, as follows: printmessage() The complete script is as follows: def printmessage(): print("Hello world") printmessage() When the script runs, the function definition is not executed. In other words, the line of code starting with def and the block of code that follows don’t do anything. In the third line of code, the function is called, and then it is executed. The result of the script is Hello world This is a simple example, but it illustrates the basic structure of a custom function. Typically, functions are more elaborate. Consider the following example: you want to create a list of the names of all the fields in a table or feature class. There is no function in ArcPy that does this. However, the ListFields() function allows you to create a list of the fields in a table, and you can then use a for loop to iterate over the items in the list to get the names of the fields. The list of names can be stored in a list object. The code is as follows: import arcpy arcpy.env.workspace = "C:/Data" fields = arcpy.ListFields("streams.shp") namelist = [] for field in fields: namelist.append(field.name) Now, say you anticipate that you will be using these lines of code often—in the same script or other scripts. You can simply copy the lines of code, paste them where they are needed, and make any necessary changes. For example, you will need to replace the argument "streams.shp" with the feature class or table of interest. Instead of copying and pasting the entire code, you can define a custom function to carry out the same steps. First, you must give the function a name—for example, listfieldnames(). The following code defines the function: def listfieldnames(): You can now call the function from elsewhere in the script by name. In this example, when calling the function, you want to pass a value to the function —that is, the name of a table or feature class. To make this possible, the function must include an argument to receive these values. The argument must be included in the definition of the function, as follows: def listfieldnames(table): Following the def statement is an indented block of code that contains what the function does. This block of code is identical to the previous lines of code, but now the hard-coded value of the feature class is replaced by the argument of the function, as follows: def listfieldnames(table): fields = arcpy.ListFields(table) namelist = [] for field in fields: namelist.append(field.name) Notice how there are no hard-coded values left in the function. The lack of hard coding is typical for custom functions because you want a function to be reusable in other scripts. The last thing needed is a way for the function to pass values back, also referred to as returning values. Returning values ensures that the function not only creates the list of names, but also returns the list so it can be used by any code that calls the function. This is accomplished using a return statement. The completed description of the function is as follows: def listfieldnames(table): fields = arcpy.ListFields(table) namelist = [] for field in fields: namelist.append(field.name) return namelist Once a function is defined, it can be called directly from within the same script, as follows: fieldnames = listfieldnames("C:/Data/hospitals.shp") Running the code returns a list of the field names in a table or feature class using the function previously defined. Notice that the new function listfieldnames() can be called directly because it is defined in the same script. One important aspect is the placement of the function definition relative to the code that calls the function. The custom function can be called only after it is defined. The correct organization of the code is as follows: import arcpy arcpy.env.workspace = "C:/Data" def listfieldnames(table): fields = arcpy.ListFields(table) namelist = [] for field in fields: namelist.append(field.name) return namelist fieldnames = listfieldnames("hospitals.shp") print(fieldnames) If the function is called before the function definition, the following error is returned: NameError: name 'listfieldnames' is not defined Complex scripts with several functions therefore often start with defining several functions (and classes), followed by the code that calls these functions later in the script. In addition, it is common to add empty lines around the blocks of code that define functions to improve readability, as shown in the figure. The example function uses an argument, called table, which makes it possible to pass a value to the function. A function can use more than one argument, and arguments can be made optional. The arguments should be ordered so that the required ones are listed first, followed by the optional ones. Arguments are made optional by specifying default values. Custom functions can be used for many other tasks, including working with geometry objects. Next, an example script is explained, and then it will be converted to a custom function. The example script calculates the sinuosity index for each polyline feature representing a river segment. Sinuosity, in this context, is defined as the length of the polyline representing the river segment divided by the straight-line distance between the first and last vertex of the polyline. Segments that are relatively straight have a sinuosity index of close to 1, whereas meandering segments have higher values, up to 1.5 or 2. The calculation can be accomplished by using properties of a Polyline object—i.e., length, firstPoint, and lastPoint. The script to print the sinuosity index for every polyline feature in a feature class is as follows: import arcpy import math arcpy.env.workspace = "C:/Data/Hydro.gdb" fc = "streams" with arcpy.da.SearchCursor(fc, ["OID@", "SHAPE@"]) as cursor: for row in cursor: oid = row[0] shape = row[1] channel = shape.length deltaX = shape.firstPoint.X - shape.lastPoint.X deltaY = shape.firstPoint.Y - shape.lastPoint.Y valley = math.sqrt(pow(deltaX, 2) + pow(deltaY, 2)) si = round(channel / valley, 3) print(f"Stream ID {oid} has a sinuosity index of {si}") A brief explanation of how the script works is in order. A search cursor is used to obtain the unique ID and the geometry of each polyline. The length property of the geometry represents the length of the polyline. The straightline distance between the first and last vertex of the polyline is calculated using the firstPoint and lastPoint properties of the geometry, which return a Point object. The x,y coordinates of these vertices are used to calculate the distance on the basis of the Pythagorean theorem. The two distances are divided to obtain the sinuosity index, and for display purposes, the values are rounded to three decimal places. Consider the stream network that’s shown in the figure. The result of the script is a printout of the sinuosity index of each segment. The calculation of the sinuosity index requires several lines of code that may be useful in other places, and this code lends itself to a custom function. This custom function receives a geometry object and returns the sinuosity index. The script using a custom function is as follows: import arcpy import math arcpy.env.workspace = "C:/Data/Hydro.gdb" fc = "streams" def sinuosity(shape): channel = shape.length deltaX = shape.firstPoint.X - shape.lastPoint.X deltaY = shape.firstPoint.Y - shape.lastPoint.Y valley = math.sqrt(pow(deltaX, 2) + pow(deltaY, 2)) return channel / valley with arcpy.da.SearchCursor(fc, ["OID@", "SHAPE@"]) as cursor: for row in cursor: oid = row[0] shape = row[1] si = round(sinuosity(shape), 3) print(f"Stream ID {oid} has a sinuosity index of {si}") The custom function is called sinuosity(), and the only argument is a geometry object referred to as shape. When calling the function, the geometry object is passed to the function, and the index is returned as a value. The script uses the round() function to return a floating-point number rounded to the specified number of decimal places. The only issue with rounding in this manner is that any trailing zeros are dropped—e.g., 1.300 is printed as 1.3. An alternative is to use format codes to customize the print formatting. The final two lines of the script using a format code are as follows: si = sinuosity(shape) print(f"Stream ID {oid} has a sinuosity index of {si:.3f}") The format code.3f means the output is formatted using a floating-point number with three decimal places. This type of formatting also applies rounding—e.g., the number 1.4567 is formatted as 1.457. Again, it is common to add empty lines around the blocks of code that define a function to improve readability. Creating functions can be beneficial in several ways: If a task is to be used many times, creating a function can reduce the amount of code you must write and manage. The actual code that carries out the task is written only once as a function; from that point on, you can call this custom function as needed. Creating functions can reduce the clutter caused by multiple iterations. For example, if you wanted to create lists of the field names for all the feature classes in all the geodatabases in a list of workspaces, it would quickly create a relatively complicated set of nested for loops. Using a function for creating a list of field names removes one of these for loops and places it in a separate block of code. Complex tasks can be broken into smaller steps. By defining each step as a function, the complex task does not appear so complex anymore. Well-defined functions are a good way to organize longer scripts. Custom functions can be called not only directly from the same script but also from other scripts, which the next section addresses. 2.4 Calling functions from other scripts Once functions are created in a script, they can be called from another script by importing the script that contains the function. For relatively complex functions, it is worthwhile to consider making them into separate scripts, especially if they are needed on a regular basis. Rather than defining a function within a script, the function becomes part of a separate script that can be called from other scripts. Consider the earlier example of the helloworld.py script: def printmessage(): print("Hello world") The printmessage() function can be called from another script by importing the helloworld.py script. For example, the script print.py imports this script as follows: import helloworld helloworld.printmessage() The script print.py imports the helloworld.py script as a module— helloworld. A module name is equal to the name of the script minus the .py extension. The function is called using the regular syntax to call a function —that is, <module>.<function>. In the example script, the helloworld module is imported into the print.py script. The import statement causes Python to look for a file named helloworld.py. No paths can be used in the import statement, and thus it is important to recognize where Python looks for modules. The first place Python looks for modules is the current folder, which is the folder in which the print.py script is located. The current folder can be obtained using the following code, in which sys.path() is a list of system paths: import sys print(sys.path[0]) The current folder also can be obtained using the os module, as follows: import os print(os.getcwd()) Next, Python looks at all the other system paths that have been set during the installation or subsequent configuration of Python itself. These paths are contained in an environment variable called PYTHONPATH. Note that this is not a geoprocessing environment setting, but a variable of the Python environment. To view a complete list of these paths, use the following code: import sys for path in sys.path: print(path) The sys.path() function returns a list of paths, and the iteration makes the printout easier to read. In a typical scenario, the list will include the paths as shown in the figure. Description The first element in the list is the path of the current script (i.e., C:\Testing), which is returned using sys.path[0]. The rest of the paths will vary depending on how ArcGIS is installed and on the environment being used. In the list of paths, you will notice two types (beyond the current folder). First, there are paths tied to the core installation of ArcGIS Pro—i.e., \ArcGIS\Pro\bin, \ArcGIS\Pro\Resources\ArcPy, and \ArcGIS\Pro\Resources\ArcToolbox\Scripts. These paths make it possible to do things such as import arcpy. Second, there are paths tied to the specific Python environment being used—in this case, the default environment arcgispro-py3. This is the location where Python itself is installed, including any additional packages. If you are using a different environment, the list of paths will have the same structure, but arcgispropy3 would be replaced by the name and location of that environment. What if the module (i.e., script) you want to import is in a different folder— that is, not in the current folder of the script or in any of the folders in sys.path? You have two options, as follows: Option 1: Append the path using code. You can temporarily add a path to your script. For example, if the scripts you want to call are in the folder C:\Myscripts, you can use the following code before calling the function: import sys sys.path.append("C:/Myscripts") The sys.path.append() statement is a temporary solution so a script can call a function in another script in the current session. Option 2: Use a path configuration (.pth) file. You can access a module in a different folder by adding a path configuration file to a folder that is already part of sys.path. It is common to use the site-packages folder—for example, C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib\site-packages. A path configuration file has a .pth extension and contains the path(s) that will be appended to sys.path. This file can be created using a basic text editor. As part of the ArcGIS Pro installation, a path configuration file called ArcGISPro.pth is placed in the site-packages folder of Python. The file itself looks like the example in the figure. Description The path configuration file makes all the modules located in the specific folders available. You should not be making any changes to this default .pth file. You can create a .pth file yourself if you commonly work with scripts that are in different folders. For example, if the modules you want to import are in the folder C:\Myscripts, you can create a .pth file and place it in the Python site-packages folder. One complication is that the default environment arcgispro-py3 cannot be modified, so any additional .pth files are not recognized. This option is available only when working with a cloned environment. Therefore, for many typical users, adding the path within the script itself is more convenient. Note: A third alternative is to modify the PYTHONPATH variable directly from within the operating system. However, this option is cumbersome and error-prone, and therefore not recommended. The earlier example for calculating the sinuosity index for polylines is revisited to illustrate how custom functions can be called from other scripts. The script that contains the custom function is called rivers.py and is as follows: import math def sinuosity(shape): channel = shape.length deltaX = shape.firstPoint.X - shape.lastPoint.X deltaY = shape.firstPoint.Y - shape.lastPoint.Y valley = math.sqrt(pow(deltaX, 2) + pow(deltaY, 2)) return channel / valley This script must import the math module because it is being used in the script. There is no need to import ArcPy because that is done in the other script in which the geometry objects are created using a search cursor. Notice that the rivers.py script does not include any hard coding. This lack of hard coding is typical for custom functions because you want the code to be reusable without modification. The script that calls the custom function is called river_calculations.py and is as follows: import arcpy import rivers arcpy.env.workspace = "C:/Data/Hydro.gdb" fc = "streams" with arcpy.da.SearchCursor(fc, ["OID@", "SHAPE@"]) as cursor: for row in cursor: oid = row[0] shape = row[1] si = round(rivers.sinuosity(shape), 3) print(f"Stream ID {oid} has a sinuosity index of {si}") The script must import ArcPy to create the geometry objects using a search cursor. It also must import the rivers.py script as a module using import rivers. Importing the module makes the custom function available to the current script. When calling the custom function, the module must be included—i.e., rivers.sinuosity() instead of just sinuosity(). By writing the custom function in a separate script, you have created a module that can be used by other scripts, which significantly increases the usability of your code. 2.5 Organizing code into modules By creating a script that defines a custom function, you are using the script as a module. All Python script files are, in fact, modules. That’s why you can call the function by first importing the script (module), and then using a statement such as <module>.<function>. Recall the example: import random random_number = random.randrange(1, 100) print(random_number) The random module consists of the random.py file and is in one of the folders that Python automatically recognizes—i.e., C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib. The random.py script (module) contains several functions, including randrange(). The rivers.py script in the previous section represents another example. After importing the module using import rivers, the custom function can be called using rivers.sinuosity(). Although the example code employs only a single function, it can easily be expanded to include other relevant functions pertaining to rivers. This approach makes it easy to create new functions in a script and call them from another script. However, it also introduces a complication: How do you distinguish between running a script by itself and calling it from another script? What is needed is a structure that provides control of the execution of the script. If the script is run by itself, the function is executed. If the module is imported into another script, the function is not executed until it is specifically called. Consider the example hello.py script, which contains a function as well as some test code to make sure the function works: def printmessage(): print("Hello world") printmessage() This type of testing is reasonable, because when you run the script by itself, it confirms that the function works. Without the last print message, you would not be able to see that the function works correctly. However, when you import this module to use the function as follows: import hello The test code runs immediately and prints the message: "Hello world" When you import the script file as a module, you don’t want the test code to run automatically, but only when you call the specific function. You want to be able to differentiate between running the script by itself and importing it as a module into another script. This is where the variable __name__ comes in (there are two underscores on each side). For any script, this variable has the value of "__main__". For an imported module, the variable is set to the name of the module. Using an if statement in the script that contains the function makes it possible to distinguish between a script and a module, as follows: def printmessage(): print("Hello world") if __name__ == "__main__": printmessage() In this case, the test of the module (i.e., printmessage()) will run only if the script is run by itself. If you import the module into another script, no code will run until you call the function. This structure is not limited to testing. In some geoprocessing scripts, almost the entire script consists of one or more functions, and only the very last lines of code call the function if, indeed, the script is run by itself. The structure is as follows: import arcpy <import other modules as needed> def mycooltool(<arguments>): <lines of code> ... if __name__ == "__main__": mycooltool(<arguments>) This structure provides control over running the script and makes it possible to use the same script in two different ways—running it by itself or calling it from another script. Consider the script associated with the Illuminated Contours tool discussed in chapter 1. The script starts by importing several modules, followed by a single function called illuminatedContours(). The entire script is, in fact, written as a function, whereas the last few lines of the script call the function as shown in the image. Description The block of code following if __name__ == "__main__": is executed only if the script is run by itself. The script normally runs when the Illuminated Contours tool is run in ArcGIS Pro. Running the script in ArcGIS Pro is not considered a call from another script because the script is not imported as a module, but called by running a tool in ArcGIS Pro. As a result, the block of code executes. The code uses the arcpy.GetParameterAsText() function to receive the parameters from the tool, which chapter 3 explains in more detail. The final line of code is a call to the function illuminatedContours() to carry out the task at hand. The benefit of this structure is that the function can be used in other ways. For example, you could write a script that imports the IllumContours.py script as a module, and then you can call the illuminatedContours() function without having to make any changes to the script or using the tool dialog box. Many scripts that contain custom functions provide only a supporting role and are not normally run by themselves. Consider again the rivers.py script that calculates the sinuosity index for polylines: import math def sinuosity(shape): channel = shape.length deltaX = shape.firstPoint.X - shape.lastPoint.X deltaY = shape.firstPoint.Y - shape.lastPoint.Y valley = math.sqrt(pow(deltaX, 2) + pow(deltaY, 2)) return channel / valley Without polylines as inputs, there is nothing to calculate. Running the script does not produce an error, but it also does not do anything because the function is never called. To make it clear that the script requires specific inputs that are not part of the script, the following code could be added to the script: if __name__ == "__main__": print("This script requires geometry objects as inputs.") In this case, when the script is run on its own, the message is printed, and it is clear the script does not perform any calculations without specific inputs. 2.6 Creating classes In the previous sections, you saw how to create your own custom functions and organize your code into modules. This approach substantially increases code reusability because you can write a section of code and use it many times by calling it from within the same script or from another script. However, these functions and modules have their limitations. The principal limitation is that a function does not store information the way a variable does. Every time a function runs, it starts from scratch. In some cases, functions and variables are closely related. For example, consider a land parcel with several attributes, such as the land-use type, total assessed value, and total area. The parcel also may have procedures associated with it, such as how to estimate the property taxes on the basis of land-use type and total assessed value. These functions require the value of the attributes. These values can be passed to a function as variables. What if a function must change the variables? The values could be returned by the function. However, the passing and returning of variables can become cumbersome. A better solution is to use a class. A class provides a way to group together functions and variables that are closely related so they can interact with each other. A class also makes it possible to work with multiple objects of the same type. For example, each land parcel is likely to have the same attributes. The concept of grouping together functions and variables related to a type of data is an essential aspect of object-oriented programming (OOP). Classes are the container for these related functions and variables. Classes make it possible to create objects as defined by these functions and variables. Functions that are part of a class are called methods, and variables that are part of a class are called properties. Examples that follow will more clearly explain the use of methods and properties. ArcPy includes many classes, such as the env class, which can access and set environment settings, and the Result class, which defines the properties and methods of Result objects that are returned by geoprocessing tools. Creating your own custom classes in Python, however, opens many new possibilities. Python classes are defined using the class keyword as follows: class <classname>(object): The reference to object in parentheses means that the custom class being created is based on a general type of class in Python. Since the reference is implicit, it can be left out, as follows: class <classname>(): Note: Although the preceding code is correct, it is better to include the object to ensure compatibility with Python 2. There is a colon at the end of the statement, which means the code following a class statement is indented the same as any block of code. This indented block of code is the class definition. A class typically consists of one or more functions, which means the general structure of a class is as follows: class <class>(object): def <function1>(<arguments>): <code> def <function2>(<arguments>): <code> Consider a simple example: class Person(object): def setname(self, name): self.name = name def greeting(self): print("My name is (0).".format(self.name)) The class keyword is used to create a Python class called Person. The class contains two method definitions—these are essentially function definitions, except that they are written inside a class statement and are therefore referred to as “methods.” The self argument refers to the object itself. You can call it whatever you like, but it is almost always called “self” by convention. Note: The Style Guide for Python Code recommends using the CapitalizedWords, or CapWords, convention for class names—for example, MyClass. By contrast, the recommended style for variables, functions, and scripts is all lowercase. This style is not required, however, and many developers follow different conventions. The names of functions in ArcPy, for example, do not follow the Style Guide. A class can be thought of as a blueprint. It describes how to make something, and you can create many instances from this blueprint. Each object created from a class is called an instance of the class. Creating an instance of a class is sometimes referred to as instantiating the class. Next, you will see how this class can be used. You start by creating an object: me = Person() Using an assignment statement creates an instance of the Person class. Creating this instance looks like calling a function, but you are creating an object of type Person. Once an instance is created, you can use the properties and methods of the class, as follows: me.setname("Abraham Lincoln") me.greeting() Running this code prints the following: My name is Abraham Lincoln. This example is relatively simple, but it illustrates some key concepts. First, a class is created using the class keyword. Second, variables of the class are called properties, which can store values as all variables can. When you create an instance of the class, you can pass the values for these properties. Third, functions of the class are called methods, which can carry out tasks as all functions can. When you create an instance of the class, you can call the function as a method of the class. A single class can contain many properties and methods. Now return to the example of a parcel of land. You want to create a class called Parcel that has two properties (land-use type and total assessed value) and a method (calculating tax) associated with the parcel. For the purpose of this example, assume the property tax is calculated as follows: for single-family residential, tax = 0.05 * value; for multifamily residential, tax = 0.04 * value; and for all other land uses, tax = 0.02 * value. In other words, the tax calculation is based on the land-use type. Creating the Parcel class is coded as follows: class Parcel(object): def __init__(self, landuse, value): self.landuse = landuse self.value = value def assessment(self): if self.landuse == "SFR": rate = 0.05 elif self.landuse == "MFR": rate = 0.04 else: rate = 0.02 assessment = self.value * rate return assessment The class called Parcel is created using the class keyword. The class contains two methods: __init__() and assessment(). The __init__() method is a special method reserved for initializing objects. This method must have at least one argument in addition to self. In the example, this method has three arguments: self, landuse, and value. When the class is used to create objects, however, the first argument (self) is not used because it represents the object itself and is provided implicitly by using the class. The __init__() method is used to initialize (or specify) an object with its initial properties by giving the properties a value. The class can now be used to create Parcel objects, which have properties called landuse and value. Next, look at how to use this class. The following code creates an instance of the class: myparcel = Parcel("SFR", 200000) This code creates a single Parcel object, and the two properties are assigned values. You can now use these properties. For example, the following code prints the values of both properties: print(myparcel.landuse) print(myparcel.value) The result is as follows: SFR 200000 This part of the code serves only to confirm that the Parcel object is created and that the properties have values. You also can check the type of the object, as follows: print(type(myparcel)) The result is <class '__main__.Parcel'> This check confirms that the type of object is Parcel. The __main__ portion means that the class definition resides in the current script. So far, the code has served only to create the instance of the Parcel class and confirm the properties of the object. The assessment() method is where the actual calculation is done. With the Parcel object created, you can use the properties and methods, as follows: print("Land use: ", myparcel.landuse) mytax = myparcel.assessment() print("Tax: ", mytax) Running this code results in the following: Land use: SFR Tax: 10000.0 The assessment() method is used to calculate the tax for this one parcel for which the land use is single-family residential (SFR) and the value is $200,000. In the example, the values for the properties of the object are hard-coded into the script, but in a real-world scenario, the values would reside in a database. You could run the property tax calculation for every parcel in the database, creating a new instance for each parcel to carry out the calculation. In many cases, you may want to use the class in more than one script. This can be accomplished by putting the class in a module—that is, creating a separate script with the definition of the class, which can then be called from another script. This approach is analogous to creating a separate script for a function, which can be called from other scripts, as described earlier in this chapter. In this example, the existing code for the Parcel class is copied to a separate script called parcelclass.py and is coded as follows: class Parcel(object): def __init__(self, landuse, value): self.landuse = landuse self.value = value def assessment(self): if self.landuse == "SFR": rate = 0.05 elif self.landuse == "MFR": rate = 0.04 else: rate = 0.02 assessment = self.value * rate return assessment In this example, the script that uses the class is called parceltax.py and is coded as follows: import parcelclass myparcel = parcelclass.Parcel("SFR", 200000) print("Land use: ", myparcel.landuse) mytax = myparcel.assessment() print(mytax) A few notes are in order. The module called parcelclass is imported to make it possible to use the Parcel class. This approach works only if the parcelclass.py script resides in the same folder as the parceltax.py script, or in one of the well-known locations in which Python looks for modules. A Parcel object is created using the <module>.<class> structure—i.e., parcelclass.Parcel. The example creates only a single Parcel object, but this process could be repeated for any number of parcels in a database table. Creating a class in a separate script allows you to organize your code better and makes it possible to reuse the class in many different scripts. Classes also can be used to work with geometry objects. Recall the custom function called sinuosity(), which used geometry objects, from an earlier section. Instead of using a custom function, you can use a custom class to create objects, and the function becomes a method of this class. Creating the River class is coded as follows: import math class River(object): def __init__(self, shape): self.shape = shape def sinuosity(self): channel = self.shape.length deltaX = self.shape.firstPoint.X self.shape.lastPoint.X deltaY = self.shape.firstPoint.Y self.shape.lastPoint.Y valley = math.sqrt(pow(deltaX, 2) + pow(deltaY, 2)) return channel / valley The only property of the River class is the geometry object, although additional properties can be used. The sinuosity() method is identical to the custom function sinuosity() used in an earlier section, but now the River object is referenced instead of the geometry object. The argument of the method is the object (i.e., self), whereas the argument of the custom function is the geometry object being passed to the function (i.e., shape), and the block of code for the method uses self.shape instead of shape. Empty lines are typically added around the blocks of code to improve readability, as shown in the figure. Description Using the class to calculate the sinuosity index is coded as follows: import arcpy import rivers arcpy.env.workspace = "C:/Data/Hydro.gdb" fc = "streams" with arcpy.da.SearchCursor(fc, ["OID@", "SHAPE@"]) as cursor: for row in cursor: oid = row[0] shape = row[1] segment = rivers.River(row[1]) si = round(segment.sinuosity(), 3) print(f"Stream ID {oid} has a sinuosity index of {si}") Because the class is created in a separate script (i.e., rivers.py), this script must be imported as a module. In the for loop, a new River object is created for every row in the input feature class using rivers.River(), where rivers is the module and River is the class. The object is assigned to a variable (i.e., segment), and then the function to calculate the sinuosity index can be called as a method of this object using segment.sinuosity(). The River class example uses only a single method, in which case the class does not provide additional functionality relative to a custom function. However, the class could be expanded with additional properties and methods, which effectively would group multiple functions together in a meaningful manner. For example, you could have methods related to the slope of a stream segment by using elevation data or methods related to the channel properties or flow direction or other hydrologically relevant information. 2.7 Working with packages When you have several different functions and classes, it often makes sense to put them in separate modules (scripts). As your collection of modules grows, you can consider grouping them into packages. A package is essentially another type of module, but it contains multiple modules that are closely related to (and may depend on) each other. A regular module is stored as a .py file, but a package is stored as a folder (or directory). Technically speaking, a package is a folder with a file called __init__.py in it. This file defines the general attributes of the package. This script does not need to define anything; it can be just an empty file, but it must exist. If __init__.py does not exist, the directory is just a directory, and not a package, and it can’t be imported. The __init__.py file makes it possible to import a package as a module. For example, to import ArcPy, you use the import arcpy statement, but you are not referring to a script file called arcpy.py. Instead, you are referring to a folder called arcpy containing a file called __init__.py. ArcPy is an example of a Python package because it resides in a folder called arcpy and contains a file called __init__.py, in addition to many other files. To make the functionality of ArcPy available in your script, you use the statement import arcpy. The same syntax is used to import a module that consists of only a single Python file (.py). From the practical standpoint of writing your own scripts, therefore, ArcPy looks and feels like a module, but from a code organization perspective, ArcPy is a package. As a result, you sometimes will see ArcPy referred to as a module. Although this characterization is not correct in terms of how the code is organized, the terms “module” and “package” are often used interchangeably in the Python community. Consider how this applies to creating your own package. For example, if you have a package you want to call mytools, you must have a folder called mytools, and inside this folder must be a file called __init__.py. The structure of a package called mytools with two modules (analysis and model) would be as follows: C:\Myfolder a system path C:\Myfolder\mytools directory for the mytools package C:\Myfolder\mytools\__init__.py package code C:\Myfolder\mytools\analysis.py analysis module C:\Myfolder\mytools\model.py model module To use the package, your code would be as follows: import sys sys.path.append("C:/Myfolder") import mytools output = mytools.analysis.<function>(<arguments>) This is a simplified example, but it illustrates how modules and packages are organized, including ArcPy. In addition to the term “package,” you also may have come across the term site package. A site package is a locally installed package that is available to all users of that computer. The “site” is the local computer. What makes a package a site package has to do with how it is installed, not its actual contents. Because the term “site package” is related to how a package is installed on a local computer and not its contents, the distinction between package and site packages is not important from a practical standpoint of writing code. The default Python environment includes many packages, which can be found in the Lib\site-packages folder. For example, a commonly used package is NumPy, which manipulates large arrays of data. The path for NumPy as part of the default environment is C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib\site-packages\numpy. The contents of this folder consist of several folders and files, including an __init__.py file. The __init__.py file of NumPy includes a basic description of the package and its subpackages, as well as several testing routines. Typically, there is no need to look at the contents of these files, but examining the structure of packages will give you a better idea of how Python is organized as a programming language. The installation of ArcGIS Pro includes both ArcPy and Python, and the folder in which ArcPy is located is automatically recognized by Python. Where exactly is ArcPy installed? Typically, the location is C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy Note that ArcPy is not installed in Python’s Lib\site-packages folder, as are all the other packages including, for example, NumPy. One of the reasons is that for every new Python environment, all the packages are copied, and each environment can have different packages. Because ArcPy must be part of every environment, it is kept separately in the Pro\Resources folder instead of the Pro\bin\Python folder. Another reason is that ArcPy is proprietary—i.e., not open source, as are the packages in the Lib\sitepackages folder. When you explore the contents of the folder in which ArcPy resides, you will notice a subfolder called arcpy (which gives ArcPy its name as a Python package you can import). This folder contains a file called __init__.py, which makes it a Python package, instead of a module referencing a file called arcpy.py. You also will see many other files whose names sound familiar, including analysis.py, cartography.py, geocoding.py, and several others. The arcpy folder also includes several subfolders, including the sa folder, which contains scripts that are part of the arcpy.sa module. For example, the Neighborhood.py script contains the implementation of the Neighborhood classes of arcpy.sa. Normally, there is no need to work with these files directly, but for educational purposes, it is okay to examine them. Just don’t make any changes! Points to remember Custom functions and classes allow you to organize your code into meaningful elements and to reuse these elements whenever needed. There are many benefits to using custom functions and classes, including limiting repetitive code, reducing clutter in your code, and being able to break complex tasks into smaller steps. Custom functions can be created using the def keyword. The block of code that follows the def statement defines what the function does. Custom functions can contain arguments, although they are not required. Custom functions can be called from within the same script or from another script. When creating a custom function in the same script, the function definition must come before the code that calls the function. When calling a function from another script, you first need to import the script that contains the function as a module. A module is therefore a regular .py file that contains at least one function (or class). To distinguish between running a script by itself and importing it as a module into another script, you can use the if __name__ == "__main__": statement. When importing modules, you cannot use paths, and therefore modules (i.e., scripts) must be in the same folder as the script importing the module(s) or in one of the system paths. You can also add a path to your script using sys.path.append(). Custom classes can be created to make it easier to group together related functions and variables. Custom classes can be created using the class keyword. Functions that are part of a class are called “methods,” and variables that are part of a class are called “properties.” Classes can be called from within the same script or from another script. As your collection of custom functions and classes grows, you can consider making it a package, like the ArcPy package. Key terms argument call (a function) class class definition custom class custom function format code function function definition hard-coded initializing instance instantiating method module object object-oriented programming (OOP) package parameter pass (a value) property returning (a value) site package Review questions What are some of the benefits of creating custom functions and classes in your scripts? Explain where the code that defines a custom function or custom class can be located relative to the code where the function or class is used. Describe the steps to create a custom function. Describe the steps to create a custom class. How do you import a module (with your custom function or class) that is not located in the same folder as the script you are currently running? What is a Python package, and how is it different from a Python module? Explain how the code of the ArcPy package is organized and where it is installed. Chapter 3 Creating Python script tools 3.1 Introduction This chapter describes the process of turning a Python script into a script tool. Script tools make it possible to integrate your scripts into workflows and extend the functionality of ArcGIS Pro. Script tools can be run as stand-alone tools using their tool dialog box, but they also can be used within a model or called by other scripts. Script tools have a tool dialog box, which contains the parameters that are passed to the script. Developing script tools is relatively easy and greatly enhances the experience of using a script. Tool dialog boxes reduce user error because parameters can be specified using drop-down lists, check boxes, combo boxes, and other mechanisms. This use of a tool dialog box provides substantial control over user input, greatly reducing the need to write a lot of code for errorchecking. Creating script tools also makes it easier to share scripts with others. 3.2 Script tools versus Python toolboxes Before getting into how script tools are created, it is important to distinguish between two types of tools that can be developed for use in ArcGIS Pro using Python. The focus of this chapter is on how to create a script tool, sometimes referred to as a “Python script tool.” The code for these tools is written as a Python script, and this script is called when the tool is run. The tool dialog box for the script tool is created from within ArcGIS Pro. Tool properties and parameters are created manually using the interface options of the ArcGIS Pro application. This approach provides an intuitive and easy-to-learn approach to creating and testing a script tool. Although most script tools use Python, it is possible to use other scripting languages that accept arguments. For example, you could use a .com, .bat, .exe, or .r file instead of a .py file. A script tool calls a single script file, although other scripts can be called from the main script when the tool is run. The second approach is to create a tool using a Python toolbox. In this approach, the entire tool dialog box is written in Python, and the script is saved as a .pyt file that is recognized as a Python toolbox in ArcGIS Pro. Creating a Python toolbox does not use any of the interface options in ArcGIS Pro, and the toolbox is created entirely in a Python editor. Python toolboxes can be written only in Python, and a single Python toolbox can contain multiple tools, all written in the same script file. Chapter 4 covers creating a Python toolbox in detail. When you are first learning how to create a tool for ArcGIS Pro using Python, it is recommended that you start with a script tool because the process is more intuitive. Once you gain some experience in creating script tools, you can start using Python toolboxes as well. The same task can be accomplished using both approaches, and the choice is largely a matter of preference and experience. The end of chapter 4 revisits some of the pros and cons of script tools and Python toolboxes. 3.3 Why create your own tools? Many ArcGIS Pro workflows consist of a sequence of operations in which the output of one tool becomes the input of another tool. ModelBuilder and scripting can be used to automatically run these tools in a sequence. Any model created and saved using ModelBuilder is a tool because it is in a toolbox (.tbx file) or a geodatabase. A model, therefore, is typically run from within ArcGIS Pro. A Python script (.py file), however, can be run in two ways: Option 1: As a stand-alone script. The script is run from the operating system or from within a Python editor. For a script to use ArcPy, ArcGIS Pro must be installed and licensed, but ArcGIS Pro does not need to be open for the script to run. For example, you can schedule a script to run at a prescribed time directly from the operating system. Option 2: As a tool within ArcGIS Pro. The script is turned into a tool to be run from within ArcGIS Pro. Such a tool is like any other tool: it is in a toolbox, can be run from a tool dialog box, and can be called from other scripts, models, and tools. There are several advantages to using tools instead of stand-alone scripts. These benefits apply to both script tools and Python toolboxes. A tool includes a tool dialog box, which makes it easier for users to enter the parameters using built-in validation and error checking. A tool becomes an integral part of geoprocessing. Therefore, it is possible to access the tool from within ArcGIS Pro. It is also possible to use the tool in ModelBuilder and in the Python window, and to call it from another script. A tool is fully integrated with the application it was called from. So, any environment settings are passed from ArcGIS Pro to the tool. The use of tools makes it possible to write tool messages. Documentation can be provided for tools, which can be accessed like the documentation for system tools. Sharing a tool makes it easier to share the functionality of a script with others. A well-designed tool means a user requires no knowledge of Python to use the tool. Despite the many benefits, developing a robust tool requires effort. If the primary purpose of the script is to automate tasks that are carried out only by the script’s author, the extra effort to develop a script tool may not be warranted. On the other hand, scripts that are going to be shared with others typically benefit from being made available as a tool. 3.4 Steps to creating a script tool A script tool is created using the following steps: 1. Create a Python script that carries out the intended tasks, and save it as a .py file. 2. Create a custom toolbox (.tbx file) where the script tool can be stored. 3. Add a script tool to the custom toolbox. 4. Configure the tool properties and parameters. 5. Modify the script so that it can receive the tool parameters. 6. Test that your script tool works as intended. Modify the script and/or the tool’s parameters as needed for the script tool to work correctly. You can create a new custom toolbox on the Project tab of the Catalog pane in ArcGIS Pro. Navigate to Toolboxes, right-click it, and click New Toolbox. Select the folder where you want to save the toolbox, and give the toolbox a name. You also can create a new toolbox directly inside a folder or a geodatabase. In that case, you right-click on the folder or geodatabase, and click New > Toolbox. Although you can create a toolbox inside a geodatabase, the only way to share the toolbox is to share the entire geodatabase. A stand-alone toolbox is saved as a separate .tbx file and can more easily be shared. A stand-alone toolbox can be located anywhere on your computer. For a given project, it makes sense to locate the toolbox in the same folder in which datasets and other files for a project are organized, but you also can have a separate folder for your custom toolboxes, especially if they are used in multiple projects. Stand-alone toolboxes have a file extension—e.g., C:\Demo\MyCoolTools.tbx. A toolbox inside a geodatabase, like other geodatabase elements, does not have a file extension—e.g., C:\Demo\Study.gdb\MyCoolTools. To create a script tool, right-click a custom toolbox, and click New > Script. Write access to the toolbox is needed to add a new script tool. As a result, you cannot add tools to any of the system toolboxes in ArcGIS Pro. The New Script dialog box has three panels: General, Parameters, and Validation. The General panel is used to specify the tool name, label, script file (.py), and several options. The Parameters panel is used to specify the tool parameters, which will control how a user interacts with the tool. The Validation panel provides further options to control the tool’s behavior and appearance. Not all this information must be completed in one step. You can enter some of the basic information, save it, and then return to edit the tool properties later. All the information needed to create a script tool is reviewed in detail in this chapter. First, however, it is important to consider the example script used for illustration. That way, the information has a context and is more meaningful. The example script has been created as a stand-alone script. The script creates a random sample of features from an input feature class on the basis of a user-specified count and saves the resulting sample as a new feature class. The complete code is shown next, followed by a figure of the same script. The syntax highlighting in the figure assists with reading the script. # Python script: random_sample.py # Author: Paul Zandbergen # This script creates a random sample of input features based on # a specified count and saves the results as a new feature class. # Import modules. import arcpy import random # Set inputs and outputs. Inputfc can be a shapefile or geodatabase # feature class. Outcount cannot exceed the feature count of inputfc. inputfc = "C:/Random/Data.gdb/points" outputfc = "C:Random/Data.gdb/random" outcount = 5 # Create a list of all the IDs of the input features. inlist = [] with arcpy.da.SearchCursor(inputfc, "OID@") as cursor: for row in cursor: id = row[0] inlist.append(id) # Create a random sample of IDs from the list of all IDs. randomlist = random.sample(inlist, outcount) # Use the random sample of IDs to create a new feature class. desc = arcpy.da.Describe(inputfc) fldname = desc["OIDFieldName"] sqlfield = arcpy.AddFieldDelimiters(inputfc, fldname) sqlexp = f"{sqlfield} IN {tuple(randomlist)}" arcpy.Select_analysis(inputfc, outputfc, sqlexp) A few points of explanation about the script are in order. First, it is important to understand the general logic of the script. The script creates a list of all the unique IDs of the input features and uses the sample() function of the random module to create a new list with the unique IDs of the random sample. This new list is used to select features from the input features, and the result is saved as a new feature class. Second, the input feature class, the output feature class, and the number of features to be selected are hard-coded in the script. Third, the script works for both shapefiles and geodatabase feature classes. This is accomplished by using OID@ when setting the search cursor, by using the OIDFieldName property to read the name of the field that stores the unique IDs, and by using the AddFieldDelimiters() function to ensure correct SQL syntax regardless of the type of feature class. The SQL syntax also warrants a bit of discussion. The WHERE clause uses the IN operator to compare the unique ID of the features to the list of randomly selected IDs, as follows: sqlexp = f"{sqlfield} IN {tuple(randomlist}}" In SQL, this list must be in a pair of parentheses, which is equivalent to a tuple in Python. To ensure proper string formatting, f-strings are used, but this formatting also can be accomplished using .format(). When testing the script, the following can be added to check what the WHERE clause looks like: print(sqlexp) For a geodatabase feature class, the WHERE clause looks something like this: OBJECTID IN (1302, 236, 951, 913, 837) For a shapefile, the WHERE clause looks something like this: "FID" IN (820, 1095, 7, 409, 145) The actual ID values to be selected change with every run of the script because the sample() function creates a new random selection regardless of any previous result. The WHERE clause is used as the third parameter of the Select tool, which creates the output feature class. The script does not work on stand-alone tables because the Select tool works only on feature classes. To work with stand-alone tables, the Select Layer By Attribute tool is used instead. Although the script works correctly, making changes to the inputs requires opening the script in a Python editor, typing the inputs, and running the script. A tool dialog box makes it a lot easier to share the functionality of this script. The goal of the script tool is for a user to be able to specify the input feature class to be used for sampling, the output feature class to save the result, and the count of features to be included in the random sample. In other words, the goal is to have a script tool in which the tool dialog box incorporates these features, as shown in the figure. It is important to have an expectation or vision as to what the final tool dialog box should look like because it facilitates preparing your script, and it guides decisions during the creation of the script tool. Simply drawing out on a piece of paper what you expect the final tool dialog box to look like can be a great help. Certain details may change along the way as you develop and test your script tool, but it helps to have a goal. Returning to the code example, there are a few things to notice about the script. First, the script is broken into sections, each preceded with comments, so the logic of the script is easier to follow. Comments are not required for a script tool to work, but if users of the script tool are likely to view the underlying code, comments will make it easier to understand. Second, the script uses several hard-coded values, including an input feature class, an output feature class, and a count of features. This type of hard coding is typical for stand-alone scripts. To facilitate using the script as a script tool, hard coding is limited to variables that will become tool parameters. No hard coding is used anywhere else in the script. To prepare your script for use as a script tool, follow these guidelines: Make sure your script works correctly as a stand-alone script first. Working correctly will require the use of hard-coded values. Identify which values will become parameters in your script tool. Create variables for these values near the top of your script, and make sure that the hard-coded values are used only once to assign values to the variables. The rest of your script should not contain any hardcoded values and use only variables. Although the stand-alone script will run correctly provided the input feature class exists, the script will require some changes to be used as part of a script tool. These changes are facilitated by limiting the hard coding of values to the variables that are going to be used as tool parameters. Returning to the New Script dialog box, it’s time to complete the basic information about the script tool in the General panel. The following information should be included: The name of a tool is used when you want to run a tool from Python. The name cannot contain any spaces and follows most of the same rules that apply to a Python function name. The label of the tool is the display name of the tool in the toolbox. The label name can have spaces. Consider the example of the Get Count tool. In ArcGIS Pro, the tool appears with its label, Get Count (with a space), but for the tool to be called from Python, its name, GetCount (without a space), is used. Note: Get Count is not a script tool but a system tool, and therefore there is no script. You can view the properties of any of the built-in geoprocessing tools, including system tools and script tools. Viewing the properties of existing tools is a good way to learn about tool design and get ideas for your own tools. The script file is the complete path of the Python script file to be run when the tool is executed. You can browse to an existing file or type the path of a file. The script file does not need to exist at this point and can be created later. There are three different options to decide on. The first option is whether you want to import the script. You should not import the script when you are still creating the script tool, but you can import it before sharing the tool with others. The option to set a password becomes active only if you check the option to import the script. The default settings—i.e., don’t import the script (and consequently don’t use a password )—are used for most script tools. The third option is to “store tool with relative path.” This option typically should be checked and is checked by default. When this option is checked, relative paths are used instead of absolute paths to reference the location of the script file in relation to the location of the custom toolbox file. Absolute paths start with a drive letter, followed by a colon, and then the folder and file name—for example, C:\Scripts\streams.py. Relative paths refer to a location that is relative to a current folder and do not use a full path. Only the path to the script file can be stored as a relative path; paths within the script itself will not be converted. Typically, your Python script and toolbox are in the same folder, and working with relative paths ensures the Python script can still be located if the folder is moved or renamed. If you are going to share the tool with others, which is often the goal of developing a script tool, make sure this option is checked. Chapter 5 revisits working with paths in script tools in more depth. For the example script tool, the dialog box is shown in the figure. Description Although you can continue with the tool parameters and validation, at this point you also can save the information and return to it later. Click OK to save the script tool properties. The script tool now appears in the custom toolbox. Although it looks like a finished script tool, the tool is far from completed. When you double-click on the script tool, the tool dialog box opens, but there are no parameters. When you click Run, the tool executes and runs the Python script associated with the script tool. At this stage, the script still uses the hard-coded values. If the Python script does not produce any errors, the script tool produces the desired outputs, which in this case consist of a new feature class with a sample of five features. Because the script tool executed the same as any regular geoprocessing tool, it resulted in geoprocessing messages and a new entry in the geoprocessing history. Even though the script tool ran correctly and produced the desired output, running the tool in this manner is not very meaningful. You are running the stand-alone script with the hard-coded values by calling it from within ArcGIS Pro, but the script tool is not user friendly. A user would have to edit the script to change the hard-coded values. Still missing are the tool parameters so a user can set those values using the tool dialog box instead of editing the hard-coded values in the script. Time to return to the tool parameters. Right-click on the script tool, and click Properties. This step brings up the script tool properties with the same three panels as the New Script dialog box, but this time the window reads Tools Properties: Random Sample. In the General panel, you can edit the properties set in an earlier step. Click on the Parameters tab to view the parameters for the script tool. By default, no parameters are listed, but most tools need at least one input parameter and one output parameter. The script tool parameters are organized as a table. Each row in the table is a parameter, and each column in the table is a property of the parameter. The next sections examine tool parameters in detail. 3.5 Exploring tool parameters All geoprocessing tools have parameters. Users enter parameter values in the tool dialog box. In contrast, for a stand-alone script, these values are often hard-coded in the script. The parameters must be set using the tool dialog box to create a script tool. When a script tool runs, the parameter values from the tool dialog box are passed to the script. The script reads these values and uses them in the code. Creating and exposing parameters requires the following steps: Including code in the script to receive the parameter values Setting up the parameters in the script tool properties These steps do not need to be completed in order, but they both must be completed for the script tool to work as intended. Next, you will examine how these steps are implemented by using one of the built-in tools, the Multiple Ring Buffer tool. This tool is a good example because it contains several different types of parameters, and because it is a script tool whose code can be viewed. The tool dialog box is shown in the figure. The Multiple Ring Buffer tool has seven parameters total, the first three of which are required. You can view details about these parameters under the tool properties. To bring up the tool properties, in the Geoprocessing pane, navigate to the Multiple Ring Buffer tool under Toolboxes > Analysis Tools > Proximity. Right-click on the Multiple Ring Buffer tool, and click Properties. Description Note: You cannot open the tool properties from the tool dialog box itself. The properties can be opened only by right-clicking on the tool in a list of tools—for example, by browsing to the tool using the Toolboxes tab or by searching for the tool by name. In the tool properties window, review the properties in the General panel. Notice the location of the Python script: C:\Program Files\ArcGIS\Pro\Resources\ArcToolBox\Scripts \MultiRingBuffer.py. The options are dimmed because you cannot make changes to this script tool. Description Click on the Parameters panel to view the properties of the seven parameters. The order in which the parameters are listed controls the order in which they are shown in the tool dialog box. Also notice the parameters are numbered starting with the number zero. When the tool is executed, the values for the parameters are passed to the script and can be accessed using an index starting at zero. Description The information also shows details about each parameter, including the label, name, data type, whether the parameter is required or optional, the direction (input or output), and several others. For example, the input features parameter consists of a feature layer, which is a required input parameter. Reviewing the parameters of an existing script tool is helpful, especially when you compare the properties with the tool dialog box. For example, the Outside Polygons Only parameter is a Boolean parameter, which means it shows up as a check box in the tool dialog box. The default value is false, which means it is not checked by default when the tool dialog box opens. Note: Because the Multiple Ring Buffer tool is a built-in tool, you can see the list of parameters and their properties, but you cannot make any changes. You can view the parameters for any geoprocessing tool (e.g., Clip, Buffer, and so on), but because standard tools are not written in Python, you cannot view the code. The Multiple Ring Buffer tool was chosen as an example, because it is a script tool (as are several other built-in tools), and you can view the code. Now that you have a better understanding of the parameters of this script tool, it is time to examine the Python script to see how the parameters are handled. When a user specifies the parameter values in the Multiple Ring Buffer tool dialog box, the tool can be run. Once the tool is run, the userspecified parameter values are passed to the script. Review the script to see how these parameter values are received by the script. To open the script, right-click on the Multiple Ring Buffer tool, and click Edit. Description The script opens in your default Python editor, which is often IDLE. You can change the default editor to be used from within ArcGIS Pro under Geoprocessing Options. In ArcGIS Pro, click Project > Options > Geoprocessing. You also can click on the Analysis tab and then on the Geoprocessing Options icon (right below Ready To Use Tools). You can select your script editor by navigating to the application. For example, the path to IDLE for the default environment is C:\Program Files\ArcGIS\Pro \bin\Python\envs\arcgispro-py3\Scripts\idle.exe. The path will be different when running a different environment. The path for Spyder is typically something like this: C:\Users\ <YourName>\AppData\Local\ESRI\conda\envs\ <YourEnvironment>\Scripts\spyder.exe. The path for PyCharm is typically something like this: C:\Program Files\JetBrains\PyCharm Community Edition 2019.3.4\bin\pycharm64.exe. The path for PyCharm is the same regardless of which environment you are using, but to work with a script, you must configure the environment from within PyCharm. Because, in this case, you only are viewing the script and not making any changes to it or running it from the Python editor, IDLE will suffice. As an alternative, you can use your Python editor and open the script directly by navigating to its location: C:\Program Files\ArcGIS\Pro\Resources\ArcToolbox\Scripts\MultiRingBuffer.py. The example MultiRingBuffer.py script includes introductory comments, the import of several modules, and a dictionary for unit conversions. The section of interest here is a bit farther down, starting at line 42. The tool parameters are received by the script using the GetParameterAsText() and GetParameter() functions of ArcPy. Notice how there are seven parameters total with index values 0 through 6. These parameters in the script match exactly with the parameters in the tool dialog box. When a user specifies tool parameters using the tool dialog box and then clicks Run, the values of the parameters are passed to the script as a list. The values are read on the basis of their index value. A combination of the GetParameterAsText() and GetParameter() functions is used, depending on the nature of each parameter. These two functions are examined in this section in a bit more detail. The syntax of the GetParameterAsText() function is <variable> = arcpy.GetParameterAsText(<index>) The only argument of this function is an index number on the tool dialog box, which indicates the numeric position of the parameter. The parameters set on the tool dialog box are sent to the script as a list, and the GetParameterAsText() function assigns these parameter values to variables in the script. The GetParameterAsText() function receives parameters as a text string, even if the parameter on the tool dialog box is a different data type. Numerical values, Boolean values, and other data types are all converted to strings, and additional code is included to correctly interpret these strings to their correct type. For example, the code for the Outside Polygons Only parameter is as follows: outsidePolygons = arcpy.GetParameterAsText(6) if outsidePolygons.lower() == 'true': sideType = 'OUTSIDE_ONLY' else: sideType = ' ' The Outside Polygons Only parameter on the tool dialog box is a Boolean value of true or false. These values are converted to strings, and as a result, the conditional statement uses the string value “true” instead of the Boolean value True. Another example of the formatting of the parameter values is illustrated by the Buffer Unit parameter, called “unit” in the script. In the tool dialog box, a user selects the unit from a drop-down list (e.g., Feet or Kilometers). In the script, the value is received as follows: unit = arcpy.GetParameterAsText(3).upper() The string method upper() is used to convert the input string to uppercase, resulting in FEET or KILOMETERS. The Distances parameter on the tool dialog box is received by the script using the GetParameter() function. This function is used because the parameter consists of multiple values (doubles, in this case) instead of a single value. The GetParameter() function reads this parameter as a list of floats. An alternative to using the GetParameterAsText() and GetParameter() functions is to use sys.argv, or system arguments. The index number for sys.argv starts at 1, so sys.argv[1] is equivalent to GetParameterAsText(0). The use of sys.argv has its limitations, including the fact that it accepts a limited number of characters, thus using the GetParameterAsText() and GetParameter() functions is preferred. The example script MultiRingBuffer.py uses a custom function get_parameter_values() to receive the parameter values, and this function is called later in the script. The use of a custom function is not required to receive parameter values, and many simpler scripts do not apply this practice. Every tool parameter has an associated data type. One of the benefits of data types is that the tool dialog box will not send values to the script unless they are the correct data type. User entries for parameters are checked against the parameter data types before they are sent to the script. This is one advantage of using tools over stand-alone scripts, because the script does not have to check for invalid parameters. The data types of the parameters of the Multiple Ring Buffer tool include a feature layer, a feature class, a list of doubles, three strings, and a Boolean. Many other data types are possible for the parameters of a script tool, from address locator to z-domain. Data types for parameters should be selected carefully because they control the interaction between the tool dialog box and the script. After parameters are assigned a data type, the tool dialog box uses this data type to check the parameter value. For example, if you enter a path to an element of a different data type, the tool dialog box will generate an error. In the example, the Input Features parameter is a feature layer, so typing the path for a raster, such as C:\Raster\elevation, generates an error and prevents the tool from running. Description This built-in error-checking mechanism prevents users from using incorrect parameters to run a tool. When the tool runs, the dialog box has already validated the parameter Input Features as a feature layer, and no additional code is needed in the script to verify this is the case. The data type property is also used when using the drop-down menu in the tool dialog box to select datasets and when browsing through folders for data. Only data that matches the parameter’s data type is shown. This prevents entering incorrect paths to data. In the previous example, if the raster dataset C:\Raster\elevation is added as a layer to the current map, this raster layer will not be shown in the drop-down options for the Input Features parameter. 3.6 Setting tool parameters Tool parameters for a script tool can be set when creating a new script tool. They can also be edited after the script tool is created by accessing the script tool properties. Setting parameters works the same, no matter at which stage they are set. The example used for illustration here is the Random Sample script tool created earlier for which no parameters were set. Right-click on the script tool, and click Properties to bring up the script tool properties. Note: Recall that double-clicking on the script tool brings up the tool dialog box itself, which does not allow you to set the parameters. This is the same as right-clicking on the script tool and clicking Open. Rightclick > Edit brings up the script associated with the script tool, but that is for a later step. You can complete the properties for each parameter by clicking on the cells in the table. Some properties must be typed (such as the label), whereas others must be selected from a list of options (such as the data type). The first parameter of the tool is the input features. For the Label property, enter Input Features. The label is the display name that shows up in the tool dialog box, so the label should be meaningful. Spaces are allowed for the label. To move to the next cell, click on that cell, or press Tab. The Name property is filled out with a default name based on the Label property without spaces. This default name typically is enough. Description Also, notice that a new row is added for the next parameter as soon as the minimally required properties for the first parameter are specified. First, however, you must complete the properties for the first parameter. Next, click on the cell for the Data Type property. This step brings up the Parameter Data type dialog box. The default is set to String, but you can select the appropriate type from the drop-down list of options. Select Feature Layer as the type, and click OK to close the Parameter Data type dialog box. A feature layer means that, on the tool dialog box, a user can use a dropdown list to select from the feature layers in the current map but also can browse to a feature class on disk. One of the other options for data type is Feature Class, which sounds similar to Feature Layer but allows only for the use of feature classes on disk and does not allow for the use of feature layers in the current map. The Parameter Data type has several other options. For example, you can select multiple data types (e.g., Feature Layer and Table View), which means all those types become valid entries for this parameter in the tool dialog box. There is also an option to use Multiple values, which makes it possible to enter multiple values of the same data type as a single parameter. These values are passed to the script as a list of values. The option to use a table of values makes it possible to enter multiple values but in a table format. These options allow for more complicated tool parameters. Familiarity with other geoprocessing tools helps explain the possibilities these options provide. Tools such as Intersect and Union use this table of values format, also referred to as a value table. For example, the first parameter of the Union tool is Input Features, but it allows you to select multiple feature layers and their associated ranks. Combined, the feature layers and ranks represent only a single tool parameter. Parameters with multiple values are passed to the script as a string, with the individual list elements separated by semicolons. The Python split() method can create a list of the elements from the string. The syntax is as follows: import arcpy input = arcpy.GetParameterAsText(0) input_list = input.split(";") As an alternative, parameters with multiple values also can be handled using arcpy .GetParameter(). In that case, the result is a list of values, and the individual values can be obtained using an index or by iterating over the list. For parameters that consist of a table of values, you can use GetParameter() to obtain a ValueTable object instead of a string or list. In a ValueTable object, the values are stored in a virtual table of rows and columns. ValueTable is an ArcPy class that is specifically designed for this type of parameter. Because of these various options, it is important when writing the script to be aware of the data type of the parameters being passed to the script from the tool dialog box. Returning to the Random Sample script tool example, the parameter properties so far are shown in the figure. Description For the remaining parameter properties, the default values are enough. These default values include Required for Type and Input for Direction. There are three choices for Type: Required, Optional, and Derived. Required means that a parameter value must be specified for a tool to run. Optional means that a value is not required for a script to run. Typically, when setting the Type property to Optional, a default value for the parameter is specified in the tool properties or in the script. Derived parameters are used for output parameters only and do not appear on the tool dialog box. Derived parameters are used in several cases, including the following: When a tool outputs a single value instead of a dataset. Such a single value is often referred to as a scalar When a tool creates outputs using information from other parameters When a tool modifies an input without creating a new output All tools should have outputs so that the tool can be used in a model and be called from a script. Sometimes the only way to ensure that a tool has an output is by using a derived parameter. Examples of tools with derived parameters include the Get Count and Add Field tools. The input parameter of the Get Count tool is a feature class, table, or raster, and the output is a count of the number of rows. This count is a scalar variable and is returned as a Result object. The count comprises an output parameter and is a derived parameter that does not appear on the tool dialog box. The tool properties confirm that the Get Count tool has two parameters, but because the Row Count variable is a derived output parameter, it does not appear in the tool dialog box. Note: Running the Get Count tool as a single tool is not common. Although the count is printed to the Results window, this tool typically is used within a model or script in which the output is used as the input to another step. The Get Count tool is also commonly used in conditional statements. For example, a procedure can be stopped if the count of rows is zero. The Direction property defines whether the parameter is an input of the tool or an output. There are only two choices for Direction: Input and Output. For derived parameters, the parameter direction is automatically set to Output. Every tool should have at least one output parameter. Having an output parameter makes it possible to use the tool in ModelBuilder. Although technically a script can run without output parameters, for ModelBuilder to work, every tool needs an output so it can be used as the input to another tool in the model. Returning to the Random Sample script tool, the tool dialog box so far consists of a single parameter. The second parameter consists of the output features to be created by the tool. The parameter properties are as follows: Label: Output Features Name: Output_Features Data Type: Feature Layer Type: Required Direction: Output The third parameter consists of the number of features to be selected at random. The parameter properties are as follows: Label: Number of Features Name: Number_of_Features Data Type: Long (for long integer) Type: Required Direction: Input The use of the Long data type ensures that only integer values are entered. The tool will not run if text or decimal numbers are entered. There is one additional parameter property to consider, which is the Filter property. Logically, the number of features should be a positive number, not a negative number or zero. The Filter property can be used to set the range of allowable values. The options for this property are None, Range, and Value. Clicking on the cell and selecting the Range option brings up the Range Filter dialog box. Enter the value of 1 for the minimum and a very large value for the maximum—e.g., 1000000000. Leaving the maximum blank is not an option, which means you must pick a somewhat arbitrary maximum number in this case. Click OK to close the Range Filter dialog box. The parameter properties are now as shown in the figure. Description These settings complete the tool parameters for the Random Sample script tool. If the order of the parameters must be modified, right-click on a row, and click Move Up or Move Down. If you must remove a parameter, rightclick on a row, and click Delete. For the Random Samples script tool, the order and number of parameters is correct, and no further changes are necessary. Click OK to save the tool properties. The tool dialog box is now as shown in the figure. Of the parameter properties reviewed so far, the Filter property requires a bit more discussion. The Filter property allows you to limit the values to be entered for a parameter. There are several filter types, and the type depends on the data type of the parameter. For example, for the Long and Double data types, the types are Range and Value List. The Range filter allows you to specify minimum and maximum values, and the Value List filter allows you to specify a list of specific values. These filters are like the range and coded value domains for a geodatabase. When the data type is a feature layer or feature class, there is only one filter, called Feature Type. This allows you to filter the valid entries for the parameter on the basis of geometry type, including point, polyline, and polygon features. The available filter types consist of the following: Areal units—acres, hectares, and so on Feature type—point, polyline, polygon, and so on Field—short, long, float, double, text, and so on File—custom file extensions (e.g., .csv, .txt, and so on) Linear units—inches, feet, meters, and so on Range—values between a specified minimum and maximum value Time units—seconds, minutes, hours, and so on Value list—a set of specific custom values Workspace—file system, local database, or remote database For most data types, there is only one filter type. For example, if the data type of a parameter is set to Feature Class, the only possible filter type is Feature Type. Many data types have no filter type at all. The different filter types exert specific control over which values are valid inputs. Carefully setting the filter type improves the robustness of the tool. There are several parameter properties that have not been covered yet, including Category, Dependency, Default, Environment, and Symbology. These properties were not used in the Random Sample example script tool, but they can be important for certain tools. Each property is reviewed briefly in this section. The Category property allows you to organize the tool parameters in the tool dialog box. You create your own categories by typing a name. After you use a category once, the name appears as a drop-down option for the other parameters. Parameters within the same category are organized in an expandable group in the script tool dialog box. This grouping is sometimes used for tools with many parameters. Consider the example of the Empirical Bayesian Kriging tool, which has no less than 15 tool parameters. Several of the optional parameters are grouped into categories to make the tool dialog box easier to read. Description The parameters in a category can be collapsed or expanded in the tool dialog box as needed. Description The Dependency property can be used for input and derived output parameters. In many cases, a tool parameter is closely related to another one. For example, consider the Delete Field tool. The first parameter is an input table, and the second parameter, Drop Field, is a list of fields. The list of fields is populated only when the input table is selected. Description This dependency of one parameter on another parameter in the same tool is controlled using the Dependency property. In the example of the Delete Field tool, the Dependency property of the Drop Field parameter is set to the input table. A second reason to use the Dependency property is to work with derived output parameters. For example, when an input parameter is modified by a tool, the Dependency property of the derived output parameter is set to the input parameter. In the case of the Delete Field tool, the Dependency property of the output parameter is set to the input table. Note: Remember that the derived output parameter is not visible in the tool dialog box. The Default property allows you to specify the value of the parameter when the script tool dialog box is open. If no default value is specified, the parameter value will be blank on the tool dialog box. Default values are commonly used for Boolean parameters. The Environments property provides another option to set default values. This property provides a drop-down list with environment settings. You select a specific setting, and when this property is set, the default value for the parameter is obtained from the environments of the geoprocessing framework. The Symbology property allows you to specify the path to a layer file. By default, the output of a geoprocessing tool is added to the current map. This behavior can be set as part of Geoprocessing Options by checking the box for “Add output datasets to an open map.” The symbology of a layer added in this way follows the regular rules for adding data to a map in ArcGIS Pro —in other words, there is no customized symbology. The Symbology property can be set to a custom layer file (.lyrx). This option is available only for outputs for which layer files make sense, such as feature layers, rasters, TINs, and so on. Setting the Symbology property does not control whether the output is added to an open map because this option is controlled by Geoprocessing Options. 3.7 Editing tool code to receive parameters Although the tool dialog box for the Random Sample script tool is created, the tool is not ready to be used. Still missing are changes to the Python script to receive the tool parameters when the tool is executed. When testing a script tool, you will alternate between running the tool and editing the script until the tool works as desired. You can leave the Python editor open while you carry out this testing. You can open a script from within the Python editor, but there is a shortcut in ArcGIS Pro. Right-click the script tool in the toolbox, and click Edit, which opens the Python script associated with the script tool in a Python editor. You can configure which editor is used under Geoprocessing Options, as discussed earlier in this chapter. Editing a script in a Python editor does not prevent the script from being used by a script tool. Therefore, you can leave your script open in a Python editor during the testing of the script tool. Logically, you must save your changes to the script for them to take effect. When executing a script tool, the tool calls the associated script. This call is independent from using a Python editor. In other words, whether your Python editor is open or not and whether you have the script open or not has no effect on the execution of the script tool. Also, when a script tool calls a script, no messages are printed to the interactive interpreter of your Python editor. Time to consider the changes necessary to the script. The following code shows the portion of the original script that must be modified: inputfc = "C:/Random/Data.gdb/points" outputfc = "C:/Random/Data.gdb/random" outcount = 5 The three variables that are hard-coded into the script must be modified to parameters using the GetParameterAsText() or GetParameter() function. The first two parameters are both strings, and the GetParameterAsText() function will suffice for these values. The third parameter is a number, which means the GetParameter() function can be used. The modified code is as follows: inputfc = arcpy.GetParameterAsText(0) outputfc = arcpy.GetParameterAsText(1) outcount = arcpy.GetParameter(2) The GetParameterAsText() function also can be used for parameters that are not strings, but the value must be cast to the appropriate type. For example, the third parameter could also be received as follows: outcount = int(arcpy.GetParameterAsText(2)) It is important to recognize that GetParameterAsText() returns the values as a string, whereas GetParameter() returns the values as an object. Thus, a decision between using GetParameter() and GetParameterAsText() must be based on a good understanding of the values being passed to the script. Once you make these changes to the script and save the script file, the tool is ready to run, and it can be used like any regular geoprocessing tool. The tool creates a random selection of the input features on the basis of the specified number and saves the result to a new feature class. The new feature class is added as a feature layer to the active map. The execution of the tool results in messages and a new entry to this geoprocessing history. Typically, a tool does not work perfectly the first time it is tested, and you may find yourself tweaking both the Python script and the tool parameters in an iterative manner until the tool performs as desired. You also may test your tool for robustness by trying to enter incorrect or invalid parameter values. For example, what happens if you enter a negative number for the number of features? Or what happens if the number you enter is larger than the number of features in the input feature class? Note: The answer to the first question is that an error appears in the tool dialog box because a filter was used, and the tool will run only if the number of features is greater than 1. The answer to the second question is that the tool executes but produces an error generated by the line randomlist = random .sample(inlist, outcount). The error is ValueError: Sample larger than population or is negative. At this stage, therefore, the Random Sample tool works but is not very robust. 3.8 Customizing tool behavior Once the parameters of a script tool are specified, you can add custom behavior. Examples of custom behavior include the following: Certain parameters may need to be enabled or disabled on the basis of the values contained in other parameters. Some parameters may benefit from having a default value specified on the basis of the values in other parameters. Warning and error messages may need to be customized. Tool behavior can be set on the Validation tab on the Tool Properties dialog box. In the Validation panel, you can use Python code that uses a Python class called ToolValidator. The ToolValidator class controls how the tool dialog box is changed on the basis of user input. It also is used to describe the output data the tool produces, which is important for using tools in ModelBuilder. The ToolValidator class makes it possible to create more robust tools. A detailed description of customizing tool behavior is not provided here. Details on the ToolValidator class can be found in ArcGIS Pro help, under the topics “Customizing Script Tool Behavior” and “Programming a ToolValidator Class.” The ToolValidator class is used only on the Validation panel of the script tool properties. The code for validation is written in Python and can be edited using a Python editor, but the code is embedded in the toolbox instead of as a separate script file. 3.9 Working with messages One of the advantages of running a script as a script tool is being able to write messages that appear in the tool dialog box and in the geoprocessing history. Tools and scripts that call a tool also have access to these messages. When scripts are run as stand-alone scripts, messages are printed only to the interactive interpreter—there is no tool dialog box and no geoprocessing history in which messages can be retrieved later. Also, there is no sharing of messages between stand-alone scripts. However, because script tools work the same as any other geoprocessing tool in ArcGIS Pro, they automatically create messages. For example, when the Random Sample tool is run, it prints simple messages that indicate when the script started running and when it was completed. Several ArcPy functions are available for writing additional messages. These functions include the following: AddMessage()—for general information messages (severity = 0) AddWarning()—for warning messages (severity = 1) AddError()—for error messages (severity = 2) AddIDMessage()—for both warning and error messages AddReturnMessage()—for all messages, independent of severity The AddReturnMessage() function can be used to retrieve all messages returned from a previously run tool, regardless of severity. The original severity of the geoprocessing messages is preserved—for example, an error message is printed as an error message. Some of the other message functions create a custom message. The use of these message functions is illustrated in this section using the Random Sample script tool. When a user of the Random Sample tool enters a value that exceeds the number of input features, the script fails, and an error is reported, as shown in the figure. The error results from the random.sample() function when the value for outcount is greater than the value for inlist. The error message is somewhat informative (“Sample larger than population or is negative”) but not user friendly. Especially for a user who is not familiar with Python code, the message is not very helpful. The message is potentially misleading because it appears to refer to a potential issue with the random.py script in the default environment (arcgispro-py3). For a typical ArcGIS Pro user without scripting experience, such messages are confusing and frustrating. To make the tool more robust, a check can be added to the script to compare the number of features to be selected with the number of input features. The number of input features is determined using the Get Count tool. This additional code should follow the block of code in which the tool parameters are received, as follows: fcount = arcpy.GetCount_management(inputfc)[0] if outcount > int(fcount): When the number of features to be selected exceeds the number of input features, the AddError() function is used to return a custom message: arcpy.AddError("The number of features to be selected" "exceeds the number of input features.") When this error happens, the script should end. The script can be ended using the following code: sys.exit(1) Exit code 1 means there was a problem, and that is why the script ended. Exit code 0 is used when the script ends without any problems. This line of code also requires adding import sys at the top of the script. When the number of features to be selected does not exceed the number of input features, the script should continue as usual, as follows: else: <rest of script> The updated script is now as shown in the figure. Description When these changes are made to the script, the tool fails when the number of features to be selected exceeds the number of input features. Description The error message provides specific feedback to the user about why the tool failed. Even though the tool fails, the feedback to the user is more specific and does not bring up potentially misleading messages related to the Python code. Different scenarios may not result in an error but warrant a warning or another type of message. For example, in the case of the Random Sample tool, what if the number of features to be selected is the same as the number of input features? This number would not be a reason for the tool to fail, but the tool has effectively copied the input features without making any changes. The AddWarning() function can be used to report a warning message: if outcount == int(fcount): arcpy.AddWarning("The number of features to be selected" "is the same as the number of input features." "This means the tool created a copy of the" "input features without creating a sample.") This code is added to the end of the script, but inside the else block. A warning message does not prevent the tool from finishing, but the tool dialog box reports that the tool completed with warnings, as shown in the figure. The View Details link brings up the custom message, as shown in the figure. Another level of control can be accomplished using the AddIDMessage() function. This function makes it possible to use system messages within a script tool. The syntax of the function is as follows: AddIDMessage(message_type, message_ID, {add_argument1}, {add_argument2}) The message type can be set to ERROR, INFORMATIVE, or WARNING. The message ID number indicates the specific Esri system message. Depending on the message, additional arguments may be necessary. In the following example code, an error message with the message ID number 12 is produced if the output feature class already exists: import arcpy infc = arcpy.GetParameterAsText(0) outfc = arcpy.GetParameterAsText(1) if arcpy.Exists(outfc): arcpy.AddIDMessage("ERROR", 12, outfc) else: arcpy.CopyFeatures_management(infc, outfc) The syntax of error message 12 is 000012: <value> already exists This message has one argument, which in this case is the name of a feature class. There are more than one thousand message codes, and there is no single list of all of them in the ArcGIS Pro help pages. You can enter a message code in the help pages to view the description, but you cannot browse through and search the message descriptions. In Python, you can use the arcpy.GetIDMessage() function to get the description associated with a specific message ID, as follows: import arcpy m_id = 12 print(arcpy.GetIDMessage(m_id)) The result is %s already exists The modulo symbol (%) is a placeholder for the name of the file. To make this information more useful, you can build a dictionary of all the message IDs and their associated string values, as follows: import arcpy dict_id = {} for k in range(1000000): v = arcpy.GetIDMessage(k) if v: dict_id[k] = v The error codes consist of six digits, from 1 through 999999, which is why the argument of the range function is set to 1000000. Not all possible numbers are valid error codes, which is why a message is added to the dictionary only if it has a value. Once the dictionary is created, you can search through it for system messages of interest. For example, the following code prints all system messages that have JSON as part of the string: for k,v in dict_id.items(): if "JSON" in v: print(k, v) The result prints as follows: 1303 Invalid JSON in WebMap. 1451 Unable to parse service configuration JSON. 2092 Failed to export diagram layer definition to JSON. ... Working with system messages in your script tool provides additional integration of your tool with the geoprocessing framework of ArcGIS Pro, although it can be cumbersome to sift through the specific message codes. 3.10 Handling messages for stand-alone scripts and tools Python scripts can be run as stand-alone scripts or as tools. Messaging works a bit differently for each one. However, a script can be designed to handle both scenarios. For a stand-alone script, there is no way to view messages, and they must be printed to the interactive interpreter. For a tool, functions such as AddError() are used instead of printing messages to ensure messages appear in the geoprocessing environment, including the geoprocessing history. Standard practice is to write a message-handling routine that writes messages to both the interactive interpreter and the geoprocessing environment, using the print() function for the former and ArcPy functions such as AddError(), AddWarning(), and AddMessage() for the latter. 3.11 Customizing tool progress information When a tool runs, information on its progress can take several forms. The appearance of the progress dialog box can be controlled using the ArcPy progressor, or progress indicator, functions. The ArcPy progressor functions include the following: SetProgressor()—sets the type of progressor SetProgressorLabel()—changes the label of the progressor SetProgressorPosition()—moves the step progressor by an increment ResetProgressor()—resets the progressor There are two types of progressors: default and step. In the default type, the progressor moves back and forth continuously but doesn’t provide a clear indication of how much progress is being made. The label above the progressor provides information on the current geoprocessing operation. In the step progressor, the percentage completed is shown. This information can be useful when processing large datasets. The type of progressor is set using the SetProgressor() function. This function establishes a progressor object, which allows progress information to be passed to the tool dialog box. The appearance of the tool dialog box can be controlled using either the default progressor or the step progressor. The syntax of this function is as follows: SetProgressor(type, {message}, {min_range}, {max_range}, {step_value}) The progressor type is either default or step. The message is the progressor label that appears at the beginning of the tool execution. The three remaining parameters are for step progressors only and indicate the start value, end value, and step interval. In a typical step progressor, the start value is set to 0, the end value to however many steps are completed in the geoprocessing operations, and the step interval to 1. The SetProgressorLabel() function is used to update the label of the progressor, which is typically a unique string specific to each step. The SetProgressorPosition() function is used to move the step progressor by an increment on the basis of the percentage of steps completed. These functions are commonly used in combination so that the label is updated at every increment. Once tool execution is completed, the progressor can be reset to its original position using the ResetProgressor() function. This function is used only if there is a second series of steps to complete for which the progress should be shown separately. There is no need to reset the position of the progressor when a tool is completed. The following example script illustrates the use of a step progressor. The script is associated with a script tool that copies all the shapefiles from one workspace to a geodatabase. A step progressor is used, and the number of steps is derived from the number of feature classes in the list. In the for loop, the label is changed to the name of the shapefile being copied, and after the shapefile is copied, the step progressor is moved by an increment. The script is as follows: import arcpy import os arcpy.env.workspace = arcpy.GetParameterAsText(0) outworkspace = arcpy.GetParameterAsText(1) fclist = arcpy.ListFeatureClasses() fcount = len(fclist) arcpy.SetProgressor("step", "Copying shapefiles to geodatabase...", 0, fcount, 1) for fc in fclist: arcpy.SetProgressorLabel("Copying " + fc + "...") fcdesc = arcpy.Describe(fc) outfc = os.path.join(outworkspace, fcdesc.baseName) arcpy.CopyFeatures_management(fc, outfc) arcpy.SetProgressorPosition() Running the script brings up a step progressor that shows the percentage completed. This percentage is calculated from the step progressor parameters—that is, the steps are automatically converted to a percentage as they are completed. An important consideration is the number of steps being used in a step progressor. In many scripts, it is not known in advance how many features, feature classes, fields, or records must be processed. A script that uses a search cursor, for example, may iterate over millions of records. If each iteration is one step, the progress dialog box would need to be updated millions of times, which could severely reduce performance. It may therefore be necessary to include a section in the script that determines the number of iterations (features, feature classes, rows, or whatever the case may be), and then determines an appropriate number of steps on the basis of the number of iterations. Code examples for determining the number of steps are provided in the ArcGIS Pro help topic “Controlling a Script Tool’s Progressor.” Points to remember Although Python scripts can be run as stand-alone scripts outside ArcGIS Pro, there are many benefits to creating custom tools within ArcGIS Pro. Tools allow a closer integration of scripts in the ArcGIS Pro geoprocessing framework. Tools also make it easier to share the workflows with others who may not have experience using Python. There are two ways to develop tools for use in ArcGIS Pro using Python: script tools and Python toolboxes. Script tools are created using elements of the ArcGIS user interface whereas Python toolboxes are created entirely in Python. A script tool can be created in a toolbox (.tbx) and reference a single Python script file (.py) that is called when the tool is run. For tools to be usable and effective, script tool parameters must be created. Creating tool parameters includes setting parameters in the script tool properties, as well as including code in the script to receive the parameter values. Script tool parameters define what the tool dialog box looks like. Effective tools have carefully designed parameters. Each parameter has several properties, including a data type, such as feature class, table, value, field, or other. The parameter properties provide detailed control of the allowable inputs for each parameter. This control ensures that the parameters passed from the script tool dialog box to the script are as expected. All script tools should have outputs so that the tool can be used in ModelBuilder and other workflows. Sometimes the only way to achieve outputs is to use derived parameters, which do not appear on the tool dialog box. Tool behavior can be further customized using a ToolValidator class. Various message functions can be used to write messages that appear in the tool dialog box and in the geoprocessing history. The appearance of the progressor also can be modified. This progress indicator is particularly relevant if the tool is likely to carry out many iterations. Key terms absolute path comments custom behavior dependency derived parameter hard-coded value progressor Python toolbox relative path scalar script tool stand-alone script tool dialog box Review questions What are some of the benefits of using custom tools compared with using stand-alone Python scripts? Describe the steps to create a script tool. What are some of the critical tool parameters to be aware of during the creation of a script tool? What changes must be made to the code of a stand-alone script for it to be used as part of a script tool? Chapter 4 Python toolboxes 4.1 Introduction This chapter describes how to create a Python toolbox. Python toolboxes provide an alternative to creating a Python script tool. A Python toolbox can contain one or more tools. From a user perspective, tools inside a Python toolbox work just like regular geoprocessing tools. Many of the benefits of script tools also apply to tools inside a Python toolbox. The biggest difference from a developer perspective is that a Python toolbox is written entirely in Python. This characteristic makes Python toolboxes a preferred approach for those with more experience in developing tools and writing Python scripts. This chapter describes the steps to create tools inside a Python toolbox, including how to define the parameters. 4.2 Creating and editing a Python toolbox A Python toolbox is a Python file with a .pyt extension. The use of the .pyt file extension means that ArcGIS Pro automatically recognizes the file as a Python toolbox. One Python toolbox defines one or more tools. This means that if your Python toolbox contains multiple tools, they are all part of the same Python code in a single .pyt file. To create a new Python toolbox in ArcGIS Pro, right-click on the folder in Catalog where you want to create it, and click New > Python Toolbox. This step creates a new Python toolbox with a default name. The symbol for a Python toolbox is like that for a regular toolbox in ArcGIS Pro but has a small script icon showing in front. A new tool has also been added with a default name. You also can create a new Python toolbox by right-clicking on the Toolboxes folder and clicking New Python Toolbox. You can rename the Python toolbox in the Catalog pane. A default tool named Tool has been added, but this default tool is just a temporary placeholder. Double-clicking on the tool brings up the tool dialog box, but the tool has no parameters at this point. When a new Python toolbox is created, a basic template is created from scratch by ArcGIS Pro. This is the reason why you typically should create a new Python toolbox from within ArcGIS Pro instead of starting with a new empty script file in your Python editor. Alternatively, you can copy the code from the template to your own script file. To view the template, and to start modifying the code, right-click on the Python toolbox, and click Edit. This step brings up the contents of the .pyt file in the default script editor configured in ArcGIS Pro. Note that you are editing the Python toolbox itself and all the tools that are part of the toolbox —not the code for an individual tool. Recall that when creating Python script tools, each script tool has its own associated script, and therefore you edit the code for each tool separately, and not for the custom toolbox. In a Python toolbox, the code for all the tools resides in a single .pyt file. If no script editor is specified under Geoprocessing Options, the default application to work with .py files in your operating system is used. This default depends on how you configure your applications and file associations. Therefore, it could be Notepad, IDLE, PyCharm, or something else. Your operating system typically does not have a default application configured for .pyt files, but when you edit a Python toolbox from within ArcGIS Pro, it uses the default application associated with .py files. To specify a script editor to be used from within ArcGIS Pro, enter the path for the script editor under Geoprocessing Options. The typical path for IDLE for the default environment is C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts\idle.exe. The template for a new Python toolbox is shown in the figure. The details for working with this template are covered later in this chapter. At this point, it is important to recognize that the code resides in a .pyt file, not a regular script file with a .py extension. A .pyt file is a text file, just like a .py file, and therefore it can be opened in a regular text editor. In a Python editor, however, files with a .pyt extension are not always recognized as a Python file type. Although IDLE recognizes a .pyt file as Python code, many other IDEs do not, including Spyder and PyCharm. When an IDE does not recognize the contents of a .pyt file as regular Python code, there is no syntax highlighting or other functionality to work with code. Notice the lack of syntax highlighting of a .pyt file open in Spyder as shown in the figure. You could temporarily save your .pyt file as a .py file to overcome this limitation, but to test the Python toolbox in ArcGIS you would have to change it back to .pyt, which is cumbersome. Some Python editors, however, allow you to associate .pyt files with a Python file. In PyCharm, with the .pyt file open, click File > Associate with File Types. In the Register New File Types Association dialog box, make sure the file pattern is set to .pyt, and choose Python for the option to Open matching files in PyCharm. Once you click OK, the .pyt file is recognized as Python code, and you can use some of the functionality of PyCharm, including syntax highlighting and error checking. In PyCharm, you also can review and modify the file type associated by clicking File > Settings > Editor > File Types. Note: The typical file extension for Python scripts is .py. This extension is associated with the python.exe program, which opens a terminal window when run. In Windows, the extension can be .pyw. This extension is associated with the pythonw.exe program, which suppresses the terminal window. When scripts are run from an application with a GUI, you don’t want a separate terminal window to open. Regardless of the IDE, keep in mind that you cannot run a .pyt file from a Python editor. You can test the code only by using the tools inside the Python toolbox from within ArcGIS Pro. Time for a closer look at the template code. The Python toolbox code includes a Python class named Toolbox that defines the characteristics of the toolbox. This part of the code is as follows: class Toolbox(object): def __init__(self): """Define the toolbox (the name of the toolbox is the name of the .pyt file).""" self.label = "Toolbox" self.alias = "" # List of tool classes associated with this toolbox self.tools = [Tool] Note: This class always should be called Toolbox. If you change this name, the Python toolbox will not be recognized correctly. The name of the Python toolbox is the name of the .pyt file. There is a comment to this effect in the code to remind you of this rule, and you cannot change the name of the Python toolbox in the code itself. The Toolbox class contains one method called __init__(), which defines the properties of the Python toolbox. These properties include an alias and a label. An alias can consist of only letters and numbers, with no spaces or other characters. For the purpose of this chapter, the same Random Sample tool example from chapter 3 is used for illustration. Typically, you would not develop a script tool and a Python toolbox for the same tool, but using the same example script facilitates a comparison. The first part of the Toolbox class is as follows: class Toolbox(object): def __init__(self): self.label = "Random Sampling Tools" self.alias = "randomsampling" Note: Comments and empty lines in the template are removed for the most part in the code examples here for legibility purposes. As you modify the .pyt file, you can check in ArcGIS Pro that the changes are taking effect. Make sure to save your .pyt file in your Python editor, and refresh your Catalog view. For example, after the previous changes are made, right-click on the Python toolbox, and click Properties. You cannot make changes to these properties directly in ArcGIS Pro because changes are made only by editing the .pyt file. There is no need to close the .pyt file in your Python editor, but be sure to save your edits before you check the results in ArcGIS Pro. Note: To see the effects of your code changes, you must refresh both the folder where the Python toolbox resides and the Python toolbox itself. You refresh in Catalog view by right-clicking on the folder or Python toolbox and clicking Refresh. If your code contains an error—for example, using an invalid character for the alias—you will see the results in the toolbox properties, as shown in the figure. Many other errors are not as easily identified. The next property of the Toolbox class is critical. The template code is self.tools = [Tool] The self.tools property consists of a list containing all the tools defined in the toolbox. The template contains only one tool, and it is called Tool by default. Each tool is created as a class in the .pyt file, the first portion of which reads as follows: class Tool(object): def __init__(self): """Define the tool (tool name is the name of the class).""" self.label = "Tool" self.description = "" self.canRunInBackground = False The name of the class must correspond to the name of the tool used in the self.tools property of the Toolbox class. Naming of these classes follows the regular naming conventions for classes in Python—i.e., CapWords, with no spaces. The tool class contains a method called __init__(), which defines the properties of the tool. These properties include self.label and self.description. The name of the tool is established by the name of the class itself. The label is what appears in ArcGIS Pro as the display name of the tool, whereas the name of the tool is what allows you to call the tool from another script or tool. The __init__() method has several other properties. The self.canRunInBackground property is a Boolean property to specify background processing of the tool in ArcGIS Desktop 10.x and has no effect in ArcGIS Pro. The self.category property makes it possible to organize tools into toolsets within a Python toolbox. The self.stylesheet property allows you to change the stylesheet, but the default suffices in most cases. When setting the properties for a tool, you include only those properties that are given a value. The others can be left out. For the Random Sample tool, the code so far is as follows: class Toolbox(object): def __init__(self): self.label = "Random Sampling Tools" self.alias = "randomsampling" self.tools = [RandomSample] class RandomSample(object): def __init__(self): self.label = "Random Sample" This code updates the display of the tool in ArcGIS Pro, as shown in the figure. The tool class contains several other methods in addition to the __init__() method. Even though they are all listed in the template, not all of them are required. The following briefly describes each of these methods and states whether they are required or not. Later sections in this chapter examine these methods in more detail. getParameterInfo()—optional. Defines the tool parameters, similar to the Parameters panel for a script tool. isLicensed()—optional. Allows you to set whether the tool is licensed to execute. updateParameters()—optional. Used for internal validation of tool parameters. updateMessages()—optional. Used for messages created by validation of tool parameters. execute()—required. This is the source code of the tool in which the actual task is being carried out, like the script file in a script tool. To create a functional tool inside a Python toolbox requires the following three methods: (1) __init__(), to define the properties of the tool; (2) getParameterInfo(), to define the tool parameters; and (3) execute(), to carry out the actual task of the tool. Technically speaking, a tool can run without the getParameterInfo() method (i.e., it is optional), but not using this method means the tool dialog box has no parameters, and that is not very meaningful. The next sections provide details on how to set up the getParameterInfo() and execute() methods. The other functions provide additional functionality but are not required for a tool to work. The example so far uses only one tool. A single Python toolbox can contain multiple tools. To create multiple tools, the self.tools property is assigned a list of tools, and the Tool class is repeated for each tool. The basic structure is as follows: class Toolbox(object): def __init__(self): self.label = "My Cool Tools" self.alias = "mycooltools" self.tools = [CoolTool1, CoolTool2] class CoolTool1(object): def __init__(self): self.label = "Cool Tool 1" def getParameterInfo(self) # Parameter definition def execute(self, parameters, messages): # Source code class CoolTool2(object): def __init__(self): self.label = "Cool Tool 2" def getParameterInfo(self) # Parameter definition def execute(self, parameters, messages): # Source code This structure illustrates another key difference with script tools. In a Python toolbox, the code for all tools resides inside the same .pyt file, whereas in a custom toolbox with multiple script tools, each script tool is associated with its own Python script file. 4.3 Defining a tool and tool parameters Tools in a Python toolbox must have parameters to be useful, as with script tools. In a Python toolbox, tool parameters are defined using the getParameterInfo() method. Each parameter is created as a Parameter object. The syntax for the Parameter class is as follows: Parameter({name}, {displayName}, {direction}, {datatype}, {parameterType}, {enabled}, {category}, {symbology}, {multiValue}) Each parameter of this class corresponds to a property of the tool parameter. Notice that none of the class parameters is required, but logically several of them are needed to create a meaningful tool parameter. All these class parameters are strings except for enabled and multiValue, which are Booleans. Because a typical tool parameter may require several of these class parameters and their values (often strings) may be long, an alternative notation typically is used in which each class parameter is on a separate line. A generalized example for a single-input tool parameter looks something like the following: def getParameterInfo(self): param0 = arcpy.Parameter( displayName="Input Features", name="in_features", datatype="GPFeatureLayer", parameterType="Required", direction="Input") Because all class parameters are named, there is no need to use the same order as in the syntax, and there is no need to include class parameters if they are not being used. The preceding notation relies on implicit line continuation. Because the class parameters are surrounded by a pair of parentheses, all the lines following the opening parenthesis are read as a single line of code until the closing parenthesis. Effectively, the preceding code is the same as the following: def getParameterInfo(self): param0 = arcpy.Parameter(displayName="Input Features", name="in_features", datatype="GPFeatureLayer", parameterType="Required", direction="Input") Breaking down the syntax using a new line for each class parameter makes the code easier to read and edit, but this style is not required. Almost all published examples of Python toolboxes, however, use this style. The Parameter class is used to create a Parameter object, and this object is assigned to a variable. Variables can be called anything (within the rules for variable names), but many examples use the names param0, param1, param2, and so on. The numbering starts at zero because this system facilitates the use of index numbers, as becomes clear later in this section. A commonly used alternative is to use a variable that is similar to or the same as the name property of the parameter. The variable naming style you use is a matter of preference. Once the parameter objects are created, additional properties are defined. For example, recall the use of the Filter property when creating a script tool. When applied to a feature layer, the Filter property makes it possible to set the allowable feature types—e.g., point, polyline, or polygon. A similar approach can be used when defining properties of tool parameters in a Python toolbox. Filters are set using the filter property of the Parameter class. This property uses the Filter class in ArcPy, which allows you to use the same filters used when creating a script tool. The two options for filters are list and type. For example, the following code sets a filter on the tool parameter so that only polylines can be chosen: param0.filter.value = ["Polyline"] In summary, a Parameter object is created by providing values for some of the class parameters that are part of the syntax for using the class. Additional properties can be assigned a value once the object is created using properties of the object. Those properties are not assigned a value when the object is first created. In addition to filter, some of these properties include displayOrder (the order in which to display a tool in the tool dialog box), parameterDependencies (to indicate dependencies between parameters), and value (the value of the parameter). In addition to properties, Parameter objects also have several methods, which are mostly used for messages. Once tool parameters are defined, the final step in the getParameterInfo() method is to return the parameters as a list, as follows: parameters = [parm0, param1, …] return parameters Returning to the example of the Random Sample tool, here is what the first parameter looks like: def getParameterInfo(self): input_features = arcpy.Parameter( name="input_features", displayName="Input Features", datatype="GPFeatureLayer", parameterType="Required", direction="Input") parameters = [input_features] return parameters Note: Defining the parameter properties is not enough for the parameter to show in the tool dialog box. The parameters also must be returned, so be sure to include the last two lines of code to test the parameter definitions. This is just the first parameter, but it is useful to confirm that the code so far is working. Return to ArcGIS Pro, and double-click on the Random Sample tool to bring up the tool dialog box. You also can check the parameter properties by opening the tool properties. Right-click on the Random Sample tool in the Python toolbox, and click Properties. Click on the Parameters panel to bring up the parameter properties. The results look identical compared with the results when creating a script tool. The difference is that all the properties are created using Python code instead of using the Tool Properties dialog box. You cannot make any changes to these properties from within the dialog box because the Python toolbox file is read-only from within ArcGIS Pro. The second parameter of the Random Sample tool is the output feature layer or feature class. The parameter definition is as follows: output_features = arcpy.Parameter( name="output_features", displayName="Output Features", datatype="GPFeatureLayer", parameterType="Required", direction="Output") The third parameter of the Random Sample tool is the number of features to be chosen. The parameter definition is as follows: no_of_features = arcpy.Parameter( name="number_of_features", displayName="Number of Features", datatype="GPLong", parameterType="Required", direction="Input") For the tool parameters to be recognized, their values must be returned. Update the last two lines of code as follows: parameters = [input_features, output_features, no_of_features] return parameters The tool dialog box now starts to look like a finished tool, as shown in the figure. You may recall from chapter 3 that one parameter property is still missing. The number of features must be a positive integer greater than 1, which requires the use of a filter. Filters are accessed using the filter property of the Parameter object. The data type of the parameter constrains which filters are possible. For integer, both Range and ValueList are valid filter types, so you must first set the type, followed by the values to be used for this type. The code is as follows: no_of_features.filter.type = "Range" no_of_features.filter.list = [1, 1000000000] You can check the properties of the tool parameters to confirm that a range filter is applied, although you won’t be able to review the values of the filter. The complete code for the Python toolbox so far follows. Notice how the code for the filter property comes after the line of code in which the parameter for the number of features is created and has the same indentation. import arcpy class Toolbox(object): def __init__(self): self.label = "Random Sampling Tools" self.alias = "randomsampling" class Toolbox(object): def __init__(self): self.label = "Random Sampling Tools" self.alias = "randomsampling" self.tools = [RandomSample] class RandomSample(object): def __init__(self): self.label = "Random Sample" def getParameterInfo(self): input_features = arcpy.Parameter( name="input_features", displayName="Input Features", datatype="GPFeatureLayer", parameterType="Required", direction="Input") output_features = arcpy.Parameter( name="output_features", displayName="Output Features", datatype="GPFeatureLayer", parameterType="Required", direction="Output") no_of_features = arcpy.Parameter( name="number_of_features", displayName="Number of Features", datatype="GPLong", parameterType="Required" direction="Input") no_of_features.filter.type = "Range" no_of_features.filter.list = [1, 1000000000] parameters = [input_features, output_features, no_of_features] return parameters The code also is provided as a figure to serve as a reference for the proper code organization and indentation. The code in the figure includes empty lines to improve legibility. One critical aspect of creating tool parameters has been overlooked so far. Notice how the datatype properties are set to GPFeatureLayer and GPLong. Although it is intuitive what these data types mean (i.e., a feature layer and a long integer, respectively), their names are different from those used in the list of options for the Data Type property when creating a script tool. This disparity can lead to some confusion. When creating a script tool, you can scroll through the list until you find the data type of interest without having to worry about the exact name of the type. When defining the data type property for a tool parameter in a Python toolbox, you must type the correct value for the data type. There are more than 150 different data types. Table 4.1 shows a small sample of some of the most commonly used data types. The first column shows the values that show up in the drop-down list for the data type when creating a script tool, whereas the second column shows the values that should be used when specifying the data type property for a tool parameter in a Python toolbox. Table 4.1. Parameter data types in a Python toolbox Keyword for datatype property Description Address Locator DEAddressLocator A dataset used for geocoding that stores the address attributes, associated indexes, and rules that define the process for translating nonspatial descriptions of places to spatial data. Areal Unit GPArealUnit An areal unit type and value, such as square meter or acre. Boolean GPBoolean A Boolean value. Data type Data type Keyword for datatype property Description A reference framework, such as the UTM system consisting of a set of points, lines, and/or Coordinate GPCoordinateSystem surfaces, and a set of rules System used to define the positions of points in two- and threedimensional space. Dataset DEDatasetType A collection of related data, usually grouped or stored together. Date GPDate A date value. GPDouble Any floating-point number stored as a double precision, 64-bit value. DEFeatureClass A collection of spatial data with the same shape type: point, multipoint, polyline, and polygon. DEFeatureDataset A collection of feature classes that share a common geographic area and the same spatial reference system. GPFeatureLayer A reference to a feature class, including symbology and rendering properties. Double Feature Class Feature Dataset Feature Layer Keyword for datatype property Description Field Field A column in a table that stores the values for a single attribute. File DEFile A file on disk. Folder DEFolder Specifies a location on disk where data is stored. GPLayer A reference to a data source, such as a shape-file, coverage, geodatabase feature class, or raster, including symbology and rendering properties. DELayer A layer file stores a layer definition, including symbology and rendering properties. Data type Layer Layer File Linear Unit GPLinearUnit A linear unit type and value such as meter or feet. Long GPLong An integer number value. Map GPMap An ArcGIS Pro map. Raster Band DERasterBand A layer in a raster dataset. Raster GPRasterDataLayer Data Layer A raster data layer. Data type Keyword for datatype property Description Raster Dataset DERasterDataset A single dataset built from one or more rasters. Raster Layer GPRasterLayer A reference to a raster, including symbology and rendering properties. Shapefile DEShapefile Spatial data in a shapefile format. Spatial GPSpatialReference Reference The coordinate system used to store a spatial dataset, including the spatial domain. SQL GPSQLExpression Expression A syntax for defining and manipulating data from a relational database. String GPString A text value. Table DETable Tabular data. Table View GPTableView A representation of tabular data for viewing and editing purposes, stored in memory or on disk. Text File DETextfile A text file. Value Table GPValueTable A collection of columns of values. Workspace DEWorkspace A container such as a geodatabase or folder. Most of the datatype properties are like the terms in the drop-down list for the data type when creating a script tool, but without the spaces and preceded by the prefix DE (data element) or GP (geoprocessing). The complete list can be found on the ArcGIS Pro help page “Defining Parameter Data Types in a Python Toolbox.” 4.4 Working with source code The steps completed so far created the tool dialog box, but the tool is not ready to run. The code to carry out the task must still be added. This code is referred to as the source code or the main body of the tool—the rest of the code is used to define the tool and its parameters, and to customize tool behavior. Recall that in the case of a Python script tool, the script must be modified using the GetParameterAsText() and GetParameter() functions to receive the values of the parameters entered in the tool dialog box by a user. The source code for a tool inside a Python toolbox is found in the execute() method. This method has arguments to work with parameters and messages: def execute(self, parameters, messages): The argument parameters refers to the list of parameters defined in the getParameterInfo() method. The value of each parameter is obtained from this list using the valueAsText() or value() property of the Parameter object. The following code illustrates a generic example with two parameters: def execute(self, parameters, messages): in_fc = parameters[0].valueAsText out_fc = parameters[1].valueAsText The variable names do not have to be the same as the names assigned to the parameters in the getParameterInfo method. Consider how this works for the example Random Sample tool. Recall the original stand-alone script discussed in chapter 3 that carries out the tasks of interest, as shown in the figure. This entire script represents the source code and is copied in the execute() method. The hard-coded values for the variables must be modified. Specifically, the following lines of code must be modified so the values are obtained using the valueAsText() and value() properties: inputfc = "C:/Random/Data.gdb/points" outputfc = "C:/Random/Data.gdb/random" outcount = 5 The modified code is as follows: inputfc = parameters[0].valueAsText outputfc = parameters[1].valueAsText outcount = parameters[2].value The first two parameter values are received as strings using valueAsText, and the third parameter is received as an integer using value. A few elements from the original script can be placed at the top of the Python toolbox file, including importing modules and setting environment properties. The top of the .pyt file now is as follows: import arcpy import random The rest of the original script, including the modifications already discussed, becomes the source code for the tool in the execute() method. After removing comments and empty lines for display purposes, this part of the code is as follows: def execute(self, parameters, messages): inputfc = parameters[0].valueAsText outputfc = parameters[1].valueAsText outcount = parameters[2].value inlist = [] with arcpy.da.SearchCursor(inputfc, "OID@") as cursor: for row in cursor: id = row[0] inlist.append(id) randomlist = random.sample(inlist, outcount) desc = arcpy.da.Describe(inputfc) fldname = desc["OIDFieldName"] sqlfield = arcpy.AddFieldDelimiters(inputfc, fldname) sqlexp = f"{sqlfield} IN {tuple(randomlist)}" arcpy.Select_analysis(inputfc, outputfc, sqlexp) There is no need for a return at the end of the method because the result is implicit in the output parameter—i.e., the tool saves the results to a new feature class and returns it as a feature layer in the active map. Some tools may need to specify a value to be returned if they do not save anything to disk or return an object for use in the active map. At this point, the tool is complete and ready to use. You can test the tool in ArcGIS Pro to confirm the tool dialog box works as expected and that the outputs are correct. The tool dialog box looks like any other geoprocessing tool. Tool execution creates an entry in the geoprocessing history with the associated messages. The message refers to running a script, which is identical to the messages reported when running a script tool. Keep in mind, however, that the “script” being run is not a separate .py file, but the code inside the execute() method of this tool in the Python toolbox file. From a user perspective, however, the experience of using the tool is the same. The Tool class includes several other methods that have not been used yet, including isLicensed() , updateParameters(), and updateMessages(). All these methods are optional, which means you can leave them in the template unmodified, or you can remove them from the code entirely. Of these three, the updateParameters() method is the most important because it provides additional control over the behavior of the tool parameters and how the parameters interact with each other. Details on how to use this method is explained in more depth in the ArcGIS Pro help topic “Customizing Tool Behavior in a Python Toolbox.” 4.5 Comparing script tools and Python toolboxes Both script tools and Python toolboxes can be used to create custom tools using Python. Which approach you use is largely a matter of preference. It is important, however, to be aware of some of the similarities and differences. First, both approaches result in tools that are integrated into the geoprocessing framework. The tool dialog boxes work just like standard tools, and the tools can be used in scripts and models. Second, in terms of organization, a script tool is part of a custom toolbox (.tbx), and each script tool has an associated Python script file (.py). The design of the tool dialog boxes is accomplished using interface elements of ArcGIS Pro, and this information is stored in the .tbx file. In contrast, a single Python toolbox can contain several tools, but all the code is stored in a single .pyt file. In addition, the design of the tool dialog boxes is coded entirely in Python. Third, both script files and Python toolboxes can be edited using a Python editor, but .pyt files are not recognized by default as Python files, and therefore require custom configuration of your IDE. This difference also impacts debugging procedures because IDEs can only debug .py files. Fourth, the code associated with both script tools and Python toolboxes can be secured using a password. Fifth, both script tools and Python toolboxes are documented in a similar manner. Chapter 5 discusses this topic. Points to remember Python toolboxes provide an alternative to creating a Python script tool. From a user perspective, tools inside a Python toolbox work just like regular geoprocessing tools. The biggest difference from a developer perspective is that a Python toolbox is written entirely in Python. This feature makes Python toolboxes a preferred approach for those with more experience in developing tools. A Python toolbox is a Python file with a .pyt extension, which is recognized automatically by ArcGIS Pro. A single Python toolbox can contain one or more tools. The name of the Python toolbox in ArcGIS Pro is the name of the .pyt file. The basic template provided with a new Python toolbox is a helpful way to start writing your code. In a Python toolbox, the code for all the tools resides in a single .pyt file. Some Python IDEs, including PyCharm, can be configured to recognize a .pyt file as Python code, which assists with writing proper syntax. However, you cannot run a .pyt file from an IDE, and you can test the code only by using the tools inside the Python toolbox from within ArcGIS Pro. The code for a Python toolbox includes a Toolbox class, which defines the characteristics of the toolbox, including a list of tools. The code also contains a Tool class for each tool, with the name of the class corresponding to the specific tool. Each Tool class includes several methods, including __init__() to define the properties of the tool, getParameterInfo() to set up the tool parameters, and execute() to carry out the actual task of the tool. Setting up tool parameters requires careful consideration, especially of the data types for each parameter. Both script tools and Python toolboxes can be used to create custom tools using Python. The approach you use is largely a matter of preference. It is important, however, to be aware of some of the similarities and differences. Key terms implicit line continuation Python toolbox source code Review questions What is a Python toolbox? Describe the step to create a custom tool using a Python toolbox. Which classes are part of the code of a Python toolbox, and what purpose do they serve? Where in the code of a Python toolbox is the code to carry out the actual task of each tool located? What are some of the similarities and differences between script tools and Python toolboxes? Chapter 5 Sharing tools 5.1 Introduction The ArcGIS Pro geoprocessing framework is designed to facilitate the sharing of tools. Custom toolboxes and Python toolboxes can be added to projects and integrated into regular workflows. Custom toolboxes can contain any number of tools, consisting of both model tools and script tools. Script tools can be shared by distributing a toolbox file (.tbx) and the associated Python scripts (.py). However, there are several obstacles to sharing script tools. One of the principal obstacles is that the resources available to the creator of the script likely will be different from those available to the user of the shared script tools. These resources include projects, datasets, scripts, layer files, and any other files used by the tools. Another obstacle is the organization of these resources on a local computer or network. Paths present a persistent problem when sharing tools. Although sharing Python toolboxes is facilitated by the fact that all code can reside in a single .pyt file, some of these same obstacles apply to Python toolboxes as well. This chapter provides guidelines on how to distribute tools, including how to structure the files that are commonly distributed with shared tools. Alternative approaches to sharing tools are also discussed, including the use of geoprocessing packages and web tools. Geoprocessing packages make it relatively easy to share a custom tool by automatically consolidating data, tools, and supporting files into a single file for sharing. 5.2 Choosing a method for distributing tools Tools that are developed to share with others can vary from the simple to the complex. The simplest case is a single custom toolbox file with one or more script tools, or a single Python toolbox file, and no additional files. In a more typical example, a shared tool can consist of a toolbox file with several scripts, or a Python toolbox file, and some documentation. A more complex example contains a toolbox file, several scripts, documentation, layer files for symbology, sample data, and other resources. A recommended folder structure for these files is presented in section 5.4. One of the most common ways to share tools is simply to make all the files available in their original folder structure. This typically involves the use of a file compression utility to create a single ZIP file of the folders and their contents. This ZIP file can then be posted online or emailed. The recipient can download the file and extract the contents to access the individual folders and files. The toolbox is then added to a project to access the tools. There are several other ways to share tools. If users have access to the same local network, the folder containing the tools can be copied to a folder that is accessible to all users. A toolbox can be added directly from the network, and no files need to be copied to the user’s computer. Another option is to publish a tool as a geoprocessing package or web tool, which can then be shared through a local network or ArcGIS Online portal. The method depends largely on the relationship between the creator of the tool and the intended users, as well as the software and the skills of the user. For example, if tools are developed primarily for use by others within the same organization, making tools available on a local network may be the most efficient method. To make tools available to a broad community of users, the use of a ZIP file is likely the most convenient. Several other considerations influence how to share tools, including where the input and output data are located and what licensed products and extensions the tools require. In the ZIP file method, for example, any tool data must be packaged with the tool because a typical user will not have access to the data on the local network. 5.3 Handling licensing issues Tools distributed using the ZIP file method will run on a user’s computer, which may not have the necessary products or licenses to run the tools. Scripts therefore should include logic to check for the necessary product levels (ArcGIS Pro Basic, Standard, or Advanced) and extension licenses (3D Analyst, Spatial Analyst, and others). To facilitate the use of shared tools, the necessary product level and extensions must be described in the tool’s documentation. 5.4 Using a standard folder structure for sharing tools A standard folder structure, such as the example in the figure, is recommended for easy sharing of custom tools. There is no requirement to use this specific structure, but it provides a good starting point. Note: The folder structure is shown using the Catalog view in ArcGIS Pro, which does not show all file types. The root folder (i.e., Tools, in this example) contains one or more custom toolboxes (.tbx files), which may include model tools and script tools, or one or more Python toolboxes. Custom toolboxes also can reside inside a geodatabase, but a .tbx file directly under the Tools folder is easier to find. Script tools should have the “Store tool with relative path” option checked. The next section covers working with paths. The Data folder contains sample datasets that a user can work with to learn about the functionality of tools before trying the tools out on their own data. The tools also may require certain data as part of tool execution, such as lookup tables, and these are also included in this folder. Many model tools and script tools use a workspace, and a default file geodatabase for scratch data (scratch.gdb) can be provided in the Data folder or in a separate Scratch folder. Distributing an ArcGIS Pro document (.aprx file) is optional but may be helpful if example datasets are part of the shared tool. The Doc folder is used for tool documentation, which should clearly state which product level and extensions are required for the tools to run. A README file (readme.txt) often is included in the root folder to explain how the tool works, typically including special instructions on how the tool must be installed, contact information for the tool’s creators, version number, and the like. A more detailed user manual can be provided in the Doc folder as a Microsoft Word or PDF file. Experienced Python coders are likely to open the actual scripts and learn from both the comments and the code in the scripts. Many other users, however, may never look at the scripts and instead use only the tool dialog boxes. Good documentation ensures that users get the most out of a tool and understand what it will accomplish, as well as its limitations, without having to open the actual scripts. The Layers folder contains separate layer files (.lyr or .lyrx) to assign symbology to outputs for use in a project in ArcGIS Pro. Users can be instructed to apply this symbology themselves, or its use can be coded into the scripts. The Scripts folder contains the Python scripts used in the script tools. For relatively simple tools with only one or more Python scripts, a separate Scripts folder may not be necessary, and the .py files are placed directly in the root folder. Scripts also can be embedded in a toolbox, in which case there are no separate script files. This is not common, because often the purpose of sharing the tools is for users to use and learn from the scripts and contribute to their continued improvement. Other related files may include script libraries, dynamic-link libraries (DLL), text files, XML files, images, and executable files, such as .exe and .bat (batch) files. The structure discussed here is only one of many possible structures. A few examples of published tools are used in this section to show some of the typical structures used by tool authors. All the examples are downloaded from www.arcgis.com. You can obtain the actual tools by searching for the tools by name. The Create Points on Lines tool by Ian Broad, a GIS analyst who runs the Thinking Spatially blog (http://ianbroad.com/), provides a basic file structure. All files are provided in the root folder and consist of a toolbox file (.tbx), a Python script file (.py), and a readme.txt file with the tool documentation. The Distributive Flow Lines tool by Esri’s Applications Prototype Lab consists of a Python toolbox with a single tool, a file geodatabase with sample data in the root folder, a readme.txt file with a basic description of the tool, and links to online resources to explain what the tool does. Several support files are provided in a separate Index folder. The Terrain Mapping tools, discussed in chapter 1, represent a more complex file structure because of the many different files used by these tools. A single custom toolbox file (.tbx) in the root folder consists of 14 different script tools. A readme.txt file in the root folder provides a basic description of the tools, including a reference to the documentation and sample datasets. A comprehensive user manual is provided as a PDF file in the Doc folder. The Samples folder includes datasets to practice the use of the tools, including map documents (.mxd) to get started with the samples. These .mxd files can be imported into ArcGIS Pro, and the tools are designed to work with ArcGIS Desktop 10.x and ArcGIS Pro. The Python scripts associated with each of the tools reside in the Scripts folder. To assist in creating the symbology for the outputs, a set of custom color ramps and layer files are also provided in separate folders. Additional support files reside in the SkyLuminance folder. These examples illustrate that published tools employ a wide range of folder and file structures. There is no single required structure, and the most appropriate file structure depends largely on the nature of the files that must accompany a specific tool. As a rule, the necessary files should be easy to locate, which can be accomplished using meaningful names for files and folders. 5.5 Working with paths Paths are an integral part of working with data and tools. When tools are shared, paths become particularly important, because without proper documentation of where files are located, the tools will not run. If you have worked with ArcGIS Pro to create projects or tools, you are probably familiar with absolute and relative paths. Absolute paths are also referred to as “full paths.” They start with a drive letter, followed by a colon, and then the folder and file name—for example, C:\Data\streams.shp. Relative paths refer to a location that is relative to a current folder. Consider the following example with two shapefiles located in the C:\AllData\Shapefiles\Final folder: boundary.shp and locations.shp. Relative to each other, there is no need to know the path other than the file names. Now consider an example in which you want to run a tool that uses the shapefiles locations.shp and floodzone.shp. These files are in two different folders, and therefore their relative paths are Final\locations.shp and Project\floodzone.shp. The higher-level folders—that is, AllData\Shapefiles—are not needed to locate one file relative to the other. Description The use of relative paths makes it possible to move or rename folders. For example, if the AllData folder was renamed Data, all relative paths remain intact. Similarly, if the drive was modified from C:\ to E:\, all relative paths also remain intact. One limitation of relative paths is that they cannot span multiple disk drives. If some files are located on the C drive and some on the E drive, only absolute paths preserve the correct locations of all files. Both absolute paths and relative paths can be used in model tools and script tools, but in general, shared tools should rely on relative paths. Relative paths for models are enabled on the model properties dialog box. Description For script tools, relative paths are enabled on the script tool properties dialog box. Description Relative paths for model tools and script tools are relative to the current folder in which the toolbox file is located. When relative paths are enabled, it applies to the script files, datasets used for default values for parameters, files referenced in the tool documentation, layer files used for the symbology properties, and style sheets. It is important to recognize that paths within the script are not converted because ArcGIS Pro does not examine and modify the code. Therefore, if a script uses absolute paths, they are not converted to relative paths when relative paths are enabled for the script tool using the settings in the script tool properties. Note: In general, Python code must be written so that files can be found relative to a known location, which typically is the location of the script itself. After this review of working with paths, it is worthwhile to revisit relative paths in the context of sharing tools. For the purpose of this discussion, the same example folder structure discussed earlier in this chapter, as shown in the figure, is used. To share tools, relative paths must be enabled in the script tool properties. In this example, the script tool will reference a script in the Scripts folder. It also may reference tool documentation in the Doc folder. The script itself may reference data in the Data folder. These references will continue to be valid when the script tool is shared with another user if the standard folder structure is maintained. If the toolbox file (Toolbox.tbx) containing the script tool was moved to a different location separate from the other folders and files, the script files called by the script tool would not be found and the script will not work. The tool dialog box will open, but upon tool execution, the following error message will appear: ERROR 000576: Script associated with this tool does not exist. Failed to execute (<toolname>). Therefore, for a script tool to work correctly, the folder structure must be maintained. 5.6 Finding data and workspaces In general, it is best to avoid hard-coded paths in your script if it is going to be shared with others as a script tool or Python toolbox. Instead, the paths are derived from the parameters on the tool dialog box, and these paths are passed to the script. The script reads these parameters using the GetParameterAsText() and GetParameter() functions in the case of a script tool. Sometimes, however, it is necessary to use hard-coded paths to the location of a file. For example, an existing layer file may be necessary to set the symbology for an output parameter. Or a tool may require the use of a lookup table. Depending on the nature of the information, it already may be incorporated into the script (for example, a lookup table can be coded as a Python dictionary), but this may not always be possible. Therefore, some files may be necessary for the tool to run, even though they are not provided as parameters for a user to specify. Instead, these files are provided by the author of the script and distributed as part of the shared tool. Following the suggested folder structure presented earlier, these files can be placed in the Data folder, making it possible for the data files to be found relative to the location of the script. If the necessary files consist of layer files for the purpose of symbology, a separate Layer folder can be used as well. The exact location of the files is not important—what is important is that they can be located from within the script. The path of the script can be found using the following code: scriptpath = sys.path[0] or: scriptpath = os.getcwd() Running this code results in a string with the complete path of the script but without the name of the script itself. If the files necessary for the script to run are in the Data folder, per the suggested folder structure, the Python module os.path can be used to create a path to the data. The folder structure used thus far can serve as an example. The Tools folder contains the shared tool, including the toolbox in the root folder, the script in the Scripts folder, and the data files in the Data folder. Relative paths are enabled for the script tool, so the Tools folder can be moved, or even renamed, and the script tool will still work. In order to run, the script needs a geodatabase table called “lookup,” located in a file geodatabase called TestData.gdb in the Data folder. The name of the table and the geodatabase can be hard-coded into the script because the author of the script is also the author of the table and the creator of the Data folder. However, the absolute path should not be hard-coded into the script, but the relative path should be used instead: Data\TestData.gdb\lookup. This will make it possible for the Tools folder to be moved to any location without the user of the script tool being limited to the absolute path originally used by the author of the script. The code that references the lookup table in the script is as follows: import arcpy import os scriptpath = os.getcwd() toolpath = os.path.dirname(scriptpath) tooldatapath = os.path.join(toolpath, "Data") datapath = os.path.join(tooldatapath, "TestData.gdb/lookup") Notice that three elements are hard-coded into the script: the actual file name of the geodatabase table, the file geodatabase, and the folder in which the data is located. These elements are created by the author of the tool and therefore can be hard-coded into the script because they do not depend on user input. Some tools may require the use of a scratch workspace to write intermediate data. Although it is possible to set a specific scratch workspace in a script, such a workspace is unreliable because a script tool should not include its own geodatabase to write results. A robust solution is to use the scratch GDB environment setting, which points to the location of a file geodatabase. This location can be accessed using the scratchGDB property of the arcpy.env class. This property is read-only, and the primary purpose of this environment setting is for use in scripts and models. The use of the scratch GDB is reliable because this geodatabase is guaranteed to exist when a script tool is run. A user can specify a scratch workspace in ArcGIS Pro, but if no scratch workspace is set, the scratch GDB defaults to the current user’s folder. You can check the location of the scratch GDB by running the following code in the interactive window of your Python IDE: >>> import arcpy >>> print(arcpy.env.scratchGDB) The result looks as follows: C:\Users\<username>\AppData\Local\Temp\scratch.gdb The same code in the Python window brings up the default scratch GDB associated with the current project and looks as follows: C:\Users\<username>\AppData\Local\Temp\ArcGISProTemp……… \scratch.gdb Regardless of the location, this file geodatabase is guaranteed to exist. The location and name of this geodatabase should not be hard-coded in a script but can be obtained using arcpy.env .scratchGDB if necessary. Writing output to the scratch GDB makes your script portable because you don’t need to validate whether it exists at runtime. Another scenario in which hard-coded paths are commonly employed is the use of layer files to symbolize output. Consider the Terrain Tools referenced earlier in this chapter. The file structure is as shown in the figure. The original tool includes 14 script tools and several dozen layer files, but only one script tool and one layer file are shown here for illustration. The script IllumContours.py resides in the Scripts folder, while the layer file referenced in the script resides in the LayerFiles folder. In the script, the reference to the layer file is set as follows: # set the symbology scriptPath = sys.path[0] one_folder_up = os.path.dirname(scriptPath) toolLayerPath = os.path.join(one_folder_up, "LayerFiles") lyrFile = os.path.join(toolLayerPath, "Illuminated Contours.lyr") Layer files are one of the most common reasons for using hard-coded folder and file names in a script. It works well if the author of the script tool has carefully considered the use of relative paths, and the user of the script tool does not change the folder structure and the names of folders and files. Although the previous examples used script tools, the same approach can be employed in Python toolboxes. 5.7 Embedding scripts, password-protecting tools The most common way to share script tools is to reference the Python script file in the script tool properties and provide the script file separately, typically in the root folder or in a separate Scripts subfolder. Providing the script files allows users to clearly see which scripts are being used, and the scripts can be opened to view the code. Similarly, by sharing a Python toolbox, a user can open the .pyt file to view the code. Scripts also can be embedded in a custom toolbox. The code is then contained within the toolbox, and a separate script file is no longer needed. This approach can make it easier to manage and share tools. To import a script to be embedded, right-click the script tool, and click Properties. In the tool properties dialog box, click on the General tab, and locate the Options section in the lower part of the panel. When you check the Import script option, the .py file becomes embedded in the toolbox. Once a script is imported into a tool, the toolbox can be shared without including the script file. In other words, just sharing the .tbx file is enough, and no separate .py files need to be provided for the script tool to run. When a script is imported, however, the original script file is not deleted—it is simply copied and embedded in the toolbox. Embedding scripts does not mean they can no longer be viewed or edited. Say, for example, you imported a script and shared a toolbox with another user. The recipient can go into the tool properties and uncheck the Import script option to obtain a copy of the original script. The recipient also can right-click on the script tool and click Edit, and a temporary script file opens in the default editor. Both these options make it possible to view and edit the script. Although embedding scripts is a useful way to reduce the number of files to manage and share, it can lead to some confusion. For example, some script tools use multiple scripts—e.g., a script that is referenced by the script tool and additional scripts that are called by the first script. Embedding multiple scripts can be confusing to users because it becomes less transparent how the scripts work. Regular script files cannot be password protected. If you share your tools including individual .py files, any user can open these scripts with a Python editor or a text editor. Users can modify the code or copy it for use in their own scripts. This feature is, in fact, one of the reasons why working with Python is so appealing. Sometimes, however, there may be a need to hide the contents of a script, such as login credentials and other sensitive information. If you need password protection for your script files, you can embed the script first, and then check the Set password option. Setting a password does not affect execution of the script tool, but any attempt to uncheck the Import script option prompts the use of a password. Because a Python toolbox consists of a single .pyt file, there is no need to embed separate script files to reduce the number of files. To prevent a user from viewing the contents of a Python toolbox, you can encrypt the file. From within the Catalog pane of ArcGIS Pro, right-click on the Python toolbox, and click Encrypt. Clicking Encrypt brings up the Set Password dialog box. As per the warning message, setting a password overwrites the existing unencrypted file so you should make a backup copy of the Python toolbox first. Because a Python toolbox consists of a single file, you cannot set a password for individual tools as you can for script tools, and you can encrypt only the Python toolbox. On the other hand, you cannot encrypt a custom toolbox, only individual script tools inside the toolbox. To decrypt an encrypted Python toolbox, right-click on the Python toolbox, and click Decrypt. Encrypting and decrypting also can be accomplished using the EncryptPYT() and DecryptPYT() functions of ArcPy, respectively. 5.8 Documenting tools Good documentation is important when sharing tools. Documentation includes background information on how the tool was developed as well as specifics on how the tool works. Documentation also can explain specific concepts, which may be new to other users. Many coders provide detailed comments inside the script itself about how a Python script works. Although this is good practice, keep in mind that users of shared tools may not have experience with Python. One of the benefits of developing tools in Python is that a finished tool looks and feels the same as any other geoprocessing tool. Therefore, you should not rely on only comments inside your script to explain the use of the tool. Tool documentation is created using the same metadata creation tools used for datasets and other items. You can create documentation for both the toolbox and individual tools, which applies to both custom tools with script tools and Python toolboxes. To edit the metadata for a toolbox or tool, right-click it, and click Edit Metadata, which brings up the metadata for the toolbox or tool. You can upload a thumbnail image, enter tags, provide a summary, and enter several other pieces of descriptive information. Click the Save button at the top of the screen to save your edits. The options for documenting toolboxes and tools are similar, with several important distinctions. The metadata for a tool includes sections for syntax and code samples. The syntax portion is particularly relevant, because it provides an opportunity to make the tool dialog box more informative. Consider the example of the Random Sample tool. A dialog explanation can be entered as shown in the figure. Description When the metadata is saved, it becomes part of the tool’s documentation and is used as part of the tool dialog box. When the tool dialog box is open, hovering over the blue info icon to the left of a tool parameter brings up the dialog box explanation. Description Many users may not take the time to review detailed documentation in separate files or in the source code, but they may appreciate getting dialog explanation directly from the tool dialog box. For the most part, documentation for script tools and Python toolboxes works the same. There is one important difference, related to where the metadata is stored. For a custom toolbox with one or more script tools, the metadata is stored as part of the .tbx file. Storing metadata there means no additional files are created when you document the toolbox and/or tool by editing the metadata. A Python toolbox, on the other hand, consists of a text format with the .pyt file extension, and this format does not allow for saving metadata. Instead, metadata is stored in separate XML files with the same names as the Python toolbox and/or the individual tools. Consider the following example of a Python toolbox with a single tool. Editing the documentation as part of the metadata results in one XML file for the Python toolbox called <ToolboxName>.pyt.xml and one XML file for each tool called <ToolboxName>.<ToolName>.pyt.xml. These files are created automatically when you start editing the metadata. Note: XML files do not show up in the Catalog pane in ArcGIS Pro, but you can check the file names using File Explorer. This approach to storing metadata can lead to long file names. It also can lead to some confusion. For example, in File Explorer in the Windows operating system, you can choose to show file extensions or not. When file extensions are not shown, the XML files look like .pyt files, even though the default file associations recognize the file type correctly. Therefore, it is recommended to show the file extensions. In File Explorer, you can check the option for File name extensions under the View tab. When sharing a Python toolbox with documentation, the XML files must be shared as well. If the XML files are left out, the Python toolbox will continue to work as before, but there will be no documentation as part of the metadata. There are other ways to provide documentation as well, within the script itself or on disk, as follows: By commenting code. Good scripts contain detailed comments, which explain how a script works. Not all users of a script tool may look at the code, but for those who do, comments can be informative. Comments are located inside the actual script files. Through separate documentation located on disk—for example, in the Doc folder. Documentation files can be provided as Microsoft Word, PDF, or other file types. This documentation typically includes a more detailed explanation of the tools and any relevant background concepts. Creating tool documentation takes extra effort but contributes to a userfriendly tool that others will benefit from. 5.9 Example tool: Terrain Tools This section looks at an example to review the organization of the files that are part of shared tools, as well as the documentation. The example, Terrain Tools, was introduced in chapter 1 and referenced earlier in this chapter. The Terrain tools are a collection of script tools that provide improved methods for representing terrain and surfaces in ArcGIS. The tools are distributed as a custom toolbox file. They can be used in ArcGIS Desktop 10.3 and higher and imported into ArcGIS Pro 1.0 and higher. The tools can be downloaded as a ZIP file from www.arcgis.com by searching for Terrain Tools Sample. Once extracted, the organization of the files closely follows the suggested folder structure. There is a single .tbx file, which contains 14 script tools. The associated Python script files reside in a Scripts folder. The Samples folder contains datasets for testing the tools, as well as examples of the outputs of each tool, and map document (.mxd) files to facilitate viewing these examples. Map documents can be imported into ArcGIS Pro. Most of the tools add the outputs to the current map, and the LayerFiles folder contains layer files (.lyr) to give these feature layers or raster layers symbology. Additional color ramps are provided in a separate folder. A detailed manual is provided as a PDF file in the Doc folder. The manual includes background on the tools, an explanation of the files provided with the tools, and a detailed explanation of each tool. Tool documentation also is provided with the metadata for each tool. For example, the Illuminated Contours tool includes several tool parameters, and clicking on the info icon for each parameter brings up a short description. Clicking on the help icon in the upper-right corner of the tool dialog box brings up a detailed description of the tool, which replicates the information in the PDF document. Finally, the script itself contains documentation in the form of comments. The Terrain Tools are relatively sophisticated. Of the 14 tools, each associated script is typically around 100 lines of code. Some of the underlying algorithms are also advanced and rely on techniques published by a community of researchers and cartographers. The user manual is 66 pages long. Despite this level of sophistication, the consistent documentation and examples make the tools easy to use. In addition, because all the original source code is provided, you can learn from the code and modify it for your own purposes. 5.10 Creating a geoprocessing package The approach for distributing shared tools as described so far is robust but also cumbersome. It typically requires that you manually consolidate data, tools, and supporting files into a single folder. As an alternative, ArcGIS Pro uses geoprocessing packages, which are a more convenient way to distribute all the tools and files related to geoprocessing workflows. This section describes what a geoprocessing package is and how to create it. A geoprocessing package is a single compressed file with a .gpkx extension. This single file contains all the files necessary to run a geoprocessing workflow, including custom tools, input datasets, and other supporting files. This file can be posted online, emailed, or shared through a local network. Although this file sounds identical to the use of a ZIP file, as described earlier in this chapter, geoprocessing packages are created differently and have additional functionality. A geoprocessing package is created from one or more entries in the geoprocessing history, which have been created by successfully running one or more tools, including custom tools. A basic workflow to create and share a geoprocessing package is as follows: 1. Add data and custom tools to a project in ArcGIS Pro. 2. Create a geoprocessing workflow by running one or more tools. 3. In the History pane, select one or more entries, right-click the selection, and click Share As > Geoprocessing Package. 4. Create a .gpkx file by completing the entries in the Geoprocessing Package pane, which includes several options to configure how the package is created and shared. 5. Share the resulting .gpkx file. An alternative to step 3 is to use the Share tab in ArcGIS Pro. Click Share > Geoprocessing, and choose the tool(s) of interest. This step also brings up the Geoprocessing Package pane. Note: The Geoprocessing Package interface element is called a “pane,” just like the Catalog pane. It is not a geoprocessing tool. When creating a geoprocessing package, you not only are sharing the tool(s), you also are sharing how each tool was run. These settings include the following: The parameter settings of each tool The input and output data used by each tool The environment settings in effect when running the tool Any additional files you choose to add to the package The Geoprocessing Package pane includes several details, which require some careful consideration. For example, you can create a local .gpkx file, or you can upload the package directly to your ArcGIS Online portal. You also must provide a basic description of your package and tags when sharing the package. Description Checks also are performed to ensure individual tools have a minimum level of documentation, including a description. In the Geoprocessing Package pane, you can click the Analyze button to identify potential issues with the package before it is created. The analysis of the package brings up a new tab called Messages. This tab includes any warning or error messages. The analysis checks, among other things, whether tool parameters have a description. If no documentation is created, it is reported as error messages. Consider the result when using the same script tool without any documentation, as shown in the figure. Each tool parameter requires a minimum description. You cannot create a geoprocessing package unless all parameters in all tools that are part of the package have an item description. This requirement applies only to model tools and script tools because all standard tools have this description already. Once descriptions are filled in and any other errors are addressed, you can create the geoprocessing package. The Tools tab on the Geoprocessing Package pane allows you to add additional tools from the geoprocessing history, whereas the Attachments tab allows you to add other files, such as documentation in PDF format or additional datasets that were not part of the tool execution. The Geoprocessing Package pane has the look and feel of a tool, but it is not a tool, and you will not find it when you search for it in the Geoprocessing pane. As an alternative to using the Geoprocessing Package pane to create a geoprocessing package, you can use the tools in the Package Toolset in the Data Management Tools toolbox. The tools there give you a finer degree of control over how geoprocessing packages are created and how tools are shared. The Package Result tool allows you to create a geoprocessing package on the basis of entries in the geoprocessing history and includes several advanced options not available in the Geoprocessing Package pane. On the other hand, the Geoprocessing Package pane includes capabilities to analyze the tool(s) before sharing them as a geoprocessing package. The Package Result tool creates only a local .gpkx file. To share the package to ArcGIS Online, you can use the Share Package tool. Note that both the Geoprocessing Package pane and the Share Package tool automatically recognize your ArcGIS Online credentials if you are using a Named User license for ArcGIS Pro. A .gpkx file also can be shared using email, FTP, a USB drive, or other filesharing mechanisms. A recipient of the geoprocessing package can open the contents in ArcGIS Pro to examine the datasets and workflows used. Once copied to a local folder, a .gpkx file shows up on the Project tab of the Catalog pane. A single .gpkx file contains all the resources needed to run the geoprocessing workflow again, including tools, parameter settings, datasets, and other files. Tools can include system tools as well as custom tools. Therefore, if a geoprocessing result was created using a script tool, the toolbox in which the script tool resides and the underlying .py files necessary for the tool to run are all included in the geoprocessing package. Other supporting files also can be part of a .gpkx file, including documentation, text files, images, and so on. To use the tools and data inside a geoprocessing package, right-click on the .gpkx file in the Catalog pane, and click Add To Project. Data layers are added to the open map, and the executed workflow is added to the geoprocessing history. Adding the geoprocessing package makes it look as if the workflow ran on your computer in the current ArcGIS Pro session, even though the geoprocessing package was created on someone else’s computer. When a geoprocessing package is added to a project, the actual files are extracted to a local folder: C:\Users\<Your Name>\Documents\ArcGIS\Packages. A new folder is created on the basis of the name of the package, and this folder contains all the original files, plus several additional ones created as part of the package. If you download a geoprocessing package and are looking for the underlying source code (in the .py or .pyt files), you can locate the files in this folder. An alternative to adding a geoprocessing package to a project is to use the Extract Package geoprocessing tool, which allows you to extract the contents of a geoprocessing package to a folder of your choice without adding it to a project. The single greatest benefit of using geoprocessing packages is that all the necessary resources are automatically combined in a single file, no matter where they are located. There is no need to manually consolidate all the resources into a single folder as required by the more traditional approach using ZIP files. Although the examples in this section used a custom toolbox with a script tool, the same steps can be used to create a geoprocessing package for a tool in a Python toolbox. In addition, if a custom toolbox or Python toolbox contains more than one tool, the entire toolbox becomes part of the geoprocessing package, even if only one of the tools was executed. This is because tools do not exist separately from their Python toolbox. When adding a geoprocessing package to a project, however, only the tools that were executed before creating the geoprocessing package are added to the geoprocessing history of the project. 5.11 Creating a web tool An alternative to creating a geoprocessing package is to share a tool as a web tool. This approach allows you to share a tool that can be accessed in several different applications through your organization’s ArcGIS Enterprise portal. Creating a web tool is similar to creating a geoprocessing package because it also relies on entries in the geoprocessing history. In other words, you run one or more tools in ArcGIS Pro, and then you create a web tool that replicates this workflow. After you successfully run one or more tools, navigate to the History pane, right-click a tool, and click Share As > Web Tool. You can also use the Share tab. Click Share > Web Tool, and choose the tool(s) of interest. The Share As Web Tool pane opens. This pane is not a regular geoprocessing tool, but a pane, like the Geoprocessing Package pane. There are many details to consider here, and they require careful attention. These details include several options under the General tab and the Configuration tab. The Content tab allows you to add additional tools. Once you provide the minimally required information, you can click the Analyze button to perform several checks, like the steps when creating a geoprocessing package. Sharing web tools through your organization’s ArcGIS Enterprise portal requires administrative or web tool publisher permissions. Typical users may not be assigned these permissions. Because the tools recognize login credentials for Named User licenses, users without permission may see the error message as shown in the figure. This issue cannot be addressed from within ArcGIS Pro but requires a change to your permissions through an administrator of your ArcGIS Online account. Once a tool is shared as a web tool, it can be used by any user connected to the ArcGIS Enterprise portal. A web tool runs on an ArcGIS server as a geoprocessing service. Web tools can be used in a variety of ways, including custom web apps built using JavaScript API or Web AppBuilder in ArcGIS Enterprise. Web tools can be used in Python scripts by referencing the URL of the web tool using ArcPy’s ImportToolbox() function. Web tools also can be used within ArcGIS Pro. In the Catalog pane, click on the Portal tab to search for the web tool of interest. You also can connect directly to the URL of the web tool by clicking Insert > Connections > New ArcGIS Server. A detailed discussion of web tools is beyond the scope of this book. You can find details on how to author, publish, and use web tools on the ArcGIS Pro help page “Share Analysis with Web Tools.” Points to remember The ArcGIS Pro geoprocessing framework is designed to facilitate the sharing of tools. Script tools and Python toolboxes can be added to a project and integrated into regular workflows. Toolboxes can contain any number of tools, consisting of both model tools and script tools. Script tools can be shared by distributing a toolbox file (.tbx) and the accompanying Python scripts (.py), together with any other resources needed to run the tools. Python toolboxes consist of a single .pyt file and can be shared by distributing this .pyt file, plus any other resources. To ensure custom tools work properly, the resources needed to run the tools should be made available in a standard folder structure. This structure includes folders for scripts, data, and documentation. There is no single required structure, but the organization of files is strongly influenced by the complexity of the tool(s), as well as the number and type of files being distributed with the tools. Absolute paths work only when files are not moved and when folders are not renamed. To share tools, relative paths should be enabled for each script tool. Relative paths are relative to the current folder, which, for script tools, is where the toolbox is located. Relative paths cannot span multiple disk drives. Custom tools can be documented by editing the metadata of the toolbox and/or tool(s). Metadata includes a basic description, a summary of what each tool does, and specific explanations of tool parameters. Separate documentation also can be provided in the form of a manual or user guide in Microsoft Word or PDF format. Geoprocessing packages and web tools provide an alternative way to distribute custom tools. A geoprocessing package is a single, compressed file with a .gpkx extension that contains all the files necessary to run a geoprocessing workflow, including custom tools, input datasets, and other supporting files. A web tool is like a geoprocessing package, but the tool is being shared to a portal as a geoprocessing service. Both geoprocessing packages and web tools are created by first running a custom tool in ArcGIS Pro, and then sharing the entries in the geoprocessing history. Key terms absolute path embedded encrypt geoprocessing package relative path root folder scratch workspace web tool Review questions What are some of the differences between script tools and Python toolboxes in terms of how they are shared? Describe a typical folder structure used for sharing custom tools and the resources needed to run the tools. Describe some of the ways in which tool documentation can be created and where this information is stored. What is a geoprocessing package, and what are the benefits of sharing a geoprocessing package compared with sharing a custom tool? What are the steps to create a web tool from a custom tool? Chapter 6 Managing Python packages and environments 6.1 Introduction One of the strengths of Python is that, in addition to the standard library of built-in modules, a large collection of third-party packages exist to expand its functionality. A large user community develops and supports these packages. ArcGIS Pro is installed with many of the most important packages that are used in GIS and spatial data analysis workflows. To install, maintain, and keep track of these packages, ArcGIS Pro uses a package manager called “conda.” In addition to managing Python packages, conda also manages different Python environments, which allows you to manage different collections of packages for different projects. This chapter explains the use of packages and how conda is used in ArcGIS Pro to manage environments and packages. 6.2 Modules, packages, and libraries The core Python installation comes with many modules that are referred to as “built-in modules.” There are about 200 of these built-in modules, and they are available to you regardless of how Python is installed. Earlier chapters used several of these modules, including math, os, random, and time. A complete list of all the built-in modules can be found in the Python documentation in the section called “Python Module Index.” These modules significantly add to the functionality of Python, reflecting the “batteries included” philosophy of Python. In addition to the built-in modules, Python functionality can be expanded by using third-party libraries, more correctly referred to as packages. A bit of clarification of the terminology is in order. A module in Python consists of a single file with a .py extension. The name of the module is the same as the name of the file without the .py extension. A package in Python is a collection of modules under a common folder. The folder is created by placing all the modules in a directory with a special __init__.py file. When you import a module, you use the syntax import <module>. is used when importing a package—for example, import The same syntax arcpy. ArcPy is a package and consists of many modules and other elements, but when you want to bring the functionality of ArcPy into a script, you treat it like a module. Therefore, when you are importing modules into your script, you are referring to both modules and packages using the import statement. So, the terms module and package are often used interchangeably, but they are not the same in terms of how their code is organized. In addition to the terms module and package, you also will see the term library. In some programming languages, the term library has a specific meaning, but this is not the case in Python. When used in the context of Python, a library loosely refers to a collection of modules. The term standard library in Python refers to the syntax and semantics of the Python language, which comes bundled with the core Python installation, including the built-in modules. The term third-party library is used to refer to components that can be added to Python beyond what is available in the standard library. These components are typically in the form of packages, and this term will be used here instead of the term libraries, which is much looser in its meaning. 6.3 Python distributions When you go to the Python home page to download and install the software, you are getting Python, including the standard library. This installation, however, is only one of many different Python distributions. A Python distribution refers not only to the version of the software (e.g., 3.6.9), but also to the operating system that it is intended for (e.g., Windows, Linux, Mac) and the packages that may have been added. Different distributions target different audiences and usage scenarios. ArcGIS Pro includes a custom distribution that installs the Python version that works with ArcGIS Pro (e.g., version 3.6.9 for ArcGIS Pro 2.5) as well as the relevant packages, including ArcPy. This is referred to as the “ArcGIS Pro Python distribution.” This distribution includes a package manager called conda, which the next section explains in more detail. Conda is tightly coupled with the Anaconda distribution of Python. Anaconda is a private company (formerly called Continuum Analytics) that specializes in developing distributions for Python and R with a focus on data science applications. Therefore, when you launch your Python IDE, the first line of code reads: Python 3.6.9 |Anaconda, Inc.| The Anaconda distribution is a large distribution with over 1,500 packages for Python and R. The ArcGIS Pro Python distribution does not use the Anaconda distribution, but because it uses conda, the Anaconda name appears. Note: Although Anaconda is a private company, its focus is on the development of open-source software. The Anaconda distribution is free, and conda is open source. Anaconda provides additional services, support, and training for a fee, but there is no requirement to use any of these services to use conda or the Anaconda distribution. 6.4 Python package manager Considering the many different packages that are available to Python, it is important to be able to manage them effectively. Managing means being able to add a package, update a package, remove a package, and check which packages are installed. These tasks are carried out by a Python package manager. Python has a built-in package manager called PIP, which is available as a module called pip. PIP is part of the core Python distribution and can be used to perform package management tasks from command line. For example, to install a package, you use the following command: pip install <name of package> PIP is widely used by Python developers, but it can be cumbersome. One of the challenges is the sheer volume of packages available. The Python Package Index, or PyPI, is an online resource that contains tens of thousands of different packages. PIP is designed to find the packages you are looking for in PyPI and install them. However, sorting through which packages you need, managing different versions of these packages, and keeping track of which packages are installed can become challenging. In addition, PIP primarily is intended to handle pure Python packages. A pure Python package contains only Python code and does not include compiled extensions or code in other languages. Many Python packages include compiled extensions, which PIP does not handle well. As an alternative, the Anaconda distribution comes with its own package manager called conda, which is the preferred way to manage packages for ArcGIS Pro. Not only is conda used as a package manager, it also is used to manage Python environments. The next section explains what Python environments are, followed by more details on the use of conda. Although PIP is an excellent package manager, conda is the recommended package manager for ArcGIS Pro. This recommendation comes, in part, because conda can be used to manage both environments and packages. However, PIP still has some value to managing packages that is not available through conda. Conda has a few benefits, including the fact that it is part of the Anaconda distribution, which is widely used in the Python community. It is also the preferred package manager for ArcGIS Pro. Esri has created a user-friendly interface to conda that is integrated in ArcGIS Pro software. 6.5 Python environments The ability to add packages to the core Python installation makes it possible to accomplish more sophisticated tasks with your scripts. It also introduces additional complexity because different projects or tasks may require different packages. Many projects may require no additional packages at all, whereas others may require a substantial number. Different projects may require different versions of the same package. One specific package may require another package for it to run. In other words, each of your scripts may require a different set of packages to run successfully, and these requirements can include packages beyond the ones you import directly into your scripts. The package requirements for a specific project are referred to as dependencies. To manage these dependencies, Python uses so-called virtual environments. A virtual environment, in this context, is a unique installation of Python and any packages that have been added. Instead of installing a different version of Python with a specific set of packages on a different computer, you can create many virtual environments on the same computer. The environments are called “virtual” because they replicate what a different installation on a different computer would look like but, in fact, reside on a single computer. Instead of virtual environments, these isolated configurations often are referred to as Python environments or simply environments. Because ArcGIS Pro uses conda as the package manager, you also will see the term “conda environments.” Note that Python environments are not the same as geoprocessing environments in ArcGIS Pro. These two types are completely unrelated, even though they use the same term. Note: Just to revisit the difference between a Python distribution and a virtual environment, a distribution is the version of Python that is installed on a computer and all the packages that it comes with. A typical user often needs only a single distribution. A virtual environment controls which packages are available at runtime, which is usually a small subset of all the packages that are installed. A typical user must have at least one virtual environment but often will use several different ones, and these can be switched relatively easily, using the same distribution. ArcGIS Pro has a default environment. When you first install ArcGIS Pro and start writing scripts in Python, you are using the default environment. The default environment is called arcgispro-py3. This default environment logically includes the Python standard library, but it also includes many other commonly used packages. You can view the default environment by going into the Python Package Manager. In ArcGIS Pro, click on the Project tab, and then click Python, which brings up the Python Package Manager, also referred to as “Python backstage.” The Python Package Manager provides a user interface to conda, even though there is no direct reference to conda here. This interface was developed as part of ArcGIS Pro to make it easier to use conda. Section 6.8 explores the use of command line to use conda as an alternative to the Python Package Manager. The project environment chosen by default is called arcgispro-py3. This name is a reference to the folder in which the files for this environment are installed, and it is typically located here: C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3. As an aside, the files for ArcPy are in a slightly different location: C:\Program Files\ArcGIS\Pro\Resources\ArcPy. The Python Package Manager shows which packages are installed with the default environment. The default environment includes all the packages needed to support Python-related functionality in ArcGIS Pro, as well as some other packages to support typical GIS workflows. You will not see ArcPy on this list. Because you are using ArcGIS Pro, ArcPy is installed by default, and this installation cannot be modified. You will also not see any of the modules from the standard library, such as math or os. What you do see on the list are widely used third-party packages such as NumPy and SciPy. You can scroll through the list of packages, and click on an entry to read a description. The side panel includes the version number of the package, as well as a link to the home page of the package. Notice that the Uninstall button is dimmed because you cannot remove the package from the default environment. The list of packages can be broken down into several major categories, including the following: Jupyter Notebooks and tools necessary to support them (IPython, Jupyter console, JupyterLab, nbconvert for notebook conversion) data analysis (Pandas, openpyxl to work with Excel files) handling dates (pytz, python-dateutil) visualization (Matplotlib) handling web data (urllib3, Requests) scientific data (h5py to work with HDF5 data, netCDF4 to work with netCDF data, NumPy to work with arrays) scientific routines and statistics (SciPy) general Python utilities (future, pip) Note: Packages in Python code always use lowercase, but the names of the packages sometimes uses uppercase characters. For example, the numpy package is commonly referred to as NumPy, and the requests module is referred to as Requests. When a package is installed as part of the default environment, you can use the package immediately in your scripts. For example, you can run import numpy, and it will not result in an error. The 100 or so packages that are installed with the default environment are only a small subset of all the potentially available packages in the ArcGIS Pro Python distribution. To view the rest of the packages, click Add Packages in the Python Package Manager. You can scroll through the list of packages, or search for a specific one by name. As an example, one of these packages is named scikit-learn. The scikit-learn package includes many different algorithms for machine learning. This package is not included in the default environment. Notice that the Install button is dimmed because it cannot be added to the default environment. Therefore, when you run import sklearn, it results in an error. Even though the package is referred to as “scikit-learn,” it is called sklearn when importing it. Notice how the error is No module named 'sklearn'. To use this package, it first must be added to the environment. Adding a package requires a new environment separate from the default environment. 6.6 Manage environments using conda The default environment arcgispro-py3 cannot be modified. The environment is kept in a pristine state. By keeping the default environment pristine, you can switch back to the default if a certain environment no longer works. As a result, if you want to add a package for use in your scripts, you first must create a new environment. You can create one using the Python Package Manager. Click the Manage Environments button. In the Manage Environments dialog box, you can add previously created environments, clone the default environment, and remove environments you no longer need. If you have not previously worked with environments, the only environment listed is the default arcgispro-py3 environment. You can make a copy of the default environment by clicking the Clone Default button at the top or the Clone icon to the right of the default environment. When you click Clone Default, the name and path of the environment are chosen for you. When you click the Clone icon to the right of the default environment, the Clone Environment dialog box appears. The name and path are filled in with their default names, but you can change these names if you want. The default location for new and cloned environments is as follows: %LocalAppData%\ESRI\conda\envs The notation %LocalAppData% is used to indicate a general location in the user profile. For a typical user, this location corresponds to the following folder on the local computer: C:\Users\<Your Name>\AppData\Local\ESRI\conda\envs After you confirm the name and path of the new environment, click Clone to proceed. The new environment is added to the list of environments in the Manage Environments dialog box. The installation may take several minutes as the packages are being copied. Once you create a new environment, it must be activated before it can be used. In the Manage Environments dialog box, choose the environment you want, and click OK. When you choose a different environment, you must restart ArcGIS Pro before the new environment changes take effect. There is a warning message at the bottom of the Manage Environments dialog box to remind you to restart. You also can remove an environment in the Manage Environments dialog box by clicking the Remove button to the right of a specific environment. You cannot remove the default environment. You also cannot remove the active environment of the current project, so to remove an environment, you must first activate a different environment. The Manage Environments dialog box shows all the environments that reside in two well-known locations. The first location is where the default arcgispro-py3 environment is located: C:\Program Files\ArcGIS\Pro\bin\Python\envs The second location is where new environments are typically located: C:\Users\<Your Name>\AppData\Local\ESRI\conda\envs If you have created an environment in a different location, it can be added using the Add button in the Manage Environments dialog box. Cloning an environment using the Python Package Manager may not always work. If cloning fails, an error message appears below the environment where normally the location is shown. Hovering with your cursor over the exclamation symbol brings up the specific conda error. When these errors persist, it is recommended to use conda with command line as explained later in this chapter. 6.7 Manage packages using conda The Python Package Manager in ArcGIS Pro makes it easy to add a package to a specific environment. To add a package to the active environment, click the Add Packages button in the Python Package Manager. You can scroll to the package of interest, or search for it by name. Consider the example of the scikit-learn package used earlier. When the active environment can be modified, the Install button can be clicked. Once you click Install, you are prompted with the Install Package dialog box to confirm the installation. In the case of scikit-learn, the package depends on several other packages, which are listed. This list of packages is an example of the dependencies discussed earlier. Note: The specific packages that must be installed or updated because of dependencies will vary over time as new versions are released. Confirm that you agree to the terms and conditions, and then click Install. For some packages, the installation can take a considerable amount of time (i.e., several minutes or more), especially if there are many dependencies. Not all packages have dependencies, and as a result, the Install Package dialog box may show only the license agreement. Once the new package is installed, it can be used in the Python window. In the case of scikit-learn, following the installation of the package, you can run import sklearn in the Python window. By default, the most recent version of a package is shown under Add Packages. Certain projects, however, may require a specific version of a package, so there is a drop-down option under the Versions column to choose an older version. Only those versions compatible with the current version of Python are shown. Packages may become out of date over time. Click the Update Packages button in the Python Package Manager to see a list of packages installed with the active environment, and review which updates are available. You can choose a specific package to update, or you can click the Update All button to update all the packages for the active environment. You cannot update packages for the default environment. Packages can be removed from the active environment by clicking the Installed Packages button and viewing the list of installed packages, choosing a package from the list, and clicking the Uninstall button. It is possible to remove packages, including those required by ArcGIS Pro, so you must be careful with removing packages. For example, it is possible to remove the Python package, which would essentially make your environment useless. If this happened, you would see an error message in the Python window such as that shown in the figure. Typically, it is difficult, if not impossible, to fix these issues. Instead of trying to fix the active environment, switch back to the default arcgispropy3 environment and proceed with removing the environment that went bad. Then start over by cloning the default environment. This is one reason why you cannot remove packages from the default environment. Keeping the default environment pristine prevents having to reinstall ArcGIS Pro software because of a bad environment. 6.8 Using conda with command line The Python Package Manager provides a user-friendly way to manage environments and packages. It is important to realize, however, that this is simply a user interface developed as part of ArcGIS Pro to run the conda package manager. Sometimes you may want to use conda directly using command line. Some Python developers prefer using command line, and it is good to know how to use it. The full command reference can be found in the online conda documentation. What follows are some of the key commands that are used to manage environments and packages for ArcGIS Pro. To start the command prompt in Windows, search for the application called Python Command Prompt. This application resides in the ArcGIS program group and brings up the command prompt using the default environment. A few notes about what the command prompt shows are in order. First, the portion (arcgispro-py3) shows the active environment, which controls which version of Python is running and which packages are available. There is no reference to the current version of Python because it is implicit in the environment being used. Second, the portion C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3 shows where the current environment is located as a starting point, but this path can easily be changed. For example, the cd\ command jumps back to the root, and cd <dir> jumps to an existing directory. Note: This interface uses the regular Windows command prompt initialized with the arcgispro-py3 environment. When you enter commands here, you are not using Python, but you are using DOS commands. Because you are working with a virtual environment using arcgispro-py3, other commands can be used that are not part of the DOS commands, including those for conda. To work with conda, you don’t need to worry about the folder because the conda commands work regardless of which folder is being used. This flexibility is one of the benefits of working with a conda environment. You can now type your conda commands. It is helpful to start with listing all the environments that are available using the info command in conda. Type the following at the prompt, and press Enter: conda info --envs In this example, info is the conda command, and --envs is a named argument. This command is similar to using a function with an argument, although the syntax is a little different. Unless you have already created new environments, this command will show only the default environment arcgispro-py3 located in C:\Program Files\ArcGIS\Pro\bin\Python\envs. You can use the create command in conda to clone the default environment, as follows: conda create --clone arcgispro-py3 --name testing-env In this example, create is the conda command, and --clone and --name are the named arguments. The --clone argument indicates which environment must be cloned, and the --name argument gives the new environment a name. There is no need to specify the location of the default environment. If you don’t specify the location of the new environment to be created, it will be created in the default location C:\Users\<Your Name>\AppData\Local\ESRI\conda\envs. The name of the new environment can be anything of your choosing, but it should not contain any spaces. When the command is executed, you will receive a series of messages, including a possible warning message that your path contains spaces. Once the execution is finished, you can navigate to C:\Users\<Your Name>\AppData\Local\ESRI\conda\envs to confirm that a new folder with the name of your environment has been created. When you start ArcGIS Pro and navigate to the Python Package Manager, you will see a new entry in the Manage Environments dialog box. You also can clone an existing environment other than the default by specifying the name, as follows: conda create --clone testing-env --name testing2-env Again, there is no need to provide the path for the environments when you are using the default location C:\Users\<Your Name>\AppData\Local\ESRI\conda\envs. The conda remove command can be used to remove an existing environment: conda remove --name testing2-env --all The conda command remove also can remove specific packages or features, but the --all argument indicates that the entire environment is to be removed—i.e., all packages. You cannot remove the default arcgispro-py3 environment. Once an environment is created, it must be activated. To activate the environment and make it the default for future sessions of ArcGIS Pro, you can use the proswap command: proswap testing-env This command returns a message that the active ArcGIS Pro environment is changed to your new environment. The proswap command is not part of the standard conda commands but is unique to the ArcGIS Pro installation. Notice how the proswap command is not preceded by conda. You can test the result by launching ArcGIS Pro and checking the active environment in the Python Package Manager. You can use the same command to swap back to arcgispro-py3: proswap arcgispro-py3 Using proswap does not change the environment of the command prompt because it changes only the default for future sessions of the ArcGIS Pro application. You can activate an environment for the current command prompt session using the activate command: activate testing-env Note that the activate command is not preceded by conda. The result is that the next line in the command prompt starts with the name of the new active environment. The new active environment is for only the current command prompt session, and it has no effect on ArcGIS Pro. Managing environments (and their packages) using conda in the command prompt can be carried out while ArcGIS Pro is running because it does not impact the current session of ArcGIS Pro and the environment being used. For any changes to take effect, you must restart ArcGIS Pro. It is recommended that you don’t make changes in the command prompt using conda to an environment that is being used in the current session of ArcGIS Pro. In addition to managing environments, you can use conda in the command prompt to manage packages. First, you must make sure that the proper environment is activated: proswap testing-env Then you can use the install command in conda to add a specific package to the active environment. For example, here is the code to add the scikitlearn package: conda install scikit-learn This command prompts you with a confirmation message about what will be installed and updated. Note that the name of the package to be installed is scikit-learn, which is not the same as how the package is referenced when importing it into a script (i.e., sklearn). Trying to use conda install sklearn results in a message that no package by that name can be located. For many packages, however, these names are identical. Type y (for yes) to proceed with the installation. Once the installation is completed, you can confirm the addition of the package by using the list command in conda: conda list This command produces the same list you see in the Python Package Manager for the active environment. When installing a new package, the version number is not required, and by default, the most recent, compatible version is installed. The version number is needed only when you are installing an earlier version of the package. For example, the most recent version of scikit-learn at the time of writing is version 0.21.3. To install an earlier version, you can specify the version number, as follows: conda install scikit-learn=0.20.3 Note that there are no spaces around the equal-sign. This is not Python code, and the use of spaces would result in an error. Note: The ArcGIS Pro Python distribution includes the most recent versions of the packages that are compatible with the Python environment. Newer versions of the packages may already be released but have not been tested for compatibility. For example, the most recent version of the scikit-learn package for ArcGIS Pro 2.5 is version 0.21.3, but the most recent release of scikit-learn is version 0.22.2 at the time of writing. You should not try to install more recent versions of the package than those that are part of the ArcGIS Pro Python distribution to avoid potential conflicts. You can remove a package in a similar manner using the uninstall command in conda: conda uninstall scikit-learn There is no need to include a version number here, regardless of which version was installed, because there can be only one version of a package in an environment. It is not uncommon for Python packages to depend on each other. For example, the scikit-learn package works correctly only if several other packages are also installed in the same environment. In addition, some packages may need to be updated to work with the newly added packages. These dependencies of packages on each other are one of the main reasons to use a package manager such as conda because it keeps track of the dependencies and manages all the packages at the same time for a given environment. 6.9 Environment and IDEs When you create and activate a new environment for ArcGIS Pro, the current session uses this new environment. As a result, the packages you added are available immediately in the Python window. When using an IDE such as PyCharm or Spyder, you must configure your IDE to use the specific environment. This is similar to setting up your IDE to use the default environment, but you must point to the location of the new environment. Both IDLE and Spyder require a separate application for each environment, whereas a single installation of PyCharm can be used for any number of environments. Points to remember The core Python installation comes with about 200 built-in modules. You can import these modules into a script to make their functionality available. In addition to the built-in modules, the functionality of Python can be expanded by using third-party packages. These packages are organized in the online Python Package Index, or PyPI, which contains numerous packages. Python is installed as part of the standard ArcGIS Pro installation. ArcGIS Pro uses a specific Python distribution, which includes some of the most important packages used in GIS and data analysis workflows. To install, maintain, and keep track of packages, ArcGIS Pro uses a package manager called conda. You can work with conda using command line, but the Python Package Manager in ArcGIS Pro also provides a user interface to some of the most important functionality of conda. In addition to managing Python packages, conda is used to manage different Python environments, which allows you to create and use different collections of packages for different projects. The default environment in ArcGIS Pro is called arcgispro-py3, which includes more than 100 packages, including many that are required for Python-related tasks in ArcGIS Pro. This default environment cannot be modified. To install additional packages, you can use conda to clone the default environment, and then make changes to this cloned environment. Because ArcGIS Pro can use more than one environment, it is important to configure your Python IDE to use a specific environment that includes the necessary packages for a given task. Key terms Anaconda distribution conda dependency distribution environment library module package package manager Python Package Manager standard library third-party library virtual environment Review questions What is a Python package? What is the name of the default Python environment when running ArcGIS Pro? What are dependencies in Python? Why are Python environments referred to as “virtual environments”? Describe the process to create a new Python environment for ArcGIS Pro and install packages using the Python Package Manager or conda command line. What steps are necessary to use a different Python environment for ArcGIS Pro in IDLE, Spyder, and PyCharm? Chapter 7 Essential Python modules and packages for geoprocessing 7.1 Introduction This chapter looks at some of the many modules and packages that commonly are used to support GIS workflows using Python. There are many thousands of different Python modules and packages, reflecting the popularity and versatility of Python. The focus here is on a select few that complement the functionality of ArcPy to support geoprocessing workflows. They include a handful of Python’s built-in modules, which have not been covered in earlier chapters, as well as several third-party packages. The Python standard library includes around 200 built-in modules. A complete list of all the modules can be found in the Python Module Index of the online Python documentation, including fileinput, math, os, random, sys, and time. These modules are part of any Python installation. You can use the functionality of these modules directly by importing the module— e.g., import os. This chapter looks at a few additional built-in modules to carry out specialized tasks. When reviewing the modules in the Python Module Index, be aware that some modules are deprecated, which means they remain in the software but have been replaced with better alternatives. They are maintained to improve backward compatibility but should not be used for new projects. In addition, the Python Package Index (PyPI) is an online repository of more than 100,000 packages that can be added to your Python installation. The best way to manage these packages is through conda. ArcGIS Pro installs with the ArcGIS Pro Python distribution, which includes a small subset of all the packages available in PyPI. Only a subset of those packages are part of the default environment arcgispro-py3 when ArcGIS Pro is installed. Additional packages can be added to a cloned environment by using conda. If you are interested in using a specific Python package, first check if it is part of the default environment. If it is, you can proceed using the default environment. If not, review the steps in chapter 6 on creating a new conda environment and adding a package. All the packages in this chapter are part of the default environment. Table 7.1 summarizes the modules and packages covered in this chapter. The type is indicated as “standard library,” which means it is part of the standard Python installation, or arcgispro-py3, which means the package has been added to the default environment in ArcGIS Pro. The distinction is important because when you review the installed packages using the Python Package Manager in ArcGIS, you will not see the standard modules listed, but you can import them into your script. Table 7.1. Standard modules and third-party packages covered in this chapter Task Module or package Type Working with FTP ftplib standard library ZIP files zipfile standard library XML files xml.dom standard library Working with web pages urllib.request standard library Task Module or package Type CSV files csv standard library Excel files openpyxl arcgispro-py3 JSON files json standard library NumPy arrays numpy arcgispro-py3 Panda data frames pandas arcgispro-py3 Plotting 2D graphs matplotlib arcgispro-py3 There are many other packages of potential interest, but the table covers some of the most widely used ones. 7.2 Working with FTP using ftplib One of the earliest protocols to transfer files between computers is file transfer protocol (FTP). FTP is designed for transfers between a client and a server. FTP was in widespread use before other protocols, such as HTTP, became popular. FTP has several security weaknesses, but it continues to be used by many organizations to share their data publicly. One of the advantages of FTP is that it allows you to transfer many files and folders, and it maintains the folder structure. Many GIS portals no longer use FTP, but being able to work with FTP continues to be an important skill. Most FTP sites can be accessed using a regular web browser. For example, the figure illustrates the FTP site ftp.hillsboroughcounty.org. The URL starts with ftp:// instead of http://, used for websites. A typical FTP site contains various folders or directories. Using a web browser, you can click on these folders to navigate the directory structure and locate the files of interest. For the example FTP site, many of the GIS datasets of interest are located in the directory ftp://ftp.hillsboroughcounty.org/gis/pub/corporate_data/. Using a web browser, you can click on these files and download them, one by one, to a local computer. In addition to using a web browser, you also can use FTP client software to transfer files to and from an FTP site. Popular applications include FileZilla and SmartFTP. FTP client software makes it easier to transfer entire folders between a local computer and a server. Working with FTP sites using Python is accomplished using the ftplib module, which is part of the standard library. A common scenario is to use a script to download one or more specific files, or all the files in a folder. FTP sites often are used to post frequent updates, and you can run the same script repeatedly to obtain those updates. Downloading files using FTP in Python requires a few steps, as follows: (1) establishing a connection to an FTP site, (2) logging into the FTP site, (3) navigating to a specific folder, and (4) retrieving the file(s) of interest. The first two steps can be accomplished using the following lines of code: import ftplib server = "ftp.hillsboroughcounty.org" ftp = ftplib.FTP(server) ftp.login() After importing the ftplib module, an FTP object is created by specifying the FTP address. Note that the address does not include ftp:// because it is implicit in working with FTP sites. The code example uses an anonymous login. If a user name and password are needed, they are provided as the second and third parameters of the FTP object in the form of strings, as follows: ftp = ftplib.FTP(server, "username", "password") Once you log into an FTP server, you can start exploring its contents and navigating through the folders. For example, you can use the dir() method to examine the contents of the current directory: ftp.dir() This code prints a list of the folders and files in the root of the FTP site. The root typically does not contain the files of interest. The next step, therefore, is to navigate to the folder of interest by using the cwd() method and providing the subfolder as a string: ftp.cwd("gis/pub/corporate_data") Once inside the correct folder, you can list all the files inside a directory using the nlst() method: ftp.nlst() The files are returned as a list: ['Acq_elapp_1000_Buffer.zip', 'Airports.zip', ...] As an alternative, the msd() method can be used to list all the files. This method provides more control (including being able to better separate folders from files), but not all FTP sites support this method. Next, the specific file of interest must be specified. The retrbinary() method of the FTP object obtains the file, but to save the file, a local copy of the file first must be opened, and then written when it is retrieved, as follows: filename = "Airports.zip" localfile = open(filename, "wb") ftp.retrbinary("RETR " + filename, localfile.write) The argument "wb" means you are writing the file contents in binary mode. The first parameter of the retrbinary() method consists of a string starting with RETR followed by a space and the name of the file of interest. The second parameter writes the file locally. The retrbinary() method is used for binary file transfer, which is appropriate for most files. To work with plain text files, use retrlines() instead. To finish the script, the local copy of the file must be closed, and you must disconnect from the server, as follows: localfile.close() ftp.quit() Finally, you must be aware of where the downloaded files end up. By default, they are saved to the current working directory of the script, which is where the script itself is located. Depending on your IDE, however, they may end up in a different location. To control where the files are saved, change the working directory in the script, as follows: import os os.chdir(<yourworkspace>) The complete script now is as follows: import ftplib import os os.chdir("C:/Demo/Downloads") server = "ftp.hillsboroughcounty.org" ftp = ftplib.FTP(server) ftp.login() ftp.cwd("gis/pub/corporate_data") filename = "Airports.zip" localfile = open(filename, "wb") ftp.retrbinary("RETR " + filename, localfile.write) localfile.close() ftp.quit() This script downloads only a single file. To download more files, create a list of the files of interest, and then iterate over this list in a for loop. In the example script, only lines 8–11 must be changed. It also is possible to download all the files to a folder using the nlst() method to generate a list of all the files. Before doing so, you may need to check the contents of a folder in terms of the total number of files and their file size because FTP sites often are used to host large datasets. For those with administrative privileges, the ftplib module provides additional functionality to create a new folder using mkd(), delete a file using delete(), and rename a file using rename(), among other tasks. You also can upload files using the storbinary() method for binary files and storlines() for plain text files. The file type used in the example consists of ZIP files, which are common in GIS. You can use ftplib to transfer any file type, not only ZIP files. The next section looks at how to work with ZIP files. 7.3 Working with ZIP files using zipfile Many GIS datasets are large and consist of many individual files. Consider a shapefile that includes several files with the same name and different file extensions (e.g., .shp, .shx, .dbf, .prj, and so on). Or consider a file geodatabase, which consists of a separate folder with numerous files. Transferring individual files is cumbersome and may corrupt the folder structure. A widely used approach to facilitate file transfer is to use ZIP files. ZIP files were originally developed as a format to support lossless data compression, which reduces the size of a file without sacrificing the quality of the data. In addition, ZIP files make it possible to combine many files, including the underlying folder structure, into a single file. This ability makes ZIP files a preferred format to package GIS datasets and supporting files into a single file for the purpose of transferring. Transferring files is not limited to FTP, and ZIP files can be shared by email, HTTP, and other mechanisms. A ZIP file generally uses the .zip file extension. Operating systems recognize this file type and have built-in tools to create and extract ZIP files. Many utilities also work with ZIP files, including WinZIP and 7-Zip on the Windows platform, which provide additional control over the process of creating and extracting ZIP files. There are several other formats, which have similar compression and archiving abilities, including .7z, .dmg, .gz, and .tar. For the purpose of this section, the focus is on ZIP files only, but similar steps can be accomplished using these other formats. Working with ZIP files using Python is accomplished using the zipfile module, which is part of the standard library. Note: Python also has a built-in zip() function, which works with iterators and has nothing to do with ZIP files. Make sure to use the zipfile module when working with ZIP files. The main class of the zipfile module is ZipFile. In a typical script, you create a ZipFile object by pointing to an existing ZIP file or by creating a new one, and then you carry out specific tasks using the methods of this object. Creating a ZipFile object by pointing to an existing ZIP file works as follows: import zipfile zip = open("C:/Demo/test.zip", "rb") myzip = zipfile.ZipFile(zip) The argument "rb" means you are reading the file contents in binary mode. You can check the contents of the ZIP file using the namelist() method, as follows: for file in myzip.namelist(): print(file) You can use the same method to iterate over all the files in the ZIP archive and extract them one by one: for file in myzip.namelist(): out = open(file, "wb") out.write(myzip.read(file)) out.close() This code extracts each file to the current working directory, which by default is the location of the script. You can change this directory using os.chdir(). As an alternative, you can specify a path when saving each local file: out = open("C:/Temp/" + file, "wb") In most cases, however, you don’t need to iterate over the files in a ZIP file one by one because you simply want to extract all files, which can be done more easily using the extractall() method. The entire script is shown for clarity, as follows: import zipfile zip = open("C:/Demo/test.zip", "rb") myzip = zipfile.ZipFile(zip) myzip.extractall() You can specify a folder to extract the files to, using the following: myzip.extractall("C:/Temp") You can use the zipfile module to create a new ZIP archive for one or more files. You create a ZipFile object by specifying a new .zip file that does not exist yet and using the "w" argument to have write access. You also can specify the compression type. If the compression type is left out, the default ZIP_STORED is used. The code to add a single file to the ZIP archive is as follows: import zipfile zfile = zipfile.ZipFile("mytiff.zip", "w", zipfile.ZIP_DEFLATED) zfile.write("landcover.tif") zfile.close() The same approach can be used to iterate over a list of files in a directory. The script iterates over the list and adds each file to the same ZIP archive using the write() method. For example, the following script creates a list of all the files in a folder using os.listdir, and then adds each file if they use a particular file extension: import os import zipfile zfile = zipfile.ZipFile("shapefiles.zip", "w") files = os.listdir("C:/Project") for f in files: if f.endswith(("shp", "dbf", "shx")): zfile.write("C:/Project/" + f) zfile.close() Note that most shapefiles include additional file extensions. A more robust approach is to use an ArcPy function such as ListFeatureClasses() to create a list of all shapefiles in a folder, and then add all the files with the same base name, regardless of their file extension. It also is possible to work with entire folders and their contents. To do so, use os.walk() to create a list of the paths of all the files inside a folder. This approach preserves the folder structure as well. The following script creates one ZIP archive of the entire contents of one specific folder, including all subfolders: import os import zipfile mydir = "C:/Demo/Project" zfile = zipfile.ZipFile("newzip.zip", "w") for root, dirs, files in os.walk(mydir): for file in files: filepath = os.path.join(root, file) zfile.write(filepath) zfile.close() This approach can be used to create a ZIP archive for a folder that contains one or more file geodatabases, because, from a file management perspective, a file geodatabase is a folder with many files. It is impractical to work with those individual files so you can add the entire folder to a ZIP archive instead. 7.4 Working with XML files using xml.dom Extensible Markup Language (XML) is a common file format used for a variety of applications. XML is used to structure, store, and transfer data between different systems, including GIS. XML is comparable to HTML, but HTML is used mostly for display purposes, and XML is used mostly to store data. In its simplest form, XML consists of text with carefully placed tags. These tags make it possible to identify specific elements within the XML file. Formats such as KML (used in Google Earth) and GPX (used by GPS devices) are specialized versions of XML. Microsoft Office documents (e.g., .docx, .xlsx) are also XML-based. XML files use tags, but they are organized in a hierarchical structure referred to as a tree structure. This structure consists of a root element, child elements, and element attributes. The root element is the parent to all other elements. Elements in XML also are called nodes. The basic structure of XML is as follows: <root> <child> <subchild> … </subchild> </child></root> Notice how the tags come in pairs—e.g., <child> and </child>. The first of these tags is the starting tag, and the second one that uses a forward slash (/) is the ending tag. The example tags illustrate the tree structure, but when opened as text, the XML looks as follows: <root> <child> <subchild> … </subchild> </child> </root> Because of the tree structure, tags appear “nested,” which makes it cumbersome to use simple string manipulation to work with XML files. It is possible to read an XML file as text and use the tags to find the information you are looking for. However, this approach is prone to errors, and therefore specialized Python modules are used to facilitate the process of reading XML files. To read the contents of the XML file, the document first must be parsed. Parsing an XML file breaks down the file into its tree structure. Once the tree structure is created, you can navigate up and down this structure to locate the elements or nodes of interest. Python has several built-in modules to work with XML files, including xml.dom, xml.sax, and xml.etree (called ElementTree). Third-party packages for working with XML files are available, including Beautiful Soup. Each of these modules and packages have their strengths and weaknesses. Consider the following example of a simple KML file, shown as simple text: <Placemark>\n <TimeStamp>\n \<when>2020-0114T21:05:02Z</when>\n </TimeStamp>\n <styleUrl>#paddlea</styleUrl>\n <Point>\n <coordinates>-122.536226,37.86047,0</coordinates>\n </Point>\n </Placemark> This example is essentially a point location with a time stamp and a pair of coordinates. The values of interest in this case are the coordinates, and the processed result should look something like this: x,y,z (-122.536226, 37.86047, 0.0) It is easy enough to manually copy and paste these values from a text file, but the process must be automated to work with XML files containing thousands of point locations, as well as more complex data. The following code illustrates the use of the xml.dom.minidom module to obtain the coordinates. The script parses the XML file, and then reads the values of the element of interest, as follows: from xml.dom import minidom kml = minidom.parse("example.kml") placemarks = kml.getElementsByTagName("Placemark") coords = placemarks[0].getElementsByTagName("coordinates") point = coords[0].firstChild.data x,y,z = [float(c) for c in point.split(",")] This script is broken down to explain how to use the xml.dom.minidom module. The minidom module is a simplified implementation of the Document Object Model (DOM). DOM is an API used by many different programming languages to work with XML files. A typical script starts by parsing an XML file (or KML file, in the example) into a Document object, which is essentially the tree structure of elements. For parsing, use the parse() function. Once the tree structure of elements inside the XML file is obtained, various methods of the Document object can work with these elements. The getElementsByTagName() method obtains a list of Element objects on the basis of a specific tag. In the example, the elements of interest are called Placemark. Within a given element, the same getElementsByTagName() method is used to move on to the next tag, which represents the next level in the tree structure. Once the element of interest is located, the firstChild.data property returns the values as a string. Finally, to obtain the correct output, the string is split, and the values are cast as a float. Several alternatives are possible. For example, the childNodes() method can obtain a list of elements from an XML file, and the getAttribute() method can obtain the value of an element. As the example illustrates, working with XML files is a bit like using smart text processing—i.e., working through the hierarchy of tags, searching for specific tags, and breaking up strings into pieces of information for further use. There are several challenges to working with XML files, including the existence of several different “flavors” of XML, and incomplete or missing tags. These characteristics of XML partly explain why there are several modules and packages to work with XML files, each with their own strengths and weaknesses relative to a given task. HTML files and XML files have a lot in common, including the use of tags. Therefore, some of the techniques used to process XML files also can be used to work with HTML files. For example, the popular package Beautiful Soup is used to work with XML files and also for web scraping of pages in HTML format. 7.5 Working with web pages using urllib Web pages often are used as a source of information in GIS workflows. A web address is a uniform resource locator, or URL. Python’s standard library includes the urllib package, which consists of several modules to work with URLs. This section focuses on the use of the urllib .request module for opening and reading URLs. Note: Python 2 includes the modules urllib, urllib2, and urlparse to work with web pages. These modules are replaced in Python 3 by a single package called urllib, which is different from the urllib module in Python 2. Any code written for Python 2 that employs the urllib, urllib2, or urlparse modules should be carefully reviewed and updated. To open a web page, start by importing the urllib.request module, and then use the urlopen() function. Once you open a web page, you can start reading its contents using the read() method, as follows: import urllib.request url = urllib.request.urlopen("https://www.esri.com/") html = url.read() Reading an entire web page is not common. A more typical scenario is to download one or more files from a web page. To do so, use the urlretrieve() function. The arguments of this function are a URL and a local file name. The following example downloads a ZIP file and saves it as a local file: import urllib.request url = "http://opendata.toronto.ca/gcc/bikeways_wgs84.zip" file = "bikeways.zip" urllib.request.urlretrieve(url, file) The local file name can be identical to the file being downloaded, but it also can have a different name if it keeps the same file extension. The local file is saved to the current working directory, which is typically the location of the script. Similar to downloading files using FTP, this directory can be changed using os.chdir(). Some files, such as TXT or CSV, can be read directly using the urlopen() function, as follows: import urllib.request url = "https://wordpress.org/plugins/readme.txt" content = urllib.request.urlopen(url) for line in content: print(line) The urllib.request module includes functionality for related tasks, including authentication, working with proxies, and handling cookies. Additional modules of the urllib package include functionality for error handling and parsing URLs. Although urllib provides solid functionality to open and read web pages, a widely recommended alternative is the requests package. This package is not part of the standard library and must be installed as a package, but it is part of the argispro-py3 default environment. The requests package is one of the most popular Python packages and has become the de facto standard for working with web pages. The following example illustrates how to use the requests package to download a ZIP file: import requests url = "http://opendata.toronto.ca/gcc/bikeways_wgs84.zip" file = "bikeways.zip" response = requests.get(url) open(file, "wb").write(response.content) The get() function of the requests module creates a Response object. This object gives you access to the contents of the web page using the content property. To save the contents of the web page, a local file is opened, and the contents are written. This process may appear slightly more elaborate relative to using urllib, but the Response object provides great versatility. For example, it can work with JSON objects, which have become a popular way to share tabular and spatial data. 7.6 Working with CSV files using csv Plain text files are widely used to store and transfer data, but their lack of formatting can present difficulties. Hence, the usefulness of commaseparated values (CSV). A CSV file is a plain text file in which the values are separated by commas. CSV files are more robust than other forms of plain text because the separator between values is predictable—i.e., a comma. Before looking at the use of CSV files, a brief review of working with text files is in order to show the similarities and differences. One common approach to working with text files is to use the open() function, as follows: f = open("C:/Data/mytext.txt") for line in f: <some task> f.close( ) This function commonly is used for small files but becomes an issue for larger files because the entire file is read into memory. A good alternative is the fileinput module, which creates an object to iterate over the lines in a for loop, as follows: import fileinput infile = "C:/Data/mytext.txt" for line in fileinput.input(infile) <some task> A more complete example working with a text file follows. The example is relevant because the same script is used to illustrate how to work with CSV files (in this section) and Excel files (in the next section). The example reads coordinates from a text file to create point features. Each line starts with an ID number, followed by an x-coordinate and a y-coordinate. The three values are separated by a space. These coordinates are stored in a plain text file named points.txt, which is shown in the figure. Description These coordinates are used to create new point features. The script reads the contents of the text file and uses the split() method to parse each line of text into separate values for the point ID number, the x-coordinate, and the y-coordinate. The script iterates over the lines of the input text file and creates a Point object for every line. The final line of code creates the point features. The use of with statements ensures proper closure and avoids any data locks. The script is as follows: import arcpy fgdb = "C:/Demo/Data.gdb" infile = "C:/Data/points.txt" fc = "trees" sr = arcpy.SpatialReference(26910) arcpy.env.workspace = fgdb arcpy.CreateFeatureclass_management(fgdb, fc, "Point", "", "", "", sr) with arcpy.da.InsertCursor(fc, ["SHAPE@"]) as cursor: point = arcpy.Point() with open(infile) as f: for line in f: point.ID, point.X, point.Y = line.split() cursor.insertRow([point]) The result of the script is a new feature class called trees with several point features. The same coordinates stored in a CSV file are used in a different script that follows the same general steps. The CSV version of the same coordinates is shown in the figure. As expected, the separator between values is a comma, and there are no spaces. The CSV file is shown in Notepad, which does not apply any formatting. Keep in mind, however, that if you have Microsoft Office installed, the default application to open a CSV file is Microsoft Excel, even though a CSV file is not a spreadsheet format. Working with CSV files in Python is accomplished using the csv module, which is part of the standard library. You start by importing the csv module, opening the CSV file, and then using the csv.reader() function to read the contents of the file, as follows: import csv f = open("C:/Data/test.csv") for row in csv.reader(f) <some task> f.close() This code looks similar to working with regular text files but uses the csv module. One key difference is how the contents of each line (or row) is read. For a text file, you must know what the separator is between the values in a single line of text and use the split() method accordingly. In contrast, for a CSV file, the delimiter is expected to be a comma, so the separator does not need to be specified. Instead, when using the csv.reader() function, the values are returned as a list. Following is the complete script, with the changes highlighted: import arcpy import csv fgdb = "C:/Data/Demo.gdb" infile = "C:/Data/points.csv" fc = "trees" sr = arcpy.SpatialReference(26910) arcpy.env.workspace = fgdb arcpy.CreateFeatureclass_management(fgdb, fc, "Point", "", "", "", sr) with arcpy.da.InsertCursor(fc, ["SHAPE@"]) as cursor: point = arcpy.Point() with open(infile) as f: for line in csv.reader(f): point.ID, point.X, point.Y = line[0], line[1], line[2] cursor.insertRow([point]) The for loop can be written as follows for clarity: point.ID = line[0] point.X = line[1] point.Y = line[2] As the code example illustrates, the differences between the two scripts are subtle. Generally, working with CSV files is more robust compared with text files because their formatting is more predictable. ArcGIS Pro recognizes CSV files as a data format. When viewed in the Catalog pane, a CSV file appears as an entry with an icon that indicates text, as shown in the figure. When added to a map, a CSV file appears as a table, as indicated in the figure. CSV files can be used directly in many geoprocessing tools that use tables as input, including XY Table To Point and Table To Table. When a CSV file does not include a header row with field names, default field names are added—i.e., Field1, Field2, and so on. The figure shows an example of the use of a CSV file as the source for the input rows of a table. Note: Properly formatted text files (i.e., using spaces or tabs as separators) are recognized by ArcGIS Pro, and also can be used in geoprocessing tools, but they are less robust. Although CSV files can be used directly in ArcGIS Pro, using them in Python gives you more flexibility. 7.7 Working with Excel files using openpyxl Another widely used format for data manipulation is Excel spreadsheets. In addition to tabular information, Excel files can contain other elements, including formulas, graphs, and images. Although Excel files often are used to manipulate tabular information, they are not a database format. Typical rules that apply to database tables do not apply to spreadsheets. For example, spreadsheets can contain empty rows and empty columns, which is not supported in database tables. Even though Excel spreadsheets do not represent a database format, they are so widely used for data entry and manipulation that being able to work with them in Python is a good skill. Many GIS workflows also use Excel files, either as data input or as an output for use in other applications. Several modules work with Excel files in Python, but the most widely used one is openpyxl. This package is not part of the standard library in Python, but it is installed as part of the arcgispro-py3 default environment in ArcGIS Pro. The openpyxl package works with Excel files in the format .xlsx. To work with files in .xlsx format, you can use the xlrd package as an alternative. You also can use Pandas (discussed in section 7.10) to work with either Excel format. A typical Excel file is a bit different from tabular data in plain text or CSV format. First, an Excel file can contain more than one worksheet, with each worksheet representing a separate table. In addition to opening an Excel file, you also must point to a specific worksheet. Second, data in Excel worksheets is entered in cells, which are organized in rows and columns. You must reference a specific cell, and then obtain its value. The openpyxl module has functions and classes for these tasks. The first step is to open an Excel file, also referred to as a “workbook.” You can use the openpyxl.load_workbook() function, as follows: import openpyxl book = openpyxl.load_workbook("C:/Data/Example.xlsx") This function returns a workbook object. Next, methods of the workbook object can be used to obtain the worksheets in the Excel file. You can get a list of all the worksheets using the sheetnames property: sheets = book.sheetnames You can obtain a specific worksheet using the worksheet() method and specifying the index number: sheet = book.worksheets[0] Once you have a single worksheet, you can start working with the cells and their values by referencing the rows and columns. You must first reference a cell, and then you can use the cell’s value. For example, a single cell can be referenced as follows: b3 = sheet["B3"] print(b3.value) The first line of code returns a Cell object, and the value property returns the cell value. An alternative way to obtain a cell is to write out the column and row number: b3 = sheet.cell(column=2, row=3) print(b3.value) Note that the first row or column integer is 1, not zero. In order words, the number is not an index number but an argument of the column and row keywords. A more typical scenario is to read through all the cells instead of one or more specific cells. You can read through all the cells by iterating over the columns using the iter_cols() method or by iterating over the rows using the iter_rows() method. The following example reads all cell values in a worksheet using iter_cols(): for col in sheet.iter_cols(): for cell in col: print(cell.value) Iterating over the rows using the iter_rows() method works as follows: for row in sheet.iter_rows(): for cell in row: print(cell.value) Both approaches print all the values of all the cells but in a different order. The second approach (i.e., reading the values row by row) is more typical because it is similar to reading the lines in a text or CSV file. By default, the iter_cols() and iter_rows() methods continue until there are no more columns or rows left, respectively, with valid cell values. The methods do not stop reading when there is an empty column but instead continue reading rows and columns until there are no more cells left with values. An alternative to using iter_cols() and iter_rows() is to iterate over the columns or rows from their starting value (number 1) until the maximum number of columns or rows with valid cell values are read. This maximum number can be obtained using the max_col or max_row properties of a worksheet. Because cell values can be empty, you may need to check whether a cell contains a value or not. The code for checking whether a cell contains a value is if cell.value != None: This code can be rewritten more simply as if cell.value: The same script used earlier to work with a CSV file is adapted as follows to work with an Excel file. As a reminder, the script reads coordinates from a file to create point features. The Excel version of this file is shown in the figure. The script to work with the Excel file in .xlsx format using the openpyxl module follows. As before, changes relative to working with text files are highlighted. import arcpy import openpyxl fgdb = "C:/Data/Demo.gdb" infile = "C:/Data/points.xlsx" fc = "points" sr = arcpy.SpatialReference(26910) arcpy.env.workspace = fgdb arcpy.CreateFeatureclass_management(fgdb, fc, "Point", "", "", "", sr) with arcpy.da.InsertCursor(fc, ["SHAPE@"]) as cursor: point = arcpy.Point() book = openpyxl.load_workbook(infile) sheet = book.worksheets[0] for i in range (1, sheet.max_row): point.ID = sheet.cell(row = i, column = 1).value point.X = sheet.cell(row = i, column = 2).value point.Y = sheet.cell(row = i, column = 3).value cursor.insertRow([point]) In this script solution, the load_workbook() function opens the Excel file. The sheet of interest is selected by using worksheets[0], which selects the first (and only) worksheet. Next, the script iterates over the rows of the worksheet from the first row (row 1) until there are no more rows left (max_row). An alternative iteration is to use sheet.iter_rows(). Because there are only three columns total, the column numbers are specified in the script. For many columns, an alternative solution is to iterate over the rows and columns using for row in sheet.iter_rows(): and for cell in col:. Finally, the cell values are obtained using the value property, and these values are assigned to the ID, X, and Y properties of the Point object. The openpyxl module can be used for many other tasks, including working with cell formatting and styles; modifying colors, fonts, patterns, and borders; working with formulas; and working with charts. In short, most tasks carried out in Excel can be automated to some degree using Python. In addition, it should be noted that ArcGIS Pro has two standard tools to work with Excel files. These tools are Excel To Table, which converts a single worksheet in an Excel file to an ArcGIS Pro–compatible tabular format, and Table To Excel, which converts a table in ArcGIS Pro to an Excel file with a single worksheet. Both tools support .xls and .xlsx formats. These tools make assumptions about how tabular data is organized in Excel, and they may not work on all tabular datasets. Modules such as openpyxl give you more control over how to manage the data in an Excel file (including the ability to work with multiple worksheets), but for wellformatted Excel files using a single worksheet, the standard tools in ArcGIS Pro may suffice. 7.8 Working with JSON using json JavaScript Object Notation (JSON) is a text-based data format used to share data between applications. JSON has its origins in the JavaScript programming language. However, JSON has become its own standard and is considered language agnostic, which means it is independent of a specific programming language. As a result, it is widely used in many different programming languages on different platforms, and it has become a de facto standard for information sharing. This usage includes spatial datasets. Consider a simple example of what a JSON file looks like. The following example describes a person by using a name, hobbies, age, and children. Each child also has a name and age. { "firstName": "Jennifer", "lastName": "Smith", "hobbies": ["dancing", "tattoos", "geocaching"], "age": 42, "children": [ { "firstName": "Mark", "age": 7 }, { "firstName": "Ashley", "age": 11 } ] } Note that the indentation and use of brackets is different from Python because it is based on JavaScript. The example illustrates that JSON supports data types such as numbers and strings, as well as lists and objects. Also note that the structure looks a bit like a Python dictionary. JSON is built on two types of structures: (1) a collection of name/value pairs, and (2) an ordered list of values. These types are universal data structures in programming, which makes JSON interchangeable with many programming languages. Working directly with JSON objects is facilitated using the json module, which is part of the standard library. This module can be used to convert between JSON and Python. A JSON object in Python is created by entering the entire object as a string. The examples that follow use this simplified JSON: { "name": "Joe", "languages": ["Python", "Java"] } Because JSON is a text-based format, JSON objects are created as a string, as follows: import json person = '{"name": "Joe", "languages": ["Python", "Java"]}' The JSON object can be converted to a Python dictionary using the loads()function of the json module, as follows: py_person = json.loads(person) print(py_person["languages"]) The result prints as follows: ['Python', 'Java'] JSON objects also can be stored as text files with the .json file extension. The load() function of the json module can read this file and convert it to a Python dictionary. In the following example, the person.json file contains the same text as the JSON object referenced earlier and reads as follows: import json person = open("person.json") py_person = json.load(person) print(py_person["languages"]) A Python dictionary can be converted to a JSON object using the dumps() function of the json module, as follows: import json person = {"name": "Joe", "languages": ["Python", "Java"]} json_person = json.dumps(person) print(json_person) The result prints the entire JSON object as a string. The dump() function can be used to write a JSON object to a file, as follows: import json person = '{"name": "Al", "languages": ["Python", "C"]}' json_file = open("newperson.json", "w") json.dump(person, json_file) json_file.close() To improve readability of JSON files, it is useful to use pretty print JSON, also referred to as Pretty JSON or PJSON. For example, the earlier example prints a JSON object as a simple string, as follows: import json person = {"name": "Joe", "languages": ["Python", "Java"]} json_person = json.dumps(person) print(json_person) The result is a regular string: {"name": "Joe", "languages": ["Python", "Java"]} The formatting can be modified by using additional parameters for the dumps() function, including indentation: json_person = json.dumps(person, indent=4) The result is a format that illustrates the organization of the JSON object more clearly, as follows: { "name": "Joe", "languages": [ "Python", "Java" ] } Additional sorting can be accomplishing by adding the argument sort_keys=True. The use of PJSON has no impact on the actual data, and when saving to a file, the file extension is the same. JSON is widely used to share data and has become a popular format in the geospatial community. As one illustration of this wide acceptance, ArcGIS Pro has standard tools to convert to and from JSON—i.e., Features To JSON and JSON To Features. JSON is also used in services created using the ArcGIS REST API. In addition, the GeoJSON format has been developed as a file format to represent geographic data as JSON. Both formats are in widespread use, and many applications can work with both formats. The file extension for JSON is .json, whereas the file extension for GeoJSON is .geojson. Note: A detailed review of the differences between JSON and GeoJSON is beyond the scope of this book. Both JSON and GeoJSON format can be used to store spatial data, but the internal organization is slightly different between the two. The remainder of this section focuses on JSON only. There are several ways to work with JSON objects in Python scripts. First, JSON is widely used as an alternative file format when downloading data from online resources—for example, using urllib or requests. Second, you can convert existing spatial data using a standard tool in ArcGIS Pro such as Features to JSON. Third, you can work with JSON objects directly in a script—for example, by using a cursor of the arcpy.da module or by creating JSON objects in the script itself (as illustrated in earlier examples). A few additional examples illustrate some of these scenarios. The earlier examples of a JSON file did not include geographic data, so it is helpful to continue with an example that does. Consider a feature class of parcels with a single polygon feature and a handful of attributes. When this polygon feature is converted to JSON, the spatial data is represented as text only, and it can be viewed using a simple text editor, such as Notepad. The first portion of the JSON file includes information about the fields in the attribute table (FID and PARCEL_ID), the geometry type (polygon), and the spatial reference (factory code 2277). Note that the Shape field is not included because this information is captured through the geometry type, the coordinate system, and the actual coordinates. The second portion of the JSON file includes the information on the features—in this case, only a single polygon. This information consists of the values of the two attribute fields, as well as the coordinates of the vertices. There are five vertices total, but the first and last ones have identical coordinate values and are coincident. The “rings” reference indicates that JSON supports the use of exterior and interior rings to represent polygons with holes, but only a single ring is needed in this example. It is not common to have to review in detail the contents of a JSON file, but it illustrates how the entire spatial dataset, including the coordinate system, the attribute structure, the attribute values, and the features, are represented as text only. The formatting employed here uses PJSON to improve legibility. When using no additional formatting, the entire JSON file is one long line of text, which is much more difficult to interpret. Following is an illustration of what the unformatted JSON file looks like. {"displayFieldName":"","fieldAliases": {"FID":"FID","PARCEL_ID":"PARCEL_ID"},"geometryType":"esriGeo metryPolygon","spatialReference": {"wkid":102739,"latestWkid":2277},"fields": [{"name":"FID","type":"esriFieldTypeOID","alias":"FID"}, {"name":"PARCEL_ID","type":"esriFieldTypeString","alias":"PARCEL_ ID","length":15}],"features":[{"attributes": {"FID":0,"PARCEL_ID":"0206042001"},"geometry":{"rings": [[[3116036.110156253,10071403.570008084], [3115768.3600355834,10071482.069851086], [3115847.3598775864,10071747.569976255], [3116114.2300787568,10071667.570136249], [3116036.110156253,10071403.570008084]]]}}]} As discussed earlier in this section, Python’s json module can convert between JSON and Python objects, and geoprocessing tools in ArcGIS Pro can convert between JSON objects stored as files and feature classes. In addition, the ArcPy function AsShape() can convert between JSON objects and ArcPy geometry objects. This capability makes it possible to work with JSON objects to store and create spatial data without having to save the data to a file. The syntax of the arcpy.AsShape() function is as follows: AsShape(geojson_struct.{esri_json}) The first parameter is a JSON object represented as a Python dictionary. The second parameter specifies whether the object is a JSON (True) or GeoJSON (False) object. The following example creates a JSON object for a single point feature with a coordinate system and converts it to an ArcPy Point object: import arcpy geo = {"x": -124.7548, "y": 46.5783, "spatialReference": {"wkid": 4326}} point = arcpy.AsShape(geo, True) The AsShape() function returns a geometry object on the basis of the input JSON object. Points are created by using "x" and "y". Polylines are created by using "paths", and polygons are created by using "rings". For example, the following example creates a single Polyline object on the basis of a list of coordinates: import arcpy geo = { "paths": [ [[166.4359,19.5043], [166.4699,19.5098], [166.5086,19.4887], [166.5097,19.4668], [166.4933,19.4504], [166.4617,19.4410]]], "spatialReference": {"wkid":4326}} polyline = arcpy.AsShape(geo, True) Note that PJSON formatting it not entirely maintained in this last example to reduce the number of lines for display purposes. The JSON objects used so far have been relatively simple because they consist of only a single feature. An example of a JSON object using multiple point features is as follows: {"features":[{"geometry":{"x":3116036,"y":10071403}}, {"geometry":{"x":3115768,"y":10071482}}, {"geometry":{"x":3115847,"y":10071747}}]} This JSON object can also be converted to a Python dictionary using the load() or loads() function, and converting each point to a geometry object requires an iteration over the “features” key. The examples in this section illustrate how to create geometry objects by writing out the JSON object as a Python dictionary. More complex JSON objects can be read from a text file and converted to a Python dictionary using the load() function of the json module. 7.9 Working with NumPy arrays NumPy is a Python package to facilitate fast processing of large datasets. NumPy is short for “numerical Python,” and the package is designed to support scientific computing. It is typically pronounced as NUM-pie, but sometimes as NUM-pee. It is part of a larger collection of Python packages called the “SciPy (pronounced Sigh Pie) stack,” which also includes Matplotlib, IPython, and Pandas. NumPy uses a data structure called NumPy arrays, also referred to as multidimensional arrays or n-dimensional arrays. Why use NumPy? First, it often works fast relative to using other modules or packages for similar tasks. Second, it includes many different algorithms for processing and analysis not found in other packages. And third, the NumPy array data structure is employed by many other packages, and NumPy therefore is used to exchange data. In the context of GIS, NumPy is often used for processing large raster datasets as part of remote sensing or spatial analysis workflows. Python uses several different data structures, which generally fall into two categories. First, there are numbers, including integers and floating points. Second, there are sequences, including strings, lists, and tuples. NumPy uses a different data structure called an array. By design, this structure is closer to how computer hardware works, which is one reason why it is faster. Arrays are designed for scientific computing. They are like a Python list, but n dimensional, whereas a list has only one dimension. Note: Recall that ArcPy includes a class called Array. Although this class shares some common elements with NumPy arrays, it is used only for a specific purpose in the context of working with geometry objects when using ArcPy. Therefore, Array objects in ArcPy and NumPy arrays are not interchangeable. NumPy arrays often consist of numbers, and these values are indexed by a tuple of positive integers. Arrays are multidimensional, and each dimension of an array is called an axis. The number of axes in an array is also called rank. Each axis has a length, which is the number of elements in that axis, just like a list. The numbers in a single dimension are all the same type. Arrays support the use of slicing and indexing, like lists. An example of a simple array is [0, 1, 2, 3]. The dimension or rank of this array is one because there is only one axis. The length of the first (and only) axis is four. A one-dimensional (or 1D) NumPy array is like a list. Consider how this array is created, as follows: import numpy as np a = np.array([0, 1, 2, 3]) The NumPy package is imported, and by convention, it is imported as np, but importing it in this manner is not required. The NumPy array object is created using the array() function, and the argument of the function is a Python list. This example effectively converts a list to an array. Once an array is created, you can check some of its properties. The ndim property returns the dimension of the array, as follows: print(a.ndim) This property returns a value of 1. You can determine the length (or size) of the array using the len() function: print(len(a)) This function returns the length of the first dimension—in this case, 4. Next, consider a two-dimensional, or 2D, array: b = np.array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]) print(b.ndim) print(len(b)) The argument for the array() function in the example is a list of lists. The code returns the value of 2 for the dimension of the array and a value of 3 for the size of the first dimension. The shape property returns the size of all the dimensions: print(b.shape) This property returns a value of (3, 4). The example array is also referred to as a 3 × 4 (three by four) array. A 2D NumPy array is like a table or matrix. In this example, the table has three columns and four rows, although the concepts of rows and columns don’t apply in the same way to arrays. A three-dimensional, or 3D, array can be created as follows: import numpy as np c = np.array([[[1], [2], [3]], [[4], [5], [6]]]) This code creates an array with three dimensions, and the shape property is (2, 3, 1) or a 2 × 3 × 1 array. A 3D array does not have an equivalent, although it is sometimes referred to as a “three-dimensional matrix.” The same approach can be continued to create 4D, 5D, and higher-dimensional arrays. This feature is why NumPy arrays are referred to as n dimensional, where n is a positive integer. It is important to recognize the meaning of dimensions here. When creating a NumPy array, dimensions are data dimensions, not dimensions of coordinate space. In GIS, it is common to think of 2D as horizontal space (x,y coordinates) and 3D as horizontal plus vertical space (x,y,z coordinates). In a NumPy array, location (x,y or x,y,z) is only one dimension. Coordinates are typically stored as a tuple in this one dimension. For example, the coordinates of a point in 2D space (e.g., (1512768, 3201482)) by itself is an array of rank 1, because it has one axis. Therefore, dimension in NumPy is not what is commonly thought of when considering coordinates in GIS. Basically, the coordinates represent one dimension, and other dimensions can be such things as attribute values and time. The following example represents two points as a NumPy array, each point with an ID and a pair of x,y coordinates: import numpy as np newarray = np.array([(1, (471316, 5130448)), (2, (470402, 5130249))]) This code represents a 2D array, with the first dimension representing the attribute values (ID, in this case) and the second dimension representing the coordinates (tuples of x,y values in this case, but this could also be x,y,z). NumPy arrays are created in a variety of ways. Values can be entered directly into the code, as illustrated in the previous examples. Python sequences such as lists and tuples can be converted to arrays. Existing data sources also can be converted, including tables, feature classes, and raster datasets. And finally, there are several NumPy functions to create arrays from scratch. These include the zeros() function to create an array with only values of zero, the ones() function for values of one, and arrange() for a numeric sequence. The shape argument of these functions sets the dimensions of the array and the size of each dimension. For example, the following code creates a two-dimensional 3 × 5 array with values of zero: import numpy as np zeroarray = np.zeros((3, 5)) This code creates an array as follows: [[0, 0, 0, 0, 0],[0, 0, 0, 0, 0],[0, 0, 0, 0, 0]] The arrange() function makes it possible to convert a numeric sequence into an array, like Python’s built-in range() function. The following example creates a one-dimensional array: import numpy as np newarray = np.arange(1, 10) This code creates an array as follows: [1, 2, 3, 4, 5, 6, 7, 8, 9] The reshape() method of the array object can modify the numeric sequence into the desired array: import numpy as np array3x3 = np.arange(1, 10).reshape((3, 3)) This code creates a 3 × 3 array on the basis of the values in the original array, as follows: [[1, 2, 3],[4, 5, 6],[7, 8, 9]] The examples here use only one- and two-dimensional arrays because they are easy to visualize, but the same functions and methods can be used for any multidimensional array. As is common in Python, the data type is derived from the value. Consider the following array: import numpy as np a = np.array([2, 3, 4]) print(a.dtype) This code returns int32 as the data type—i.e., a 32-bit integer. Instead of relying on dynamic assignment, you can specify the data type explicitly when the array is created, as follows: import numpy as np b = np.array([2, 3, 4], dtype="float32") This code specifies the data type as a 32-bit float, even though the values are integers. Creating NumPy arrays from scratch is not so common, although they do allow you to practice array manipulation with easy-to-visualize values. A more common scenario in GIS workflows is to convert an existing dataset to a NumPy array for processing. ArcPy includes several functions for this conversion. To convert between raster data and NumPy arrays, ArcPy has two functions. They are regular functions of ArcPy, not functions of the arcpy.sa module: NumPyArrayToRaster() RasterToNumPyArray() To convert between feature classes and tables, and NumPy arrays, the arcpy.da module has four functions: FeatureClassToNumPyArray() NumPyArrayToFeatureClass() TableToNumPyArray() NumPyArrayToTable() There is one additional function, called ExtendTable(), which joins the contents of the NumPy array to an existing table on the basis of a common attribute field. This function is the equivalent of a table join, but between a table and a NumPy array. The use of these functions is illustrated with an example of the FeatureClassToNumPyArray() function of the arcpy.da module. The syntax of this function is as follows: FeatureClassToNumPyArray(in_table, field_names, {where_clause}, {spatial_reference}, {explode_to_points}, {skip_nulls}, {null_value}) The first required parameter of this function is an input feature class, layer, table, or table view. The second required parameter is a list or tuple of field names. A string can be used for a single field. You can use an asterisk (*) to access all fields, but this usage is typically not recommended. For faster performance, you should narrow the fields to the ones that are needed for the task at hand. Geometry tokens can be used (e.g., SHAPE@XY), but not full geometry objects (i.e., SHAPE@). The parameters here are like those used in the cursor classes of the arcpy.da module. When creating a cursor object, the first parameter is a feature class, layer, table, or table view, and the second parameter is a list or tuple of field names, including geometry tokens. The following example script converts a field of a feature class to a NumPy array and determines a simple statistic: import arcpy import numpy input = "C:/Data/Usa.gdb/Counties" c_array = arcpy.da.FeatureClassToNumPyArray(input, ("POP2010")) print(c_array["POP2010"].sum()) This calculation is easy enough to do with a tool in ArcGIS Pro or a simple Python script without NumPy, but NumPy is often much faster and contains other functionality not available in ArcPy. The following example uses the corrcoef() function to determine the bivariate correlation coefficients between two variables in a database table: import arcpy import numpy input = "C:/Project/health.dbf" field1 = "VAR3" field2 = "VAR4" h_array = arcpy.da.TableToNumPyArray(input, (field1, field2)) print(numpy.corrcoef((h_array[field1], array[field2]))[0][1]) In this script, the TableToNumPyArray() function of ArcPy is used to create a two-dimensional NumPy array on the basis of the two fields of interest, and the corrcoef() function of NumPy is used to create a bivariate correlation matrix between the two axes of the array. Because the function returns a 2 × 2 matrix (in the form of an array), the two indices at the end are used to obtain just the one value of interest, which is a single correlation coefficient. To work with geographic data, we often need a structured array, which includes fields, or structs, to map the data to fields in tables. Following is a simple example of a structured array: import numpy a = numpy.array([(1, 2.4, "Hello"), (2, 3.7, "World")], dtype=[("field0", "int32"), ("field1", "float32"), ("field2", (str, 10))]) To break this down, each element in this array is a record that contains three elements. These elements have been given default field names field0, field1, and field2. The data types are a 32-bit integer, a 32-bit float, and a string with 10 characters or less. Notice how the values are entered as a tuple, so it is only a single dimension. The result is a one-dimensional array with a length of two. Note: There are numerous data types in NumPy, sometimes with confusing notation. For example, int32 is often written as i4, float64 is often written as f8, and so on. You should now be able to follow along with the example from the ArcGIS Pro help pages on “Working with NumPy in ArcGIS.” The example illustrates the use of a structured array to create geographic data as a NumPy array first, which is then converted to a feature class. import arcpy import numpy outfc = "C:/Data/Texas.gdb/fd/pointlocations" pt_array = numpy.array([(1, (471316.38, 5000448.78)), (2, (470402.49, 5000049.21))], numpy.dtype([("idfield", numpy.int32), ("XY", "<f8", 2)])) sr = arcpy.Describe("C:/Data/Texas.gdb/fd").spatialReference arcpy.da.NumPyArrayToFeatureClass(pt_array, outfc, ["XY"], sr) The NumPy array consists of a two-dimensional array containing values for the ID field and tuples with x,y coordinates. The NumPyArrayToFeatureClass() function is used to create the feature class from the array. Although converting an array to a feature class is not nearly as common as vice versa, the example syntax is helpful to understand how to create a structured array for geographic data. In addition to using NumPy on its own, one of the main reasons to work with NumPy is because NumPy arrays are used by other Python packages, including GDAL (raster and vector data processing), Matplotlib (creating graphs), Pandas (data processing and analysis), and SciPy (scientific computing). All these packages are part of the arcgispro-py3 default environment in ArcGIS Pro. 7.10 Using Pandas for data analysis Tabular data is widely used in GIS workflows, including the use of a variety of formats, such as text files, CSV files, Excel files, geodatabase tables, and so on. Python can work with all these formats, but it often requires the use of a specific module or package for a specific format. Pandas has become one of the most widely used packages to work with tabular data in Python, and one of its strengths is that it can work with many different formats. Pandas is installed as part of the arcgispro-py3 default environment. The name Pandas is derived from the term panel data, which is used in statistics to describe datasets with observations over multiple time periods. As with any other package, Pandas must be imported in a script. It typically is imported as follows: import pandas as pd There is no requirement to use import-as, but many scripts that employ Pandas use this notation. One of the most important data structures in Pandas is the DataFrame. A Pandas DataFrame is a two-dimensional structure to store values. It basically is a table with rows and columns. The columns have names (like fields), and the rows have index values. You can create a DataFrame from scratch, or you can create a DataFrame by converting from another format, such as a NumPy array or CSV file. The following example uses a CSV file. Note: The term “DataFrame” in Pandas is not related to the term “data frame” in ArcGIS Desktop 10.x (which is like a map in ArcGIS Pro). Pandas was developed separately from ArcGIS, and the similarity of the terms is coincidental. Also, some writings refer to a DataFrame in Pandas as “Data Frame,” “data frame,” or simply “frame,” but the only correct term is DataFrame. Reading a CSV file using Pandas can be accomplished using the read_csv() function. This function returns a DataFrame object, as follows: import pandas as pd df = pd.read_csv("health.csv") You can check the contents of the DataFrame by printing it. A common approach is to print only the first few lines using the head() method of the DataFrame object. By default, this method prints the first five rows, but you can specify a different integer value, as follows: print(df.head(10)) The result is the first 10 rows of a Pandas DataFrame, as shown in the figure. You also can print the last few lines using the tail() method or a random sample of lines using the sample() method. The example illustrates the basic structure of a DataFrame. A DataFrame consists of rows and columns. Column names are obtained from the header row of the input file. Rows have an index number starting with the value zero. This basic structure is similar to how other applications organize tabular data, including ArcGIS Pro and Excel. What makes Pandas so powerful is how easy it is to load data into a usable structure and how easy it is to manipulate the data. Choosing specific columns to work with can be accomplished by specifying the names of the columns as a list, as follows: df[["FIPS", "Diabetes"]] Double brackets are needed here: the outer brackets indicate you want to select columns, and the inner brackets specify a list of column names. To work with the results, you can assign the result to a new DataFrame. You can print the result to confirm the contents, as follows: small_df = df[["FIPS", "Diabetes"]] print(small_df.head()) The order of the columns can be changed by changing the order of the list of column names, as follows: new_df = df[["FIPS", "Median_hh_income", "Diabetes"]] print(new_df.head()) The result prints with the new specified order of column names as shown in the figure. A DataFrame is a two-dimensional data structure, but if the number of columns is reduced to one, it effectively becomes a one-dimensional data structure. In Pandas, this structure is known as a Pandas Series. You can create a Pandas Series from scratch, by reading data from another source or by choosing only a single column from an existing DataFrame. The following example illustrates the latter: import pandas as pd df = pd.read_csv("health.csv") s = df["Diabetes"] print(s.head()) The result prints with a single column, as shown in the figure. A Pandas Series is a one-dimensional data structure to store values, like a list or a one-dimensional NumPy array. As the printout illustrates, the rows have an index and the column has a name, but because there is only a single column, it is not shown as a header. Returning to the manipulation of DataFrames, selecting and reordering columns is only one of many tasks that can be accomplished using Pandas. Another common task is to filter for specific values. For example, the original data contains a record for every US county. To obtain a DataFrame with the rows for only one state, use an expression like the following: STATE_NAME == "Florida" This code returns a value of True or False, like a SQL statement. The following code uses this expression to filter the data and stores the result as a new DataFrame object. The entire code is shown for clarity, as follows: import pandas as pd df = pd.read_csv("health.csv") fl_df = df[df.STATE_NAME == "Florida"] print(fl_df.head()) The result is a new DataFrame object with only the rows corresponding to the state of interest. This last example is a good illustration of how effective Pandas is at organizing data in a usable structure and manipulating the data. Only a few lines of code are needed to read a CSV file and filter the data on the basis of the value. Many other data manipulations are possible, including counting, descriptive statistics, and aggregation. The following examples represent only a few of the possibilities. The following script filters for the records with the maximum value for a specific column: import pandas as pd df = pd.read_csv("health.csv") print(df.loc[df["Obese"].idxmax()]) In this example, df["Obese"].idxmax() returns the index of the row in which the column name Obese has the maximum value, and df.loc[] returns the row of that index. The result prints the maximum obese rate, as shown in the figure. The following script determines the median value of the column Obese by state: import pandas as pd df = pd.read_csv("health.csv") new_df = df.groupby("STATE_NAME").median()["Obese"] print(new_df.head(10)) In this example, groupby("STATE_NAME") aggregates the data on the basis of the column name STATE_NAME, the median() method determines the median value for each column on the basis of the aggregation, and ["Obese"] selects only the column of interest for printing purposes. The result prints the median value for obesity for each state as shown in the figure. These examples illustrate the versatility of Pandas to manipulate data using Python. Pandas can accomplish many other tasks, including handling missing data, reshaping data (pivoting, appending rows or columns, sorting), creating subsets (removing duplicates, filtering), summarizing data, aggregating data, combining datasets (table associations), and basic plotting. In addition, Pandas can read data in many different formats, including NumPy arrays, text files, CSV files, Excel files (both .xls and .xlsx), JSON strings, HTML files, SQL tables, and several others. This versatility, combined with the effective data manipulation techniques using DataFrames, has made Pandas one of the most popular Python packages in the data science community. It does not make other packages obsolete, but programmers that have learned Pandas tend to make a lot less use of separate modules/packages such as csv and openpyxl, among others. The functionality of Pandas is expanded by the GeoPandas package. This package extends the data types used by Pandas to allow for geometric operations. The two main data structures of GeoPandas are GeoSeries and GeoDataFrames to store the geometry and attributes of vector objects, respectively. The result is a powerful set of tools for vector data manipulation. Under the hood, GeoPandas uses several other Python packages to work with geospatial data, including Shapely for geometric operations, Fiona for file access, and Matplotlib for plotting. Once you start using open-source Python packages, you realize how they are frequently used together. 7.11 Using Matplotlib for data visualization Python’s standard library includes limited capabilities for data visualization, but many packages are available to support the creation of 2D and 3D graphics. The most widely used of these packages is matplotlib. Matplotlib makes it possible to create a variety of different graphics, which enhance those of ArcGIS Pro. The functionality of Matplotlib is somewhat comparable to the plotting capabilities of MATLAB, and users of that software will find it relatively easy to lean Matplotlib in Python. MATLAB is a proprietary programming language and numerical computing environment. The Matplotlib package includes the pyplot module for relatively simple tasks, but many other modules exist for more sophisticated tasks, such as control of line styles, colors, fonts, and others. Matplotlib is part of the arcgispro-py3 default environment. This section illustrates the use of matplotlib.pyplot to create basic graphs. The first step in using the pyplot module is to import it as follows: import matplotlib.pyplot as plt There is no requirement to use import-as but using it is common practice to shorten code. Similarly, the use of the name plt is not required, but many scripts use this notation. Matplotlib relies heavily on NumPy arrays, and NumPy is imported as follows: import numpy as np Basic plotting, however, does not require NumPy, so not all scripts need this code. Creating a basic graph can be accomplished using the plot() function of the pyplot module. The basic syntax of a 2D plot is plot(x, y, <format>) Values for x and y can be obtained from existing data, but they can also be entered directly in the function as lists. The format argument is a string that uses codes for color, marker style, and line style. The format argument is optional, and the default is "b-", which means a blue line. The following example creates a scatter plot of five points using green ("g") circles ("o"). plt.plot([1, 2, 3, 4, 5], [2, 4, 3, 5, 4], "go") You can use Python lists to enter values for a figure, but internally all sequences are converted to NumPy arrays. There is no need to import NumPy, however, unless you are specifically working with NumPy arrays as inputs. The final step is to make the graph appear using show(): plt.show() The entire script so far is as follows: import matplotlib.pyplot as plt plt.plot([1, 2, 3, 4, 5], [2, 4, 3, 5, 4], "go") plt.show() Running the script brings up the figure in a simple viewer, as shown. The nature of the viewer varies somewhat among Python IDEs, but the plot itself is the same. Instead of displaying the figure, you can also save it to a local file, as follows: plt.savefig("demoplot.png") Many details can be added to control the display of the figure. The axis() function controls the values of the x and y axes as a list, as follows: [xmin, xmax, ymin, ymax]. plt.axis([0, 6, 0, 6]) These and other lines of code are used after the figure is created using plot(), but before the figure is displayed or saved. Labels for the axis can be added using the xlabel() and ylabel() functions: plt.xlabel("variable x") plt.ylabel("variable y") The complete script now is as follows: import matplotlib.pyplot as plt plt.plot([1, 2, 3, 4, 5], [2, 4, 3, 5, 4], "go") plt.axis([0, 6, 0, 6]) plt.xlabel("variable x") plt.ylabel("variable y") plt.savefig("demoplot.png") And the resulting figure is a scatter plot with axes and labels, as shown. Although data values can be entered directly into a script as a list or a NumPy array, it is more typical to read the values from existing data sources. Consider an example CSV file with a time series of global mean sea level rise (Source: CSIRO, 2015). The data includes values for the year and the sea level in inches. In the following code example, Pandas is used to read the contents of the CSV file. Each variable of interest is created as a Pandas Series, which is a sequence of values like a list. Matplotlib is used to create a scatter plot between the two variables, including labels for the axes and a title. The markersize parameter of the plot() function sets an appropriate size considering the number of observations. The code is as follows: import matplotlib.pyplot as plt import pandas as pd df = pd.read_csv("sealevel.csv") year = df["Year"] sea_levels = df["CSIRO Adjusted Sea Level"] plt.plot(year, sea_levels, "ro", markersize=2.0) plt.xlabel("Year") plt.ylabel("Sea Level (inches)") plt.title("Rise in Sea Level") plt.savefig("sealevel.png") The result is a professional-quality graph, generated using only a few lines of code. Matplotlib includes functionality to create many other types of graphs. One of the best ways to become familiar with the possibilities is to review the gallery of examples on the Matplotlib home page. Each example links to the complete code to create the specific example, and the source code can be downloaded as a Python script file (.py) or Jupyter Notebook file (.ipynb). Points to remember The Python standard library includes several additional modules that are widely used in geoprocessing scripts. There are many other thirdparty packages that can be added to support GIS workflows in Python. Transferring files using FTP can be automated in Python using the ftplib module. Common tasks include navigating to the correct directory, reading the files in a directory, and downloading files to a local computer. ZIP files are widely used to compress, organize, and transfer GIS datasets. The zipfile module in Python can examine the contents of a ZIP file, extract files from a ZIP file, and create new ZIP files. XML files are widely used to store datasets. Several different modules and packages can parse the contents of an XML file and read the contents of the elements of interest. Accessing the contents of web pages can be accomplished using the urllib module. Typical tasks include reading the contents of a specific page or downloading a file using a URL. A commonly used alternative is the requests package. Working with tabular data in CSV format can be accomplished using the csv module, whereas the openpyxl package is widely used to work with Excel files. JSON has become a widely used format to share data, and the json module can convert between JSON objects and Python. Several tools in ArcGIS and functions in ArcPy are also available to work with JSON objects. Fast processing of large datasets is facilitated using the NumPy array data structure, which is supported by the NumPy package. NumPy arrays also are widely used in other packages for data analysis and visualization. Several tools are available to convert between spatial datasets in ArcGIS and NumPy arrays. Pandas has become a widely used package to work with tabular data in Python. The DataFrame data structure in Pandas is effective for reading data from different formats and manipulating data for further analysis. Matplotlib provides a complete set of tools to create professionalquality graphics using Python. Many other packages can expand the functionality of your scripts. Being able to utilize some of these packages will make you a more effective coder. Key terms array axis (of NumPy array) built-in module child element (XML) comma-separated values (CSV) deprecated directory Extensible Markup Language (XML) extract file transfer protocol (FTP) GeoJSON JavaScript Object Notation (JSON) lossless data compression node (XML) NumPy NumPy array parsing Pandas Pandas DataFrame Pandas Series Pretty JSON (PJSON) Python Module Index Python Package Index (PyPI) rank (of NumPy array) root element (XML) spreadsheet struct structured array tag (XML) tree structure (XML) uniform resource locator (URL) ZIP file Review questions What is the typical workflow to obtain datasets from an FTP site using a Python script? Explain the structure of an XML file and the approach used to read data stored in XML using Python. What are some of the challenges working with Excel files in Python? Describe the structure of a JSON file, and explain how this structure influences the use of JSON objects in Python. What makes Pandas so effective in reading and manipulating datasets? Describe some of the functionality of the Matplotlib package. Chapter 8 Migrating scripts from Python 2 to 3 8.1 Introduction This chapter examines how to migrate scripts developed for ArcGIS Desktop 10.x to ArcGIS Pro. Migration requires addressing changes in the Python language, changes in the ArcGIS software, and specific changes in ArcPy. Each of these changes is discussed in detail, including strategies to facilitate the migration. 8.2 Overview of migrating scripts The focus of this book is on developing scripts and tools using Python for ArcGIS Pro. However, sometimes you may be tasked to take an older script or tool and migrate it to ArcGIS Pro. In some cases, the older script or tool may run perfectly in ArcGIS Pro without any changes. In other cases, you may encounter several errors and other issues. To effectively migrate scripts and tools developed with Python for ArcGIS Desktop 10.x to ArcGIS Pro, you must be aware of several changes. There are three types of changes to consider. First, ArcGIS Desktop 10.x uses Python 2, whereas ArcGIS Pro uses Python 3. Although much of the Python language remains the same between Python 2 and 3, there are several important differences to be aware of. For typical scripts used in GIS workflows, these differences often are limited to a few different types, which are reviewed in this chapter. There are several utilities to facilitate the Python elements associated with migrating scripts. Second, ArcGIS Pro is different from ArcGIS Desktop 10.x in several ways. The look and feel of the software interface are different, and many of the typical workflows are different, too. In addition, not all tools in Desktop 10.x are available in ArcGIS Pro, and the supported data formats have also changed. Third, changes have been made to the ArcPy package. Most notable is that the arcpy .mapping module in ArcGIS Desktop 10.x is replaced with the arcpy.mp module, but there are several other, more subtle changes as well. The following sections discuss each of these types of changes in more detail. 8.3 Changes between Python 2 and 3 Python as a programming language was conceived in the late 1980s. The first version was released in 1991. Since then, the language has evolved tremendously, although the structural changes have been gradual. The relative stability of Python is one of its strengths. Python 2 was released in 2000, and its most recent update from 2.7.17, was released in October 2019. Over time, a lot of functionality was added, but the many gradual changes resulted in several issues, including inconsistencies, ambiguities, and errors. Python 3 was created, in part, to address these issues and “clean up” the language. Python 3 was released in 2008, and the most recent version is Python 3.8, with 3.9 under development. Python 2 and 3 have coexisted for quite some time, but in recent years, the preference has shifted strongly toward Python 3. The consensus is that any new projects should be created using Python 3. One of the fundamental tenants of Python is backward compatibility. Few features are ever removed, which is one of the reasons for the issues in Python 2. Backward compatibility is generally a good thing because scripts written for an older version (e.g., 2.5) will continue to work when running a newer version of Python (e.g., 2.7). However, the nature of the changes necessary to fix some of the issues in Python 2 means that Python 3 is not compatible with Python 2. In other words, not all code written in Python 2 will run correctly in Python 3, and vice versa. Python 3 itself will be backward compatible, meaning, for example, that scripts written in Python 3.4 will run correctly in a newer version of Python 3 (e.g., 3.8). Nonetheless, some aspects of Python 3 have been backported to Python 2.6 and 2.7. In backporting, some functionality that is new or different in Python 3 also is added to those versions. This measure is not quite enough to ensure full backward compatibility, but with some careful planning, it is possible to write code that runs correctly in Python 2 and Python 3. The backporting of functionality also facilitates the migration of scripts from Python 2 to Python 3. Backporting applies to the Python language itself, but not to all its packages, including ArcPy. Maintenance of Python 2 ended on January 1, 2020. This date is considered the official “end of life,” or sunset date. There will be no further additions or improvements to Python 2.7 from the Python Software Foundation, and there will be no version 2.8. Python 2.7 will continue to exist, and code will continue to work, but no new functionality will be added. For some applications, Python 2.7 will remain the version for many years to come. The larger Python community, however, has moved toward Python 3, including the geospatial community. If you are relatively new to Python, you should focus your efforts on learning Python 3, and learn only about Python 2 as required when the need arises to migrate scripts from Python 2 to 3, or when certain projects require support for both versions. That said, the two versions are much more similar than they are different. The rest of this section reviews some of the key differences between Python 2 and 3 that are most relevant for writing scripts for ArcGIS Pro. Numerous other resources document the differences in much greater detail, including the official Python documentation. For example, search for “Porting Python 2 Code to Python 3” under the Python how-tos at https://docs.python.org/3/howto/. Printing The print statement in Python 2 is a built-in function in Python 3, as follows: Python 2: print "Hello World" Python 3: print("Hello World") Although the change consists of only a pair of parentheses, it is significant because of how often printing is used, especially when testing scripts. Because this code is so widely used in scripts, most IDEs have a built-in check. For example, running the Python 2 code in Python 3 in IDLE produces a syntax error message, as shown in the figure. The print() function is a good example of functionality that has been backported. In other words, the Python 3 code runs fine in Python 2.7. Unicode strings and encoding All strings (type str) in Python 3 are unicode strings, whereas Python 2 uses both ASCII (by default) and unicode strings (by adding a u in front of the string—e.g., u"text"). Unicode is more versatile than ASCII. So, there is less confusion over different types of strings in Python 3, especially when working with character sets in different languages. Python 3, by default, uses a character encoding system referred to as UTF8. To ensure your scripts use this encoding system, especially scripts that also may be used in older Python versions, you can add the following line at the top of your script: # coding: utf-8 This comment ensures that Python is aware of the encoding system of the script, regardless of the version being used to run the script. You will see this line of code at the top of scripts that are exported from the Python window in ArcGIS Pro. Integer types All integers in Python 3 are long, and their type is referred to as int. Python 2 used both short integers (type int) and long integers (type long). True integer division Integer division is somewhat confusing in Python 2. Consider the following example: Python 2: 7/2 # returns 3 Integer division returns an integer, and any fractional part is ignored. Integer division is also referred to as floor division. To obtain the fractional part, also referred to as “floating point” or true division, at least one of the inputs must be a floating point. Python 2: 7/2.0 # returns 3.5 In Python 3, all integer division returns a floating-point number (type float), regardless of whether the operands or the results are integers. Python 3: 7/2 # returns 3.5 Python 3: 8/2 # returns 4.0 Floor division can be accomplished in Python 3 using the floor division operator (//): Python 3: 7//2 # returns 3 Input The raw_input() function in Python 2, to obtain user input, is equivalent to the input() function in Python 3. Python 2: newvar = raw_input() Python 3: newvar = input() Opening files The file() function in Python 2 is replaced by the open() function in Python 3. Python 2: myfile = file("newtext.txt") Python 3: open("newtext") The open() function has been backported to Python 2.7. myfile = Working with ranges The range() function in Python 3 is the same as the xrange() function in Python 2. Python 2 includes the functions range() and xrange(). Python 3 includes only the range() function, which works the same as the xrange() function in Python 2. The range() function in Python 2 returns a list, whereas the range() function in Python 3 returns a range object instead of a list. For a large range of values, the use of the range object has the advantage that not all values are stored simultaneously. To obtain the values as a list, you can use the list() function in Python 3. Python 2: mylist = range(3) # my list is [0, 1, 2] Python 3: mylist = list(range(3)) # my list is [0, 1, 2] Iteration using next() The next() method in Python 2 is replaced with the next() function in Python 3. Python 2: row = rows.next() Python 3: row = next(rows) The Next() function returns the next item in an iterator. This change is significant because iterations are widely used for geoprocessing tasks, including the use of cursor objects. Working with dictionaries To iterate over the items in a dictionary, Python 2 uses both the iteritems() and items() methods, but the former is removed from Python 3. Python 2: mydictionary.iteritems()Python 3: mydictionary.items() Similarly, the methods iterkeys() and itervalues()also are removed, and Python 3 uses only keys() and values() instead. 8.4 Python 2to3 program The Python 2to3 program greatly facilitates the migration of scripts. This utility reads code in Python 2 and applies a series of fixers to transform the code to Python 3. The 2to3 program is installed as part of the standard library and can be run from command line. For example, consider the following script called myscript.py: def greet(name): print "Hello, {0}!".format(name) print "What's your name?" name = raw_input() greet(name) This code is a standard testing script from the Python documentation. To migrate this script to Python 3, run the following from command line: 2to3 myscript.py The implicit assumption is that the script is in the current directory. Otherwise, you must navigate to the correct directory or provide the full path. For example: 2to3 C:\Testing\myscript.py The result prints to the screen, as shown in the figure. The results are organized as a series of differences, referred to as diffs. A diff consists of all the lines that must be removed as indicated by a minus sign (-) and all the lines that must be added as indicated by a plus sign (+). As the result illustrates, only lines 1 (def greet(name):) and 5 (greet(name)) remain unchanged. The necessary changes include changing the print statement to the print() function and changing the raw_input function to the input() function. You also can have the changes written directly to the source file by adding the -w flag, as follows: 2to3 -w myscript.py The new script is as follows: def greet(name): print("Hello, {0}!".format(name)) print("What's your name?") name = input() greet(name) Note: When changes are written to the source file, a backup is made automatically unless this option is declined using the -n flag. The Python 2to3 program also is used in the Analyze Tools For Pro geoprocessing tool, which section 8.7 discusses. There are many other resources to assist with learning Python 3 and on migrating your code. These resources include several online books and tutorials, including the well-regarded online book Supporting Python 3: An In-depth Guide by Lennart Regebro (http://python3porting.com/). 8.5 Changes in ArcGIS Pro ArcGIS Pro was completely redesigned to modernize desktop GIS. The user interface and typical workflows are different from ArcGIS Desktop 10.x. Some of the changes that are most relevant to writing Python scripts, however, are not related to the user interface. Specific changes to the functionality of ArcPy are discussed in a later section of this chapter, whereas this section focuses on two other important changes: supported data formats and availability of geoprocessing tools. ArcGIS Pro supports many different data formats for both spatial and nonspatial data. These formats include local datasets as well as web services. Some of these formats are native to Esri (e.g., shapefiles and file geodatabase), whereas others are developed by different companies (e.g., Microsoft Excel) or are open-source efforts (e.g., geopackages by the Open Geospatial Consortium). Several data formats, however, are no longer supported. The most relevant are personal geodatabases (.mdb files). This format was widely used in ArcGIS Desktop 10.x, but its use has been discouraged in recent years, and you can no longer read data in this format in ArcGIS Pro. Similarly, coverages are no longer supported in ArcGIS Pro. This format dates to much older versions of Esri software (i.e., ArcInfo), and its use has been in steady decline. If some of your data is still in these formats, or if you encounter these formats when locating data from other sources, you will need to use ArcGIS Desktop 10.x software to process it. Several other types of datasets are read-only in ArcGIS Pro. These datasets include geometric networks, which are replaced by utility networks. Raster catalogs are also read-only and are replaced by raster mosaics. You can still read these types of datasets, which allows you to copy or convert them to their newer equivalents within ArcGIS Pro. Several geoprocessing tools from ArcGIS Desktop 10.x are no longer available in ArcGIS Pro. These tools include all the tools that were specifically designed to work with data formats that are no longer supported in ArcGIS Pro, including personal geodatabases and coverages. For example, the entire Coverages toolbox is removed for this reason. In addition, some tools are replaced with other tools that have similar functionality. For example, geometric networks are a read-only dataset in ArcGIS Pro because this functionality is replaced by utility networks. This change includes a new set of tools. As one example, the approximate equivalent of the Create Geometric Network tool in ArcGIS Desktop 10.x is the Create Utility Network tool in ArcGIS Pro. Finally, some tools from ArcGIS Desktop 10.x have not been implemented yet in ArcGIS Pro but are scheduled. You can expect many of these tools to be released in future releases of ArcGIS Pro, although the functionality may be somewhat modified. 8.6 ArcPy changes Much of ArcPy has remained the same between ArcGIS Desktop 10.x and ArcGIS Pro. Most of the modules, classes, and functions have not changed. Nevertheless, there are several significant changes to ArcPy. What follows is a detailed look at the most important changes. The most obvious, and perhaps most significant, change is that the arcpy.mapping module is replaced by the arcpy.mp module. Thus, any reference to arcpy.mapping will produce an error. Consider the following example script: import arcpy mxd = "C:/Project/Demo.mxd" mapdoc = arcpy.mapping.MapDocument(mxd) This script creates a MapDocument object on the basis of an existing .mxd file. Running the script using a Python environment for ArcGIS Pro produces the following error: AttributeError: module 'arcpy' has no attribute 'mapping' It is important to recognize that the version of ArcPy to be used cannot be set in the script because it is controlled by the environment. The default environment in ArcGIS Pro is arcgispro-py3, which includes the current version of ArcPy that works with ArcGIS Pro. The example script may run fine in IDLE or PyCharm using the Python environment for ArcGIS Desktop 10.x, but it will produce errors in the same editor using an ArcGIS Pro environment. The differences between arcpy.mapping and arcpy.mp are substantial, and simply doing a search and replace for the name of the module is not enough. First, there are changes to terminology. For example, projects and maps in ArcGIS Pro were called “map documents” and “data frames,” respectively, in ArcGIS Desktop 10.x. However, many other elements are identical, or at least very similar, in terms of terminology. For example, a layer is still a layer. Second, there are changes in general functionality, such as the ability to support multiple layouts in ArcGIS Pro, whereas only a single layout was supported in ArcGIS Desktop 10.x. Combined, these differences make it relatively cumbersome to update older scripts that employ the arcpy.mapping module. There are no utilities to facilitate the automated migration of mapping scripts. What is required to migrate a script is a good understanding of the arcpy.mp module, and then selectively identifying the (approximate) equivalents in the arcpy.mapping module as required by a script. Despite the many differences, however, the general logic for most scripts to carry out mapping-related tasks remains the same. Consider the following example of a script using the arcpy .mapping module. The script points to a map document and lists all the layers in a data frame called City. If the layer name is Parks, the visibility property is set to True, and the transparency is set to 50%. The script is as follows: import arcpy mxd = "C:/Project/Demo.mxd" mapdoc = arcpy.mapping.MapDocument(mxd) df = arcpy.mapping.ListDataFrames(mapdoc, "City")[0] lyrlist = arcpy.mapping.ListLayers(mapdoc, "", df): for lyr in lyrlist: if lyr.name = "Parks": lyr.visible = True lyr.transparency = 50 mapdoc.save() del mxd Now consider the same script using the arcpy.mp module: import arcpy aprx = "C:/Project/NewDemo.aprx" project = arcpy.mp.ArcGISProject(aprx) maps = project.listMaps("City")[0] lyrlist = maps.listLayers() for lyr in lyrlist: if lyr.name = "Parks": lyr.visible = True lyr.transparency = 50 aprx.save() del aprx Some of the most obvious differences are that ArcGIS Pro uses projects and maps instead of map documents and data frames. A more subtle but important difference is that the list functions are replaced with methods on the appropriate objects. The basic structure and logic of the script, however, remain the same. The logic consists of pointing to a project (map document), identifying a single map (data frame) by using its name, listing all the layers in the map (data frame) of interest, iterating over the list of layers, and updating the properties of a layer on the basis of its name. As a result, migrating scripts requires a lot of small changes, but the basic structure of the script mostly stays the same. There are several other changes to ArcPy beyond the arcpy.mp module. ArcGIS Pro has introduced four new modules: arcpy.ia, arcpy.metadata, arcpy.sharing, and arcpy .wmx. The new Image Analyst module arcpy.ia gives access to the functionality of the ArcGIS Image Analyst extension for the management and analysis of raster data. Some of these tools are duplicates of those found in the ArcGIS Spatial Analyst extension, but there are some unique tools for imagery, such as deep learning tools for feature detection and tools for motion imagery analysis. The new metadata module arcpy.metadata accesses an item’s metadata and exports it to a standard metadata format. The new sharing module arcpy.sharing makes it possible to create a sharing draft from a map in ArcGIS Pro as a web layer. This sharing draft can be converted to a service definition file. The module facilitates configuring web layers on the basis of maps created in ArcGIS Pro. The workflow manager module arcpy.wmx includes the geoprocessing tools of the Workflow Manager toolbox. These tools provide an integration framework for multiuser geodatabase environments. They help streamline and standardize business processes in large organizations. All these modules provide specialized functionality that typical users don’t need. There have been other, more subtle changes. The new arcpy.da.Describe() function provides several benefits over the existing arcpy.Describe() function (which continues to exist in ArcGIS Pro as well). More generally, several ArcPy functions are removed, and a few new ones are added. Table 8.1 lists all the changes to the general ArcPy functions but does not include functions of the ArcPy modules. Table 8.1 Changes in ArcPy functions ArcPy functions no longer available in ArcGIS Pro New ArcPy functions in ArcGIS Pro GetImageEXIFProperties ClearCredentials GetPackageInfo FromGeohash GetUTMFromLocation GetPortalDescription LoadSettings GetPortalInfo RefreshActiveView ImportCredentials RefreshCatalog SignInToPortal RefreshTOC SaveSettings Considering that there are more than 100 ArcPy functions, the changes are modest and mostly refer to specialized tasks. A total of eight ArcPy functions available in ArcGIS Desktop 10.x are no longer available in ArcGIS Pro. Three of these functions are related to refreshing the display of views in map documents open in ArcMap, which no longer applies. Several new functions are added, some of which are related to ArcGIS Online, reflecting the increased importance of web services in ArcGIS Pro. There have been fewer changes to the classes of ArcPy. Two classes are no longer available: Graph and Graph Template. One class is added: Chart. Although conceptually similar, the charting functionality of ArcGIS Pro is different from the graphs in ArcGIS Desktop 10.x. This change requires a new class with support for the different types of charts introduced in ArcGIS Pro. Finally, as discussed in the previous section, some geoprocessing tools are no longer available in ArcGIS Pro. The following toolboxes and all their tools are not available in ArcGIS Pro: Coverage (arcpy.arc), Schematics (arcpy.schematics), and Tracking Analyst (arcpy.ta). In addition, the Parcel Fabric toolbox (arcpy.fabric) is replaced by the Parcel toolbox (arcpy .parcel). A complete list is provided in the ArcGIS Pro help topic “Tools That Are Not Available in ArcGIS Pro” under Tool References > Appendices. Because of these changes, ArcPy is not backward compatible between ArcGIS Pro and Desktop 10.x. A script written for ArcGIS Desktop 10.x may not run correctly in ArcGIS Pro, and vice versa. General Python code can be written so that it runs correctly in both Python 2 and 3. However, if your code involves any of the changes in ArcPy discussed in this section, the script you wrote for one application will not run correctly in the other. On the other hand, if your script does not use the mapping module and does not use any of the handful of ArcPy functions and classes listed in this section, then with some careful attention to detail, it is possible to write a single script or tool that works correctly in both versions of Python. Certain projects may require support for both ArcGIS Desktop 10.x and ArcGIS Pro. Depending on the specific requirements of the script or tool, a project may require two different versions with slightly modified code. 8.7 Analyze Tools For Pro ArcGIS Pro includes a useful geoprocessing tool to assist with migrating scripts and tools, Analyze Tools For Pro. This tool uses the Python 2to3 utility to report potential issues in migrating a script or tool from Python 2 to 3, and it evaluates the code for any differences in the use of ArcPy between ArcGIS Desktop 10.x and ArcGIS Pro. Effectively, it provides a GUI (or “wrapper”) for the 2to3 utility, conveniently designed as a geoprocessing tool, as well as checks for the use of ArcPy. You also can use the 2to3 utility in command prompt, which provides functionality not available through the Analyze Tools For Pro tool. Using the command prompt includes the ability to apply a set of predefined fixes automatically to the code. In contrast, the Analyze Tools For Pro tool does not change any code but produces a report with suggested changes for review. In addition to using the Python 2to3 utility, the Analyze Tools For Pro tool looks for several other changes, including geoprocessing tools and data formats that are no longer supported in ArcGIS Pro. The Analyze Tools For Pro is also available as a geoprocessing tool in ArcGIS Desktop 10.x. Because Analyze Tools For Pro is a regular geoprocessing tool, it can be run from Python. The general syntax of the tool is AnalyzeToolsForPro_management(input, {report}) And for comparison, the Analyze Tools For Pro tool dialog box is shown in the figure. The only required parameter is the input, which can consist of a Python file (.py), a Python toolbox (.pyt), a toolbox (.tbx), or a tool name. Because a tool name is not an actual file, the toolbox that the tool is part of must first be loaded using arcpy.ImportToolbox. You can only import a toolbox when calling the tool from Python, not when using the tool dialog box. The most common way to use the Analyze Tools For Pro tool is to test an existing .py, .pyt, or .tbx file. When using a toolbox as input, what is being analyzed is not the tool dialog box(es), but the underlying Python script(s). When using a Python toolbox or regular toolbox, all tools and scripts are analyzed at the same time. The second parameter is an optional output text file that records the issues identified. When running the tool using the tool dialog box, the issues are also printed as part of the messages that result from executing the tool. Consider an example tool from the Python Scripting for ArcGIS book (Esri Press, 2013) that was written for ArcGIS Desktop 10.x. The updated version of this tool for use in ArcGIS Pro is explained in chapter 3. The toolbox is specified as the input, and the output is left blank, as shown in the figure. When the tool runs, the result is a warning, as shown in the figure. The View Details link brings up the specific issues identified in the script as part of the messages. Notice that the messages reference the Python file (i.e., random_percent.py), not the toolbox file. The issues are identified only for the script, not for the design of the tool dialog box or its validation. The formatting of the messages can make them cumbersome to read, but the specific issues are listed with their line numbers, as follows: Line 15: row = rows.next() -> row = next(rows) Line 19: row = rows.next() -> row = next(rows) As discussed in section 8.3, Python’s next() method is replaced with the next() function. This fix is relatively easy. Note: It is important to recognize what happens here. Because rows.next() is no longer supported, the script does not iterate correctly. The result is that the random_percent.py script produces an output feature class with only a single new feature, regardless of how many features should be created on the basis of the tool parameters. When the Random Features tool runs, no errors are reported, and the tool appears to be working correctly. However, it does not produce the correct output because the iteration does not work. Simply running older tools in ArcGIS Pro to see if they work therefore can be a bit misleading. The strength of the 2to3 utility is that it finds most of the issues associated with a script, including some that don’t produce errors at runtime. You can copy text directly from the messages. Alternatively, you can specify an output text file to obtain the same information, which makes for easier editing. For this same toolbox, the raw text file is shown in the figure. A few important points about running the Analyze Tools For Pro tool: When using the tool dialog box to run the tool, the output file is optional because the results are also printed to the geoprocessing history as part of the tool’s messages for viewing. When running the tool from Python, however, you must specify an output text file to view the results, or print the messages using print(arcpyGetMessages()) following tool execution. Although a toolbox file can be used as input, the tool examines only the associated script file(s). In other words, a tool dialog box may contain errors (e.g., have no parameters at all), but these issues are not identified. If the script cannot be found, the tool produces a Failed to execute error. Therefore, use the tool to test your code, not to test the robustness of your tool dialog box design. The tool does not make any changes to the underlying script file(s), and only reports issues for review. Next are some typical lines of code that present problems and how they are addressed by the Analyze Tools For Pro tool. Consider the following script that uses the mapping module: import arcpy mxd = "C:/Project/Demo.mxd" mapdoc = arcpy.mapping.MapDocument(mxd) This script produces the following error: Found REMOVED Python method mapping.MapDocument The tool correctly identifies an issue with the mapping module because it no longer exists (even though MapDocument was a class of the mapping module, not a method). Consider the following script that lists and then prints the feature classes in a workspace: import arcpy arcpy.env.workspace = "C:/Project" fcs = arcpy.ListFeatureClasses() for fc in fcs: print fc This script produces the following error: Line 5: print fc -> print(fc) This error is easy to address because the print statement is replaced by the print() function. Consider the following script that uses a personal geodatabase, which no longer is supported in ArcGIS Pro: import arcpy arcpy.env.workspace = "C:/Testing/Study.mdb" fcs = arcpy.ListFeatureClasses() The Analyze Tools For Pro tool identifies an issue with this script and produces the following warning: WARNING 001682: Found NOTYETIMPLEMENTED Personal GeoDatabase C:/Testing/Study.mdb within script C:\Scripts\test.py@2 If you ran the script, it would not produce any errors, but the list would be empty because the personal geodatabase is not supported as a format, and therefore it is not a valid workspace. This error does not prevent the script from running, but the result is not as intended. Finally, consider a script with a tool that no longer exists in ArcGIS Pro: import arcpy arcpy.env.workspace = "C:/Project" infc = "streets" outfc = "centerlines" width = 50 arcpy.CollapseDualLinesToCenterline_arc(infc, outfc, width) This script produces the following error: Found REMOVED tool CollapseDualLinesToCenterline_arc The Collapse Dual Lines To Centerline tool was designed to work with coverages, and this format is no longer supported. As a result, all the tools that work with coverages are removed, and the Analyze Tools For Pro tool correctly reports the missing tool. These examples illustrate that the Analyze Tools For Pro tool identifies many of the issues associated with migrating scripts. It provides a good starting point for migrating scripts, but it may not correctly identify all issues. Importantly, it does not identify any logical issues with your script, or with the design of your tool dialog box. When running scripts and models, there is a built-in option to check for compatibility with ArcGIS Pro. It is under Project > Options > Geoprocessing. By default, this option is unchecked. When checked, any geoprocessing tool or model that is run is checked for issues. However, the checks are limited to the use of unsupported geoprocessing tools or data formats. It does not run the Python 2to3 utility to check the contents of scripts. As a result, this option is much less informative. When migrating a specific script or tool, it therefore is recommended that you manually run the Analyze Tools For Pro tool for a more comprehensive set of checks. Points to remember Python scripts, script tools, and Python toolboxes developed for ArcGIS Desktop 10.x may not work correctly in ArcGIS Pro. Migrating scripts requires addressing changes in the Python language, changes in the ArcGIS software, and specific changes in ArcPy. ArcGIS Desktop 10.x uses Python 2, whereas ArcGIS Pro uses Python 3. Although much of the Python language remains the same between Python 2 and 3, there are several important differences to be aware of. Some of the most relevant differences include: (1) the print statement in Python 2 is replaced with the print() function in Python 3; (2) all strings in Python 3 are unicode strings; (3) all integers in Python 3 are long integers, and integer division returns a floatingpoint number; (4) the file() function is replaced by the open() function; and (5) the next() method is replaced by the next() function. The Python 2to3 program facilitates the Python-specific elements associated with migrating scripts by reading code in Python 2 and applying a series of fixers to transform the code to Python 3. The Analyze Tools For Pro geoprocessing tool in ArcGIS Pro provides a GUI to this utility. ArcGIS Pro is different from ArcGIS Desktop 10.x in terms of the look and feel of the software interface, which in turn impacts how certain elements are referred to in ArcPy. In addition, not all tools in Desktop 10.x are available in ArcGIS Pro, and the supported data formats also have changed. The most relevant of the formats no longer supported is the personal geodatabase (.mdb files). Many specific changes have been made to the ArcPy package. Most notable is that the arcpy.mapping module in ArcGIS Desktop 10.x is replaced by the arcpy.mp module, but there are several other, more subtle changes as well. ArcGIS Pro has introduced four new modules: arcpy.ia (for imagery processing and analysis), arcpy.metadata (for managing metadata content), arcpy.sharing (for creating a sharing draft from a map in ArcGIS Pro as a web layer), and arcpy.wmx (for workflow management). ArcGIS Pro has introduced the arcpy.da.Describe() function, which provides certain benefits over arcpy.Describe(). ArcGIS Pro also has introduced several new functions related to ArcGIS Online. In terms of classes, the Chart class replaces the Graph and Graph Template classes, reflecting the enhancements in charting functionality in ArcGIS Pro. Python 3 code is not backward compatible with Python 2, but with some careful planning, it is possible to write code that runs correctly in Python 2 and Python 3. However, ArcPy is not backward compatible between ArcGIS Pro and Desktop 10.x. Some scripts you write for ArcGIS Pro may work correctly in Desktop 10.x, but this compatibility may be challenging, if not impossible, to achieve because of the changes in ArcPy. Some scripts may require a different version for each application, depending on the nature of the workflow and the data formats and tools being used. The Analyze Tools For Pro geoprocessing tool in ArcGIS Desktop 10.x and ArcGIS Pro identifies many of the issues associated with migrating scripts. Importantly, it does not identify any logical issues with your script or with the design of your tool dialog box. Key terms backporting backward compatibility diff fixer floor division true division unicode string Review questions What are some of the specific reasons that not all code written in Python 3 is compatible with version 2? What is “backporting,” and how does it impact writing scripts for Python 2 and 3? What are some of the key differences between Python 2 and 3 that impact the migration of geoprocessing scripts for ArcGIS Pro? What are some of the most relevant changes in ArcPy between ArcGIS Desktop 10.x and ArcGIS Pro? What typical issues in scripts are identified by the Analyze Tools For Pro tool? Chapter 9 ArcGIS API for Python 9.1 Introduction Python and ArcPy make it possible to extend the functionality of ArcGIS Pro. ArcGIS Pro is a software application that runs on desktop computers and primarily is designed to work with local datasets. Increasingly, however, geospatial data and its applications reside on the web, referred to as web GIS. Web GIS is a type of distributed information system that allows you to store, manage, visualize, and analyze geographic data. Examples of web GIS that employ Esri technology are ArcGIS Online and ArcGIS Enterprise. Typically, web GIS includes datasets hosted in ArcGIS Online or ArcGIS Enterprise, as well as other online resources. You can work with these datasets in ArcGIS Pro by bringing hosted datasets into a map, and then overlaying local datasets. However, ArcPy has limited functionality to work directly with such web layers. This is where the ArcGIS API for Python comes in. Note: When the API was in the early stages of development, it was referred to as the ArcGIS Python API, but it was renamed to the ArcGIS API for Python upon release. The ArcGIS API for Python is a Python package for working directly with web GIS independent of ArcGIS Pro. It provides tools for tasks such as creating maps, geocoding, vector and raster analysis, and managing data. These tasks are comparable to the functionality in ArcPy but are specifically designed for web GIS. In addition, the ArcGIS API for Python provides tools to manage the organization of web GIS, such as managing users, groups, and items. These tasks have no equivalent in ArcPy. When writing scripts and creating tools with ArcPy, you use a Python editor such as IDLE, Spyder, or PyCharm. For example, you write a script in your Python editor and run it as a stand-alone script or as a script tool in ArcGIS Pro. Even though you are running Python, ArcGIS Pro provides the user interface to run the script tool and examine the results. The ArcGIS API for Python is designed to work with web GIS independent of ArcGIS Pro. Although a more traditional Python editor such as PyCharm might be adequate for certain tasks, it does not provide data visualization to the level of desktop software such as ArcGIS Pro. To work effectively with web GIS, you need an IDE that has built-in tools for visualization. This is where Jupyter Notebook comes in. Jupyter Notebook has its roots in IPython, which stands for “interactive Python” and provides useful features over the default Python interpreter. Several IDEs, including Spyder and PyCharm, employ IPython as their interpreter. In addition, Jupyter Notebook has tools for visualization and other interactive components, which makes it an interactive coding environment. This chapter describes the functionality of the ArcGIS API for Python and how to get started writing code using Jupyter Notebook. 9.2 What is the ArcGIS API for Python? The ArcGIS API for Python is a Python package for carrying out visualization, spatial data management, spatial analysis, and system administration of web GIS. It employs Python’s best practices in its design and in how it uses data structures. This package makes it easier for Python programmers to start using ArcGIS without necessarily becoming skilled in a desktop application such as ArcGIS Pro. The ArcGIS API for Python is implemented using the ArcGIS REST API powered by ArcGIS Online and ArcGIS Enterprise. This implementation means you typically are working with datasets that are hosted in your organization or that are available publicly. You also can use the ArcGIS API for Python to add new content, and you can combine local and online datasets for visualization or analysis. Like any typical Python package, the ArcGIS API for Python consists of modules, classes, and functions. The general organization of this functionality is somewhat comparable to ArcPy, but the specific modules, classes, and functions have different names and carry out their tasks slightly differently. This organization will become easier to understand with some examples later in this chapter. As the name implies, the ArcGIS API for Python is not only a Python package but also an application programming interface (API). An API is a collection of tools that allows two applications to interact with each other. Another real-world example of an API is the ArcGIS REST API, which consists of tools that allow applications to make requests of ArcGIS Online or ArcGIS Enterprise. Representational state transfer (REST) is a style that organizes a site in a way that allows users to read URLs. ArcGIS uses the REST architectural style to create sites that can be navigated similar to the way you navigate through computer folders. The ArcGIS API for Python interacts with the ArcGIS REST API. The ArcGIS API for Python can be considered a pythonic wrapper around the ArcGIS Rest API, and both APIs work together as the interface between Python code and the web GIS portal. The ArcGIS API for Python wraps the construction of ArcGIS REST API URLs in pythonic functions, so instead of constructing a URL and authenticating it manually against the server, you can call on prebuilt functions to carry out these tasks. In summary, the ArcGIS API for Python is both an API and a Python package. It includes tools that make it possible for a Python script to use the ArcGIS REST API, which in turn creates requests of ArcGIS Online or ArcGIS Enterprise services. 9.3 Installation of ArcGIS API for Python The ArcGIS API for Python is distributed as a Python package called arcgis. The arcgis package is installed as part of the arcgispro-py3 default environment of ArcGIS Pro, which makes it easy to get started using the API. Note: In older versions of ArcGIS Pro, you were required to install the arcgis package using either the Python Package Manager or conda using command prompt. Starting with ArcGIS Pro 2.1, the arcgis package is installed as part of the default environment. You can confirm the installed version of the arcgis package with the Python Package Manager. The version of the arcgis package that installs with ArcGIS Pro 2.5 is 1.7.0, but this version is likely to be updated frequently. It is important to realize that you don’t need to use the ArcGIS Pro application to work with the ArcGIS API for Python, but installation can be accomplished through ArcGIS Pro. You can install the ArcGIS API for Python in a stand-alone conda environment that is different from the environment used by ArcGIS Pro. The API is not open source, but it is a free library that you can install on a computer, with or without ArcGIS Pro. The API is platform agnostic, which means you can install it on Windows, Linux, or macOS operating systems. To take full advantage of the API, however, it is beneficial to have Esri user credentials. Without these credentials, your use will be limited to public data sources. Installation on a computer without ArcGIS Pro In some cases, you may want to install the ArcGIS API for Python on a computer that does not have ArcGIS Pro installed. This includes any computer running Linux or macOS, which are not supported by ArcGIS Pro. To install the API without ArcGIS Pro, you must have Python 3.5 or higher installed, including a package manager. The Anaconda distribution is recommended when working on a computer that does not have ArcGIS Pro installed. This distribution includes conda as a package manager, but you also can use PIP or Pipenv. When using conda, the ArcGIS API for Python can be installed by entering the following conda command in the Python command prompt: conda install -c esri arcgis This confirms that the arcgis package will be installed, and you will be prompted to enter y (yes) to proceed. Note that the -c flag stands for -channel. Channels are locations where conda looks for packages. This code ensures the package to be installed is obtained from the correct source. Esri maintains its own channel on the Anaconda Cloud to share its public packages with the broader user community. To install the API using PIP, enter the following command: pip install arcgis Finally, you can use Pipenv to install the API. Pipenv is used to install packages and manage environments, like conda. It combines the functionality of PIP (to install packages) and virtualenv (to manage environments). To install the API using Pipenv, enter the following command: pipenv install arcgis Experienced Python developers can use any of these three options to install the ArcGIS API for Python on a computer that does not have ArcGIS Pro installed, but those with less experience in managing environments and installing packages are encouraged to use the Anaconda distribution of Python and conda as the package manager. A few important notes are in order when installing and running the ArcGIS API for Python on a computer that does not have ArcGIS Pro: The Anaconda distribution of Python is strongly recommended because it includes many useful packages as well as utilities such as conda. The Anaconda distribution does not include the arcgis package. Even though the ArcGIS API for Python is free, it is not open source. As a result, you must always install the arcgis package as a separate step. The Anaconda distribution is available for Windows, macOS, and Linux. The recommended package manager is conda, which is part of the Anaconda distribution. Without ArcGIS Pro, however, you cannot use conda through the GUI specifically developed for ArcGIS Pro. Instead, you must use conda through the command prompt, or use the interface built into Anaconda called Navigator. It is not recommended to mix the use of conda and PIP/Pipenv. If you use the Anaconda distribution, conda is your best option. If you are using a different distribution, you can use PIP or Pipenv instead. The Anaconda distribution includes Jupyter Notebook and Spyder by default, so it is relatively easy to start writing code without a lot of additional configuration. PyCharm is available for Windows, macOS, and Linux but requires some additional configuration to use the correct environment. Finally, a typical installation of the arcgis package includes all the dependencies. These are documented in detail in the system requirements of the ArcGIS API for Python and include packages such as Pandas, NumPy, Matplotlib, Jupyter Notebook, and PyShp. If the current environment does not already include these packages, they will be installed automatically when the arcgis package is installed. If ArcPy is available in the current environment, it may be used for certain tasks; if ArcPy is not available, the PyShp package will be used instead. The flag --no-deps can be added to install the API with no dependencies, as follows: conda install -c esri arcgis --no-deps Any dependencies for a task can then be added manually. Consult the documentation of the ArcGIS API for Python to determine which dependencies apply to a feature of the API. 9.4 Basics of Jupyter Notebook Before describing the ArcGIS API for Python in more detail and looking at some code examples, you need a place to write your code. You can use the ArcGIS API for Python in any regular Python IDE, including IDLE, Spyder, or PyCharm. However, since most IDEs do not have strong built-in visualization capabilities, it makes sense to use a specialized IDE that has this functionality. Time to introduce Jupyter Notebook, which is the easiest IDE to use with the ArcGIS API for Python. Jupyter Notebook is an open-source web application to create documents that contain code, text, images, and other elements. The fact that it is a web application means that you create your documents in a web browser, such as Chrome or Firefox. Your web browser is, in effect, your IDE, but you are writing your code in a notebook, not in a regular HTML page. Jupyter Notebook supports many programming languages. The most important ones are Python, R, Julia, and Scala. In case you are wondering, the name Jupyter is derived from the first names of those three languages: Ju(lia) + Pyt(hon) + (e)R = Jupyter. Even though there is support for several languages, Python is required for Jupyter Notebook to run. A notebook is stored as a file with extension .ipynb. This extension reflects the fact that Jupyter Notebook has its origins in IPython. The .ipynb format is an opensource format based on JSON (JavaScript Object Notation). A single notebook document contains all the code, text, images, external links, and other elements created by the user. This characteristic of notebooks makes it relatively easy to share your work, because all you must do is send someone your .ipynb file. More details about working with notebooks are covered after you see how to create a notebook. 9.5 Creating and opening a notebook Because Jupyter Notebook is the recommended IDE to work with the ArcGIS API for Python, it is installed as part of the arcgispro-py3 default environment of ArcGIS Pro. In the Python Package Manager, you can view several packages related to the classic Jupyter Notebook and the nextgeneration interface JupyterLab. There are several ways to work with notebooks: (1) open a notebook in the ArcGIS Pro application; (2) open a notebook in stand-alone Jupyter Notebook or JupyterLab; or (3) open a notebook hosted by ArcGIS Enterprise, referred to as ArcGIS Notebooks. This section describes the first two of these approaches, whereas section 9.15 explains ArcGIS Notebooks. The easiest and most convenient approach is to open a notebook directly in ArcGIS Pro. You can create, edit, and run a notebook directly within the ArcGIS Pro application. Note: The ability to work with notebooks directly within ArcGIS Pro was introduced in version 2.5. When running a previous version of ArcGIS Pro, or when ArcGIS Pro is not installed on your computer, you can use the classic Jupyter Notebook approach discussed later in this section. You can create a new empty notebook using the Catalog pane in ArcGIS Pro. Right-click on the folder where you want to create a new notebook, and click Notebook. Enter the name of the notebook file, and press Enter. A new file with the .ipynb file extension is created in the folder. This format is recognized by ArcGIS Pro so you will see a new entry in the folder in the Catalog pane. You also can create a new notebook from the Insert tab by clicking New Notebook. This allows you to save a notebook file in a folder of your choice, and the new notebook is added under the Notebooks node in the Catalog pane. To open a notebook, double-click on the file in the Catalog pane, or rightclick the file, and click Open Notebook. The notebook opens in the main viewer window of ArcGIS Pro. Opening a notebook brings up the Notebook tab with buttons for creating a new notebook, saving edits to the current notebook, and interrupting the kernel. A kernel is a program that runs and reviews the code in the notebook. Jupyter Notebook has a kernel for Python, but there are kernels for other programming languages as well. When working with Python code in a notebook, you are not running lines of code as in an interactive interpreter or running a script file (.py), but instead you run code snippets. The kernel is like a program running in the background so that the code in the notebook can be executed. The kernel is specific to the environment, which in this case is the active environment of the ArcGIS Pro application. The Notebook view in ArcGIS Pro is like other views and can be moved and resized. However, the Notebook view does not interact with the rest of the ArcGIS Pro application. For example, you cannot drag datasets from the Catalog pane into the code of the notebook as you can with the Python window. Similarly, running code in the notebook does not generate entries in the History pane. The Notebook view includes several menus and tools, which will be discussed shortly. First, however, it is important to review how to run a notebook outside ArcGIS Pro. Although using a notebook directly inside ArcGIS Pro is convenient, ArcGIS Pro is not required to use Jupyter Notebook or the ArcGIS API for Python. The following steps show you how to run a notebook outside ArcGIS Pro. In Windows, search for the application called Python Command Prompt. This step brings up the command prompt window. Notice how the command prompt uses the arcgispro-py3 environment or a clone. This is important, because this environment ensures that both the arcgis package and Jupyter Notebook are available. Note: Although you can use the ArcGIS API for Python within the interactive Python shell of Pro, to experience the Jupyter Notebook environment, you must run a conda environment that has the correct packages installed. If you have installed ArcGIS Desktop 10.x, you will have another Python command prompt, typically called Python (command line). Command line cannot be used to work with the ArcGIS API for Python or Jupyter Notebook. Next, you must navigate to an existing folder on your hard drive where you want to store your notebook. If you have existing notebooks you want to use, navigate to the folder where those are located. If you are not familiar with command prompt, here are a few useful commands, in which you type the command, and press Enter to run it. The commands are as follows: To go to the root folder: cd\ To go down one folder: cd folder> <name of To go up one folder: cd.. To change drives: <drive To go to a specific folder: cd letter>: <drive letter>:\path\<name of folder> As you may have guessed, cd stands for “change directory.” In this example, it is assumed that the folder of interest is C:\Demo. Therefore, you first must run the cd\ command and then the cd demo command. Alternatively, you can run cd c:\demo in one step. In command line, drive letters and directory names are not case sensitive. In this example, it is assumed that the folder C:\Demo already exists. You also can create a new folder using the mkdir command, which stands for “make directory.” These commands bring you to the folder of interest, also referred to as the “workspace,” and now you can start Jupyter Notebook with the command jupyter notebook. This command results in several messages being printed, including a URL for your notebook. The URL starts with http://localhost:8888/ and is followed by a token. This token is necessary because it includes the information on the specific Python environment being used (i.e., arcgispropy3 or a clone) as well as the location of the notebooks (i.e., C:\Demo). A token looks something like the following: http://localhost:8888/? token=e2d3d028255a303a2df06cddcfc5fcd2114ee7af95b8b32c Also note that the current session of the command prompt is now labeled “jupyter notebook.” Although the command prompt window remains open, your default web browser application opens automatically. This URL is typically http://localhost:8888/tree, which means it is pointing to your desktop computer as the local server. The “tree” portion of the URL means that it is showing the folder and files inside your working folder—in this case, C:\Demo. The location of the working folder itself does not appear in the interface, at this point. If you have existing notebooks in this folder, they show up as a list. If you have not created any notebooks yet, the page shows the message, “The notebook list is empty.” Note: Jupyter Notebook opens in the default web browser on your computer. You can change it in your Windows operating system under Settings > Apps > Default apps > Web browser. It is important to keep the Python command prompt running in the background because it is running the kernel necessary to run the code in your notebook. As you carry out certain tasks, messages will be printed here. You can view them to understand what is going on, but generally you do not need to look at these messages. You can minimize the command prompt window with the kernel running, but if you close the window, it will end your Jupyter Notebook session. Now you are ready to create your notebook file. In the upper-right corner on the Files tab, click New > Python 3. A new tab opens in the browser window. Click on the Untitled tag next to the Jupyter logo, which brings up the Rename Notebook dialog box. You can enter a meaningful name. Spaces are allowed, and there is no need to specify a file extension. Click the Rename button to apply the change. The name appears at the top of the page. Changes to notebooks are saved automatically, as indicated by the (autosaved) tag next to the name. You also can click the Save button. The name of the web browser tab has changed, and the URL includes the name—for example, http://localhost:8888/notebooks/demo_notebook.ipynb. Click the browser tab labeled Home, and you will see a new entry for the notebook file that was just created. The file extension .ipynb is shown. If you close your notebook tab, you can open it again by double-clicking on the entry for the notebook on the Home tab. If you navigate using File Explorer to the folder of interest on your computer, you will see the new .ipynb file created. However, you cannot open the .ipynb file by double-clicking it because the notebook can be opened only from within an application that has the appropriate Python kernel running. You have now seen two different ways to work with notebooks: (1) directly from within ArcGIS Pro and (2) using the command prompt to launch Jupyter Notebook in a web browser. The latter is referred to as the “classic” Jupyter Notebook. Both approaches can be used to create a new notebook, edit an existing notebook, and run code in a notebook. Generally, the two approaches provide the same functionality, and they can be used interchangeably. For example, you can start working on a new notebook in ArcGIS Pro, and then open it later in a web browser, or vice versa. However, there are some differences to be aware of. First, as the previous explanation of steps shows, working with notebooks directly in ArcGIS Pro includes fewer steps and is more convenient. Second, both approaches require a conda environment that includes the necessary packages. When working directly within ArcGIS Pro, the environment being used is the active environment in the current session of the application. When launching Jupyter Notebook from the command prompt, the environment is set from the command prompt. Third, the menus and tools are similar, but not identical. When working with a notebook directly in ArcGIS Pro, some elements are controlled by the application and therefore are removed from the notebook interface. These elements include options to save the notebook and interrupt the kernel, which are part of the Notebook tab in ArcGIS Pro but are regular menu and tool options in the classic Jupyter Notebook. Finally, and perhaps most importantly, you can launch Jupyter Notebook using a web browser on a computer that does not have ArcGIS Pro installed, regardless of how the notebook was created. Despite some of these differences, once you have created a new notebook and are writing code, both approaches feel the same and provide the same functionality. Note: The remainder of this chapter employs the classic Jupyter Notebook approach, which means some functionality will be slightly different when using a notebook directly in ArcGIS Pro. The code, however, works the same regardless of the approach. Recall that the third way to work with notebooks is to host them using ArcGIS Enterprise. The interface of these hosted notebooks is nearly identical to the Notebook view in ArcGIS Pro. 9.6 Writing code in a notebook Now that you have created a new notebook, you can start writing code. Python code in a notebook is organized in cells. Cells are like small blocks of code. You enter code in a cell line by line, and then run the code for a cell. A cell can consist of only a single line of code, but it also can contain many lines. A single notebook typically contains more than one cell. You type your line of code into a cell. To run the code, press Ctrl + Enter on your keyboard, or click on the Run icon in front of the cell. The result is printed immediately below the cell. Note: You also can click the Run button in the top menu of the notebook. Clicking Run runs the current cell and shows the result, but it also adds a new empty cell below. Which option you use to run a cell is a matter of preference. This simple example illustrates a few key points about Jupyter Notebook. First, code is organized into cells, and code is run cell by cell. Second, the results are printed below the cell for immediate inspection. As later examples illustrate, results are not limited to printing text, but can include graphs, maps, and other visualizations. You can add multiple lines of code for a single cell by using the Enter key after each line of code. Recall that in a typical interactive window in a Python IDE, the Enter key results in the line of code being run. In a Jupyter Notebook, when you click Run or Ctrl + Enter, all the lines of code for a single cell are run. The following example shows a cell with four lines of code, and the resulting string is printed when the cell is run. Results are returned not only when printing messages. Consider the following example code, which counts the number of occurrences of a specific character in a string. When the cell is run, the result is shown as output. Therefore it is not necessary in a Jupyter Notebook to print messages using the print() function. As you may have noticed, Jupyter Notebook uses elements of the IPython interface, as illustrated by the use of the input prompt In[n], where n is a positive integer. This number starts at 1 and increases for additional cells. However, the number also increases every time you run the same cell again. For example, you can make a change to the code of a cell and run the cell again. The numbers for the input prompt and the output prompt are updated as a result. The numbers for the input and output have no influence on code execution. They simply keep track of the order in which the code was run. The numbers can be reset by restarting the kernel by clicking Kernel > Restart. As previously discussed, the kernel is the execution backend for Jupyter Notebook, which you can see running in the command prompt window. Examples in this chapter often start every new example with the number one, but starting at number one is not necessary for the code to run. New cells can be added by clicking on the “insert cell below” button (a plus sign) on the top menu of the notebook. When you click the Run button on the top menu, the current cell is executed, and a new cell is added below automatically. The same can be triggered when you execute a cell using the Shift + Enter keyboard shortcut, instead of Ctrl + Enter. You also can add a cell by pressing the b key on your keyboard, but it requires that your cursor is not inside a cell—otherwise, you are simply entering the character b. The following example illustrates the use of two cells, each with its own output. Even though code in a notebook is entered cell by cell, any previously used variables are stored in memory and can be used. In that sense, a notebook is like a regular Python script, but it is organized into blocks of code (called “cells”), and these can be run separately from each other. The organization of code into cells presents some issues. For example, if you updated a variable, the cell in which that variable is assigned a value must be run before that variable can be used in a different cell. Consider the following example in which the string in the first cell is modified (and a typo is introduced by changing “Notebook” to “Notebok”), but the cell is not run. Running the second cell in which the variable is used results in an incorrect answer because it is still using the earlier value of the variable. In the example code, the result prints True even though the string “book” does not appear in the string “Jupyter Notebok.” The solution is to first run the cell in which the change to the variable has been made, and then run the cell in which the variable is being used. This is a bit counterintuitive when you are used to running Python scripts because the entire script is run by default. An alternative to running each cell manually is to use the Run All option from the Cell menu. This option is useful when you have many cells in a notebook and made a change at the top. The Run All command runs all the cells in the sequence in which they are listed in the notebook and updates the results. There are many tools to manage cells in a notebook. You can select a cell and use one of the tools from the toolbar, including inserting a cell (plus sign), deleting a cell (scissors), copying a cell, pasting a cell, moving a cell up (Up Arrow), and moving a cell down (Down Arrow). There are several additional options under the Edit menu, including merging cells and splitting cells. Navigating around the cells in a notebook, you may have noticed how they change color between green and blue. When a cell is green, your cursor is inside the cell, and you can type your code or text. When a cell is blue, the cell is selected, but your cursor is not inside the cell to write code. This color-coding of cells is particularly helpful when using keyboard shortcuts because they may not work when your cursor is inside a cell. Working with cells in a notebook takes a bit of getting used to if you have been writing regular Python scripts in a more traditional IDE. However, there are many advantages to organizing your code in cells. You can fix errors and run the code again without having to use multiple windows. Consider the following example, which uses the print statement from Python 2 instead of the print() function as shown in the figure. You can update the line of code and run the cell again. There is no need to check the print results in a different window (typical when running a standalone script) or to copy and paste the line of code to a new line (typical when using an interactive interpreter). There also is no need to run all the cells in the notebook, because you can make the change, and then run only the cell(s) of interest. 9.7 Using Markdown in Jupyter Notebook One of the advantages of working with Jupyter Notebook is that you can add elements other than code to your notebook. These elements include formatted text, URLs, graphics, mathematical notations, HTML pages, and various types of multimedia. These elements can be added using a special type of cell called a Markdown cell. Markdown is a lightweight markup language that is widely used in the data science community. It is a text-toHTML conversion tool that allows you to write in plain text, and then apply some formatting to HTML so the results can be viewed in a web browser. Markup language is like HTML, using the same opening and closing tags— i.e., <tagname> </tagname>—in addition to special formatting symbols. When you add a new cell to a notebook, by default the cell type is set to Code, but you can change it by selecting the cell (i.e., the color of the cell is blue or green) and doing one of the following: (1) changing the type of cell from Code to Markdown using the drop-down options on the toolbar; (2) on the top menu, clicking Cell > Cell Type > Markdown; or (3) using the m keyboard shortcut. Once the cell type is changed to Markdown, you can enter text and apply formatting. Formatting includes the use of headings, block quotes, numbered or bulleted lists, and italic or bold text. In addition to text, you can add the following types of elements: Line breaks and horizontal lines Python code used for illustration instead of execution URLs and other external links Mathematical symbols and equations Tables Images, videos, and animations Headings are created using the hash mark symbol, a number sign, followed by a space. Additional hash mark symbols can be added for other heading levels. Regular text does not use any special symbols. When you enter the headings, some formatting is immediately visible, as shown in the figure. The final rendering of the formatting, however, is applied only when you run the cell, as shown. To return to editing the contents of the Markdown cell, double-click on the cell. The same formatting can also be accomplished using Markup tags. For example, instead of using a single hash mark symbol for heading 1, you can use the <h1>tag—i.e., <h1>Heading</h1>. When you use these tags, no formatting is immediately visible, but the tags appear as a different color. Again, the rendering is applied when you run the cell, as shown in the figure. Note: When using special symbols (such as # and several others), a space must follow for the symbol to be recognized. Without the space, the special symbol would be considered part of regular text. When using tags (such as <h1>), no space is used. Bold text uses a double asterisk (**), double underscore (__), or the <b> tag. Italic text uses a single asterisk (*), a single underscore (_), or the <i> tag. Again, you can render the text by running the cells, as shown in the figure. A bulleted list can be created by using dashes (-), plus signs (+), or asterisks (*), followed by a space. You can create nested lists by using tabs or two spaces. The rendered result is shown in the figure. A numbered list can be created by starting with the number 1, followed by a dot and a space, as shown in the figure. The rendered result is shown in the figure. Regular text formatting can be a bit tricky. For example, simply pressing Enter following a line of text does not produce a line break. Using Enter twice, however, starts a new paragraph. You can use two spaces or the <br> tag for a line break. The <br> tag is preferred because the two spaces are not clearly visible. The rendered result is shown in the figure. A block quote is accomplished using the right-arrow bracket (>), also referred to as the “greater than” symbol, followed by an optional space. The symbol must be used at the start of every line of the block quote. The rendered result is shown in the figure. Horizontal lines can be added using either three hyphens (---), three asterisks (***), or three underscores (___). The rendered result is shown in the figure. An external link in Markdown uses a set of square brackets for the link text, followed by the URL in parentheses, as shown in the figure. The rendered result is shown in the figure. Local images can be inserted by clicking Edit > Insert Image in the top menu and browsing to a locally stored file. After the image is inserted, the file name is preceded by an exclamation point (!). The rendered result is shown in the figure. You can insert external images using the same syntax used for external links such as a URL, except prefixed with an exclamation point. The rendered result is shown in the figure. You also can insert images using the <img> tag. Additional properties can be specified, such as width and height. The rendered result is shown in the figure. Sometimes you may want to include example code for illustration, but the code itself should not be executed. Example code can be included using Markdown by enclosing a block of code in three back ticks (```), which supports multiple lines. You can add syntax highlighting by adding the programming language after the opening back ticks. For inline code, you can use a single back tick or the <code> tag. The rendered result is shown in the figure. Note that using code in Markdown is just a way to illustrate a concept or explain something about Python code. The code does not actually run, and there is no syntax checking. Mathematical symbols and expressions can be created by surrounding text with a dollar sign ($) on each side, as follows: $mathematical symbol or expression$. The syntax for mathematical symbols and expressions uses LaTeX, which is a typesetting language for producing technical and scientific documentation. Jupyter Notebook recognizes LaTeX code written in Markdown cells and renders the symbols using the MathJax JavaScript library. The ability to use LaTeX inside Jupyter Notebook is one of the reasons that notebooks have become popular in fields such as mathematics and physics. For example, the following expression creates a simple inline formula, as shown in the figure. The rendered result is shown in the figure. Mathematical expressions on their own line are surrounded on either side by a double dollar sign ($$), as shown in the figure. The rendered result is shown in the figure. Because Markdown uses many special symbols for formatting, what do you do when you need those characters? You can use a backslash to generate literal characters. For example, the following code creates the asterisk and dollar symbols, as shown in the figure. The rendered result is shown in the figure. The examples shown cover only some of the key elements of using Markdown cells. There are many online resources that cover the use of Markdown for Jupyter Notebook in greater detail. Markdown provides a great way to enhance your notebooks because you can add explanations to your code, provide background information, and include visualizations, in addition to the Python code itself. There are several different “flavors” of Markdown, which vary slightly in terms of syntax and functionality. For example, the GitHub platform employs a dialect of Markdown called GitHub Flavored Markdown (GFM), which is different from the “standard” Markdown. Jupyter largely follows GFM but with minor differences. The following images illustrate one of the sample notebooks published on the help pages of the ArcGIS API for Python called “Chennai Floods 2015 —A Geographic Analysis.” The notebook starts off with some background information on the flood event that took place in 2015, followed by various visualizations and analysis of the rainfall and flooding. The example illustrates the use of Markdown cells and code cells in a single notebook. One of the benefits of using Jupyter Notebook is that you can provide the code to run a workflow and the documentation side by side in a single document. Users can read the documentation, written using Markdown, while interacting with the code at the same time. 9.8 Starting the ArcGIS API for Python Now that you have seen how to start Jupyter Notebook, it is time to start using the ArcGIS API for Python. First, you must import the arcgis package, as shown in the figure. Note that for the remaining examples, the input prompt (In[ ]) and output prompt (Out[ ]) of the interface are omitted from the figures. You can perform a quick check to confirm the current versions of Python and the ArcGIS API for Python, as shown in the figure. The ArcGIS API for Python includes several modules, the most important being the gis module. The gis module allows you to manage the contents of your GIS, as well as users and their roles. The main class of the gis module is GIS. A GIS object represents the GIS you are working with through ArcGIS Online or through an instance of ArcGIS Enterprise. This object becomes your entry point for using the ArcGIS API for Python. To get started, import the GIS class as shown in the figure. When the GIS class is imported using from-import, there is no need to use import arcgis first. Next, a GIS object is created, as shown in the figure. The GIS class has several optional parameters, including a URL, a user name, and a password. The URL can be a web address to ArcGIS Online or a local portal in the form: https://gis.example.com/portal. If these parameters are left blank, you are using an anonymous login to ArcGIS Online. Example code for providing user credentials is shown in the figure. To create a Jupyter Notebook and work with the ArcGIS API for Python, you do not need to provide user credentials, but your use will be limited to public datasets. To work with datasets hosted within an organization, you must authenticate with your user credentials. When working with the ArcGIS Online portal, some functionality of the API uses credits, which is another reason why authentication may be necessary. The complete syntax for the GIS class of the arcgis.gis module can be found in the online API reference and is as follows: class arcgis.gis.GIS(url=None, username=None, password=None, key_file=None, cert_file=None, verify_cert=True, set_active=True, client_id=None, profile=None, **kwargs) The notation of the syntax in the documentation of the ArcGIS API for Python is slightly different from the notation used in the documentation of ArcPy. Recall that optional parameters for classes and functions of ArcPy are enclosed in braces { }, whereas in the preceding example, any optional parameters are given a default value or initialized to None. Any required parameters are listed without a default, but none of the parameters of the GIS class are required. Also notice that the syntax in the documentation starts with the keyword class, but it is not used in actual code. Note: The **kwargs parameter at the end stands for “keyworded,” or named, arguments, which are separate from the preceding explicitly named parameters. The use of **kwargs makes it possible to pass one or more additional parameters. In the case of the GIS class, these arguments consist of proxy server and token settings. Alternative ways to authenticate your user credentials are described in detail in the help pages under the topic “Working with Different Authentication Schemes.” One useful alternative is to connect using the active portal in the ArcGIS Pro application, known as the pro authentication scheme, as follows: from arcgis.gis import GIS mygis = GIS("pro") This authentication works only when ArcGIS Pro is installed locally and running concurrently. The credentials in ArcGIS Pro are used for authentication without specifying those credentials in the code. The next step is to create a basic map display to visualize your spatial data. The GIS object includes a map widget for this purpose. A widget is like a miniapplication that runs inside a notebook. Jupyter Notebook includes several widgets for information display and user interaction, but the map widget is added as part of the arcgis package. The widget creates a map centered on the location provided. You also can provide a tuple of coordinates for latitude and longitude. When providing a general location description, an address, or a landmark, these locations are geocoded using a default geocoder. If you do not specify a location, the map widget returns a map of the world. You can bring up a display by typing the name of the map widget. The display, in this case, uses the default ArcGIS Online basemap. The zoom level is based on the nature of the geocoded location. One of the key benefits of using Jupyter Notebook is that you can display a map of your GIS directly within the notebook, and the results update interactively with your code. For example, you can change the basemap using the basemap property of the map widget. In the preceding example, several lines of code are merged into a single cell. Using one cell is not required but allows you to make several changes at once. It also makes sure the entire block of code is run instead of running several cells individually. Although Jupyter Notebook is relatively intuitive to use, it is common to experience issues in the beginning to get it up and running. One common issue when using a map widget is getting no response at all. The code appears to have run, but no map display comes up, and there is no error message. This scenario typically means there is an issue with your browser. You can change the default browser on your operating system and restart Jupyter Notebook, or you can copy the URL from the command prompt window into a different browser. Just copying http://localhost:8888/tree is not enough—you must copy the token as well. A complete URL looks something like the following: http://localhost:8888/? token=e2d3d028255a303a2df06cddcfc5fcd2114ee7af95b8b32c Other common issues are difficulties with authenticating user credentials because there are several different authentication schemes. If you have ArcGIS Pro installed locally, the pro authentication is a convenient workaround if you keep ArcGIS Pro running concurrently. 9.9 Adding content Now that you have a Jupyter Notebook up and running and can use the arcgis package to bring up a simple map display, it is time to add new content. The example uses a new notebook and starts with a map for a different study area, as shown in the figure. You can use the content property of the GIS object to search for content. The content property returns an instance of the ContentManager class. The search() method of this class locates items of interest. So far, the line of code is search_result = mygis.content.search() The search() method has arguments for the query (as a string), the type of item, the maximum number of items, and several others. In the following example, the search is for feature layers related to NYC taxi data and returns a maximum of five items. The result is a list of Item objects. An anonymous login was used, so the result consists of items in ArcGIS Online that meet the search criteria, and which are publicly available. The items published in ArcGIS Online are dynamic, and contents can change quickly. Therefore, the same search may produce different results a few months later. Items can be many different things, including web maps, feature services, image services, and so on. Each item has a unique identifier and a wellknown URL. By restricting the search to a specific item type, only those types are returned as part of the list. Because the items are returned as a Python list, you can use an index to obtain a specific item. When you query a single item, a rich representation of the item is displayed with a thumbnail image, brief description, type, modification date, and a hyperlink to the item page on the portal. The item, in this case, is a feature layer collection, which can contain one or more feature layers. This specific feature layer collection includes four feature layers. When adding this item to a map widget, all these layers will be added. You can also choose a specific layer to work with by using an index, as shown in the figure. The result is the URL of a single feature layer. Note that the second line of code, referencing the feature layer variable, is not necessary and is used in this example only to confirm the results. The layer can now be added to the map display using the add_layer() method, as shown in the figure. You can customize the symbology of the layer that you add to the map widget using the functionality of the arcgis.mapping module. A detailed description of this functionality can be found in the documentation of the ArcGIS API for Python. 9.10 Creating new content In addition to working with data that is already available in your web GIS, you can create new content by publishing the data to your GIS. The following example, shown in the figure, uses a new notebook and starts with a map for a different study area, Vancouver, British Columbia, Canada. Because the task involves publishing a new item to ArcGIS Online, an anonymous login is not enough. For the code example to work, you must use ArcGIS Online credentials with publisher permissions or employ the pro authentication scheme. The URL, user name, and password in the second line of code must be replaced with your credentials. Note: For the next example, you must be logged into ArcGIS Online or a portal to publish items and because geocoding the results requires credits. The data to be added in this example resides in a CSV file on the local computer. The data consists of the locations of more than 100,000 street trees in the City of Vancouver. The purpose of the notebook is to read this dataset using Pandas, and then to publish this data as a new item to ArcGIS Online. After importing Pandas, a DataFrame object is created by reading the local CSV file. To facilitate the process of publishing the data, a random sample of 100 records is created for illustration purposes. The sample() method is used to create this sample as a new DataFrame object. To examine the data inside the notebook, the first five rows are displayed. Description The dataset includes fields for unique ID, street address, latitude, and longitude, as well as descriptive details for each tree. Because there are too many columns to show, the display of the first five rows includes a sliding bar to scroll through the columns. This example also illustrates the versatility of Pandas when used in Jupyter Notebook. Not only is it easy to read datasets from local and online resources, a sample of the data can be displayed directly inside the notebook. This feature is different from other IDEs, in which the results are typically viewed in a separate window with less convenient formatting. To make the attributes more manageable, the most pertinent fields are selected and organized. The results are returned as a new DataFrame object, as follows. > Once the desired dataset is obtained in the form of a DataFrame object, the next step is to import the data as a feature collection. Importing can be accomplished using the import_data() method of the ContentManager class. An instance of this class called content is available as a property of the GIS object. The DataFrame object is the argument of the import_data() method. The second line of code here is not required. It is added to confirm that the result is a feature collection. Note: The import_data() method works with a Pandas DataFrame or a Spatially Enabled DataFrame (SEDF). SEDF is a class of the arcgis.features module to give the Pandas DataFrame spatial abilities. When using import_data(), there is a limit of 1,000 rows when importing a Pandas DataFrame as a feature collection Item. No such limit applies when using an SEDF. The arcgis .features module also includes functionality to convert a Pandas DataFrame to a SEDF. To publish the feature collection as an item in ArcGIS Online, the feature collection must be converted to a JSON object. To make this possible, the properties of the feature collection are converted to a Python dictionary, and this dictionary is used in the json.dumps() function to create the JSON object. The final step is to publish the JSON object as an item using the add() method of the ContentManager class. To make the published item more usable, several item properties are provided as a dictionary, including a title, a description, several tags, and a type. None of these items are required but they represent good practice. The only required key, in this case, is the text key to specify the JSON object. Description The final line of code in this cell brings up a snapshot of the published item. Once the item is published, it can be added to the map display as a layer. An alternative solution is to first publish the CSV file as an item in ArcGIS Online, and then publish it as a feature layer using the publish() method of the GIS object. This solution does not require the use of Pandas or converting the data to a feature collection using a JSON object. The complete code solution is shown as a single cell in the figure. Description This alternative solution publishes the entire CSV file, including all the records and all the fields. The use of Pandas provides more flexibility because it allows for data cleaning and filtering before publishing the data. In both cases, the result is a hosted feature layer, which can be added to the display. Many additional methods are available in the ContentManager class of the arcgis.gis module. In addition to adding items, there are methods for bulk updates, cloning, creating services, deleting items, and sharing. 9.11 Performing analysis Many other tasks can be performed using the ArcGIS API for Python, including geocoding, working with imagery, and performing network analysis. Many of the workflows that employ ArcGIS Pro and ArcPy on local data can be replicated for web GIS using the ArcGIS API for Python without having to perform these tasks manually using the interface of ArcGIS Online or Portal for ArcGIS. This section focuses on performing spatial analysis tasks, which are comparable to using geoprocessing tools in ArcGIS Pro. The ArcGIS API for Python includes several modules to carry out specialized analysis tasks. These modules include the arcgis.raster module for raster and imagery analysis, the arcgis.network module for network analysis, and the arcgis.geoanalytics module for the distributed analysis of large datasets. Some of the more “basic” analysis tools are part of the arcgis.features module to work with vector datasets. Note: The arcgis.geoprocessing module is primarily used to import web tools published to your web GIS as Python modules. It does not include the geoprocessing tools that are available in ArcGIS Pro as the name might suggest. The arcgis.features module allows you to perform spatial analysis tasks on feature layers. These tasks are organized in several submodules. This organization is similar to how geoprocessing tools in ArcGIS Pro are organized into toolboxes and toolsets. However, there is no direct correspondence between the functions in the ArcGIS API for Python and the geoprocessing tools in ArcGIS Pro. Recall that every geoprocessing tool in ArcGIS is a function in ArcPy, but this is not the case for the ArcGIS API for Python. On the other hand, many geoprocessing tools in ArcGIS Pro have a similar function in the ArcGIS API for Python, but they are organized into different modules, may have slightly different names, and their syntax is often somewhat different, too. These similarities and differences are illustrated here with an example on buffering. Buffering is one of the most widely used examples of spatial analysis and is used to illustrate the nature of spatial analysis functions in the ArcGIS API for Python. In ArcGIS Pro, the Buffer tool uses an input feature class to create an output polygon feature class on the basis of a buffer distance. The same procedure can be accomplished using the ArcGIS API for Python with the create_buffers() function of the arcgis.features.use_proximity submodule. The general syntax of this function is as follows: use_proximity.create_buffers(input_layer, distances=[], field=None, units='Meters', dissolve_type='None', ring_type='Disks', side_type='Full', end_type='Round', output_name=None, context=None, gis=None, estimate=False, future=False) Note that the syntax in the ArcGIS API for Python is different from the syntax employed in ArcPy. In the syntax example for create_buffers(), the input_layer argument is required because no default value is specified. On the other hand, the field parameter is optional because a default value is shown in the syntax. Sometimes the default value is None. Recall that the syntax notation for functions in ArcPy uses braces to indicate optional parameters, and no default values are shown in the syntax itself. These differences are mostly a result of choices made in the creation of the documentation and don’t reflect differences in what the functions do. If you carefully compare the syntax of the create_buffers() function with that of the buffer() function of ArcPy, you will notice several additional arguments for the create_buffers() function. These arguments include the gis argument to specify the GIS on which the function is run and the estimate parameter to return the number of credits to run the operation. The differences in function name, syntax notation, and several arguments aside, both functions accomplish the same task—i.e., create buffer polygons around input features. Note: For the next example, even though the result is created in memory and not published, you must be logged into ArcGIS Online or a portal because running the analysis requires credits. The example notebook uses the create_buffers() function to create polygon features around point features. The input is a feature layer hosted in ArcGIS Online. The code starts with creating a GIS object. A feature layer collection is obtained using the get() method of the ContentManager class. The argument of the get() method is a unique item ID. The same item can also be obtained by searching for “USA Airports” and filtering the results. Description This specific item consists of multiple layers, so an index is used to obtain the first layer representing only the major airports. The URL is printed for confirmation. Next, the use_proximity submodule is imported from the arcgis.features module. The create_buffers() function creates the buffer polygons. The function uses three arguments: the input layer, the distance value as a list, and the units. The output of the function is assigned to a variable, and the type of this variable is printed for confirmation. The result is an in-memory feature collection and is not published as an item. If you prefer to store the output as a feature layer, specify the output_name parameter of the function. The final step is to create a map display, and to add the airports and the buffer polygons. The map display is set to a single state, Florida, to show a close-up of the buffers instead of showing the full extent of the data. The use_proximity submodule includes several additional functions to carry out proximity analysis, including create_drive_time_areas() and find_nearest(). The arcgis .features module includes several other submodules for analysis: analyze_patterns, elevation, enrich_data, find_locations, hydrology, manage_data, and summarize_data. All the functions in these submodules are also organized under the analysis submodule for convenience. Identifying a specific function of interest is somewhat complicated by the fact that the function names don’t match exactly with the names of standard tools in ArcGIS Pro. In addition, the organization of functions into modules and submodules does not match the organization of geoprocessing tools in toolboxes and toolsets. The easiest way to view all the available functions is to scroll through the documentation of the ArcGIS API for Python hosted on GitHub. Although the number of analysis functions in the ArcGIS API for Python is substantial, there are many geoprocessing tools in ArcGIS Pro that do not have an equivalent function in the API. On the other hand, there are some functions in the ArcGIS API for Python that do not have an equivalent in ArcGIS Pro. For example, the arcgis.learn module includes several tools to support artificial intelligence (AI)–based deep learning tools, including the use of computer vision tools for object identification and pixel classification. The ArcGIS API for Python is relatively new, and it is anticipated that additional modules and functions will be added in future releases. 9.12 ArcPy versus ArcGIS API for Python The examples so far have illustrated how the ArcGIS API for Python allows you to automate tasks for web GIS, similar to how ArcPy automates workflows in ArcGIS Pro. Jupyter Notebook is a natural fit for writing code using the ArcGIS API for Python because of its ability to interact with both local and online resources, and to visualize tabular data, maps, graphs, and other elements without having to use a separate application for display purposes. On the other hand, Python scripts that employ ArcPy are often written in a more traditional IDE such as Spyder or PyCharm, and ArcGIS Pro is used to visualize the results or to obtain user input through a tool dialog box. It is important to recognize that both arcpy and arcgis are Python packages created by Esri, and both are installed as part of the default arcgispro-py3 environment. Both can be used in any IDE that is configured to use this environment (or another conda environment with the same packages). For example, you can import the arcgis package in the Python window in ArcGIS Pro or in IDLE, Spyder, or PyCharm. All the code in the earlier examples in this chapter run correctly in the Python window or a Python IDE, but visualization is different. Consider the example code that imports the arcgis package, creates a GIS object, and then creates a map display. When the map display is called, the result is a reference to the MapView object instead of a graphical display of the map. This result is less informative, but the code works, and the object reference confirms that the MapView object was created. Consider the earlier example of adding a CSV file as an item in ArcGIS Online and publishing it as a feature layer. When the code is stripped from the interactive map display elements, the code is as follows: from arcgis.gis import GIS mygis = GIS(URL, username, password) csv = "C:/Demo/trees.csv" data_prop = {"title": "Vancouver trees", "description": "CSV file of street trees in the City", "tags": "trees, csv"} trees_csv_item = mygis.content.add(item_properties = data_prop, data = csv) tree_feature_layer = trees_csv_item.publish() This script can be run using a regular Python IDE and carries out the same task with the same results. Being able to display intermediate steps and results inside a notebook is helpful, especially when troubleshooting code, but the code works regardless of whether it is run as a stand-alone script or inside a notebook. Similarly, you can use ArcPy in a Jupyter Notebook without using the ArcGIS API for Python. Consider the example of a geoprocessing script to run the Clip tool. ArcPy is imported, a local workspace is set, and the Clip() function is used to carry out the task. Running this code in a Jupyter Notebook produces the same result as running the script in another IDE. The last line of code is not typically part of a stand-alone script but is added to the notebook to confirm that an output file was created, similar to printing a message to the interactive window of an IDE. In summary, both the arcpy and arcgis packages can be used in any Python IDE that runs a conda environment with both packages installed, including the arcgispro-py3 default environment. A more traditional IDE is a more natural fit for scripts using ArcPy, whereas Jupyter Notebook is a good match for the functionality of the ArcGIS API for Python. Some tasks may require both packages in the same script or notebook, and then the choice for which IDE to use is largely a matter of preference. 9.13 Working with JupyterLab Jupyter Notebook is still relatively new but has quickly become popular because of its versatility and functionality. Being able to write code, inspect the results, and get rich output is appealing to the data science community, as well as to educators and application developers. Developments in this field take place rapidly. Project Jupyter is a nonprofit open-source project that started in 2014. This project developed Jupyter Notebook by building on the earlier IPython project. There are many aspects to Project Jupyter beyond the user interface, but the Jupyter Notebook interface is the most visible result. The next version of the interface is called JupyterLab. Development of JupyterLab started in 2017, and this new interface will eventually replace the “classic” Jupyter Notebook. Both versions of the interface support the same notebook document format. JupyterLab is a next-generation web-based user interface for working with documents, writing code, and developing workflows for interactive computing. The new interface maintains much of the functionality of the Jupyter Notebook but adds many other features found in a typical IDE. To start JupyterLab, enter the following command in the Python command prompt while running a conda environment. Note that there is a space in the command, even though the interface name does not have a space: jupyter lab This command launches a browser window, just like the classic Jupyter Notebook. The URL is typically http://localhost:8888/lab. The interface consists of a main work area, a collapsible left sidebar, and a menu bar at the top. The main work area is used to arrange documents and perform other activities. When the interface first opens, the left side bar shows the File Browser tab, which allows you to explore the files inside the current workspace. In the example in the figure, several example .ipynb files are shown, as well as a Python script and a CSV file. A geodatabase is recognized as a folder. The Launcher panel allows you to start a new activity—for example, a notebook or a Python script. You also can open an existing notebook by double-clicking on the file in the File Browser window. Once a notebook is open, many of the controls and the display are like the classic Jupyter Notebook. You can write and interact with your code in much the same manner. Description JupyterLab represents a modern IDE to work with Python code in a notebook format. You also can run stand-alone Python scripts. Although JupyterLab will eventually replace Jupyter Notebook, both interfaces will coexist for some time, and the experience of creating and using notebooks is similar between the two. Note: Although JupyterLab is launched by simply running jupyter lab, the map widget used in the latest version of the ArcGIS API for Python (1.7.0 at the time of writing) requires some additional configuration. These steps are included on the documentation page of the ArcGIS API for Python, under Guide > Get Started > Using the JupyterLab environment > Installation. The steps are not included here because they are likely to change in upcoming releases. 9.14 Documentation and help files Extensive online resources are available to assist with learning more about the ArcGIS API for Python. Resources include the official guide, the API reference, and the sample notebooks. The help pages are referred to as the “guide” and are located at the following URL: https://developers.arcgis.com/python/guide/. The guide follows the same organization as the help pages of other elements of the ArcGIS platform. It provides an overview of the API and a detailed look at the various modules. The guide includes explanations of the functionality of each module with detailed code examples. However, there is no complete inventory of all the modules, classes, and functions of the ArcGIS API for Python, and no syntax is provided. The complete documentation is referred to as the “API reference” and is hosted on GitHub at the following URL: https://developers.arcgis.com/python/api-reference/. Here you will find a complete listing of all the modules, classes, and functions, with their syntax. Description The organization of the documentation is like the style employed by other Python packages on GitHub, which is different from the style employed by typical ArcGIS help pages, including those for ArcPy. When working with the ArcGIS API for Python, you therefore will typically need to consult two sets of resources: the guide for general explanations and the API reference for the complete functionality and syntax. Typically, when you are just starting to use the ArcGIS API for Python, you likely will rely more on the guide for ideas about what is possible. Once you gain some experience, the API reference will become more important when you need to look up the syntax for specific classes and functions. Finally, there is a growing library of sample notebooks at the following URL: https://developers.arcgis.com/python/sample-notebooks You can preview the notebooks online or download them to use on a local computer. As with any Python code, you can reuse some of the code for your own scripts and notebooks. 9.15 ArcGIS Notebooks A recent addition to the functionality of ArcGIS is the use of ArcGIS Notebooks. ArcGIS Notebooks use the same approach as Jupyter Notebook, but the notebook files are hosted by ArcGIS Enterprise. ArcGIS Notebooks are hosted just like other items in a portal, such as maps, tools, and feature layers, and users can be assigned roles to create and edit notebooks. ArcGIS Notebooks are hosted in an ArcGIS Enterprise portal using ArcGIS Notebook Server. Hosting notebooks is implemented using Docker containers, which provide a virtualized operating system to run the notebook. All the resources necessary to run the notebook are made available without installing anything locally. For example, when using ArcGIS Notebooks, you do not need to have ArcGIS Pro or Python installed locally, but you can still use all the functionality of Python in a notebook. This includes both ArcPy and the ArcGIS API for Python. Note: Docker is a software company that has developed an industry standard to deliver software in packages called containers. Docker containers are widely used to distribute applications with complex dependencies to many users in an organization. Docker software is not created by Esri, but ArcGIS Notebook Server uses Docker software to create and provide a separate container for each user. The use of ArcGIS Notebooks takes away some of the cumbersome installation and configuration of software by individual users in an organization. An individual user does not need to set up a specific conda environment because it is already part of the Docker container. In addition to ArcPy and the ArcGIS API for Python, several hundred Python packages are available, which substantially overlap with those that are part of the Python distribution for ArcGIS Pro. Additional packages can be installed during a notebook session. Installation and configuration of ArcGIS Notebooks builds upon a base deployment of ArcGIS Enterprise. The detailed steps are on the help pages of ArcGIS Enterprise at the following URL: https://enterprise.arcgis.com/en/notebook/. Once ArcGIS Notebooks is up and running, creating and editing notebooks is similar to working with a notebook inside ArcGIS Pro or using a standalone Jupyter Notebook, and the Python code is identical between the various approaches. The ArcGIS Notebooks interface includes the same elements as the classic Jupyter Notebook. Existing notebooks can be added, notebooks can be shared, and hosted notebooks can be saved locally as .ipynb files. These features provide many new possibilities to share and collaborate on workflows using the notebook format. Points to remember The ArcGIS API for Python is a Python package to work with web GIS without using ArcGIS Pro. It provides tools for tasks such as creating maps, geocoding, vector and raster analysis, and managing data, which are comparable to the tools in ArcPy for desktop GIS but are specifically designed for web GIS. The ArcGIS API for Python is not only a Python package but an application programming interface, which includes tools for a Python script to use the ArcGIS REST API, which in turn creates requests of ArcGIS Enterprise services. The recommended IDE to use the ArcGIS API for Python is Jupyter Notebook, which is an open-source web application. Jupyter Notebook is a natural fit to write code using the ArcGIS API for Python because of its ability to interact with both local and online resources, and to visualize tabular data, maps, graphs, and other elements. Python code in a notebook is organized into cells, which can contain one or more lines of code. Results are printed directly below each cell. The actual Python code is identical to the code you would use in a different IDE. In addition to Python code, a notebook can include many other elements by using Markdown cells. You can add headings, formatted text, block quotes, example code, external links, images, and multimedia files, among others. Using Markdown greatly enhances the ability of notebooks to document and share workflows. You can start using the ArcGIS API for Python by importing the arcgis package, which is installed as part of the arcgispro-py3 environment. The arcgis package includes several modules, the most important being the gis module, which allows you to manage the contents of your GIS, as well as users and their roles. The main class of the gis module is the GIS class, and this object represents the GIS you are working with through ArcGIS Online or an instance of ArcGIS Enterprise. The GIS object includes a map widget, which allows you to visualize your GIS. You can add contents to a notebook by searching for items in ArcGIS Online. You can also create new content by publishing items. Many of the workflows that employ ArcGIS Pro and ArcPy on local data can be replicated for web GIS using the ArcGIS API for Python. The ArcGIS API for Python includes several modules to carry out specialized analysis tasks, including raster and imagery analysis, network analysis, and distributed analysis of large datasets. Some of the more “basic” analysis tools are part of the arcgis.features module. The Jupyter Notebook interface is well suited to employ other Python packages, including Pandas for data manipulation and analysis. The JupyterLab interface provides additional functionality and will eventually replace the classic Jupyter Notebook interface. The same notebook format is supported in both versions. ArcGIS Notebooks use the same approach as Jupyter Notebook, but the notebook files are hosted by ArcGIS Enterprise. ArcGIS Notebooks provide many new possibilities to share and collaborate on workflows using the notebook format within the ArcGIS platform. Key terms application programming interface (API) cell (in a notebook) container (Docker) IPython JupyterLab Jupyter Notebook kernel map widget Markdown notebook platform agnostic pythonic representational state transfer (REST) token web GIS Review questions What are some of the key similarities and differences between ArcPy and the ArcGIS API for Python? What are some of the features of Jupyter Notebook that make it well suited to write Python code using the ArcGIS API for Python? What types of content can you add to a notebook using Markdown? Describe the process of publishing a local dataset as an item in ArcGIS Online. When can you use an anonymous login to use the ArcGIS API for Python, and when do you need to provide user credentials? What are some of the advantages of writing Python code as a Jupyter Notebook instead of a stand-alone Python script or script tool? What are some of the limitations? Index A absolute paths, 57, 119, 120–21 Add Packages button, 154–55 Anaconda distribution, 146, 147, 228–29 Analyze Tools For Pro, 217–22 application programming interface (API), 226 ArcGIS API, 3, 4 basics of Jupyter Notebook and, 229 creating and opening a notebook for, 230–36 creating new content with, 256–59 documentation and help files for, 266–68 features of, 226 installation of, 227–29 introduction to, 225–26 JupyterLab and, 264–66 performing analysis in, 259–62 starting, for Python, 250–53 using Markdown in Jupyter Notebook and, 240–50 web GIS and, 14–17 writing code in notebook in, 237–40 versus ArcPy, 263–64 ArcGIS Desktop 10.x, 2, 4 migrating scripts from (See migration of scripts) ArcGIS Notebooks, 230–36, 268–69 ArcGIS Pro 2.5 code, 4 Analyze Tools For Pro in, 217–22 changes to, 213 geoprocessing packages in, 135–40 migrating scripts from (See migration of scripts) Python versions and, 2–3 sharing of tools in (See sharing of tools) using Python scripting in, 1–2 ArcPy, 14 changes to, 214–17 creating functions in, 22–28 JSON objects in, 189–90 messages in, 82–88 migrating scripts and (See migration of scripts) modules, functions, and classes in, 21–22 package tools, 42–46, 145–46 progressor functions in, 88–90 tool parameters in, 66–67, 71–72 zip files in, 170–71 versus ArcGIS API for Python, 263–64 arguments, 20 arrays, NumPy, 190–96 B backporting, 2 backward compatibility, 2, 208 Booleans, 101 built-in modules, 163 C Category property, 76 cells, 237 class definition, 37 classes creating, 36–42 working with packages and, 42–46 Clone Environments dialog box, 152–53 command line, 156–60 comma-separated value (CSV) files, 175–79, 256–57, 258–59 comments, 55 conda installation of ArcGIS API and, 228–29 managing environments using, 151–53 managing packages using, 154–55 used with command line, 156–60 content creation in ArcGIS API, 256–59 Create Points on Lines tool, 118 CSV files, 175–79, 256–57, 258–59 custom classes, 36 custom functions, Python creating, 22–28 introduction to, 19 overview of functions and modules and, 19–22 customized tool progress information, 88–90 D data analysis using Pandas, 196–200 data and workspaces in sharing tools, 122–25 Data Type property, 70, 75–76 data visualization using Matplotlib, 200–205 Default property, 76, 78 Delete Field tool, 77–78 dependencies, 148 Dependency property, 76, 77–78 deprecated modules, 163 derived parameter, 72 dictionaries in Python 2 versus 3, 211 digital elevation model (DEM), 6–7 Direction property, 73–74 directories, FTP, 165 distributions, Python, 146 Distributive Flow Lines tool, 118 documentation ArcGIS API, 266–68 tools, 127–30 Document Object Model (DOM), 173 Double data type, 75 Drop Field parameter, 77 E editors, Python, 4–5 embedded scripts, 125–27 Empirical Bayesian Kriging (EBK) tool, 11, 76 Environment property, 76, 78 environments, Python, 147–51 managed using conda, 151–53 Esri Support, 9 Excel To Table, 183 Extensible Markup Language (XML), 129–30, 171–73 extraction, ZIP file, 169 F Feature Class, 70–71, 76, 80–81 Feature Layer, 70, 73–74, 75, 80 Feature Type parameter, 75–76 Fences toolbox, 10–13 file transfer protocol (FTP) sites, 164–68 FileZilla, 165 Filter property, 74, 75, 102–6 Filter type, 75–76 FTP files, 164–68 functions called from other scripts, 28–32 creating, 22–28 overview of modules and, 19–22 working with packages and, 42–46 G geometry objects and classes, 40–41 GeoPandas, 200 geoprocessing modules and packages Analyze Tools For Pro, 217–22 comma-separated value (CSV) files in, 175–79 Excel files using openpyxl in, 177, 179–83 file transfer protocol (FTP) sites in, 164–68 introduction to, 163–64 JavaScript Object Notation (JSON) in, 183–90 Matplotlib for data visualization in, 200–205 NumPy arrays in, 190–96 Pandas for data analysis in, 196–200 web pages using urllib in, 173–75 XML files in, 171–73 ZIP files in, 168–71 Geoprocessing Options, 65–66, 94 geoprocessing packages, 135–40 Get Count tool, 72–73 H hard-coded values, 55 HTML files, 173 I IDLE, 5, 65, 225, 263 environment and, 160 .pyt files in, 95 Illuminated Contours tool, 6–8, 34–35, 124 implicit line continuation, 102 initializing of objects, 38–39 Input Features parameter, 71 input in Python 2 versus 3, 210 instances, 37 instantiating, 37 integer division in Python 2 versus 3, 209–10 integer types in Python 2 versus 3, 209 Intersect tool, 71 iteration using next() in Python 2 versus 3, 211 J JavaScript Object Notation (JSON) files, 183–90, 258 JupyterLab, 264–66 Jupyter Notebook, 3, 226, 263, 264 basics of, 229 creating and opening a notebook in, 230–36 JupyterLab and, 264–66 using Markdown in, 240–50 writing code in, 238–39 L Label property, 69, 73–74 libraries, 145–46 Licensing issues in sharing of tools, 116 Long data type, 74, 75 lossless data compression, 168 M Manage Environments dialog box, 151–53 Markdown in Jupyter Notebook, 240–50 MATLAB, 200 Matplotlib, 200–205 messages, 81–88 handled for stand-alone scripts and tools, 88 metadata, 127–29 methods, 36–39, 41–42 Microsoft Excel, 177, 179–83 Microsoft Office, 177 migration of scripts Analyze Tools For Pro and, 217–22 changes between Python 2 and 3 and, 208–11 changes in ArcGIS Pro and, 213 changes in ArcPy and, 214–17 introduction to, 207 overview of, 207 Python 2to3 program for, 211–13 ModelBuilder, 50, 73 modules, 145–46 geoprocessing (See geoprocessing modules and packages) organizing code into, 32–36 overview of, 20–22 Multiple Ring Buffer tool, 61–68 N Name property, 73–74 nodes, XML, 171 notebooks ArcGIS Notebooks, 230–36, 268–69 creating and opening, 230–36 using Markdown in, 240–50 writing code in, 237–40 Notepad, 177, 187 NumPy, 43–46, 49–50 arrays in, 190–96 O object-oriented programming (OOP), 36 opening files in Python 2 versus 3, 210 openpyxl, 179–83 Outside Polygons Only parameter, 63 P package managers, 146 Python, 147, 148–51 packages creating geoprocessing, 135–40 geoprocessing (See geoprocessing modules and packages) introduction to, 145 managed using conda, 154–55 managing environments using conda in, 151–53 modules, packages, and libraries of, 145–46 Python distributions and, 146 Python environments and, 147–51 Python package manager, 147 using conda with command line in, 156–60 working with, 42–46 Pandas, 196–200, 256 Pandas DataFrame, 196–98 Pandas Series, 198–99 Parameter Data type, 71 parameters, tool, 60–69 defining tools and, 101–8 editing tool code to receive, 78–81 setting of, 69–78 password-protected tools, 125–27 paths, 57, 119–22 PIP, 147 platform agnostic, API as, 227 printing in Python 2 versus 3, 209 pristine state of environment, 151 progressors, 88–90 properties, 36 PyCharm, 5, 66, 225, 263 environment and, 160 .pyt files in, 95–97 Python ArcGIS and versions of, 2–3 ArcGIS API for (See ArcGIS API) creating custom functions in (See custom functions, Python) distributions of, 146 environments with, 147–51 example scripts, tools, and notebooks in, 5–17 geoprocessing in (See geoprocessing modules and packages) migrating scripts from Python 2 to 3 in (See migration of scripts) overview of functions and modules in, 19–22 package manager in, 147 as scripting language, 1–2 toolboxes in (See toolboxes, Python) working with editors in, 4–5 working with packages of, 42–46 Python 2 to 3 differences. See migration of scripts Python 2to3 program, 211–13 pythonic wrapper, 226 Python Package Index (PyPI), 147, 163 Python Package Manager, 147, 148–51 Manage Environments dialog box, 151–53 managing packages using conda, 154–55 PYTHONPATH, 29–31 R Random Sample tool, 9–10, 69–70, 72, 74–75, 98 editing code to receive parameters, 78–81 messages and, 82–88 setting parameters in, 103–6 source code and, 109–13 Range Filter tool, 74 ranges in Python 2 versus 3, 210 relative paths, 57, 119–21 representational state transfer (REST), 226 root folder, 117 S scalar values, 72 scratch GDB, 123–24 scratch workspace, 123 scripts, 1–2 calling functions from separate, 28–32 embedded, 125–27 examples of, 5–17 migration of (See migration of scripts) organizing code into modules, 32–36 stand-alone, 50 script tools comparing Python toolboxes and, 113 custom behavior of, 81 customizing tool progress information in, 88–90 edited to receive parameters, 78–81 exploring tool parameters for, 60–69 handling messages for stand-alone, 88 introduction to, 49 setting tool parameters for, 69–78 steps to creating, 51–60 versus Python toolboxes, 49–50 why create your own, 50 working with messages, 81–88 sharing of tools choosing a method for, 115–16 creating geoprocessing package for, 135–40 creating web tool for, 140–42 documenting tools and, 127–30 embedding scripts, password-protecting tools in, 125–27 finding data and workspaces for, 122–25 handling licensing issues in, 116 introduction to, 115 Terrain Tools example of, 130–35 using a standard folder structure for, 116–19 working with paths in, 119–22 sinuosity index, 26–27 slice, data, 10–11 SmartFTP, 165 source code, 109–13 spreadsheets, Excel, 177, 179–83 Spyder, 5, 66, 225, 263 environment and, 160 .pyt files in, 95 Stack Exchange, 9 stand-alone scripts, 50 handling messages for, 88 standard folder structure for sharing tools, 116–19 standard libraries, 146, 163 Style Guide for Python Code, 4 Symbology property, 76, 78 T Table To Table, 178 Terrain Tools, 5–8, 118, 124, 130–35 third-party libraries, 145, 146 3D Fences toolbox, 10–13 toolboxes, Python, 49–50 comparing script tools and, 113 creating and editing, 93–101 defining tools and tool parameters for, 101–8 introduction to, 93 working with source code in, 109–13 tool dialog box, 50 tool parameters, 60–69 setting of, 69–78 tools, sharing of. See sharing of tools tree structure, XML files, 171–72 Type parameter, 73–74 U unicode strings and encoding in Python 2 versus 3, 209 uniform resource locators (URLs), 173–75 Union tool, 71 urllib, 173–75 V Validation tab, 81 virtual environments, 148 W web pages using uniform resource locators (URLs), 173–75 web tools, 140–42 X XML files, 129–30, 171–73 XY Table To Point, 178 Z ZIP files for geoprocessing, 168–71 for sharing of tools, 116 Transcription