Uploaded by Claudiu Nicusor

Paul A. Zandbergen - Advanced Python Scripting for ArcGIS Pro-Esri Press (2020)

advertisement
Esri Press, 380 New York Street, Redlands, California 92373-8100
Copyright © 2020 Esri
All rights reserved.
Printed in the United States of America
24 23 22 21 20 1 2 3 4 5 6 7 8 9 10
e-ISBN: 9781589486195
The Library of Congress has cataloged the print edition as follows:
Library of Congress Control Number: 2020936496
The information contained in this document is the exclusive property of Esri unless otherwise noted.
This work is protected under United States copyright law and the copyright laws of the given
countries of origin and applicable international laws, treaties, and/or conventions. No part of this
work may be reproduced or transmitted in any form or by any means, electronic or mechanical,
including photocopying or recording, or by any information storage or retrieval system, except as
expressly permitted in writing by Esri. All requests should be sent to Attention: Contracts and Legal
Services Manager, Esri, 380 New York Street, Redlands, California 92373-8100, USA.
The information contained in this document is subject to change without notice.
US Government Restricted/Limited Rights: Any software, documentation, and/or data delivered
hereunder is subject to the terms of the License Agreement. The commercial license rights in the
License Agreement strictly govern Licensee’s use, reproduction, or disclosure of the software, data,
and documentation. In no event shall the US Government acquire greater than
RESTRICTED/LIMITED RIGHTS. At a minimum, use, duplication, or disclosure by the US
Government is subject to restrictions as set forth in FAR §52.227-14 Alternates I, II, and III (DEC
2007); FAR §52.227-19(b) (DEC 2007) and/or FAR §12.211/12.212 (Commercial Technical
Data/Computer Software); and DFARS §252.227-7015 (DEC 2011) (Technical Data – Commercial
Items) and/or DFARS §227.7202 (Commercial Computer Software and Commercial Computer
Software Documentation), as applicable. Contractor/Manufacturer is Esri, 380 New York Street,
Redlands, CA 92373-8100, USA.
@esri.com, 3D Analyst, ACORN, Address Coder, ADF, AML, ArcAtlas, ArcCAD, ArcCatalog,
ArcCOGO, ArcData, ArcDoc, ArcEdit, ArcEditor, ArcEurope, ArcExplorer, ArcExpress, ArcGIS,
arcgis.com, ArcGlobe, ArcGrid, ArcIMS, ARC/INFO, ArcInfo, ArcInfo Librarian, ArcLessons,
ArcLocation, ArcLogistics, ArcMap, ArcNetwork, ArcNews, ArcObjects, ArcOpen, ArcPad, ArcPlot,
ArcPress, ArcPy, ArcReader, ArcScan, ArcScene, ArcSchool, ArcScripts, ArcSDE, ArcSdl,
ArcSketch, ArcStorm, ArcSurvey, ArcTIN, ArcToolbox, ArcTools, ArcUSA, ArcUser, ArcView,
ArcVoyager, ArcWatch, ArcWeb, ArcWorld, ArcXML, Atlas GIS, AtlasWare, Avenue, BAO,
Business Analyst, Business Analyst Online, BusinessMAP, CityEngine, CommunityInfo, Database
Integrator, DBI Kit, EDN, Esri, Esri CityEngine, esri.com, Esri — Team GIS, Esri — The GIS
Company, Esri — The GIS People, Esri — The GIS Software Leader, FormEdit, GeoCollector,
Geographic Design System, Geography Matters, Geography Network, geographynetwork.com,
Geoloqi, Geotrigger, GIS by Esri, gis.com, GISData Server, GIS Day, gisday.com, GIS for Everyone,
JTX, MapIt, Maplex, MapObjects, MapStudio, ModelBuilder, MOLE, MPS —Atlas, PLTS, Rent-aTech, SDE, See What Others Can’t, SML, Sourcebook·America, SpatiaLABS, Spatial Database
Engine, StreetMap, Tapestry, the ARC/INFO logo, the ArcGIS Explorer logo, the ArcGIS logo, the
ArcPad logo, the Esri globe logo, the Esri Press logo, The Geographic Advantage, The Geographic
Approach, the GIS Day logo, the MapIt logo, The World’s Leading Desktop GIS, Water Writes, and
Your Personal Geographic Information System are trademarks, service marks, or registered marks of
Esri in the United States, the European Community, or certain other jurisdictions. Other companies
and products or services mentioned herein may be trademarks, service marks, or registered marks of
their respective mark owners.
173946
Contents
Preface
vii
Acknowledgments
xi
Chapter 1
Introducing advanced Python scripting
1
Chapter 2
Creating Python functions and classes
19
Chapter 3
Creating Python script tools
Chapter 4
Python toolboxes
Chapter 5
Sharing tools
Chapter 6
Managing Python packages and
environments
49
93
115
145
Chapter 7
Essential Python modules and packages for
geoprocessing
163
Chapter 8
Migrating scripts from Python 2 to 3
Chapter 9
ArcGIS API for Python
Index
273
225
207
Preface
Programming has become an increasingly important aspect of the skillset of
GIS professionals in many fields. Most GIS jobs require at least some
experience in programming, and Python is often at the top of the list.
Python scripting allows you to automate tasks in ArcGIS® Pro that would
be cumbersome using the regular menu-driven interface. Python Scripting
for ArcGIS Pro, also published by Esri Press (2020), covers the
fundamentals of learning Python to write scripts but does not get into the
more advanced skills to develop tools to be shared with others. This is
where the current book, Advanced Python Scripting for ArcGIS Pro, comes
in. If you are looking to take your GIS programming skills to the next level,
this book is for you.
Before getting further into the contents of the book, a bit of history is in
order. In 2013, Esri Press published Python Scripting for ArcGIS. I wrote
the book to serve as an easy-to-understand introduction to Python for
creating scripts for ArcGIS Desktop using Python 2. The book quickly
became popular among students and professionals, but several years later
the book was no longer current.
ArcGIS Pro was released in 2015 and further established Python as the
preferred scripting language within the ArcGIS platform. ArcGIS Pro uses
Python version 3, which is significantly different from version 2. As the
industry started to shift from ArcGIS Desktop to ArcGIS Pro, interest grew
in an updated version of the book. Both the changes in ArcGIS and the
differences in Python versions necessitated a completely new book—not
just a second edition of the existing book with minor code updates. That
new book is Python Scripting for ArcGIS Pro.
In addition, the interest in using Python in the geospatial community
continues to grow. This has led to an increasing interest in developing
Python tools to share with others, using third-party packages created by the
open-source geospatial community, and applying Python to new areas such
as web GIS. The current book, Advanced Python Scripting for ArcGIS Pro,
covers these topics while at the same time teaching best practices in Python
coding.
The current book is written for ArcGIS Pro 2.5, which uses Python 3.6.9.
As new functionality is added to future releases of ArcGIS Pro, the code in
this book will continue to work for the foreseeable future. However, much
of the code will not work in ArcGIS Desktop 10.x, although sometimes
only minor changes are needed. One chapter in this book is specifically
dedicated to explaining the differences between the versions of Python and
ArcGIS and how to migrate existing scripts and tools from ArcGIS Desktop
10.x to ArcGIS Pro. In addition, some of the code in this book uses the
ArcGIS API for Python 1.7.0, which installs with ArcGIS Pro 2.5.
This book is designed to enhance the skills of those who already have a
good foundation in Python to write scripts for ArcGIS. The book covers
how to take those scripts and develop them into tools and notebooks to
share with others, as well as several other more advanced tasks. A good
familiarity with ArcGIS Pro is assumed, including managing data, creating
cartographic output, and running tools. You should be familiar with the
basic concepts of GIS, including coordinate systems, data formats, table
operations, and basic spatial analysis methods. You also need a good
foundation in Python and the use of ArcPy for basic tasks, including all the
topics covered in Python Scripting for ArcGIS Pro.
The primary audience for this book is experienced ArcGIS Pro users who
have already been using Python for some time to write scripts to automate
their workflows. If you are already familiar with writing scripts in Python
for ArcGIS Desktop 10.x, you may still want to consider the Python
Scripting for ArcGIS Pro book, which contains several chapters on topics
that have changed significantly between working with Python in ArcGIS
Desktop 10.x and ArcGIS Pro. This includes setting up your Python editor,
working with rasters, and map scripting.
This book also is intended for upper-division undergraduate and graduate
courses in GIS. Many colleges and universities teach courses in GIS
programming, which has become one of the core skills in GIS degrees and
specializations. Students who are just starting out with learning Python
should first use the Python Scripting for ArcGIS book. By the end of that
book, students should be able to write Python scripts to automate tasks for
ArcGIS Pro. The topics in the current book follow logically on the topics in
Python Scripting for ArcGIS. Therefore, Advanced Python Scripting for
ArcGIS Pro could be used as a second textbook in a first course on GIS
programming, or as the main textbook for the second course in a two-course
sequence.
This book contains nine chapters. Following the introductory chapter,
chapter 2 covers creating Python functions and classes, which is an
essential part of developing more organized and reusable code. The next
three chapters cover the development of Python script tools and Python
toolboxes, which make it easier to share the functionality of Python scripts
with others. The next two chapters cover how to manage Python packages
and environments and illustrate how to work with some of the most widely
used third-party packages. The next chapter shows how to migrate scripts
and tools from ArcGIS Desktop 10.x to ArcGIS Pro, which includes
migrating from Python 2 to 3. And the final chapter covers the ArcGIS API
for Python, which expands the use of Python scripting to web GIS using
Jupyter Notebook.
This book does not cover more introductory topics, including Python
fundamentals, setting up a Python editor, using ArcPy to write scripts to
work with spatial and tabular data, working with geometries, raster analysis,
and map scripting. Those topics are covered in Python Scripting for ArcGIS
Pro and are not repeated in this book.
The chapters in this book are accompanied by exercises that reinforce the
concepts covered in the chapters. These exercises and data are located in the
Learn organization’s ArcGIS Online group Advanced Python Scripting for
ArcGIS Pro (Esri Press), at https://go.esri.com/PythonAdvData. For general
book information, go to https://go.esri.com/PythonProAdv. You should
first read each chapter and then complete the accompanying exercise before
moving on to the next chapter. Depending on your learning style and
familiarity with coding, you can try out some of the code in the chapters as
you read them, but you also can read the entire chapter first, and then start
the exercise. To complete the exercises, you must have ArcGIS Pro 2.5 or
later installed on your computer.
This book will teach you how to develop tools and notebooks in Python for
ArcGIS Pro. My hope is that the book will contribute to increasing your
confidence in writing more advanced scripts and to develop those scripts
into tools and notebooks to share with others. I look forward to learning
about your contributions to the Python and GIS community. I sincerely
hope this book will allow you to experience the versatility and power of
Python coding.
Paul A. Zandbergen
Vancouver, BC, Canada
Acknowledgments
A book of this scope materializes only with the support of many
individuals.
First, I would like to recognize the numerous students in my courses over
the years at several institutions. You learn something best by teaching it to
others, and I’m fortunate to have worked with many aspiring GIS
professionals interested in learning Python. Much of what I know about
what needs to go in a book like this, I have learned from them.
The contributions of the staff at Esri Press cannot be underestimated. Their
ongoing feedback throughout the writing and editing of the manuscript has
been invaluable. Other Esri staff members also have left their mark on the
book, especially David Wynne and Atma Mani. Their insider perspectives
have made the book more accurate and more complete.
Since the publication of the first book, Python Scripting for ArcGIS, I have
received a lot of feedback from numerous students, instructors, GIS
professionals, and anonymous reviewers. I’ve done my best to incorporate
all that I’ve learned from them into this new book.
I also would like to thank my parents, who always encouraged me to seek a
career path that would allow me to fulfill my curiosity about the world
while at the same time trying to make it a better place.
Most importantly, this book would not be possible without the continued
support of my family. Marcia, Daniel, and Sofia, thank you for believing in
me and allowing me to pursue my passions.
Paul A. Zandbergen
Vancouver, BC, Canada
Chapter 1
Introducing advanced Python
scripting
1.1
Introduction
Python has become one of the most widely used programming languages,
and this increase includes geospatial applications. Python is employed for
many different tasks, from automating data processing using desktop
software, to web scraping for downloading structured data, to developing
machine-learning algorithms for classifying imagery hosted in the cloud.
Python is a versatile, open-source programming language supported on
different platforms. These features contribute to its growing popularity in
the geospatial community. Python is also the preferred scripting language
for working with ArcGIS Pro.
This book represents the logical follow-up to Python Scripting for ArcGIS
Pro, also published by Esri Press (2020), which introduces the
fundamentals of Python and teaches you how to write basic scripts to
automate workflows. Advanced Python Scripting for ArcGIS Pro picks up
where Python Scripting for ArcGIS Pro left off by focusing on more
advanced scripting techniques and the development of tools and notebooks
to be shared with others. This book also includes working with third-party
packages and the ArcGIS API for Python, which opens new and exciting
possibilities to use Python for geospatial applications.
This book is written for ArcGIS Pro and Python 3. The topics covered in
this book require substantial previous experience in writing Python scripts
for ArcGIS. The fundamentals of Python and ArcPy, including setting up a
Python editor and writing basic scripts for data processing using ArcPy, are
covered in Python Scripting for ArcGIS Pro.
1.2
Python scripting in ArcGIS Pro using ArcPy
ArcGIS Pro provides support for the use of Python as a scripting language,
including the ArcPy package installed as part of ArcGIS Pro. ArcPy
provides access to all the tools available in ArcGIS Pro, including those that
are part of ArcGIS Pro extensions. This feature makes Python scripting an
attractive and efficient method for automating tasks in ArcGIS Pro.
Python scripting has become a fundamental tool for GIS professionals to
extend the functionality of ArcGIS Pro and automate workflows. Python is
the scripting language of choice to work with ArcGIS Pro and is included in
every ArcGIS Pro installation. Python is also directly embedded in many
tools in ArcGIS Pro. For example, Python is one of the standard expression
types for field calculations. As another example, several geoprocessing
tools in ArcGIS Pro consist of Python scripts, even though the casual user
does not necessarily notice it (or need to).
One of the goals for using the current book is to learn how to develop new
geoprocessing tools that expand the functionality of ArcGIS Pro but that
look and feel like regular tools that are part of the software. This is
accomplished using Python script tools and Python toolboxes. A secondary
goal is to become familiar with the ArcGIS API for Python to expand the
use of Python to working with web GIS. This is accomplished using
notebooks.
1.3
Python versions and ArcGIS
Compared with other programming languages, Python has gone through a
limited number of versions, reflecting a philosophy of incremental change
and backward compatibility. Python 3 was released in 2008 as a major
overhaul, with the primary goal to clean up the code base and remove
redundancy. The most recent version, at the time of writing, is 3.8, with 3.9
under development.
Some of the changes in Python 3 are fundamental, which result in breaking
with the backward compatibility philosophy of Python. As a result, not all
code written in Python 3 works in Python 2. Some of the new functionality
added in Python 3 was also added to Python 2, a process known as
backporting. With careful attention to detail, it is therefore possible to write
code that works in both versions.
The two versions of Python will continue to coexist for some time, but
officially Python 2 will no longer be maintained past 2020. This means that
any existing code will continue to work, but there will be no further
improvements to version 2.
ArcGIS Desktop 10.x uses Python 2 whereas ArcGIS Pro uses Python 3,
which has several implications. If you are going to write scripts for both
versions of ArcGIS or are planning to migrate scripts and tools from
ArcGIS Desktop 10.x to ArcGIS Pro, you must learn some of the
differences between the two versions of Python. Resources and utilities
exist to assist with this conversion, which are covered in chapter 8.
The purpose of this book is to focus on writing scripts and developing tools
for ArcGIS Pro using Python 3. Although Python code is not 100 percent
backward compatible between versions 3 and 2, it is, in principle, possible
to write Python code that works for both versions. However, because of
fundamental differences between ArcGIS Desktop 10.x and ArcGIS Pro,
many scripts and tools written for one version are unlikely to work in the
other. Nonetheless, sometimes the differences are small, and strategies to
identify and correct for these differences are covered in chapter 8.
Note: Many GIS users will continue to use both ArcGIS Desktop 10.x
and ArcGIS Pro for some time. At the time of writing, the most current
versions are ArcMap 10.7.1 and ArcGIS Pro 2.5. The installation of
ArcMap 10.7.1 includes the installation of Python 2.7.16, and the
installation of ArcGIS Pro 2.5 includes the installation of Python 3.6.9.
These two versions can run on the same computer. When working with
ArcGIS Pro 2.5, you should use only version 3.6.9.
1.4
ArcGIS API for Python and Jupyter Notebook
Python and ArcPy make it possible to extend the functionality of ArcGIS
Pro using scripts and tools. ArcGIS Pro is a software application that runs
on desktop computers and is primarily designed to work with local datasets.
Increasingly, however, geospatial data and their applications reside on the
web, referred to as web GIS. Web GIS is a type of distributed information
system that allows you to store, manage, visualize, and analyze geographic
data. ArcPy has limited functionality to work directly with web GIS. The
ArcGIS API for Python is a different Python package from Esri to work
directly with web GIS. This API complements the use of ArcPy for desktop
GIS.
Code that uses the ArcGIS API for Python is typically written in Jupyter
Notebook, an open-source web application that works like a Python editor
and provides built-in visualization capabilities. Notebooks can also be used
directly within ArcGIS Pro. Details on using the ArcGIS API for Python are
covered in chapter 9.
1.5
The structure of this book
Advanced Python Scripting for ArcGIS Pro consists of nine chapters that
focus on developing tools for ArcGIS Pro and writing more advanced
scripts. Sample code is provided throughout the text.
Chapter 1 introduces Python scripting for ArcGIS Pro and illustrates several
example scripts, tools, and notebooks that were developed using Python.
Chapter 2 demonstrates how to create custom functions and classes in
Python. Custom functions and classes make it easier to organize more
complex code and use parts of your code in multiple scripts. Custom
functions and classes are widely used in script tools and Python toolboxes.
Chapter 3 explains how to create custom script tools, which make Python
scripts available as regular geoprocessing tools with a familiar tool dialog
box. Script tools are one of the preferred methods for sharing Python scripts
with other users and make it easier to add a Python script as a tool to a
larger sequence of operations.
Chapter 4 covers how to create Python toolboxes as an alternative to Python
script tools. In a Python toolbox, the tool dialog box is written in Python
itself, which is often more robust.
Chapter 5 outlines strategies for sharing tools with others, including how to
organize your files, work with paths, and provide documentation for tools.
Chapter 6 covers the use of managing packages using conda. Packages
allow you to add functionality to Python, and conda is a convenient way to
install and manage these packages as well as Python environments, which
control which packages are available.
Chapter 7 describes the use of selected built-in modules and third-party
packages other than ArcPy, which can greatly enhance the functionality of
your scripts. The modules and packages include ftplib, urllib, openpyxl,
json,
NumPy, Pandas, and Matplotlib.
Chapter 8 explains the key steps in migrating scripts and tools from ArcGIS
Desktop 10.x to ArcGIS Pro, including the use of several utilities to
facilitate this process.
Chapter 9 introduces ArcGIS API for Python, which makes it possible to
use Python to work with web GIS. This chapter also introduces Jupyter
Notebook as the preferred way to write and document Python code using
this API. The resulting notebooks can also be shared with others.
1.6
A note about code in this book
Most of the code in this book is written for ArcGIS Pro 2.5, which uses
Python 3.6.9. Most of the code will work in earlier versions of ArcGIS Pro,
except for the most recently added functionality. As new functionality is
added to future releases of ArcGIS Pro, the code in this book will continue
to work for the foreseeable future. However, much of the code will not
work in ArcGIS Desktop 10.x. Some of the code in this book also uses the
ArcGIS API for Python version 1.7.0. This is the version that is installed
with ArcGIS Pro 2.5, but the ArcGIS API for Python can also be installed
separately. If installed separately, Python 3.5 is required to use the ArcGIS
API for Python.
Note: The update cycle of the ArcGIS API for Python does not follow
the same schedule as ArcGIS Pro. For example, at the time of writing,
version 1.8.0 of the ArcGIS API for Python has been released, whereas
ArcGIS Pro 2.5 installs with version 1.7.0. This version will be updated
with future releases of the ArcGIS Pro software. The differences in these
versions are typically small.
The code in this book employs the coding conventions of the official Style
Guide for Python Code, also referred to as PEP 8. The complete style guide
can be found at http://www.python.org/dev/peps/pep-0008/. Although not
required, following coding guidelines improves the consistency and
readability of your code.
1.7
Working with Python editors
Writing scripts and developing tools requires a Python editor. You are
expected to be already familiar with using a Python editor and configuring
it to use the correct environment. Details on working with Python editors
are covered in Python Scripting for ArcGIS Pro.
The code is this book is not specific to one Python editor. IDLE is installed
by default with every Python installation, and therefore most code
illustrations in this book use IDLE as the Python editor of choice. Other
recommended editors include PyCharm and Spyder, and some code
illustrations use these editors as well. You are free to use the Python editor
of your choice. Regardless of which editor is used for code illustrations, the
Python code is the same for any Python editor. To use a Python editor with
the code in this book, however, it must be configured to work with the
default environment arcgispro-py3 or a cloned environment. Chapter 6
provides details on using conda to manage environments, but the
configuration of Python editors is covered in Python Scripting for ArcGIS
Pro.
You can also use the Python window in ArcGIS Pro to write and test Python
code. However, the Python window is most suited to running short snippets
of code for testing purposes. The more complicated and longer scripts
developed in this book require the use of a dedicated Python editor, such as
IDLE, Spyder, or PyCharm.
1.8
Exploring example scripts, tools, and notebooks
This section uses several examples to illustrate how Python is used to create
scripts, tools, and notebooks. The examples were obtained from Esri and
the ArcGIS user community. One of the reasons for presenting these
examples is for you to become more familiar with looking at Python code
and tools developed by others. One of the best ways to learn how to write
code and develop tools is to work with existing examples. You are not
expected to fully understand all the code at this point, but the examples will
give you a flavor of what is to come in this book.
Example 1: Terrain Tools
The Terrain Tools were developed by Esri and extend what is available in
ArcGIS Pro by providing capabilities for creating alternative terrain
representations. These representations include different types of hillshade
surfaces and contours, which can greatly enhance the cartographic display
of terrain data.
The tools are made available as a collection of tools in a toolbox. Each tool
consists of a tool dialog box and has a corresponding Python script.
Although these scripts are written in Python, the functionality of the script
can be accessed the same way as any other geoprocessing tools. The figure
illustrates what the toolbox looks like in ArcGIS Pro.
The “scroll” icon indicates that these tools are written in Python, also
referred to as Python script tools.
The tool dialog boxes look like those of regular geoprocessing tools in
ArcGIS Pro. As an example, consider the Illuminated Contours tool. The
tool provides an analytical version of the hand-drawn Tanaka method of
symbolizing contours that includes coloring and varying the thickness of
contour lines. Assuming a certain lighting direction, contours are drawn
lighter on parts of the terrain that are illuminated and darker on parts of the
terrain that are not illuminated.
The tool dialog box looks much like the regular Contour tool available in
ArcGIS Pro.
The Illuminated Contours tool has five parameters, two of which are
optional. The required parameters include the input raster, which is a digital
elevation model or DEM, as well as the contour interval to be used and the
output contour feature class. The optional parameters include the base
contour and z-factor to be used. The result of the tool is a new polyline
feature class, in which each contour is broken up into segments with new
attributes for the color (grayscale, from white to black) and the appropriate
thickness.
An example of the resulting illuminated contours is shown, with the
contours overlaid on top of the regular DEM shown in grayscale from dark
(low elevation) to light (high elevation). The assumed lighting direction is
from the northwest, as revealed in the different levels of illumination of the
contours.
The Illuminated Contours tool effectively carries out a series of steps,
which can be accomplished by running regular geoprocessing tools and
applying symbology. Some of these steps include creating contours from a
DEM, creating a default hillshade, converting hillshade brightness values,
reclassifying this grid of values into five-degree intervals, converting the
reclassified grid to polygons, intersecting the contour polylines with these
polygons, and assigning symbology on the basis of the new attributes of the
polylines. The purpose of the script tool is to automate these steps and
provide a user-friendly interface.
A single Python script is used in this tool, and the script can be opened to
get an inside look at what the tool does. When you open the script in a
Python editor, it looks like the figure.
When scrolling through the script, you will find the equivalent of the tasks
you would need to carry out in ArcGIS Pro using existing tools. For
example, making sure you have a license for the Spatial Analyst extension;
running geoprocessing tools such as Hillshade, Contour, Reclassify, and
Intersect; and applying symbology using a layer file. You could complete
these steps using existing tools and a few manual manipulations, but the
Python script tool includes all of them in a single easy-to-use tool dialog
box.
One of the nice things about working with Python script tools is that you
can view the underlying code. Not only can you learn from the code from
others, you can also copy it and make a modified version of it for your own
work.
Detailed documentation, all the source code, and example datasets to
experiment with these tools can be found in ArcGIS Online at
www.arcgis.com by searching for Terrain Tools Sample.
Example 2: Random Sample
The Random Sample tool was developed by the author and is discussed in
more detail in chapter 3. This tool creates a random sample based on an
input feature class and a user-defined number of features. The output is
saved as a new feature class. The tool is created as a Python script tool. The
tool dialog box is shown in the figure.
The tool provides functionality not available in ArcGIS Pro. Several online
resources such as Stack Exchange (http://stackexchange.com) and Esri
Support (http://support.esri.com) provide various code solutions to select
features at random from a feature class, but employing these solutions
requires substantial coding skills. By developing a Python script tool, the
script becomes more user-friendly. As a Python developer, you can write
this type of script, develop and test the Python script tool, and then make
the tool available to other users who can use the tool without having to
learn Python.
The Python script for this tool is shown in the figure.
The code for this tool is explained in detail in chapter 3, including the steps
to develop and test the tool dialog box. The strategy for sharing this type of
tool is covered in chapter 5. By the end of this book, you will be able to
develop tools like this Python script tool.
Example 3: 3D Fences toolbox
The 3D Fences toolbox was developed by Esri’s Applications Prototype
Lab. This toolbox makes it possible to create 3D fence diagrams on the
basis of point data with a z-dimension field and at least one value field. An
example application of this tool is to use sampling points of measurements
of oil in seawater after an oil spill. Not only does each sampling point have
an x,y coordinates, it also has a z-dimension (depth) and a measurement (oil
concentration).
The tool creates a vertical subset of the 3D data—i.e., a slice—and
transforms this subset onto a 2D plane. The value of interest, for example,
oil concentration, is interpolated using Empirical Bayesian Kriging (EBK).
The results are transformed as points into the original coordinate space as a
“fence.” This transformation allows for a closer examination of 3D data,
which is more difficult to do using the original point cloud of
measurements.
This tool is relatively sophisticated, but it is also written entirely in Python.
The tool is made available in two different versions in a Python toolbox as
shown in the figure.
Details on Python script tools and Python toolboxes are covered in chapters
3 and 4, respectively. At this stage, it is enough to know that, in both cases,
all the code is written in Python, and the tool dialog boxes look just like
regular geoprocessing tools in ArcGIS Pro.
The tool dialog box has a lot of options for inputs, outputs, and analysis
settings, reflecting the relatively complex nature of the interpolation. Like
regular geoprocessing tools, some of the parameters have suggested
defaults.
Some of the key parameters of the tool are the input point features with a zdimension and a field for the measurement of interest, the interpolation
settings, and the preexisting 2D linear features along which the fence will
be created.
The output is a 3D point feature class. The following example (courtesy of
the tool’s author) shows the results from the Feature-based Fences tool as a
scene in ArcGIS Pro, with the original points used in the interpolation
shown in red, and the resulting 3D fence as a color ramp.
The Python code associated with these tools is relatively long and complex,
as could be expected for a sophisticated tool. The entire code is more than
1,000 lines, although the script also includes notes, comments, and blank
lines to facilitate reading.
Even though the code appears complex at first, you probably recognize
some existing tools, such as Copy Features, Add Field, and the Empirical
Bayesian Kriging tool. The entire workflow is elaborate and would be
cumbersome to complete step by step in ArcGIS Pro. Developing a tool of
this complexity requires advanced coding skills and a significant time
investment. Once created, however, the tool can be used many times, and it
can be shared with other users.
Documentation and all the source code for this tool can be found in ArcGIS
Online at www.arcgis.com by searching for 3D Fences toolbox.
Example 4: Notebook for wildfire analysis
Python and ArcPy make it possible to extend the functionality of ArcGIS
Pro, as illustrated in the previous examples. The following example uses the
ArcGIS API for Python to work with web GIS. The example is one of the
sample notebooks provided with the documentation of the ArcGIS API for
Python. The specific notebook illustrates an analysis of the Thomas Fire in
2017 in California. The figure shows the top portion of the notebook as part
of the online documentation.
A notebook shows Python code combined with text, graphics, and other
elements. As you run part of the code, the results update interactively. A
notebook provides a different interface to working with Python code
compared with more traditional Python editors, and it does not produce a
tool for use in ArcGIS Pro with the familiar interface of a tool dialog box.
Instead, users interact directly with the code and display the results within
the notebook.
Notebooks can be opened directly in ArcGIS Pro. The figure shows the
same notebook opened in ArcGIS Pro.
A user can inspect the code, update datasets or analysis parameters, and run
the code to view the updated results within the notebook. The following
example shows a side-by-side comparison of imagery before and after the
wildfire. A user can enter a new address to be geocoded, run the code, and
view the updated imagery.
You can also use the ArcGIS API for Python to perform many different
types of analysis, similar to the geoprocessing tools in ArcGIS Pro. The
figure shows an example of the use of map algebra to calculate a
normalized burn ratio to determine the impacts of the fire by using the
before-and-after imagery.
The results are symbolized and added to a map display in the notebook. The
maps show the burned areas (in red) on top of a background satellite image.
This notebook can be found under the sample notebooks at
https://developers.arcgis.com/python/sample-notebooks by searching for
Thomas Fire.
Although a notebook does not have the familiar interface of a
geoprocessing tool, it provides a more interactive approach to work with
code and the results. Notebooks can be shared with others and hosted in an
ArcGIS Enterprise portal. The use of notebooks makes it easier to
document and share workflows. Creating notebooks using the ArcGIS API
for Python is covered in detail in chapter 9.
You can benefit from these examples because they provide insight into why
you would develop these tools and notebooks in the first place. Perhaps you
have been using ArcGIS Pro for a while and have wondered why there is
not a tool for a certain task. Or you have established a workflow that
requires many repetitive tasks, and you are looking for a way to automate
the steps. Or you want to document your workflow and share not only the
code, but also the data and the results. Having a strong motivation to
develop a certain script, tool or notebook will make it easier as you embark
on strengthening your Python skills.
Once you learn how to use Python for developing scripts, tools, and
notebooks, you will find that one of the best ways to keep learning Python
scripting is to work with existing code written by others. Using example
code can also speed up the process of creating your own scripts, tools, and
notebooks.
Points to remember
ArcGIS Pro supports the use of scripting to automate workflows.
Python is the preferred scripting language for working with ArcGIS
Pro. There is a large user community, and a growing set of third-party
packages for use in Python that provide additional functionality.
This book focuses on more advanced scripting techniques, and the
development of scripts, tools, and notebooks to be shared. The topics
covered require substantial previous experience in writing Python
scripts for ArcGIS. The fundamentals of Python and ArcPy, including
setting up a Python editor and writing basic scripts for data
processing using ArcPy, are covered in Python Scripting for ArcGIS
Pro.
ArcGIS Pro works with Python 3, whereas ArcGIS Desktop 10.x
works with Python 2. This book focusses on the use of ArcGIS Pro
and Python 3, but migrating scripts and tools from ArcGIS Desktop
10.x to ArcGIS Pro is covered in chapter 8.
In addition to using ArcPy to write scripts and develop tools for
ArcGIS Pro, the book also covers the use of the ArcGIS API for
Python to work with web GIS. This includes the use of notebooks,
which provide an interactive approach to working with Python code,
geospatial datasets, and analysis results. The use of notebooks makes
it easier to document and share workflows.
One of the best ways to continue learning Python scripting is to
examine the work published by others. There are many published
examples of Python scripts and tools developed for ArcGIS Pro using
ArcPy, as well as sample notebooks developed for web GIS using the
ArcGIS API for Python.
Key terms
backporting
backward compatibility
conda
environment
Jupyter Notebook
notebook
Python editor
Python script tool
Python toolbox
script
scripting language
web GIS
Review questions
Which version of Python is used with ArcGIS Pro?
What is the main goal of developing scripts and tools for ArcGIS Pro
using Python?
Which Python editors are recommended to write script and develop
tools for ArcGIS Pro?
What are some of the similarities and differences between Python
tools for ArcGIS Pro and notebooks?
What are some of the similarities and differences between ArcPy and
the ArcGIS API for Python?
Discuss one of the examples presented in this chapter and explain
how it adds functionality not available in ArcGIS Pro.
Chapter 2
Creating Python functions and
classes
2.1
Introduction
This chapter describes how to create custom functions in Python that can be
called from elsewhere in the same script or from another script. Functions
are organized into modules, and modules can be organized into a Python
package. ArcPy itself is a collection of modules organized into a package.
By creating custom functions, you can organize your code into logical parts
and reuse frequently needed procedures. This chapter also describes how to
create custom classes in Python, which makes it easier to group together
functions and variables.
Custom functions and classes are important for writing longer and more
complex scripts. They allow you to better organize your code as well as
reuse important elements of your code. A good understanding of functions
and classes is also important because they are used frequently in other
chapters in the book. Many example scripts and tools published by others
also contain custom functions and classes.
2.2
Functions and modules
Before getting into creating custom functions, a quick review of functions is
in order. Functions are blocks of code that perform a specific task. Python
includes many built-in functions, such as help(), int(), print(), and str().
Most functions require one or more arguments, which serve as the input for
the function.
Using a function is referred to as calling the function. When you call a
function, you supply it with arguments. Consider the print() function:
name = "Paul"
print(name)
The result is
Paul
In this example, the argument of the print() function is a variable, and this
variable has a value. The print() function outputs the value to the console.
The general syntax of a function is:
<function>(<arguments>)
In this syntax, <function> stands for the name of the function, followed by
the parameters of the function in parentheses. Function arguments are also
called parameters, and these terms are often used interchangeably.
Python has several dozen built-in functions. For a complete list of built-in
functions, see https://docs.python.org/3/library/functions.html. You will use
several built-in functions in a typical Python script.
You can also import additional functionality from other modules. A module
is like an extension that can be imported into Python to extend its
capabilities. Typically, a module consists of several specialized functions.
Modules are imported using a special statement called import. The general
syntax for the import statement is
import <module>
Once you import a module in a script, all functions in that module are
available to use in that script. Consider the random module, for example.
You can import this module to access several different functions. The
following code generates a random number from 1 to 99 using the
randrange() function of the random module.
import random
random_number = random.randrange(1, 100)
print(random_number)
The code to generate a random number has already been written and is
shared with the Python user community. This code can be used freely by
anyone who needs it. The random module contains several different
functions, and many of them are closely related. Whenever your script
needs a random number, you don’t have to write the code yourself. You can
import the random module and use any of its functions.
One of the most widely used modules is the os module, which includes
several functions related to the operating system. For example, the
os.mkdir() function creates a new folder in the current working directory,
as follows:
import os
os.mkdir("test")
The general syntax to use a function from a module that is not one of the
built-in functions is as follows:
import <module>
<module>.<function>(<arguments>)
In other words, you first must import the module using import<module>, and
then reference the module when calling the function using <module>.
<function>.<parameters>.
When writing scripts for ArcGIS Pro, you can use the ArcPy package to
access the functionality of ArcGIS Pro within a script. ArcPy is referred to
as a package because it consists of several modules, functions, and classes,
but to work with ArcPy, you import it just like a module. That is why most
geoprocessing scripts start off as follows:
import arcpy
Once you import ArcPy, you can use one of its many functions. For
example, the arcpy.Exists() function determines whether a dataset exists
and returns a Boolean value of True or False. The following code
determines whether a shapefile exists:
import arcpy
print(arcpy.Exists("C:/Data/streams.shp"))
This code follows the regular Python syntax <module>.<function>
(<parameters>),
where arcpy is the module, and Exists() is the function,
even though ArcPy is technically considered a package.
ArcPy includes several modules, including the data access module
arcpy.da. This module is used for describing data, performing editing tasks,
and following database workflows. The da.Describe() function determines
the type of dataset, as well as several properties of the dataset. For example,
the following code determines the geometry shape type of a shapefile:
import arcpy
desc = arcpy.da.Describe("C:/Data/streams.shp")
print(desc["shapeType"])
For a polyline shapefile, the result is Polyline.
The general syntax for using a function of an ArcPy module is
arcpy.<module>.<function>(<arguments>)
In the preceding example code, Describe() is a function of the arcpy.da
module.
When referring to a function, it is important to refer to the module that it is
part of. For example, ArcPy also includes a Describe() function. So both
arcpy.Describe()
and arcpy.daDescribe() are valid functions, but they
work in different ways.
Now that you’ve reviewed the use of functions and modules, the next
section introduces creating your own custom functions.
2.3
Creating functions
In addition to using existing functions, you can create your own custom
functions that can be called from within the same script or from other
scripts. Once you write your own functions, you can reuse them whenever
needed. This capability makes code more efficient because there is no need
to write code for the same task over and over.
Python functions are defined using the def keyword. The def statement
contains the name of the function, followed by any arguments in
parentheses. The syntax of the def statement is
def <functionname>(<arguments>):
There is a colon at the end of the statement, and the code following a def
statement is indented the same as any block of code. This indented block of
code is the function definition.
For example, consider the script helloworld.py as follows:
def printmessage():
print("Hello world")
In this example, the function printmessage() has no arguments, but many
functions use parameters to pass values. Elsewhere in the same script, you
can call this function directly, as follows:
printmessage()
The complete script is as follows:
def printmessage():
print("Hello world")
printmessage()
When the script runs, the function definition is not executed. In other
words, the line of code starting with def and the block of code that follows
don’t do anything. In the third line of code, the function is called, and then
it is executed. The result of the script is
Hello world
This is a simple example, but it illustrates the basic structure of a custom
function. Typically, functions are more elaborate. Consider the following
example: you want to create a list of the names of all the fields in a table or
feature class. There is no function in ArcPy that does this. However, the
ListFields() function allows you to create a list of the fields in a table, and
you can then use a for loop to iterate over the items in the list to get the
names of the fields. The list of names can be stored in a list object. The
code is as follows:
import arcpy
arcpy.env.workspace = "C:/Data"
fields = arcpy.ListFields("streams.shp")
namelist = []
for field in fields:
namelist.append(field.name)
Now, say you anticipate that you will be using these lines of code often—in
the same script or other scripts. You can simply copy the lines of code,
paste them where they are needed, and make any necessary changes. For
example, you will need to replace the argument "streams.shp" with the
feature class or table of interest.
Instead of copying and pasting the entire code, you can define a custom
function to carry out the same steps. First, you must give the function a
name—for example, listfieldnames(). The following code defines the
function:
def listfieldnames():
You can now call the function from elsewhere in the script by name. In this
example, when calling the function, you want to pass a value to the function
—that is, the name of a table or feature class. To make this possible, the
function must include an argument to receive these values. The argument
must be included in the definition of the function, as follows:
def listfieldnames(table):
Following the def statement is an indented block of code that contains what
the function does. This block of code is identical to the previous lines of
code, but now the hard-coded value of the feature class is replaced by the
argument of the function, as follows:
def listfieldnames(table):
fields = arcpy.ListFields(table)
namelist = []
for field in fields:
namelist.append(field.name)
Notice how there are no hard-coded values left in the function. The lack of
hard coding is typical for custom functions because you want a function to
be reusable in other scripts.
The last thing needed is a way for the function to pass values back, also
referred to as returning values. Returning values ensures that the function
not only creates the list of names, but also returns the list so it can be used
by any code that calls the function. This is accomplished using a return
statement. The completed description of the function is as follows:
def listfieldnames(table):
fields = arcpy.ListFields(table)
namelist = []
for field in fields:
namelist.append(field.name)
return namelist
Once a function is defined, it can be called directly from within the same
script, as follows:
fieldnames = listfieldnames("C:/Data/hospitals.shp")
Running the code returns a list of the field names in a table or feature class
using the function previously defined. Notice that the new function
listfieldnames() can be called directly because it is defined in the same
script.
One important aspect is the placement of the function definition relative to
the code that calls the function. The custom function can be called only
after it is defined. The correct organization of the code is as follows:
import arcpy
arcpy.env.workspace = "C:/Data"
def listfieldnames(table):
fields = arcpy.ListFields(table)
namelist = []
for field in fields:
namelist.append(field.name)
return namelist
fieldnames = listfieldnames("hospitals.shp")
print(fieldnames)
If the function is called before the function definition, the following error is
returned:
NameError: name 'listfieldnames' is not defined
Complex scripts with several functions therefore often start with defining
several functions (and classes), followed by the code that calls these
functions later in the script.
In addition, it is common to add empty lines around the blocks of code that
define functions to improve readability, as shown in the figure.
The example function uses an argument, called table, which makes it
possible to pass a value to the function. A function can use more than one
argument, and arguments can be made optional. The arguments should be
ordered so that the required ones are listed first, followed by the optional
ones. Arguments are made optional by specifying default values.
Custom functions can be used for many other tasks, including working with
geometry objects. Next, an example script is explained, and then it will be
converted to a custom function. The example script calculates the sinuosity
index for each polyline feature representing a river segment. Sinuosity, in
this context, is defined as the length of the polyline representing the river
segment divided by the straight-line distance between the first and last
vertex of the polyline. Segments that are relatively straight have a sinuosity
index of close to 1, whereas meandering segments have higher values, up to
1.5 or 2. The calculation can be accomplished by using properties of a
Polyline object—i.e., length, firstPoint, and lastPoint. The script to
print the sinuosity index for every polyline feature in a feature class is as
follows:
import arcpy
import math
arcpy.env.workspace = "C:/Data/Hydro.gdb"
fc = "streams"
with arcpy.da.SearchCursor(fc, ["OID@", "SHAPE@"]) as cursor:
for row in cursor:
oid = row[0]
shape = row[1]
channel = shape.length
deltaX = shape.firstPoint.X - shape.lastPoint.X
deltaY = shape.firstPoint.Y - shape.lastPoint.Y
valley = math.sqrt(pow(deltaX, 2) + pow(deltaY, 2))
si = round(channel / valley, 3)
print(f"Stream ID {oid} has a sinuosity index of
{si}")
A brief explanation of how the script works is in order. A search cursor is
used to obtain the unique ID and the geometry of each polyline. The length
property of the geometry represents the length of the polyline. The straightline distance between the first and last vertex of the polyline is calculated
using the firstPoint and lastPoint properties of the geometry, which
return a Point object. The x,y coordinates of these vertices are used to
calculate the distance on the basis of the Pythagorean theorem. The two
distances are divided to obtain the sinuosity index, and for display purposes,
the values are rounded to three decimal places.
Consider the stream network that’s shown in the figure.
The result of the script is a printout of the sinuosity index of each segment.
The calculation of the sinuosity index requires several lines of code that
may be useful in other places, and this code lends itself to a custom
function. This custom function receives a geometry object and returns the
sinuosity index. The script using a custom function is as follows:
import arcpy
import math
arcpy.env.workspace = "C:/Data/Hydro.gdb"
fc = "streams"
def sinuosity(shape):
channel = shape.length
deltaX = shape.firstPoint.X - shape.lastPoint.X
deltaY = shape.firstPoint.Y - shape.lastPoint.Y
valley = math.sqrt(pow(deltaX, 2) + pow(deltaY, 2))
return channel / valley
with arcpy.da.SearchCursor(fc, ["OID@", "SHAPE@"]) as cursor:
for row in cursor:
oid = row[0]
shape = row[1]
si = round(sinuosity(shape), 3)
print(f"Stream ID {oid} has a sinuosity index of
{si}")
The custom function is called sinuosity(), and the only argument is a
geometry object referred to as shape. When calling the function, the
geometry object is passed to the function, and the index is returned as a
value.
The script uses the round() function to return a floating-point number
rounded to the specified number of decimal places. The only issue with
rounding in this manner is that any trailing zeros are dropped—e.g., 1.300
is printed as 1.3. An alternative is to use format codes to customize the print
formatting. The final two lines of the script using a format code are as
follows:
si = sinuosity(shape)
print(f"Stream ID {oid} has a sinuosity index of {si:.3f}")
The format code.3f means the output is formatted using a floating-point
number with three decimal places. This type of formatting also applies
rounding—e.g., the number 1.4567 is formatted as 1.457.
Again, it is common to add empty lines around the blocks of code that
define a function to improve readability.
Creating functions can be beneficial in several ways:
If a task is to be used many times, creating a function can reduce the
amount of code you must write and manage. The actual code that
carries out the task is written only once as a function; from that point
on, you can call this custom function as needed.
Creating functions can reduce the clutter caused by multiple
iterations. For example, if you wanted to create lists of the field
names for all the feature classes in all the geodatabases in a list of
workspaces, it would quickly create a relatively complicated set of
nested for loops. Using a function for creating a list of field names
removes one of these for loops and places it in a separate block of
code.
Complex tasks can be broken into smaller steps. By defining each
step as a function, the complex task does not appear so complex
anymore. Well-defined functions are a good way to organize longer
scripts.
Custom functions can be called not only directly from the same script but
also from other scripts, which the next section addresses.
2.4 Calling functions from other scripts
Once functions are created in a script, they can be called from another script
by importing the script that contains the function. For relatively complex
functions, it is worthwhile to consider making them into separate scripts,
especially if they are needed on a regular basis. Rather than defining a
function within a script, the function becomes part of a separate script that
can be called from other scripts.
Consider the earlier example of the helloworld.py script:
def printmessage():
print("Hello world")
The printmessage() function can be called from another script by importing
the helloworld.py script. For example, the script print.py imports this script
as follows:
import helloworld
helloworld.printmessage()
The script print.py imports the helloworld.py script as a module—
helloworld. A module name is equal to the name of the script minus the .py
extension. The function is called using the regular syntax to call a function
—that is, <module>.<function>.
In the example script, the helloworld module is imported into the print.py
script. The import statement causes Python to look for a file named
helloworld.py. No paths can be used in the import statement, and thus it is
important to recognize where Python looks for modules.
The first place Python looks for modules is the current folder, which is the
folder in which the print.py script is located. The current folder can be
obtained using the following code, in which sys.path() is a list of system
paths:
import sys
print(sys.path[0])
The current folder also can be obtained using the os module, as follows:
import os
print(os.getcwd())
Next, Python looks at all the other system paths that have been set during
the installation or subsequent configuration of Python itself. These paths are
contained in an environment variable called PYTHONPATH. Note that this
is not a geoprocessing environment setting, but a variable of the Python
environment. To view a complete list of these paths, use the following code:
import sys
for path in sys.path:
print(path)
The sys.path() function returns a list of paths, and the iteration makes the
printout easier to read. In a typical scenario, the list will include the paths as
shown in the figure.
Description
The first element in the list is the path of the current script (i.e., C:\Testing),
which is returned using sys.path[0]. The rest of the paths will vary
depending on how ArcGIS is installed and on the environment being used.
In the list of paths, you will notice two types (beyond the current folder).
First, there are paths tied to the core installation of ArcGIS Pro—i.e.,
\ArcGIS\Pro\bin, \ArcGIS\Pro\Resources\ArcPy, and
\ArcGIS\Pro\Resources\ArcToolbox\Scripts. These paths make it possible
to do things such as import arcpy. Second, there are paths tied to the
specific Python environment being used—in this case, the default
environment arcgispro-py3. This is the location where Python itself is
installed, including any additional packages. If you are using a different
environment, the list of paths will have the same structure, but arcgispropy3 would be replaced by the name and location of that environment.
What if the module (i.e., script) you want to import is in a different folder—
that is, not in the current folder of the script or in any of the folders in
sys.path? You have two options, as follows:
Option 1: Append the path using code.
You can temporarily add a path to your script. For example, if the scripts
you want to call are in the folder C:\Myscripts, you can use the following
code before calling the function:
import sys
sys.path.append("C:/Myscripts")
The sys.path.append() statement is a temporary solution so a script can
call a function in another script in the current session.
Option 2: Use a path configuration (.pth) file.
You can access a module in a different folder by adding a path
configuration file to a folder that is already part of sys.path. It is common
to use the site-packages folder—for example, C:\Program
Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib\site-packages. A path
configuration file has a .pth extension and contains the path(s) that will be
appended to sys.path. This file can be created using a basic text editor. As
part of the ArcGIS Pro installation, a path configuration file called
ArcGISPro.pth is placed in the site-packages folder of Python. The file itself
looks like the example in the figure.
Description
The path configuration file makes all the modules located in the specific
folders available. You should not be making any changes to this default .pth
file. You can create a .pth file yourself if you commonly work with scripts
that are in different folders. For example, if the modules you want to import
are in the folder C:\Myscripts, you can create a .pth file and place it in the
Python site-packages folder. One complication is that the default
environment arcgispro-py3 cannot be modified, so any additional .pth files
are not recognized. This option is available only when working with a
cloned environment. Therefore, for many typical users, adding the path
within the script itself is more convenient.
Note: A third alternative is to modify the PYTHONPATH variable directly
from within the operating system. However, this option is cumbersome
and error-prone, and therefore not recommended.
The earlier example for calculating the sinuosity index for polylines is
revisited to illustrate how custom functions can be called from other scripts.
The script that contains the custom function is called rivers.py and is as
follows:
import math
def sinuosity(shape):
channel = shape.length
deltaX = shape.firstPoint.X - shape.lastPoint.X
deltaY = shape.firstPoint.Y - shape.lastPoint.Y
valley = math.sqrt(pow(deltaX, 2) + pow(deltaY, 2))
return channel / valley
This script must import the math module because it is being used in the
script. There is no need to import ArcPy because that is done in the other
script in which the geometry objects are created using a search cursor.
Notice that the rivers.py script does not include any hard coding. This lack
of hard coding is typical for custom functions because you want the code to
be reusable without modification.
The script that calls the custom function is called river_calculations.py and
is as follows:
import arcpy
import rivers
arcpy.env.workspace = "C:/Data/Hydro.gdb"
fc = "streams"
with arcpy.da.SearchCursor(fc, ["OID@", "SHAPE@"]) as cursor:
for row in cursor:
oid = row[0]
shape = row[1]
si = round(rivers.sinuosity(shape), 3)
print(f"Stream ID {oid} has a sinuosity index of
{si}")
The script must import ArcPy to create the geometry objects using a search
cursor. It also must import the rivers.py script as a module using import
rivers.
Importing the module makes the custom function available to the
current script. When calling the custom function, the module must be
included—i.e., rivers.sinuosity() instead of just sinuosity().
By writing the custom function in a separate script, you have created a
module that can be used by other scripts, which significantly increases the
usability of your code.
2.5 Organizing code into modules
By creating a script that defines a custom function, you are using the script
as a module. All Python script files are, in fact, modules. That’s why you
can call the function by first importing the script (module), and then using a
statement such as <module>.<function>. Recall the example:
import random
random_number = random.randrange(1, 100)
print(random_number)
The random module consists of the random.py file and is in one of the
folders that Python automatically recognizes—i.e., C:\Program
Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib. The random.py script
(module) contains several functions, including randrange().
The rivers.py script in the previous section represents another example.
After importing the module using import rivers, the custom function can
be called using rivers.sinuosity(). Although the example code employs
only a single function, it can easily be expanded to include other relevant
functions pertaining to rivers.
This approach makes it easy to create new functions in a script and call
them from another script. However, it also introduces a complication: How
do you distinguish between running a script by itself and calling it from
another script? What is needed is a structure that provides control of the
execution of the script. If the script is run by itself, the function is executed.
If the module is imported into another script, the function is not executed
until it is specifically called.
Consider the example hello.py script, which contains a function as well as
some test code to make sure the function works:
def printmessage():
print("Hello world")
printmessage()
This type of testing is reasonable, because when you run the script by itself,
it confirms that the function works. Without the last print message, you
would not be able to see that the function works correctly. However, when
you import this module to use the function as follows:
import hello
The test code runs immediately and prints the message:
"Hello world"
When you import the script file as a module, you don’t want the test code to
run automatically, but only when you call the specific function. You want to
be able to differentiate between running the script by itself and importing it
as a module into another script. This is where the variable __name__ comes
in (there are two underscores on each side). For any script, this variable has
the value of "__main__". For an imported module, the variable is set to the
name of the module. Using an if statement in the script that contains the
function makes it possible to distinguish between a script and a module, as
follows:
def printmessage():
print("Hello world")
if __name__ == "__main__":
printmessage()
In this case, the test of the module (i.e., printmessage()) will run only if the
script is run by itself. If you import the module into another script, no code
will run until you call the function.
This structure is not limited to testing. In some geoprocessing scripts,
almost the entire script consists of one or more functions, and only the very
last lines of code call the function if, indeed, the script is run by itself. The
structure is as follows:
import arcpy
<import other modules as needed>
def mycooltool(<arguments>):
<lines of code>
...
if __name__ == "__main__":
mycooltool(<arguments>)
This structure provides control over running the script and makes it possible
to use the same script in two different ways—running it by itself or calling
it from another script.
Consider the script associated with the Illuminated Contours tool discussed
in chapter 1. The script starts by importing several modules, followed by a
single function called illuminatedContours().
The entire script is, in fact, written as a function, whereas the last few lines
of the script call the function as shown in the image.
Description
The block of code following if
__name__ == "__main__":
is executed only
if the script is run by itself. The script normally runs when the Illuminated
Contours tool is run in ArcGIS Pro. Running the script in ArcGIS Pro is not
considered a call from another script because the script is not imported as a
module, but called by running a tool in ArcGIS Pro. As a result, the block
of code executes. The code uses the arcpy.GetParameterAsText() function
to receive the parameters from the tool, which chapter 3 explains in more
detail. The final line of code is a call to the function illuminatedContours()
to carry out the task at hand.
The benefit of this structure is that the function can be used in other ways.
For example, you could write a script that imports the IllumContours.py
script as a module, and then you can call the illuminatedContours()
function without having to make any changes to the script or using the tool
dialog box.
Many scripts that contain custom functions provide only a supporting role
and are not normally run by themselves. Consider again the rivers.py script
that calculates the sinuosity index for polylines:
import math
def sinuosity(shape):
channel = shape.length
deltaX = shape.firstPoint.X - shape.lastPoint.X
deltaY = shape.firstPoint.Y - shape.lastPoint.Y
valley = math.sqrt(pow(deltaX, 2) + pow(deltaY, 2))
return channel / valley
Without polylines as inputs, there is nothing to calculate. Running the script
does not produce an error, but it also does not do anything because the
function is never called. To make it clear that the script requires specific
inputs that are not part of the script, the following code could be added to
the script:
if __name__ == "__main__":
print("This script requires geometry objects as inputs.")
In this case, when the script is run on its own, the message is printed, and it
is clear the script does not perform any calculations without specific inputs.
2.6 Creating classes
In the previous sections, you saw how to create your own custom functions
and organize your code into modules. This approach substantially increases
code reusability because you can write a section of code and use it many
times by calling it from within the same script or from another script.
However, these functions and modules have their limitations. The principal
limitation is that a function does not store information the way a variable
does. Every time a function runs, it starts from scratch.
In some cases, functions and variables are closely related. For example,
consider a land parcel with several attributes, such as the land-use type,
total assessed value, and total area. The parcel also may have procedures
associated with it, such as how to estimate the property taxes on the basis of
land-use type and total assessed value. These functions require the value of
the attributes. These values can be passed to a function as variables. What if
a function must change the variables? The values could be returned by the
function. However, the passing and returning of variables can become
cumbersome.
A better solution is to use a class. A class provides a way to group together
functions and variables that are closely related so they can interact with
each other. A class also makes it possible to work with multiple objects of
the same type. For example, each land parcel is likely to have the same
attributes. The concept of grouping together functions and variables related
to a type of data is an essential aspect of object-oriented programming
(OOP). Classes are the container for these related functions and variables.
Classes make it possible to create objects as defined by these functions and
variables. Functions that are part of a class are called methods, and
variables that are part of a class are called properties. Examples that follow
will more clearly explain the use of methods and properties.
ArcPy includes many classes, such as the env class, which can access and
set environment settings, and the Result class, which defines the properties
and methods of Result objects that are returned by geoprocessing tools.
Creating your own custom classes in Python, however, opens many new
possibilities.
Python classes are defined using the class keyword as follows:
class <classname>(object):
The reference to object in parentheses means that the custom class being
created is based on a general type of class in Python. Since the reference is
implicit, it can be left out, as follows:
class <classname>():
Note: Although the preceding code is correct, it is better to include the
object to ensure compatibility with Python 2.
There is a colon at the end of the statement, which means the code
following a class statement is indented the same as any block of code. This
indented block of code is the class definition. A class typically consists of
one or more functions, which means the general structure of a class is as
follows:
class <class>(object):
def <function1>(<arguments>):
<code>
def <function2>(<arguments>):
<code>
Consider a simple example:
class Person(object):
def setname(self, name):
self.name = name
def greeting(self):
print("My name is (0).".format(self.name))
The class keyword is used to create a Python class called Person. The class
contains two method definitions—these are essentially function definitions,
except that they are written inside a class statement and are therefore
referred to as “methods.” The self argument refers to the object itself. You
can call it whatever you like, but it is almost always called “self” by
convention.
Note: The Style Guide for Python Code recommends using the
CapitalizedWords, or CapWords, convention for class names—for
example, MyClass. By contrast, the recommended style for variables,
functions, and scripts is all lowercase. This style is not required,
however, and many developers follow different conventions. The names
of functions in ArcPy, for example, do not follow the Style Guide.
A class can be thought of as a blueprint. It describes how to make
something, and you can create many instances from this blueprint. Each
object created from a class is called an instance of the class. Creating an
instance of a class is sometimes referred to as instantiating the class.
Next, you will see how this class can be used. You start by creating an
object:
me = Person()
Using an assignment statement creates an instance of the Person class.
Creating this instance looks like calling a function, but you are creating an
object of type Person. Once an instance is created, you can use the
properties and methods of the class, as follows:
me.setname("Abraham Lincoln")
me.greeting()
Running this code prints the following:
My name is Abraham Lincoln.
This example is relatively simple, but it illustrates some key concepts. First,
a class is created using the class keyword. Second, variables of the class
are called properties, which can store values as all variables can. When you
create an instance of the class, you can pass the values for these properties.
Third, functions of the class are called methods, which can carry out tasks
as all functions can. When you create an instance of the class, you can call
the function as a method of the class. A single class can contain many
properties and methods.
Now return to the example of a parcel of land. You want to create a class
called Parcel that has two properties (land-use type and total assessed
value) and a method (calculating tax) associated with the parcel. For the
purpose of this example, assume the property tax is calculated as follows:
for single-family residential, tax = 0.05 * value; for multifamily residential,
tax = 0.04 * value; and for all other land uses, tax = 0.02 * value. In other
words, the tax calculation is based on the land-use type.
Creating the Parcel class is coded as follows:
class Parcel(object):
def __init__(self, landuse, value):
self.landuse = landuse
self.value = value
def assessment(self):
if self.landuse == "SFR":
rate = 0.05
elif self.landuse == "MFR":
rate = 0.04
else:
rate = 0.02
assessment = self.value * rate
return assessment
The class called Parcel is created using the class keyword. The class
contains two methods: __init__() and assessment(). The __init__()
method is a special method reserved for initializing objects. This method
must have at least one argument in addition to self. In the example, this
method has three arguments: self, landuse, and value. When the class is
used to create objects, however, the first argument (self) is not used
because it represents the object itself and is provided implicitly by using the
class. The __init__() method is used to initialize (or specify) an object
with its initial properties by giving the properties a value. The class can
now be used to create Parcel objects, which have properties called landuse
and value.
Next, look at how to use this class. The following code creates an instance
of the class:
myparcel = Parcel("SFR", 200000)
This code creates a single Parcel object, and the two properties are assigned
values. You can now use these properties. For example, the following code
prints the values of both properties:
print(myparcel.landuse)
print(myparcel.value)
The result is as follows:
SFR
200000
This part of the code serves only to confirm that the Parcel object is created
and that the properties have values. You also can check the type of the
object, as follows:
print(type(myparcel))
The result is
<class '__main__.Parcel'>
This check confirms that the type of object is Parcel. The __main__ portion
means that the class definition resides in the current script.
So far, the code has served only to create the instance of the Parcel class
and confirm the properties of the object. The assessment() method is where
the actual calculation is done. With the Parcel object created, you can use
the properties and methods, as follows:
print("Land use: ", myparcel.landuse)
mytax = myparcel.assessment()
print("Tax: ", mytax)
Running this code results in the following:
Land use: SFR
Tax: 10000.0
The assessment() method is used to calculate the tax for this one parcel for
which the land use is single-family residential (SFR) and the value is
$200,000. In the example, the values for the properties of the object are
hard-coded into the script, but in a real-world scenario, the values would
reside in a database. You could run the property tax calculation for every
parcel in the database, creating a new instance for each parcel to carry out
the calculation.
In many cases, you may want to use the class in more than one script. This
can be accomplished by putting the class in a module—that is, creating a
separate script with the definition of the class, which can then be called
from another script. This approach is analogous to creating a separate script
for a function, which can be called from other scripts, as described earlier in
this chapter.
In this example, the existing code for the Parcel class is copied to a
separate script called parcelclass.py and is coded as follows:
class Parcel(object):
def __init__(self, landuse, value):
self.landuse = landuse
self.value = value
def assessment(self):
if self.landuse == "SFR":
rate = 0.05
elif self.landuse == "MFR":
rate = 0.04
else:
rate = 0.02
assessment = self.value * rate
return assessment
In this example, the script that uses the class is called parceltax.py and is
coded as follows:
import parcelclass
myparcel = parcelclass.Parcel("SFR", 200000)
print("Land use: ", myparcel.landuse)
mytax = myparcel.assessment()
print(mytax)
A few notes are in order. The module called parcelclass is imported to
make it possible to use the Parcel class. This approach works only if the
parcelclass.py script resides in the same folder as the parceltax.py script, or
in one of the well-known locations in which Python looks for modules. A
Parcel object is created using the <module>.<class> structure—i.e.,
parcelclass.Parcel.
The example creates only a single Parcel object, but
this process could be repeated for any number of parcels in a database table.
Creating a class in a separate script allows you to organize your code better
and makes it possible to reuse the class in many different scripts.
Classes also can be used to work with geometry objects. Recall the custom
function called sinuosity(), which used geometry objects, from an earlier
section. Instead of using a custom function, you can use a custom class to
create objects, and the function becomes a method of this class.
Creating the River class is coded as follows:
import math
class River(object):
def __init__(self, shape):
self.shape = shape
def sinuosity(self):
channel = self.shape.length
deltaX = self.shape.firstPoint.X self.shape.lastPoint.X
deltaY = self.shape.firstPoint.Y self.shape.lastPoint.Y
valley = math.sqrt(pow(deltaX, 2) + pow(deltaY, 2))
return channel / valley
The only property of the River class is the geometry object, although
additional properties can be used. The sinuosity() method is identical to
the custom function sinuosity() used in an earlier section, but now the
River
object is referenced instead of the geometry object. The argument of
the method is the object (i.e., self), whereas the argument of the custom
function is the geometry object being passed to the function (i.e., shape),
and the block of code for the method uses self.shape instead of shape.
Empty lines are typically added around the blocks of code to improve
readability, as shown in the figure.
Description
Using the class to calculate the sinuosity index is coded as follows:
import arcpy
import rivers
arcpy.env.workspace = "C:/Data/Hydro.gdb"
fc = "streams"
with arcpy.da.SearchCursor(fc, ["OID@", "SHAPE@"]) as cursor:
for row in cursor:
oid = row[0]
shape = row[1]
segment = rivers.River(row[1])
si = round(segment.sinuosity(), 3)
print(f"Stream ID {oid} has a sinuosity index of
{si}")
Because the class is created in a separate script (i.e., rivers.py), this script
must be imported as a module. In the for loop, a new River object is
created for every row in the input feature class using rivers.River(), where
rivers
is the module and River is the class. The object is assigned to a
variable (i.e., segment), and then the function to calculate the sinuosity
index can be called as a method of this object using segment.sinuosity().
The River class example uses only a single method, in which case the class
does not provide additional functionality relative to a custom function.
However, the class could be expanded with additional properties and
methods, which effectively would group multiple functions together in a
meaningful manner. For example, you could have methods related to the
slope of a stream segment by using elevation data or methods related to the
channel properties or flow direction or other hydrologically relevant
information.
2.7 Working with packages
When you have several different functions and classes, it often makes sense
to put them in separate modules (scripts). As your collection of modules
grows, you can consider grouping them into packages. A package is
essentially another type of module, but it contains multiple modules that are
closely related to (and may depend on) each other. A regular module is
stored as a .py file, but a package is stored as a folder (or directory).
Technically speaking, a package is a folder with a file called __init__.py in
it. This file defines the general attributes of the package. This script does
not need to define anything; it can be just an empty file, but it must exist. If
__init__.py does not exist, the directory is just a directory, and not a
package, and it can’t be imported. The __init__.py file makes it possible to
import a package as a module. For example, to import ArcPy, you use the
import arcpy statement, but you are not referring to a script file called
arcpy.py. Instead, you are referring to a folder called arcpy containing a file
called __init__.py.
ArcPy is an example of a Python package because it resides in a folder
called arcpy and contains a file called __init__.py, in addition to many other
files. To make the functionality of ArcPy available in your script, you use
the statement import arcpy. The same syntax is used to import a module
that consists of only a single Python file (.py). From the practical standpoint
of writing your own scripts, therefore, ArcPy looks and feels like a module,
but from a code organization perspective, ArcPy is a package. As a result,
you sometimes will see ArcPy referred to as a module. Although this
characterization is not correct in terms of how the code is organized, the
terms “module” and “package” are often used interchangeably in the
Python community.
Consider how this applies to creating your own package. For example, if
you have a package you want to call mytools, you must have a folder called
mytools, and inside this folder must be a file called __init__.py. The
structure of a package called mytools with two modules (analysis and
model)
would be as follows:
C:\Myfolder a system path C:\Myfolder\mytools directory for the
mytools package C:\Myfolder\mytools\__init__.py package code
C:\Myfolder\mytools\analysis.py analysis module
C:\Myfolder\mytools\model.py model module
To use the package, your code would be as follows:
import sys
sys.path.append("C:/Myfolder")
import mytools
output = mytools.analysis.<function>(<arguments>)
This is a simplified example, but it illustrates how modules and packages
are organized, including ArcPy.
In addition to the term “package,” you also may have come across the term
site package. A site package is a locally installed package that is available
to all users of that computer. The “site” is the local computer. What makes a
package a site package has to do with how it is installed, not its actual
contents. Because the term “site package” is related to how a package is
installed on a local computer and not its contents, the distinction between
package and site packages is not important from a practical standpoint of
writing code.
The default Python environment includes many packages, which can be
found in the Lib\site-packages folder. For example, a commonly used
package is NumPy, which manipulates large arrays of data. The path for
NumPy as part of the default environment is C:\Program
Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib\site-packages\numpy.
The contents of this folder consist of several folders and files, including an
__init__.py file.
The __init__.py file of NumPy includes a basic description of the package
and its subpackages, as well as several testing routines. Typically, there is
no need to look at the contents of these files, but examining the structure of
packages will give you a better idea of how Python is organized as a
programming language.
The installation of ArcGIS Pro includes both ArcPy and Python, and the
folder in which ArcPy is located is automatically recognized by Python.
Where exactly is ArcPy installed? Typically, the location is
C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy
Note that ArcPy is not installed in Python’s Lib\site-packages folder, as are
all the other packages including, for example, NumPy. One of the reasons is
that for every new Python environment, all the packages are copied, and
each environment can have different packages. Because ArcPy must be part
of every environment, it is kept separately in the Pro\Resources folder
instead of the Pro\bin\Python folder. Another reason is that ArcPy is
proprietary—i.e., not open source, as are the packages in the Lib\sitepackages folder.
When you explore the contents of the folder in which ArcPy resides, you
will notice a subfolder called arcpy (which gives ArcPy its name as a
Python package you can import). This folder contains a file called
__init__.py, which makes it a Python package, instead of a module
referencing a file called arcpy.py. You also will see many other files whose
names sound familiar, including analysis.py, cartography.py, geocoding.py,
and several others.
The arcpy folder also includes several subfolders, including the sa folder,
which contains scripts that are part of the arcpy.sa module. For example,
the Neighborhood.py script contains the implementation of the
Neighborhood classes of arcpy.sa.
Normally, there is no need to work with these files directly, but for
educational purposes, it is okay to examine them. Just don’t make any
changes!
Points to remember
Custom functions and classes allow you to organize your code into
meaningful elements and to reuse these elements whenever needed.
There are many benefits to using custom functions and classes,
including limiting repetitive code, reducing clutter in your code, and
being able to break complex tasks into smaller steps.
Custom functions can be created using the def keyword. The block of
code that follows the def statement defines what the function does.
Custom functions can contain arguments, although they are not
required.
Custom functions can be called from within the same script or from
another script. When creating a custom function in the same script,
the function definition must come before the code that calls the
function. When calling a function from another script, you first need
to import the script that contains the function as a module. A module
is therefore a regular .py file that contains at least one function (or
class).
To distinguish between running a script by itself and importing it as a
module into another script, you can use the if __name__ ==
"__main__":
statement.
When importing modules, you cannot use paths, and therefore
modules (i.e., scripts) must be in the same folder as the script
importing the module(s) or in one of the system paths. You can also
add a path to your script using sys.path.append().
Custom classes can be created to make it easier to group together
related functions and variables. Custom classes can be created using
the class keyword. Functions that are part of a class are called
“methods,” and variables that are part of a class are called
“properties.” Classes can be called from within the same script or
from another script.
As your collection of custom functions and classes grows, you can
consider making it a package, like the ArcPy package.
Key terms
argument
call (a function)
class
class definition
custom class
custom function
format code
function
function definition
hard-coded
initializing
instance
instantiating
method
module
object
object-oriented programming (OOP)
package
parameter
pass (a value)
property
returning (a value)
site package
Review questions
What are some of the benefits of creating custom functions and
classes in your scripts?
Explain where the code that defines a custom function or custom
class can be located relative to the code where the function or class is
used.
Describe the steps to create a custom function.
Describe the steps to create a custom class.
How do you import a module (with your custom function or class)
that is not located in the same folder as the script you are currently
running?
What is a Python package, and how is it different from a Python
module?
Explain how the code of the ArcPy package is organized and where it
is installed.
Chapter 3
Creating Python script tools
3.1 Introduction
This chapter describes the process of turning a Python script into a script
tool. Script tools make it possible to integrate your scripts into workflows
and extend the functionality of ArcGIS Pro. Script tools can be run as
stand-alone tools using their tool dialog box, but they also can be used
within a model or called by other scripts. Script tools have a tool dialog
box, which contains the parameters that are passed to the script. Developing
script tools is relatively easy and greatly enhances the experience of using a
script. Tool dialog boxes reduce user error because parameters can be
specified using drop-down lists, check boxes, combo boxes, and other
mechanisms. This use of a tool dialog box provides substantial control over
user input, greatly reducing the need to write a lot of code for errorchecking. Creating script tools also makes it easier to share scripts with
others.
3.2 Script tools versus Python toolboxes
Before getting into how script tools are created, it is important to
distinguish between two types of tools that can be developed for use in
ArcGIS Pro using Python. The focus of this chapter is on how to create a
script tool, sometimes referred to as a “Python script tool.” The code for
these tools is written as a Python script, and this script is called when the
tool is run. The tool dialog box for the script tool is created from within
ArcGIS Pro. Tool properties and parameters are created manually using the
interface options of the ArcGIS Pro application. This approach provides an
intuitive and easy-to-learn approach to creating and testing a script tool.
Although most script tools use Python, it is possible to use other scripting
languages that accept arguments. For example, you could use a .com, .bat,
.exe, or .r file instead of a .py file. A script tool calls a single script file,
although other scripts can be called from the main script when the tool is
run.
The second approach is to create a tool using a Python toolbox. In this
approach, the entire tool dialog box is written in Python, and the script is
saved as a .pyt file that is recognized as a Python toolbox in ArcGIS Pro.
Creating a Python toolbox does not use any of the interface options in
ArcGIS Pro, and the toolbox is created entirely in a Python editor. Python
toolboxes can be written only in Python, and a single Python toolbox can
contain multiple tools, all written in the same script file. Chapter 4 covers
creating a Python toolbox in detail.
When you are first learning how to create a tool for ArcGIS Pro using
Python, it is recommended that you start with a script tool because the
process is more intuitive. Once you gain some experience in creating script
tools, you can start using Python toolboxes as well. The same task can be
accomplished using both approaches, and the choice is largely a matter of
preference and experience.
The end of chapter 4 revisits some of the pros and cons of script tools and
Python toolboxes.
3.3 Why create your own tools?
Many ArcGIS Pro workflows consist of a sequence of operations in which
the output of one tool becomes the input of another tool. ModelBuilder and
scripting can be used to automatically run these tools in a sequence. Any
model created and saved using ModelBuilder is a tool because it is in a
toolbox (.tbx file) or a geodatabase. A model, therefore, is typically run
from within ArcGIS Pro. A Python script (.py file), however, can be run in
two ways:
Option 1: As a stand-alone script. The script is run from the operating
system or from within a Python editor. For a script to use ArcPy, ArcGIS
Pro must be installed and licensed, but ArcGIS Pro does not need to be
open for the script to run. For example, you can schedule a script to run at a
prescribed time directly from the operating system.
Option 2: As a tool within ArcGIS Pro. The script is turned into a tool to be
run from within ArcGIS Pro. Such a tool is like any other tool: it is in a
toolbox, can be run from a tool dialog box, and can be called from other
scripts, models, and tools.
There are several advantages to using tools instead of stand-alone scripts.
These benefits apply to both script tools and Python toolboxes.
A tool includes a tool dialog box, which makes it easier for users to
enter the parameters using built-in validation and error checking.
A tool becomes an integral part of geoprocessing. Therefore, it is
possible to access the tool from within ArcGIS Pro. It is also possible
to use the tool in ModelBuilder and in the Python window, and to call
it from another script.
A tool is fully integrated with the application it was called from. So,
any environment settings are passed from ArcGIS Pro to the tool.
The use of tools makes it possible to write tool messages.
Documentation can be provided for tools, which can be accessed like
the documentation for system tools.
Sharing a tool makes it easier to share the functionality of a script
with others.
A well-designed tool means a user requires no knowledge of Python
to use the tool.
Despite the many benefits, developing a robust tool requires effort. If the
primary purpose of the script is to automate tasks that are carried out only
by the script’s author, the extra effort to develop a script tool may not be
warranted. On the other hand, scripts that are going to be shared with others
typically benefit from being made available as a tool.
3.4 Steps to creating a script tool
A script tool is created using the following steps:
1. Create a Python script that carries out the intended tasks, and save it
as a .py file.
2. Create a custom toolbox (.tbx file) where the script tool can be stored.
3. Add a script tool to the custom toolbox.
4. Configure the tool properties and parameters.
5. Modify the script so that it can receive the tool parameters.
6. Test that your script tool works as intended. Modify the script and/or
the tool’s parameters as needed for the script tool to work correctly.
You can create a new custom toolbox on the Project tab of the Catalog pane
in ArcGIS Pro. Navigate to Toolboxes, right-click it, and click New
Toolbox. Select the folder where you want to save the toolbox, and give the
toolbox a name.
You also can create a new toolbox directly inside a folder or a geodatabase.
In that case, you right-click on the folder or geodatabase, and click New >
Toolbox. Although you can create a toolbox inside a geodatabase, the only
way to share the toolbox is to share the entire geodatabase. A stand-alone
toolbox is saved as a separate .tbx file and can more easily be shared.
A stand-alone toolbox can be located anywhere on your computer. For a
given project, it makes sense to locate the toolbox in the same folder in
which datasets and other files for a project are organized, but you also can
have a separate folder for your custom toolboxes, especially if they are used
in multiple projects. Stand-alone toolboxes have a file extension—e.g.,
C:\Demo\MyCoolTools.tbx. A toolbox inside a geodatabase, like other
geodatabase elements, does not have a file extension—e.g.,
C:\Demo\Study.gdb\MyCoolTools.
To create a script tool, right-click a custom toolbox, and click New > Script.
Write access to the toolbox is needed to add a new script tool. As a result,
you cannot add tools to any of the system toolboxes in ArcGIS Pro.
The New Script dialog box has three panels: General, Parameters, and
Validation.
The General panel is used to specify the tool name, label, script file (.py),
and several options. The Parameters panel is used to specify the tool
parameters, which will control how a user interacts with the tool. The
Validation panel provides further options to control the tool’s behavior and
appearance. Not all this information must be completed in one step. You
can enter some of the basic information, save it, and then return to edit the
tool properties later.
All the information needed to create a script tool is reviewed in detail in this
chapter. First, however, it is important to consider the example script used
for illustration. That way, the information has a context and is more
meaningful.
The example script has been created as a stand-alone script. The script
creates a random sample of features from an input feature class on the basis
of a user-specified count and saves the resulting sample as a new feature
class. The complete code is shown next, followed by a figure of the same
script. The syntax highlighting in the figure assists with reading the script.
# Python script: random_sample.py
# Author: Paul Zandbergen
# This script creates a random sample of input features based
on
# a specified count and saves the results as a new feature
class.
# Import modules.
import arcpy
import random
# Set inputs and outputs. Inputfc can be a shapefile or
geodatabase
# feature class. Outcount cannot exceed the feature count of
inputfc.
inputfc = "C:/Random/Data.gdb/points"
outputfc = "C:Random/Data.gdb/random"
outcount = 5
# Create a list of all the IDs of the input features.
inlist = []
with arcpy.da.SearchCursor(inputfc, "OID@") as cursor:
for row in cursor:
id = row[0]
inlist.append(id)
# Create a random sample of IDs from the list of all IDs.
randomlist = random.sample(inlist, outcount)
# Use the random sample of IDs to create a new feature class.
desc = arcpy.da.Describe(inputfc)
fldname = desc["OIDFieldName"]
sqlfield = arcpy.AddFieldDelimiters(inputfc, fldname)
sqlexp = f"{sqlfield} IN {tuple(randomlist)}"
arcpy.Select_analysis(inputfc, outputfc, sqlexp)
A few points of explanation about the script are in order. First, it is
important to understand the general logic of the script. The script creates a
list of all the unique IDs of the input features and uses the sample()
function of the random module to create a new list with the unique IDs of
the random sample. This new list is used to select features from the input
features, and the result is saved as a new feature class. Second, the input
feature class, the output feature class, and the number of features to be
selected are hard-coded in the script. Third, the script works for both
shapefiles and geodatabase feature classes. This is accomplished by using
OID@
when setting the search cursor, by using the OIDFieldName property to
read the name of the field that stores the unique IDs, and by using the
AddFieldDelimiters() function to ensure correct SQL syntax regardless of
the type of feature class.
The SQL syntax also warrants a bit of discussion. The WHERE clause uses
the IN operator to compare the unique ID of the features to the list of
randomly selected IDs, as follows:
sqlexp = f"{sqlfield} IN {tuple(randomlist}}"
In SQL, this list must be in a pair of parentheses, which is equivalent to a
tuple in Python. To ensure proper string formatting, f-strings are used, but
this formatting also can be accomplished using .format(). When testing the
script, the following can be added to check what the WHERE clause looks
like:
print(sqlexp)
For a geodatabase feature class, the WHERE clause looks something like
this:
OBJECTID IN (1302, 236, 951, 913, 837)
For a shapefile, the WHERE clause looks something like this:
"FID" IN (820, 1095, 7, 409, 145)
The actual ID values to be selected change with every run of the script
because the sample() function creates a new random selection regardless of
any previous result.
The WHERE clause is used as the third parameter of the Select tool, which
creates the output feature class. The script does not work on stand-alone
tables because the Select tool works only on feature classes. To work with
stand-alone tables, the Select Layer By Attribute tool is used instead.
Although the script works correctly, making changes to the inputs requires
opening the script in a Python editor, typing the inputs, and running the
script. A tool dialog box makes it a lot easier to share the functionality of
this script. The goal of the script tool is for a user to be able to specify the
input feature class to be used for sampling, the output feature class to save
the result, and the count of features to be included in the random sample. In
other words, the goal is to have a script tool in which the tool dialog box
incorporates these features, as shown in the figure.
It is important to have an expectation or vision as to what the final tool
dialog box should look like because it facilitates preparing your script, and
it guides decisions during the creation of the script tool. Simply drawing out
on a piece of paper what you expect the final tool dialog box to look like
can be a great help. Certain details may change along the way as you
develop and test your script tool, but it helps to have a goal.
Returning to the code example, there are a few things to notice about the
script. First, the script is broken into sections, each preceded with
comments, so the logic of the script is easier to follow. Comments are not
required for a script tool to work, but if users of the script tool are likely to
view the underlying code, comments will make it easier to understand.
Second, the script uses several hard-coded values, including an input
feature class, an output feature class, and a count of features. This type of
hard coding is typical for stand-alone scripts. To facilitate using the script as
a script tool, hard coding is limited to variables that will become tool
parameters. No hard coding is used anywhere else in the script.
To prepare your script for use as a script tool, follow these guidelines:
Make sure your script works correctly as a stand-alone script first.
Working correctly will require the use of hard-coded values.
Identify which values will become parameters in your script tool.
Create variables for these values near the top of your script, and make
sure that the hard-coded values are used only once to assign values to
the variables. The rest of your script should not contain any hardcoded values and use only variables.
Although the stand-alone script will run correctly provided the input feature
class exists, the script will require some changes to be used as part of a
script tool. These changes are facilitated by limiting the hard coding of
values to the variables that are going to be used as tool parameters.
Returning to the New Script dialog box, it’s time to complete the basic
information about the script tool in the General panel.
The following information should be included:
The name of a tool is used when you want to run a tool from Python.
The name cannot contain any spaces and follows most of the same
rules that apply to a Python function name.
The label of the tool is the display name of the tool in the toolbox.
The label name can have spaces. Consider the example of the Get
Count tool. In ArcGIS Pro, the tool appears with its label, Get Count
(with a space), but for the tool to be called from Python, its name,
GetCount (without a space), is used.
Note: Get Count is not a script tool but a system tool, and therefore
there is no script. You can view the properties of any of the built-in
geoprocessing tools, including system tools and script tools. Viewing the
properties of existing tools is a good way to learn about tool design and
get ideas for your own tools.
The script file is the complete path of the Python script file to be run
when the tool is executed. You can browse to an existing file or type
the path of a file. The script file does not need to exist at this point
and can be created later.
There are three different options to decide on. The first option is
whether you want to import the script. You should not import the
script when you are still creating the script tool, but you can import it
before sharing the tool with others. The option to set a password
becomes active only if you check the option to import the script. The
default settings—i.e., don’t import the script (and consequently don’t
use a password )—are used for most script tools.
The third option is to “store tool with relative path.” This option
typically should be checked and is checked by default. When this
option is checked, relative paths are used instead of absolute paths to
reference the location of the script file in relation to the location of
the custom toolbox file. Absolute paths start with a drive letter,
followed by a colon, and then the folder and file name—for example,
C:\Scripts\streams.py. Relative paths refer to a location that is relative
to a current folder and do not use a full path. Only the path to the
script file can be stored as a relative path; paths within the script itself
will not be converted. Typically, your Python script and toolbox are
in the same folder, and working with relative paths ensures the
Python script can still be located if the folder is moved or renamed. If
you are going to share the tool with others, which is often the goal of
developing a script tool, make sure this option is checked. Chapter 5
revisits working with paths in script tools in more depth.
For the example script tool, the dialog box is shown in the figure.
Description
Although you can continue with the tool parameters and validation, at this
point you also can save the information and return to it later. Click OK to
save the script tool properties. The script tool now appears in the custom
toolbox.
Although it looks like a finished script tool, the tool is far from completed.
When you double-click on the script tool, the tool dialog box opens, but
there are no parameters.
When you click Run, the tool executes and runs the Python script associated
with the script tool. At this stage, the script still uses the hard-coded values.
If the Python script does not produce any errors, the script tool produces the
desired outputs, which in this case consist of a new feature class with a
sample of five features.
Because the script tool executed the same as any regular geoprocessing
tool, it resulted in geoprocessing messages and a new entry in the
geoprocessing history.
Even though the script tool ran correctly and produced the desired output,
running the tool in this manner is not very meaningful. You are running the
stand-alone script with the hard-coded values by calling it from within
ArcGIS Pro, but the script tool is not user friendly. A user would have to
edit the script to change the hard-coded values. Still missing are the tool
parameters so a user can set those values using the tool dialog box instead
of editing the hard-coded values in the script.
Time to return to the tool parameters. Right-click on the script tool, and
click Properties. This step brings up the script tool properties with the same
three panels as the New Script dialog box, but this time the window reads
Tools Properties: Random Sample. In the General panel, you can edit the
properties set in an earlier step.
Click on the Parameters tab to view the parameters for the script tool.
By default, no parameters are listed, but most tools need at least one input
parameter and one output parameter. The script tool parameters are
organized as a table. Each row in the table is a parameter, and each column
in the table is a property of the parameter. The next sections examine tool
parameters in detail.
3.5 Exploring tool parameters
All geoprocessing tools have parameters. Users enter parameter values in
the tool dialog box. In contrast, for a stand-alone script, these values are
often hard-coded in the script. The parameters must be set using the tool
dialog box to create a script tool. When a script tool runs, the parameter
values from the tool dialog box are passed to the script. The script reads
these values and uses them in the code. Creating and exposing parameters
requires the following steps:
Including code in the script to receive the parameter values
Setting up the parameters in the script tool properties
These steps do not need to be completed in order, but they both must be
completed for the script tool to work as intended.
Next, you will examine how these steps are implemented by using one of
the built-in tools, the Multiple Ring Buffer tool. This tool is a good example
because it contains several different types of parameters, and because it is a
script tool whose code can be viewed. The tool dialog box is shown in the
figure.
The Multiple Ring Buffer tool has seven parameters total, the first three of
which are required. You can view details about these parameters under the
tool properties. To bring up the tool properties, in the Geoprocessing pane,
navigate to the Multiple Ring Buffer tool under Toolboxes > Analysis Tools
> Proximity. Right-click on the Multiple Ring Buffer tool, and click
Properties.
Description
Note: You cannot open the tool properties from the tool dialog box
itself. The properties can be opened only by right-clicking on the tool in
a list of tools—for example, by browsing to the tool using the Toolboxes
tab or by searching for the tool by name.
In the tool properties window, review the properties in the General panel.
Notice the location of the Python script: C:\Program
Files\ArcGIS\Pro\Resources\ArcToolBox\Scripts \MultiRingBuffer.py. The
options are dimmed because you cannot make changes to this script tool.
Description
Click on the Parameters panel to view the properties of the seven
parameters. The order in which the parameters are listed controls the order
in which they are shown in the tool dialog box. Also notice the parameters
are numbered starting with the number zero. When the tool is executed, the
values for the parameters are passed to the script and can be accessed using
an index starting at zero.
Description
The information also shows details about each parameter, including the
label, name, data type, whether the parameter is required or optional, the
direction (input or output), and several others. For example, the input
features parameter consists of a feature layer, which is a required input
parameter.
Reviewing the parameters of an existing script tool is helpful, especially
when you compare the properties with the tool dialog box. For example, the
Outside Polygons Only parameter is a Boolean parameter, which means it
shows up as a check box in the tool dialog box. The default value is false,
which means it is not checked by default when the tool dialog box opens.
Note: Because the Multiple Ring Buffer tool is a built-in tool, you can
see the list of parameters and their properties, but you cannot make any
changes. You can view the parameters for any geoprocessing tool (e.g.,
Clip, Buffer, and so on), but because standard tools are not written in
Python, you cannot view the code. The Multiple Ring Buffer tool was
chosen as an example, because it is a script tool (as are several other
built-in tools), and you can view the code.
Now that you have a better understanding of the parameters of this script
tool, it is time to examine the Python script to see how the parameters are
handled. When a user specifies the parameter values in the Multiple Ring
Buffer tool dialog box, the tool can be run. Once the tool is run, the userspecified parameter values are passed to the script. Review the script to see
how these parameter values are received by the script. To open the script,
right-click on the Multiple Ring Buffer tool, and click Edit.
Description
The script opens in your default Python editor, which is often IDLE.
You can change the default editor to be used from within ArcGIS Pro under
Geoprocessing Options. In ArcGIS Pro, click Project > Options >
Geoprocessing. You also can click on the Analysis tab and then on the
Geoprocessing Options icon (right below Ready To Use Tools). You can
select your script editor by navigating to the application.
For example, the path to IDLE for the default environment is C:\Program
Files\ArcGIS\Pro \bin\Python\envs\arcgispro-py3\Scripts\idle.exe. The path
will be different when running a different environment.
The path for Spyder is typically something like this: C:\Users\
<YourName>\AppData\Local\ESRI\conda\envs\
<YourEnvironment>\Scripts\spyder.exe. The path for PyCharm is typically
something like this: C:\Program Files\JetBrains\PyCharm Community
Edition 2019.3.4\bin\pycharm64.exe. The path for PyCharm is the same
regardless of which environment you are using, but to work with a script,
you must configure the environment from within PyCharm. Because, in this
case, you only are viewing the script and not making any changes to it or
running it from the Python editor, IDLE will suffice.
As an alternative, you can use your Python editor and open the script
directly by navigating to its location: C:\Program
Files\ArcGIS\Pro\Resources\ArcToolbox\Scripts\MultiRingBuffer.py.
The example MultiRingBuffer.py script includes introductory comments,
the import of several modules, and a dictionary for unit conversions. The
section of interest here is a bit farther down, starting at line 42.
The tool parameters are received by the script using the
GetParameterAsText() and GetParameter() functions of ArcPy. Notice how
there are seven parameters total with index values 0 through 6. These
parameters in the script match exactly with the parameters in the tool dialog
box. When a user specifies tool parameters using the tool dialog box and
then clicks Run, the values of the parameters are passed to the script as a
list. The values are read on the basis of their index value. A combination of
the GetParameterAsText() and GetParameter() functions is used, depending
on the nature of each parameter. These two functions are examined in this
section in a bit more detail.
The syntax of the GetParameterAsText() function is
<variable> = arcpy.GetParameterAsText(<index>)
The only argument of this function is an index number on the tool dialog
box, which indicates the numeric position of the parameter. The parameters
set on the tool dialog box are sent to the script as a list, and the
GetParameterAsText()
function assigns these parameter values to variables
in the script. The GetParameterAsText() function receives parameters as a
text string, even if the parameter on the tool dialog box is a different data
type. Numerical values, Boolean values, and other data types are all
converted to strings, and additional code is included to correctly interpret
these strings to their correct type. For example, the code for the Outside
Polygons Only parameter is as follows:
outsidePolygons = arcpy.GetParameterAsText(6)
if outsidePolygons.lower() == 'true':
sideType = 'OUTSIDE_ONLY'
else:
sideType = ' '
The Outside Polygons Only parameter on the tool dialog box is a Boolean
value of true or false. These values are converted to strings, and as a result,
the conditional statement uses the string value “true” instead of the Boolean
value True.
Another example of the formatting of the parameter values is illustrated by
the Buffer Unit parameter, called “unit” in the script. In the tool dialog box,
a user selects the unit from a drop-down list (e.g., Feet or Kilometers). In
the script, the value is received as follows:
unit = arcpy.GetParameterAsText(3).upper()
The string method upper() is used to convert the input string to uppercase,
resulting in FEET or KILOMETERS.
The Distances parameter on the tool dialog box is received by the script
using the GetParameter() function. This function is used because the
parameter consists of multiple values (doubles, in this case) instead of a
single value. The GetParameter() function reads this parameter as a list of
floats.
An alternative to using the GetParameterAsText() and GetParameter()
functions is to use sys.argv, or system arguments. The index number for
sys.argv
starts at 1, so sys.argv[1] is equivalent to
GetParameterAsText(0).
The use of sys.argv has its limitations, including
the fact that it accepts a limited number of characters, thus using the
GetParameterAsText() and GetParameter() functions is preferred.
The example script MultiRingBuffer.py uses a custom function
get_parameter_values() to receive the parameter values, and this function
is called later in the script. The use of a custom function is not required to
receive parameter values, and many simpler scripts do not apply this
practice.
Every tool parameter has an associated data type. One of the benefits of
data types is that the tool dialog box will not send values to the script unless
they are the correct data type. User entries for parameters are checked
against the parameter data types before they are sent to the script. This is
one advantage of using tools over stand-alone scripts, because the script
does not have to check for invalid parameters.
The data types of the parameters of the Multiple Ring Buffer tool include a
feature layer, a feature class, a list of doubles, three strings, and a Boolean.
Many other data types are possible for the parameters of a script tool, from
address locator to z-domain. Data types for parameters should be selected
carefully because they control the interaction between the tool dialog box
and the script.
After parameters are assigned a data type, the tool dialog box uses this data
type to check the parameter value. For example, if you enter a path to an
element of a different data type, the tool dialog box will generate an error.
In the example, the Input Features parameter is a feature layer, so typing the
path for a raster, such as C:\Raster\elevation, generates an error and
prevents the tool from running.
Description
This built-in error-checking mechanism prevents users from using incorrect
parameters to run a tool. When the tool runs, the dialog box has already
validated the parameter Input Features as a feature layer, and no additional
code is needed in the script to verify this is the case.
The data type property is also used when using the drop-down menu in the
tool dialog box to select datasets and when browsing through folders for
data. Only data that matches the parameter’s data type is shown. This
prevents entering incorrect paths to data. In the previous example, if the
raster dataset C:\Raster\elevation is added as a layer to the current map, this
raster layer will not be shown in the drop-down options for the Input
Features parameter.
3.6 Setting tool parameters
Tool parameters for a script tool can be set when creating a new script tool.
They can also be edited after the script tool is created by accessing the
script tool properties. Setting parameters works the same, no matter at
which stage they are set.
The example used for illustration here is the Random Sample script tool
created earlier for which no parameters were set. Right-click on the script
tool, and click Properties to bring up the script tool properties.
Note: Recall that double-clicking on the script tool brings up the tool
dialog box itself, which does not allow you to set the parameters. This is
the same as right-clicking on the script tool and clicking Open. Rightclick > Edit brings up the script associated with the script tool, but that
is for a later step.
You can complete the properties for each parameter by clicking on the cells
in the table. Some properties must be typed (such as the label), whereas
others must be selected from a list of options (such as the data type). The
first parameter of the tool is the input features. For the Label property, enter
Input Features. The label is the display name that shows up in the tool
dialog box, so the label should be meaningful. Spaces are allowed for the
label.
To move to the next cell, click on that cell, or press Tab. The Name property
is filled out with a default name based on the Label property without
spaces. This default name typically is enough.
Description
Also, notice that a new row is added for the next parameter as soon as the
minimally required properties for the first parameter are specified. First,
however, you must complete the properties for the first parameter.
Next, click on the cell for the Data Type property. This step brings up the
Parameter Data type dialog box. The default is set to String, but you can
select the appropriate type from the drop-down list of options.
Select Feature Layer as the type, and click OK to close the Parameter Data
type dialog box.
A feature layer means that, on the tool dialog box, a user can use a dropdown list to select from the feature layers in the current map but also can
browse to a feature class on disk. One of the other options for data type is
Feature Class, which sounds similar to Feature Layer but allows only for
the use of feature classes on disk and does not allow for the use of feature
layers in the current map.
The Parameter Data type has several other options. For example, you can
select multiple data types (e.g., Feature Layer and Table View), which
means all those types become valid entries for this parameter in the tool
dialog box. There is also an option to use Multiple values, which makes it
possible to enter multiple values of the same data type as a single
parameter. These values are passed to the script as a list of values. The
option to use a table of values makes it possible to enter multiple values but
in a table format. These options allow for more complicated tool
parameters. Familiarity with other geoprocessing tools helps explain the
possibilities these options provide. Tools such as Intersect and Union use
this table of values format, also referred to as a value table. For example,
the first parameter of the Union tool is Input Features, but it allows you to
select multiple feature layers and their associated ranks. Combined, the
feature layers and ranks represent only a single tool parameter.
Parameters with multiple values are passed to the script as a string, with the
individual list elements separated by semicolons. The Python split()
method can create a list of the elements from the string. The syntax is as
follows:
import arcpy
input = arcpy.GetParameterAsText(0)
input_list = input.split(";")
As an alternative, parameters with multiple values also can be handled
using arcpy .GetParameter(). In that case, the result is a list of values, and
the individual values can be obtained using an index or by iterating over the
list.
For parameters that consist of a table of values, you can use GetParameter()
to obtain a ValueTable object instead of a string or list. In a ValueTable
object, the values are stored in a virtual table of rows and columns.
ValueTable is an ArcPy class that is specifically designed for this type of
parameter.
Because of these various options, it is important when writing the script to
be aware of the data type of the parameters being passed to the script from
the tool dialog box.
Returning to the Random Sample script tool example, the parameter
properties so far are shown in the figure.
Description
For the remaining parameter properties, the default values are enough.
These default values include Required for Type and Input for Direction.
There are three choices for Type: Required, Optional, and Derived.
Required means that a parameter value must be specified for a tool to run.
Optional means that a value is not required for a script to run. Typically,
when setting the Type property to Optional, a default value for the
parameter is specified in the tool properties or in the script. Derived
parameters are used for output parameters only and do not appear on the
tool dialog box. Derived parameters are used in several cases, including the
following:
When a tool outputs a single value instead of a dataset. Such a single
value is often referred to as a scalar
When a tool creates outputs using information from other parameters
When a tool modifies an input without creating a new output
All tools should have outputs so that the tool can be used in a model and be
called from a script. Sometimes the only way to ensure that a tool has an
output is by using a derived parameter. Examples of tools with derived
parameters include the Get Count and Add Field tools. The input parameter
of the Get Count tool is a feature class, table, or raster, and the output is a
count of the number of rows. This count is a scalar variable and is returned
as a Result object. The count comprises an output parameter and is a
derived parameter that does not appear on the tool dialog box. The tool
properties confirm that the Get Count tool has two parameters, but because
the Row Count variable is a derived output parameter, it does not appear in
the tool dialog box.
Note: Running the Get Count tool as a single tool is not common.
Although the count is printed to the Results window, this tool typically is
used within a model or script in which the output is used as the input to
another step. The Get Count tool is also commonly used in conditional
statements. For example, a procedure can be stopped if the count of
rows is zero.
The Direction property defines whether the parameter is an input of the tool
or an output. There are only two choices for Direction: Input and Output.
For derived parameters, the parameter direction is automatically set to
Output. Every tool should have at least one output parameter. Having an
output parameter makes it possible to use the tool in ModelBuilder.
Although technically a script can run without output parameters, for
ModelBuilder to work, every tool needs an output so it can be used as the
input to another tool in the model.
Returning to the Random Sample script tool, the tool dialog box so far
consists of a single parameter.
The second parameter consists of the output features to be created by the
tool. The parameter properties are as follows:
Label: Output Features
Name: Output_Features
Data Type: Feature Layer
Type: Required
Direction: Output
The third parameter consists of the number of features to be selected at
random. The parameter properties are as follows:
Label: Number of Features
Name: Number_of_Features
Data Type: Long (for long integer)
Type: Required
Direction: Input
The use of the Long data type ensures that only integer values are entered.
The tool will not run if text or decimal numbers are entered. There is one
additional parameter property to consider, which is the Filter property.
Logically, the number of features should be a positive number, not a
negative number or zero. The Filter property can be used to set the range of
allowable values. The options for this property are None, Range, and Value.
Clicking on the cell and selecting the Range option brings up the Range
Filter dialog box. Enter the value of 1 for the minimum and a very large
value for the maximum—e.g., 1000000000. Leaving the maximum blank is
not an option, which means you must pick a somewhat arbitrary maximum
number in this case.
Click OK to close the Range Filter dialog box. The parameter properties are
now as shown in the figure.
Description
These settings complete the tool parameters for the Random Sample script
tool. If the order of the parameters must be modified, right-click on a row,
and click Move Up or Move Down. If you must remove a parameter, rightclick on a row, and click Delete. For the Random Samples script tool, the
order and number of parameters is correct, and no further changes are
necessary. Click OK to save the tool properties.
The tool dialog box is now as shown in the figure.
Of the parameter properties reviewed so far, the Filter property requires a
bit more discussion. The Filter property allows you to limit the values to be
entered for a parameter. There are several filter types, and the type depends
on the data type of the parameter. For example, for the Long and Double
data types, the types are Range and Value List. The Range filter allows you
to specify minimum and maximum values, and the Value List filter allows
you to specify a list of specific values. These filters are like the range and
coded value domains for a geodatabase. When the data type is a feature
layer or feature class, there is only one filter, called Feature Type. This
allows you to filter the valid entries for the parameter on the basis of
geometry type, including point, polyline, and polygon features.
The available filter types consist of the following:
Areal units—acres, hectares, and so on
Feature type—point, polyline, polygon, and so on
Field—short, long, float, double, text, and so on
File—custom file extensions (e.g., .csv, .txt, and so on)
Linear units—inches, feet, meters, and so on
Range—values between a specified minimum and maximum value
Time units—seconds, minutes, hours, and so on
Value list—a set of specific custom values
Workspace—file system, local database, or remote database
For most data types, there is only one filter type. For example, if the data
type of a parameter is set to Feature Class, the only possible filter type is
Feature Type. Many data types have no filter type at all. The different filter
types exert specific control over which values are valid inputs. Carefully
setting the filter type improves the robustness of the tool.
There are several parameter properties that have not been covered yet,
including Category, Dependency, Default, Environment, and Symbology.
These properties were not used in the Random Sample example script tool,
but they can be important for certain tools. Each property is reviewed
briefly in this section.
The Category property allows you to organize the tool parameters in the
tool dialog box. You create your own categories by typing a name. After
you use a category once, the name appears as a drop-down option for the
other parameters. Parameters within the same category are organized in an
expandable group in the script tool dialog box. This grouping is sometimes
used for tools with many parameters. Consider the example of the
Empirical Bayesian Kriging tool, which has no less than 15 tool parameters.
Several of the optional parameters are grouped into categories to make the
tool dialog box easier to read.
Description
The parameters in a category can be collapsed or expanded in the tool
dialog box as needed.
Description
The Dependency property can be used for input and derived output
parameters. In many cases, a tool parameter is closely related to another
one. For example, consider the Delete Field tool.
The first parameter is an input table, and the second parameter, Drop Field,
is a list of fields. The list of fields is populated only when the input table is
selected.
Description
This dependency of one parameter on another parameter in the same tool is
controlled using the Dependency property. In the example of the Delete
Field tool, the Dependency property of the Drop Field parameter is set to
the input table.
A second reason to use the Dependency property is to work with derived
output parameters. For example, when an input parameter is modified by a
tool, the Dependency property of the derived output parameter is set to the
input parameter. In the case of the Delete Field tool, the Dependency
property of the output parameter is set to the input table.
Note: Remember that the derived output parameter is not visible in the
tool dialog box.
The Default property allows you to specify the value of the parameter when
the script tool dialog box is open. If no default value is specified, the
parameter value will be blank on the tool dialog box. Default values are
commonly used for Boolean parameters.
The Environments property provides another option to set default values.
This property provides a drop-down list with environment settings. You
select a specific setting, and when this property is set, the default value for
the parameter is obtained from the environments of the geoprocessing
framework.
The Symbology property allows you to specify the path to a layer file. By
default, the output of a geoprocessing tool is added to the current map. This
behavior can be set as part of Geoprocessing Options by checking the box
for “Add output datasets to an open map.” The symbology of a layer added
in this way follows the regular rules for adding data to a map in ArcGIS Pro
—in other words, there is no customized symbology. The Symbology
property can be set to a custom layer file (.lyrx). This option is available
only for outputs for which layer files make sense, such as feature layers,
rasters, TINs, and so on. Setting the Symbology property does not control
whether the output is added to an open map because this option is
controlled by Geoprocessing Options.
3.7 Editing tool code to receive parameters
Although the tool dialog box for the Random Sample script tool is created,
the tool is not ready to be used. Still missing are changes to the Python
script to receive the tool parameters when the tool is executed.
When testing a script tool, you will alternate between running the tool and
editing the script until the tool works as desired. You can leave the Python
editor open while you carry out this testing. You can open a script from
within the Python editor, but there is a shortcut in ArcGIS Pro. Right-click
the script tool in the toolbox, and click Edit, which opens the Python script
associated with the script tool in a Python editor. You can configure which
editor is used under Geoprocessing Options, as discussed earlier in this
chapter.
Editing a script in a Python editor does not prevent the script from being
used by a script tool. Therefore, you can leave your script open in a Python
editor during the testing of the script tool. Logically, you must save your
changes to the script for them to take effect. When executing a script tool,
the tool calls the associated script. This call is independent from using a
Python editor. In other words, whether your Python editor is open or not
and whether you have the script open or not has no effect on the execution
of the script tool. Also, when a script tool calls a script, no messages are
printed to the interactive interpreter of your Python editor.
Time to consider the changes necessary to the script. The following code
shows the portion of the original script that must be modified:
inputfc = "C:/Random/Data.gdb/points"
outputfc = "C:/Random/Data.gdb/random"
outcount = 5
The three variables that are hard-coded into the script must be modified to
parameters using the GetParameterAsText() or GetParameter() function.
The first two parameters are both strings, and the GetParameterAsText()
function will suffice for these values. The third parameter is a number,
which means the GetParameter() function can be used. The modified code
is as follows:
inputfc = arcpy.GetParameterAsText(0)
outputfc = arcpy.GetParameterAsText(1)
outcount = arcpy.GetParameter(2)
The GetParameterAsText() function also can be used for parameters that are
not strings, but the value must be cast to the appropriate type. For example,
the third parameter could also be received as follows:
outcount = int(arcpy.GetParameterAsText(2))
It is important to recognize that GetParameterAsText() returns the values as
a string, whereas GetParameter() returns the values as an object. Thus, a
decision between using GetParameter() and GetParameterAsText() must be
based on a good understanding of the values being passed to the script.
Once you make these changes to the script and save the script file, the tool
is ready to run, and it can be used like any regular geoprocessing tool.
The tool creates a random selection of the input features on the basis of the
specified number and saves the result to a new feature class. The new
feature class is added as a feature layer to the active map.
The execution of the tool results in messages and a new entry to this
geoprocessing history.
Typically, a tool does not work perfectly the first time it is tested, and you
may find yourself tweaking both the Python script and the tool parameters
in an iterative manner until the tool performs as desired.
You also may test your tool for robustness by trying to enter incorrect or
invalid parameter values. For example, what happens if you enter a negative
number for the number of features? Or what happens if the number you
enter is larger than the number of features in the input feature class?
Note: The answer to the first question is that an error appears in the
tool dialog box because a filter was used, and the tool will run only if
the number of features is greater than 1. The answer to the second
question is that the tool executes but produces an error generated by the
line randomlist = random .sample(inlist, outcount). The error is
ValueError: Sample larger than population or is negative.
At this
stage, therefore, the Random Sample tool works but is not very robust.
3.8 Customizing tool behavior
Once the parameters of a script tool are specified, you can add custom
behavior. Examples of custom behavior include the following:
Certain parameters may need to be enabled or disabled on the basis of
the values contained in other parameters.
Some parameters may benefit from having a default value specified
on the basis of the values in other parameters.
Warning and error messages may need to be customized.
Tool behavior can be set on the Validation tab on the Tool Properties dialog
box. In the Validation panel, you can use Python code that uses a Python
class called ToolValidator. The ToolValidator class controls how the tool
dialog box is changed on the basis of user input. It also is used to describe
the output data the tool produces, which is important for using tools in
ModelBuilder.
The ToolValidator class makes it possible to create more robust tools. A
detailed description of customizing tool behavior is not provided here.
Details on the ToolValidator class can be found in ArcGIS Pro help, under
the topics “Customizing Script Tool Behavior” and “Programming a
ToolValidator Class.” The ToolValidator class is used only on the
Validation panel of the script tool properties. The code for validation is
written in Python and can be edited using a Python editor, but the code is
embedded in the toolbox instead of as a separate script file.
3.9 Working with messages
One of the advantages of running a script as a script tool is being able to
write messages that appear in the tool dialog box and in the geoprocessing
history. Tools and scripts that call a tool also have access to these messages.
When scripts are run as stand-alone scripts, messages are printed only to the
interactive interpreter—there is no tool dialog box and no geoprocessing
history in which messages can be retrieved later. Also, there is no sharing of
messages between stand-alone scripts.
However, because script tools work the same as any other geoprocessing
tool in ArcGIS Pro, they automatically create messages. For example, when
the Random Sample tool is run, it prints simple messages that indicate when
the script started running and when it was completed. Several ArcPy
functions are available for writing additional messages. These functions
include the following:
AddMessage()—for
general information messages (severity = 0)
AddWarning()—for
warning messages (severity = 1)
AddError()—for
error messages (severity = 2)
AddIDMessage()—for
both warning and error messages
AddReturnMessage()—for
all messages, independent of severity
The AddReturnMessage() function can be used to retrieve all messages
returned from a previously run tool, regardless of severity. The original
severity of the geoprocessing messages is preserved—for example, an error
message is printed as an error message. Some of the other message
functions create a custom message. The use of these message functions is
illustrated in this section using the Random Sample script tool.
When a user of the Random Sample tool enters a value that exceeds the
number of input features, the script fails, and an error is reported, as shown
in the figure.
The error results from the random.sample() function when the value for
outcount
is greater than the value for inlist. The error message is
somewhat informative (“Sample larger than population or is negative”) but
not user friendly. Especially for a user who is not familiar with Python
code, the message is not very helpful. The message is potentially
misleading because it appears to refer to a potential issue with the
random.py script in the default environment (arcgispro-py3). For a typical
ArcGIS Pro user without scripting experience, such messages are confusing
and frustrating.
To make the tool more robust, a check can be added to the script to compare
the number of features to be selected with the number of input features. The
number of input features is determined using the Get Count tool. This
additional code should follow the block of code in which the tool
parameters are received, as follows:
fcount = arcpy.GetCount_management(inputfc)[0]
if outcount > int(fcount):
When the number of features to be selected exceeds the number of input
features, the AddError() function is used to return a custom message:
arcpy.AddError("The number of features to be selected"
"exceeds the number of input features.")
When this error happens, the script should end. The script can be ended
using the following code:
sys.exit(1)
Exit code 1 means there was a problem, and that is why the script ended.
Exit code 0 is used when the script ends without any problems. This line of
code also requires adding import sys at the top of the script.
When the number of features to be selected does not exceed the number of
input features, the script should continue as usual, as follows:
else:
<rest of script>
The updated script is now as shown in the figure.
Description
When these changes are made to the script, the tool fails when the number
of features to be selected exceeds the number of input features.
Description
The error message provides specific feedback to the user about why the tool
failed.
Even though the tool fails, the feedback to the user is more specific and
does not bring up potentially misleading messages related to the Python
code.
Different scenarios may not result in an error but warrant a warning or
another type of message. For example, in the case of the Random Sample
tool, what if the number of features to be selected is the same as the number
of input features? This number would not be a reason for the tool to fail, but
the tool has effectively copied the input features without making any
changes. The AddWarning() function can be used to report a warning
message:
if outcount == int(fcount):
arcpy.AddWarning("The number of features to be selected"
"is the same as the number of input
features."
"This means the tool created a copy of
the"
"input features without creating a
sample.")
This code is added to the end of the script, but inside the else block.
A warning message does not prevent the tool from finishing, but the tool
dialog box reports that the tool completed with warnings, as shown in the
figure.
The View Details link brings up the custom message, as shown in the
figure.
Another level of control can be accomplished using the AddIDMessage()
function. This function makes it possible to use system messages within a
script tool. The syntax of the function is as follows:
AddIDMessage(message_type, message_ID, {add_argument1},
{add_argument2})
The message type can be set to ERROR, INFORMATIVE, or WARNING. The
message ID number indicates the specific Esri system message. Depending
on the message, additional arguments may be necessary. In the following
example code, an error message with the message ID number 12 is
produced if the output feature class already exists:
import arcpy
infc = arcpy.GetParameterAsText(0)
outfc = arcpy.GetParameterAsText(1)
if arcpy.Exists(outfc):
arcpy.AddIDMessage("ERROR", 12, outfc)
else:
arcpy.CopyFeatures_management(infc, outfc)
The syntax of error message 12 is
000012: <value> already exists
This message has one argument, which in this case is the name of a feature
class.
There are more than one thousand message codes, and there is no single list
of all of them in the ArcGIS Pro help pages. You can enter a message code
in the help pages to view the description, but you cannot browse through
and search the message descriptions. In Python, you can use the
arcpy.GetIDMessage() function to get the description associated with a
specific message ID, as follows:
import arcpy
m_id = 12
print(arcpy.GetIDMessage(m_id))
The result is
%s already exists
The modulo symbol (%) is a placeholder for the name of the file.
To make this information more useful, you can build a dictionary of all the
message IDs and their associated string values, as follows:
import arcpy
dict_id = {}
for k in range(1000000):
v = arcpy.GetIDMessage(k)
if v:
dict_id[k] = v
The error codes consist of six digits, from 1 through 999999, which is why
the argument of the range function is set to 1000000. Not all possible
numbers are valid error codes, which is why a message is added to the
dictionary only if it has a value. Once the dictionary is created, you can
search through it for system messages of interest. For example, the
following code prints all system messages that have JSON as part of the
string:
for k,v in dict_id.items():
if "JSON" in v:
print(k, v)
The result prints as follows:
1303 Invalid JSON in WebMap.
1451 Unable to parse service configuration JSON.
2092 Failed to export diagram layer definition to JSON.
...
Working with system messages in your script tool provides additional
integration of your tool with the geoprocessing framework of ArcGIS Pro,
although it can be cumbersome to sift through the specific message codes.
3.10 Handling messages for stand-alone scripts and
tools
Python scripts can be run as stand-alone scripts or as tools. Messaging
works a bit differently for each one. However, a script can be designed to
handle both scenarios. For a stand-alone script, there is no way to view
messages, and they must be printed to the interactive interpreter. For a tool,
functions such as AddError() are used instead of printing messages to
ensure messages appear in the geoprocessing environment, including the
geoprocessing history. Standard practice is to write a message-handling
routine that writes messages to both the interactive interpreter and the
geoprocessing environment, using the print() function for the former and
ArcPy functions such as AddError(), AddWarning(), and AddMessage() for
the latter.
3.11 Customizing tool progress information
When a tool runs, information on its progress can take several forms. The
appearance of the progress dialog box can be controlled using the ArcPy
progressor, or progress indicator, functions. The ArcPy progressor functions
include the following:
SetProgressor()—sets
the type of progressor
SetProgressorLabel()—changes
the label of the progressor
SetProgressorPosition()—moves
the step progressor by an
increment
ResetProgressor()—resets
the progressor
There are two types of progressors: default and step. In the default type, the
progressor moves back and forth continuously but doesn’t provide a clear
indication of how much progress is being made. The label above the
progressor provides information on the current geoprocessing operation.
In the step progressor, the percentage completed is shown. This information
can be useful when processing large datasets.
The type of progressor is set using the SetProgressor() function. This
function establishes a progressor object, which allows progress information
to be passed to the tool dialog box. The appearance of the tool dialog box
can be controlled using either the default progressor or the step progressor.
The syntax of this function is as follows:
SetProgressor(type, {message}, {min_range}, {max_range},
{step_value})
The progressor type is either default or step. The message is the progressor
label that appears at the beginning of the tool execution. The three
remaining parameters are for step progressors only and indicate the start
value, end value, and step interval. In a typical step progressor, the start
value is set to 0, the end value to however many steps are completed in the
geoprocessing operations, and the step interval to 1.
The SetProgressorLabel() function is used to update the label of the
progressor, which is typically a unique string specific to each step. The
SetProgressorPosition() function is used to move the step progressor by
an increment on the basis of the percentage of steps completed. These
functions are commonly used in combination so that the label is updated at
every increment. Once tool execution is completed, the progressor can be
reset to its original position using the ResetProgressor() function. This
function is used only if there is a second series of steps to complete for
which the progress should be shown separately. There is no need to reset
the position of the progressor when a tool is completed.
The following example script illustrates the use of a step progressor. The
script is associated with a script tool that copies all the shapefiles from one
workspace to a geodatabase. A step progressor is used, and the number of
steps is derived from the number of feature classes in the list. In the for
loop, the label is changed to the name of the shapefile being copied, and
after the shapefile is copied, the step progressor is moved by an increment.
The script is as follows:
import arcpy
import os
arcpy.env.workspace = arcpy.GetParameterAsText(0)
outworkspace = arcpy.GetParameterAsText(1)
fclist = arcpy.ListFeatureClasses()
fcount = len(fclist)
arcpy.SetProgressor("step", "Copying shapefiles to
geodatabase...",
0, fcount, 1)
for fc in fclist:
arcpy.SetProgressorLabel("Copying " + fc + "...")
fcdesc = arcpy.Describe(fc)
outfc = os.path.join(outworkspace, fcdesc.baseName)
arcpy.CopyFeatures_management(fc, outfc)
arcpy.SetProgressorPosition()
Running the script brings up a step progressor that shows the percentage
completed. This percentage is calculated from the step progressor
parameters—that is, the steps are automatically converted to a percentage as
they are completed.
An important consideration is the number of steps being used in a step
progressor. In many scripts, it is not known in advance how many features,
feature classes, fields, or records must be processed. A script that uses a
search cursor, for example, may iterate over millions of records. If each
iteration is one step, the progress dialog box would need to be updated
millions of times, which could severely reduce performance. It may
therefore be necessary to include a section in the script that determines the
number of iterations (features, feature classes, rows, or whatever the case
may be), and then determines an appropriate number of steps on the basis of
the number of iterations. Code examples for determining the number of
steps are provided in the ArcGIS Pro help topic “Controlling a Script Tool’s
Progressor.”
Points to remember
Although Python scripts can be run as stand-alone scripts outside
ArcGIS Pro, there are many benefits to creating custom tools within
ArcGIS Pro. Tools allow a closer integration of scripts in the ArcGIS
Pro geoprocessing framework. Tools also make it easier to share the
workflows with others who may not have experience using Python.
There are two ways to develop tools for use in ArcGIS Pro using
Python: script tools and Python toolboxes. Script tools are created
using elements of the ArcGIS user interface whereas Python
toolboxes are created entirely in Python.
A script tool can be created in a toolbox (.tbx) and reference a single
Python script file (.py) that is called when the tool is run.
For tools to be usable and effective, script tool parameters must be
created. Creating tool parameters includes setting parameters in the
script tool properties, as well as including code in the script to receive
the parameter values. Script tool parameters define what the tool
dialog box looks like.
Effective tools have carefully designed parameters. Each parameter
has several properties, including a data type, such as feature class,
table, value, field, or other. The parameter properties provide detailed
control of the allowable inputs for each parameter. This control
ensures that the parameters passed from the script tool dialog box to
the script are as expected.
All script tools should have outputs so that the tool can be used in
ModelBuilder and other workflows. Sometimes the only way to
achieve outputs is to use derived parameters, which do not appear on
the tool dialog box.
Tool behavior can be further customized using a ToolValidator class.
Various message functions can be used to write messages that appear
in the tool dialog box and in the geoprocessing history. The
appearance of the progressor also can be modified. This progress
indicator is particularly relevant if the tool is likely to carry out many
iterations.
Key terms
absolute path
comments
custom behavior
dependency
derived parameter
hard-coded value
progressor
Python toolbox
relative path
scalar
script tool
stand-alone script
tool dialog box
Review questions
What are some of the benefits of using custom tools compared with
using stand-alone Python scripts?
Describe the steps to create a script tool.
What are some of the critical tool parameters to be aware of during
the creation of a script tool?
What changes must be made to the code of a stand-alone script for it
to be used as part of a script tool?
Chapter 4
Python toolboxes
4.1 Introduction
This chapter describes how to create a Python toolbox. Python toolboxes
provide an alternative to creating a Python script tool. A Python toolbox can
contain one or more tools. From a user perspective, tools inside a Python
toolbox work just like regular geoprocessing tools. Many of the benefits of
script tools also apply to tools inside a Python toolbox. The biggest
difference from a developer perspective is that a Python toolbox is written
entirely in Python. This characteristic makes Python toolboxes a preferred
approach for those with more experience in developing tools and writing
Python scripts. This chapter describes the steps to create tools inside a
Python toolbox, including how to define the parameters.
4.2 Creating and editing a Python toolbox
A Python toolbox is a Python file with a .pyt extension. The use of the .pyt
file extension means that ArcGIS Pro automatically recognizes the file as a
Python toolbox. One Python toolbox defines one or more tools. This means
that if your Python toolbox contains multiple tools, they are all part of the
same Python code in a single .pyt file.
To create a new Python toolbox in ArcGIS Pro, right-click on the folder in
Catalog where you want to create it, and click New > Python Toolbox.
This step creates a new Python toolbox with a default name.
The symbol for a Python toolbox is like that for a regular toolbox in
ArcGIS Pro but has a small script icon showing in front. A new tool has
also been added with a default name. You also can create a new Python
toolbox by right-clicking on the Toolboxes folder and clicking New Python
Toolbox. You can rename the Python toolbox in the Catalog pane.
A default tool named Tool has been added, but this default tool is just a
temporary placeholder. Double-clicking on the tool brings up the tool dialog
box, but the tool has no parameters at this point.
When a new Python toolbox is created, a basic template is created from
scratch by ArcGIS Pro. This is the reason why you typically should create a
new Python toolbox from within ArcGIS Pro instead of starting with a new
empty script file in your Python editor. Alternatively, you can copy the code
from the template to your own script file.
To view the template, and to start modifying the code, right-click on the
Python toolbox, and click Edit. This step brings up the contents of the .pyt
file in the default script editor configured in ArcGIS Pro. Note that you are
editing the Python toolbox itself and all the tools that are part of the toolbox
—not the code for an individual tool. Recall that when creating Python
script tools, each script tool has its own associated script, and therefore you
edit the code for each tool separately, and not for the custom toolbox. In a
Python toolbox, the code for all the tools resides in a single .pyt file.
If no script editor is specified under Geoprocessing Options, the default
application to work with .py files in your operating system is used. This
default depends on how you configure your applications and file
associations. Therefore, it could be Notepad, IDLE, PyCharm, or something
else. Your operating system typically does not have a default application
configured for .pyt files, but when you edit a Python toolbox from within
ArcGIS Pro, it uses the default application associated with .py files. To
specify a script editor to be used from within ArcGIS Pro, enter the path for
the script editor under Geoprocessing Options. The typical path for IDLE
for the default environment is C:\Program
Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Scripts\idle.exe.
The template for a new Python toolbox is shown in the figure.
The details for working with this template are covered later in this chapter.
At this point, it is important to recognize that the code resides in a .pyt file,
not a regular script file with a .py extension. A .pyt file is a text file, just
like a .py file, and therefore it can be opened in a regular text editor. In a
Python editor, however, files with a .pyt extension are not always
recognized as a Python file type. Although IDLE recognizes a .pyt file as
Python code, many other IDEs do not, including Spyder and PyCharm.
When an IDE does not recognize the contents of a .pyt file as regular
Python code, there is no syntax highlighting or other functionality to work
with code. Notice the lack of syntax highlighting of a .pyt file open in
Spyder as shown in the figure.
You could temporarily save your .pyt file as a .py file to overcome this
limitation, but to test the Python toolbox in ArcGIS you would have to
change it back to .pyt, which is cumbersome. Some Python editors,
however, allow you to associate .pyt files with a Python file. In PyCharm,
with the .pyt file open, click File > Associate with File Types. In the
Register New File Types Association dialog box, make sure the file pattern
is set to .pyt, and choose Python for the option to Open matching files in
PyCharm.
Once you click OK, the .pyt file is recognized as Python code, and you can
use some of the functionality of PyCharm, including syntax highlighting
and error checking. In PyCharm, you also can review and modify the file
type associated by clicking File > Settings > Editor > File Types.
Note: The typical file extension for Python scripts is .py. This extension
is associated with the python.exe program, which opens a terminal
window when run. In Windows, the extension can be .pyw. This
extension is associated with the pythonw.exe program, which suppresses
the terminal window. When scripts are run from an application with a
GUI, you don’t want a separate terminal window to open.
Regardless of the IDE, keep in mind that you cannot run a .pyt file from a
Python editor. You can test the code only by using the tools inside the
Python toolbox from within ArcGIS Pro.
Time for a closer look at the template code. The Python toolbox code
includes a Python class named Toolbox that defines the characteristics of
the toolbox. This part of the code is as follows:
class Toolbox(object):
def __init__(self):
"""Define the toolbox (the name of the toolbox is the
name of
the .pyt file)."""
self.label = "Toolbox"
self.alias = ""
# List of tool classes associated with this toolbox
self.tools = [Tool]
Note: This class always should be called Toolbox. If you change this
name, the Python toolbox will not be recognized correctly. The name of
the Python toolbox is the name of the .pyt file. There is a comment to
this effect in the code to remind you of this rule, and you cannot change
the name of the Python toolbox in the code itself.
The Toolbox class contains one method called __init__(), which defines
the properties of the Python toolbox. These properties include an alias and a
label. An alias can consist of only letters and numbers, with no spaces or
other characters.
For the purpose of this chapter, the same Random Sample tool example
from chapter 3 is used for illustration. Typically, you would not develop a
script tool and a Python toolbox for the same tool, but using the same
example script facilitates a comparison. The first part of the Toolbox class is
as follows:
class Toolbox(object):
def __init__(self):
self.label = "Random Sampling Tools"
self.alias = "randomsampling"
Note: Comments and empty lines in the template are removed for the
most part in the code examples here for legibility purposes.
As you modify the .pyt file, you can check in ArcGIS Pro that the changes
are taking effect. Make sure to save your .pyt file in your Python editor, and
refresh your Catalog view. For example, after the previous changes are
made, right-click on the Python toolbox, and click Properties.
You cannot make changes to these properties directly in ArcGIS Pro
because changes are made only by editing the .pyt file. There is no need to
close the .pyt file in your Python editor, but be sure to save your edits
before you check the results in ArcGIS Pro.
Note: To see the effects of your code changes, you must refresh both the
folder where the Python toolbox resides and the Python toolbox itself.
You refresh in Catalog view by right-clicking on the folder or Python
toolbox and clicking Refresh.
If your code contains an error—for example, using an invalid character for
the alias—you will see the results in the toolbox properties, as shown in the
figure. Many other errors are not as easily identified.
The next property of the Toolbox class is critical. The template code is
self.tools = [Tool]
The self.tools property consists of a list containing all the tools defined in
the toolbox. The template contains only one tool, and it is called Tool by
default. Each tool is created as a class in the .pyt file, the first portion of
which reads as follows:
class Tool(object):
def __init__(self):
"""Define the tool (tool name is the name of the
class)."""
self.label = "Tool"
self.description = ""
self.canRunInBackground = False
The name of the class must correspond to the name of the tool used in the
self.tools property of the Toolbox class. Naming of these classes follows
the regular naming conventions for classes in Python—i.e., CapWords, with
no spaces. The tool class contains a method called __init__(), which
defines the properties of the tool. These properties include self.label and
self.description.
The name of the tool is established by the name of the
class itself. The label is what appears in ArcGIS Pro as the display name of
the tool, whereas the name of the tool is what allows you to call the tool
from another script or tool.
The __init__() method has several other properties. The
self.canRunInBackground
property is a Boolean property to specify
background processing of the tool in ArcGIS Desktop 10.x and has no
effect in ArcGIS Pro. The self.category property makes it possible to
organize tools into toolsets within a Python toolbox. The self.stylesheet
property allows you to change the stylesheet, but the default suffices in
most cases. When setting the properties for a tool, you include only those
properties that are given a value. The others can be left out.
For the Random Sample tool, the code so far is as follows:
class Toolbox(object):
def __init__(self):
self.label = "Random Sampling Tools"
self.alias = "randomsampling"
self.tools = [RandomSample]
class RandomSample(object):
def __init__(self):
self.label = "Random Sample"
This code updates the display of the tool in ArcGIS Pro, as shown in the
figure.
The tool class contains several other methods in addition to the __init__()
method. Even though they are all listed in the template, not all of them are
required. The following briefly describes each of these methods and states
whether they are required or not. Later sections in this chapter examine
these methods in more detail.
getParameterInfo()—optional.
Defines the tool parameters, similar
to the Parameters panel for a script tool.
isLicensed()—optional. Allows you to set whether the tool is
licensed to execute.
updateParameters()—optional.
Used for internal validation of tool
parameters.
updateMessages()—optional.
Used for messages created by
validation of tool parameters.
execute()—required. This is the source code of the tool in which the
actual task is being carried out, like the script file in a script tool.
To create a functional tool inside a Python toolbox requires the following
three methods: (1) __init__(), to define the properties of the tool; (2)
getParameterInfo(),
to define the tool parameters; and (3) execute(), to
carry out the actual task of the tool. Technically speaking, a tool can run
without the getParameterInfo() method (i.e., it is optional), but not using
this method means the tool dialog box has no parameters, and that is not
very meaningful. The next sections provide details on how to set up the
getParameterInfo() and execute() methods. The other functions provide
additional functionality but are not required for a tool to work.
The example so far uses only one tool. A single Python toolbox can contain
multiple tools. To create multiple tools, the self.tools property is assigned
a list of tools, and the Tool class is repeated for each tool. The basic
structure is as follows:
class Toolbox(object):
def __init__(self):
self.label = "My Cool Tools"
self.alias = "mycooltools"
self.tools = [CoolTool1, CoolTool2]
class CoolTool1(object):
def __init__(self):
self.label = "Cool Tool 1"
def getParameterInfo(self)
# Parameter definition
def execute(self, parameters, messages):
# Source code
class CoolTool2(object):
def __init__(self):
self.label = "Cool Tool 2"
def getParameterInfo(self)
# Parameter definition
def execute(self, parameters, messages):
# Source code
This structure illustrates another key difference with script tools. In a
Python toolbox, the code for all tools resides inside the same .pyt file,
whereas in a custom toolbox with multiple script tools, each script tool is
associated with its own Python script file.
4.3 Defining a tool and tool parameters
Tools in a Python toolbox must have parameters to be useful, as with script
tools. In a Python toolbox, tool parameters are defined using the
getParameterInfo() method. Each parameter is created as a Parameter
object. The syntax for the Parameter class is as follows:
Parameter({name}, {displayName}, {direction}, {datatype},
{parameterType}, {enabled}, {category},
{symbology},
{multiValue})
Each parameter of this class corresponds to a property of the tool parameter.
Notice that none of the class parameters is required, but logically several of
them are needed to create a meaningful tool parameter. All these class
parameters are strings except for enabled and multiValue, which are
Booleans. Because a typical tool parameter may require several of these
class parameters and their values (often strings) may be long, an alternative
notation typically is used in which each class parameter is on a separate
line. A generalized example for a single-input tool parameter looks
something like the following:
def getParameterInfo(self):
param0 = arcpy.Parameter(
displayName="Input Features",
name="in_features",
datatype="GPFeatureLayer",
parameterType="Required",
direction="Input")
Because all class parameters are named, there is no need to use the same
order as in the syntax, and there is no need to include class parameters if
they are not being used. The preceding notation relies on implicit line
continuation. Because the class parameters are surrounded by a pair of
parentheses, all the lines following the opening parenthesis are read as a
single line of code until the closing parenthesis. Effectively, the preceding
code is the same as the following:
def getParameterInfo(self):
param0 = arcpy.Parameter(displayName="Input Features",
name="in_features", datatype="GPFeatureLayer",
parameterType="Required", direction="Input")
Breaking down the syntax using a new line for each class parameter makes
the code easier to read and edit, but this style is not required. Almost all
published examples of Python toolboxes, however, use this style.
The Parameter class is used to create a Parameter object, and this object is
assigned to a variable. Variables can be called anything (within the rules for
variable names), but many examples use the names param0, param1,
param2, and so on. The numbering starts at zero because this system
facilitates the use of index numbers, as becomes clear later in this section. A
commonly used alternative is to use a variable that is similar to or the same
as the name property of the parameter. The variable naming style you use is
a matter of preference.
Once the parameter objects are created, additional properties are defined.
For example, recall the use of the Filter property when creating a script tool.
When applied to a feature layer, the Filter property makes it possible to set
the allowable feature types—e.g., point, polyline, or polygon. A similar
approach can be used when defining properties of tool parameters in a
Python toolbox. Filters are set using the filter property of the Parameter
class. This property uses the Filter class in ArcPy, which allows you to use
the same filters used when creating a script tool. The two options for filters
are list and type. For example, the following code sets a filter on the tool
parameter so that only polylines can be chosen:
param0.filter.value = ["Polyline"]
In summary, a Parameter object is created by providing values for some of
the class parameters that are part of the syntax for using the class.
Additional properties can be assigned a value once the object is created
using properties of the object. Those properties are not assigned a value
when the object is first created.
In addition to filter, some of these properties include displayOrder (the
order in which to display a tool in the tool dialog box),
parameterDependencies (to indicate dependencies between parameters), and
value
(the value of the parameter). In addition to properties, Parameter
objects also have several methods, which are mostly used for messages.
Once tool parameters are defined, the final step in the getParameterInfo()
method is to return the parameters as a list, as follows:
parameters = [parm0, param1, …]
return parameters
Returning to the example of the Random Sample tool, here is what the first
parameter looks like:
def getParameterInfo(self):
input_features = arcpy.Parameter(
name="input_features",
displayName="Input Features",
datatype="GPFeatureLayer",
parameterType="Required",
direction="Input")
parameters = [input_features]
return parameters
Note: Defining the parameter properties is not enough for the
parameter to show in the tool dialog box. The parameters also must be
returned, so be sure to include the last two lines of code to test the
parameter definitions.
This is just the first parameter, but it is useful to confirm that the code so far
is working. Return to ArcGIS Pro, and double-click on the Random Sample
tool to bring up the tool dialog box.
You also can check the parameter properties by opening the tool properties.
Right-click on the Random Sample tool in the Python toolbox, and click
Properties. Click on the Parameters panel to bring up the parameter
properties.
The results look identical compared with the results when creating a script
tool. The difference is that all the properties are created using Python code
instead of using the Tool Properties dialog box. You cannot make any
changes to these properties from within the dialog box because the Python
toolbox file is read-only from within ArcGIS Pro.
The second parameter of the Random Sample tool is the output feature
layer or feature class. The parameter definition is as follows:
output_features = arcpy.Parameter(
name="output_features",
displayName="Output Features",
datatype="GPFeatureLayer",
parameterType="Required",
direction="Output")
The third parameter of the Random Sample tool is the number of features to
be chosen. The parameter definition is as follows:
no_of_features = arcpy.Parameter(
name="number_of_features",
displayName="Number of Features",
datatype="GPLong",
parameterType="Required",
direction="Input")
For the tool parameters to be recognized, their values must be returned.
Update the last two lines of code as follows:
parameters = [input_features, output_features,
no_of_features]
return parameters
The tool dialog box now starts to look like a finished tool, as shown in the
figure.
You may recall from chapter 3 that one parameter property is still missing.
The number of features must be a positive integer greater than 1, which
requires the use of a filter. Filters are accessed using the filter property of
the Parameter object. The data type of the parameter constrains which
filters are possible. For integer, both Range and ValueList are valid filter
types, so you must first set the type, followed by the values to be used for
this type. The code is as follows:
no_of_features.filter.type = "Range"
no_of_features.filter.list = [1, 1000000000]
You can check the properties of the tool parameters to confirm that a range
filter is applied, although you won’t be able to review the values of the
filter.
The complete code for the Python toolbox so far follows. Notice how the
code for the filter property comes after the line of code in which the
parameter for the number of features is created and has the same
indentation.
import arcpy
class Toolbox(object):
def __init__(self):
self.label = "Random Sampling Tools"
self.alias = "randomsampling"
class Toolbox(object):
def __init__(self):
self.label = "Random Sampling Tools"
self.alias = "randomsampling"
self.tools = [RandomSample]
class RandomSample(object):
def __init__(self):
self.label = "Random Sample"
def getParameterInfo(self):
input_features = arcpy.Parameter(
name="input_features",
displayName="Input Features",
datatype="GPFeatureLayer",
parameterType="Required",
direction="Input")
output_features = arcpy.Parameter(
name="output_features",
displayName="Output Features",
datatype="GPFeatureLayer",
parameterType="Required",
direction="Output")
no_of_features = arcpy.Parameter(
name="number_of_features",
displayName="Number of Features",
datatype="GPLong",
parameterType="Required"
direction="Input")
no_of_features.filter.type = "Range"
no_of_features.filter.list = [1, 1000000000]
parameters = [input_features, output_features,
no_of_features]
return parameters
The code also is provided as a figure to serve as a reference for the proper
code organization and indentation. The code in the figure includes empty
lines to improve legibility.
One critical aspect of creating tool parameters has been overlooked so far.
Notice how the datatype properties are set to GPFeatureLayer and GPLong.
Although it is intuitive what these data types mean (i.e., a feature layer and
a long integer, respectively), their names are different from those used in
the list of options for the Data Type property when creating a script tool.
This disparity can lead to some confusion. When creating a script tool, you
can scroll through the list until you find the data type of interest without
having to worry about the exact name of the type. When defining the data
type property for a tool parameter in a Python toolbox, you must type the
correct value for the data type. There are more than 150 different data types.
Table 4.1 shows a small sample of some of the most commonly used data
types. The first column shows the values that show up in the drop-down list
for the data type when creating a script tool, whereas the second column
shows the values that should be used when specifying the data type
property for a tool parameter in a Python toolbox.
Table 4.1. Parameter data types in a Python
toolbox
Keyword for
datatype property
Description
Address
Locator
DEAddressLocator
A dataset used for geocoding
that stores the address
attributes, associated indexes,
and rules that define the
process for translating
nonspatial descriptions of
places to spatial data.
Areal Unit
GPArealUnit
An areal unit type and value,
such as square meter or acre.
Boolean
GPBoolean
A Boolean value.
Data type
Data type
Keyword for
datatype property
Description
A reference framework, such
as the UTM system consisting
of a set of points, lines, and/or
Coordinate
GPCoordinateSystem surfaces, and a set of rules
System
used to define the positions of
points in two- and threedimensional space.
Dataset
DEDatasetType
A collection of related data,
usually grouped or stored
together.
Date
GPDate
A date value.
GPDouble
Any floating-point number
stored as a double precision,
64-bit value.
DEFeatureClass
A collection of spatial data with
the same shape type: point,
multipoint, polyline, and
polygon.
DEFeatureDataset
A collection of feature classes
that share a common
geographic area and the same
spatial reference system.
GPFeatureLayer
A reference to a feature class,
including symbology and
rendering properties.
Double
Feature
Class
Feature
Dataset
Feature
Layer
Keyword for
datatype property
Description
Field
Field
A column in a table that stores
the values for a single
attribute.
File
DEFile
A file on disk.
Folder
DEFolder
Specifies a location on disk
where data is stored.
GPLayer
A reference to a data source,
such as a shape-file,
coverage, geodatabase
feature class, or raster,
including symbology and
rendering properties.
DELayer
A layer file stores a layer
definition, including symbology
and rendering properties.
Data type
Layer
Layer File
Linear Unit GPLinearUnit
A linear unit type and value
such as meter or feet.
Long
GPLong
An integer number value.
Map
GPMap
An ArcGIS Pro map.
Raster
Band
DERasterBand
A layer in a raster dataset.
Raster
GPRasterDataLayer
Data Layer
A raster data layer.
Data type
Keyword for
datatype property
Description
Raster
Dataset
DERasterDataset
A single dataset built from one
or more rasters.
Raster
Layer
GPRasterLayer
A reference to a raster,
including symbology and
rendering properties.
Shapefile
DEShapefile
Spatial data in a shapefile
format.
Spatial
GPSpatialReference
Reference
The coordinate system used to
store a spatial dataset,
including the spatial domain.
SQL
GPSQLExpression
Expression
A syntax for defining and
manipulating data from a
relational database.
String
GPString
A text value.
Table
DETable
Tabular data.
Table View GPTableView
A representation of tabular
data for viewing and editing
purposes, stored in memory or
on disk.
Text File
DETextfile
A text file.
Value
Table
GPValueTable
A collection of columns of
values.
Workspace DEWorkspace
A container such as a
geodatabase or folder.
Most of the datatype properties are like the terms in the drop-down list for
the data type when creating a script tool, but without the spaces and
preceded by the prefix DE (data element) or GP (geoprocessing). The
complete list can be found on the ArcGIS Pro help page “Defining
Parameter Data Types in a Python Toolbox.”
4.4 Working with source code
The steps completed so far created the tool dialog box, but the tool is not
ready to run. The code to carry out the task must still be added. This code is
referred to as the source code or the main body of the tool—the rest of the
code is used to define the tool and its parameters, and to customize tool
behavior. Recall that in the case of a Python script tool, the script must be
modified using the GetParameterAsText() and GetParameter() functions to
receive the values of the parameters entered in the tool dialog box by a user.
The source code for a tool inside a Python toolbox is found in the execute()
method. This method has arguments to work with parameters and messages:
def execute(self, parameters, messages):
The argument parameters refers to the list of parameters defined in the
getParameterInfo()
method. The value of each parameter is obtained from
this list using the valueAsText() or value() property of the Parameter
object. The following code illustrates a generic example with two
parameters:
def execute(self, parameters, messages):
in_fc = parameters[0].valueAsText
out_fc = parameters[1].valueAsText
The variable names do not have to be the same as the names assigned to the
parameters in the getParameterInfo method.
Consider how this works for the example Random Sample tool. Recall the
original stand-alone script discussed in chapter 3 that carries out the tasks of
interest, as shown in the figure.
This entire script represents the source code and is copied in the execute()
method. The hard-coded values for the variables must be modified.
Specifically, the following lines of code must be modified so the values are
obtained using the valueAsText() and value() properties:
inputfc = "C:/Random/Data.gdb/points"
outputfc = "C:/Random/Data.gdb/random"
outcount = 5
The modified code is as follows:
inputfc = parameters[0].valueAsText
outputfc = parameters[1].valueAsText
outcount = parameters[2].value
The first two parameter values are received as strings using valueAsText,
and the third parameter is received as an integer using value.
A few elements from the original script can be placed at the top of the
Python toolbox file, including importing modules and setting environment
properties. The top of the .pyt file now is as follows:
import arcpy
import random
The rest of the original script, including the modifications already
discussed, becomes the source code for the tool in the execute() method.
After removing comments and empty lines for display purposes, this part of
the code is as follows:
def execute(self, parameters, messages):
inputfc = parameters[0].valueAsText
outputfc = parameters[1].valueAsText
outcount = parameters[2].value
inlist = []
with arcpy.da.SearchCursor(inputfc, "OID@") as cursor:
for row in cursor:
id = row[0]
inlist.append(id)
randomlist = random.sample(inlist, outcount)
desc = arcpy.da.Describe(inputfc)
fldname = desc["OIDFieldName"]
sqlfield = arcpy.AddFieldDelimiters(inputfc, fldname)
sqlexp = f"{sqlfield} IN {tuple(randomlist)}"
arcpy.Select_analysis(inputfc, outputfc, sqlexp)
There is no need for a return at the end of the method because the result is
implicit in the output parameter—i.e., the tool saves the results to a new
feature class and returns it as a feature layer in the active map. Some tools
may need to specify a value to be returned if they do not save anything to
disk or return an object for use in the active map.
At this point, the tool is complete and ready to use. You can test the tool in
ArcGIS Pro to confirm the tool dialog box works as expected and that the
outputs are correct. The tool dialog box looks like any other geoprocessing
tool.
Tool execution creates an entry in the geoprocessing history with the
associated messages.
The message refers to running a script, which is identical to the messages
reported when running a script tool. Keep in mind, however, that the
“script” being run is not a separate .py file, but the code inside the
execute() method of this tool in the Python toolbox file. From a user
perspective, however, the experience of using the tool is the same.
The Tool class includes several other methods that have not been used yet,
including isLicensed() , updateParameters(), and updateMessages(). All
these methods are optional, which means you can leave them in the
template unmodified, or you can remove them from the code entirely. Of
these three, the updateParameters() method is the most important because
it provides additional control over the behavior of the tool parameters and
how the parameters interact with each other. Details on how to use this
method is explained in more depth in the ArcGIS Pro help topic
“Customizing Tool Behavior in a Python Toolbox.”
4.5 Comparing script tools and Python toolboxes
Both script tools and Python toolboxes can be used to create custom tools
using Python. Which approach you use is largely a matter of preference. It
is important, however, to be aware of some of the similarities and
differences.
First, both approaches result in tools that are integrated into the
geoprocessing framework. The tool dialog boxes work just like standard
tools, and the tools can be used in scripts and models.
Second, in terms of organization, a script tool is part of a custom toolbox
(.tbx), and each script tool has an associated Python script file (.py). The
design of the tool dialog boxes is accomplished using interface elements of
ArcGIS Pro, and this information is stored in the .tbx file. In contrast, a
single Python toolbox can contain several tools, but all the code is stored in
a single .pyt file. In addition, the design of the tool dialog boxes is coded
entirely in Python.
Third, both script files and Python toolboxes can be edited using a Python
editor, but .pyt files are not recognized by default as Python files, and
therefore require custom configuration of your IDE. This difference also
impacts debugging procedures because IDEs can only debug .py files.
Fourth, the code associated with both script tools and Python toolboxes can
be secured using a password.
Fifth, both script tools and Python toolboxes are documented in a similar
manner. Chapter 5 discusses this topic.
Points to remember
Python toolboxes provide an alternative to creating a Python script
tool. From a user perspective, tools inside a Python toolbox work just
like regular geoprocessing tools. The biggest difference from a
developer perspective is that a Python toolbox is written entirely in
Python. This feature makes Python toolboxes a preferred approach
for those with more experience in developing tools.
A Python toolbox is a Python file with a .pyt extension, which is
recognized automatically by ArcGIS Pro. A single Python toolbox
can contain one or more tools. The name of the Python toolbox in
ArcGIS Pro is the name of the .pyt file.
The basic template provided with a new Python toolbox is a helpful
way to start writing your code. In a Python toolbox, the code for all
the tools resides in a single .pyt file.
Some Python IDEs, including PyCharm, can be configured to
recognize a .pyt file as Python code, which assists with writing proper
syntax. However, you cannot run a .pyt file from an IDE, and you can
test the code only by using the tools inside the Python toolbox from
within ArcGIS Pro.
The code for a Python toolbox includes a Toolbox class, which
defines the characteristics of the toolbox, including a list of tools. The
code also contains a Tool class for each tool, with the name of the
class corresponding to the specific tool.
Each Tool class includes several methods, including __init__() to
define the properties of the tool, getParameterInfo() to set up the
tool parameters, and execute() to carry out the actual task of the tool.
Setting up tool parameters requires careful consideration, especially
of the data types for each parameter.
Both script tools and Python toolboxes can be used to create custom
tools using Python. The approach you use is largely a matter of
preference. It is important, however, to be aware of some of the
similarities and differences.
Key terms
implicit line continuation
Python toolbox
source code
Review questions
What is a Python toolbox?
Describe the step to create a custom tool using a Python toolbox.
Which classes are part of the code of a Python toolbox, and what
purpose do they serve?
Where in the code of a Python toolbox is the code to carry out the
actual task of each tool located?
What are some of the similarities and differences between script tools
and Python toolboxes?
Chapter 5
Sharing tools
5.1 Introduction
The ArcGIS Pro geoprocessing framework is designed to facilitate the
sharing of tools. Custom toolboxes and Python toolboxes can be added to
projects and integrated into regular workflows. Custom toolboxes can
contain any number of tools, consisting of both model tools and script tools.
Script tools can be shared by distributing a toolbox file (.tbx) and the
associated Python scripts (.py). However, there are several obstacles to
sharing script tools.
One of the principal obstacles is that the resources available to the creator
of the script likely will be different from those available to the user of the
shared script tools. These resources include projects, datasets, scripts, layer
files, and any other files used by the tools. Another obstacle is the
organization of these resources on a local computer or network. Paths
present a persistent problem when sharing tools. Although sharing Python
toolboxes is facilitated by the fact that all code can reside in a single .pyt
file, some of these same obstacles apply to Python toolboxes as well.
This chapter provides guidelines on how to distribute tools, including how
to structure the files that are commonly distributed with shared tools.
Alternative approaches to sharing tools are also discussed, including the use
of geoprocessing packages and web tools. Geoprocessing packages make it
relatively easy to share a custom tool by automatically consolidating data,
tools, and supporting files into a single file for sharing.
5.2 Choosing a method for distributing tools
Tools that are developed to share with others can vary from the simple to
the complex. The simplest case is a single custom toolbox file with one or
more script tools, or a single Python toolbox file, and no additional files. In
a more typical example, a shared tool can consist of a toolbox file with
several scripts, or a Python toolbox file, and some documentation. A more
complex example contains a toolbox file, several scripts, documentation,
layer files for symbology, sample data, and other resources. A
recommended folder structure for these files is presented in section 5.4.
One of the most common ways to share tools is simply to make all the files
available in their original folder structure. This typically involves the use of
a file compression utility to create a single ZIP file of the folders and their
contents. This ZIP file can then be posted online or emailed. The recipient
can download the file and extract the contents to access the individual
folders and files. The toolbox is then added to a project to access the tools.
There are several other ways to share tools. If users have access to the same
local network, the folder containing the tools can be copied to a folder that
is accessible to all users. A toolbox can be added directly from the network,
and no files need to be copied to the user’s computer. Another option is to
publish a tool as a geoprocessing package or web tool, which can then be
shared through a local network or ArcGIS Online portal.
The method depends largely on the relationship between the creator of the
tool and the intended users, as well as the software and the skills of the user.
For example, if tools are developed primarily for use by others within the
same organization, making tools available on a local network may be the
most efficient method. To make tools available to a broad community of
users, the use of a ZIP file is likely the most convenient.
Several other considerations influence how to share tools, including where
the input and output data are located and what licensed products and
extensions the tools require. In the ZIP file method, for example, any tool
data must be packaged with the tool because a typical user will not have
access to the data on the local network.
5.3 Handling licensing issues
Tools distributed using the ZIP file method will run on a user’s computer,
which may not have the necessary products or licenses to run the tools.
Scripts therefore should include logic to check for the necessary product
levels (ArcGIS Pro Basic, Standard, or Advanced) and extension licenses
(3D Analyst, Spatial Analyst, and others). To facilitate the use of shared
tools, the necessary product level and extensions must be described in the
tool’s documentation.
5.4 Using a standard folder structure for sharing tools
A standard folder structure, such as the example in the figure, is
recommended for easy sharing of custom tools. There is no requirement to
use this specific structure, but it provides a good starting point.
Note: The folder structure is shown using the Catalog view in ArcGIS
Pro, which does not show all file types.
The root folder (i.e., Tools, in this example) contains one or more custom
toolboxes (.tbx files), which may include model tools and script tools, or
one or more Python toolboxes. Custom toolboxes also can reside inside a
geodatabase, but a .tbx file directly under the Tools folder is easier to find.
Script tools should have the “Store tool with relative path” option checked.
The next section covers working with paths.
The Data folder contains sample datasets that a user can work with to learn
about the functionality of tools before trying the tools out on their own data.
The tools also may require certain data as part of tool execution, such as
lookup tables, and these are also included in this folder. Many model tools
and script tools use a workspace, and a default file geodatabase for scratch
data (scratch.gdb) can be provided in the Data folder or in a separate
Scratch folder. Distributing an ArcGIS Pro document (.aprx file) is optional
but may be helpful if example datasets are part of the shared tool.
The Doc folder is used for tool documentation, which should clearly state
which product level and extensions are required for the tools to run. A
README file (readme.txt) often is included in the root folder to explain
how the tool works, typically including special instructions on how the tool
must be installed, contact information for the tool’s creators, version
number, and the like. A more detailed user manual can be provided in the
Doc folder as a Microsoft Word or PDF file. Experienced Python coders are
likely to open the actual scripts and learn from both the comments and the
code in the scripts. Many other users, however, may never look at the
scripts and instead use only the tool dialog boxes. Good documentation
ensures that users get the most out of a tool and understand what it will
accomplish, as well as its limitations, without having to open the actual
scripts.
The Layers folder contains separate layer files (.lyr or .lyrx) to assign
symbology to outputs for use in a project in ArcGIS Pro. Users can be
instructed to apply this symbology themselves, or its use can be coded into
the scripts.
The Scripts folder contains the Python scripts used in the script tools. For
relatively simple tools with only one or more Python scripts, a separate
Scripts folder may not be necessary, and the .py files are placed directly in
the root folder. Scripts also can be embedded in a toolbox, in which case
there are no separate script files. This is not common, because often the
purpose of sharing the tools is for users to use and learn from the scripts and
contribute to their continued improvement.
Other related files may include script libraries, dynamic-link libraries
(DLL), text files, XML files, images, and executable files, such as .exe and
.bat (batch) files.
The structure discussed here is only one of many possible structures. A few
examples of published tools are used in this section to show some of the
typical structures used by tool authors. All the examples are downloaded
from www.arcgis.com. You can obtain the actual tools by searching for the
tools by name.
The Create Points on Lines tool by Ian Broad, a GIS analyst who runs the
Thinking Spatially blog (http://ianbroad.com/), provides a basic file
structure. All files are provided in the root folder and consist of a toolbox
file (.tbx), a Python script file (.py), and a readme.txt file with the tool
documentation.
The Distributive Flow Lines tool by Esri’s Applications Prototype Lab
consists of a Python toolbox with a single tool, a file geodatabase with
sample data in the root folder, a readme.txt file with a basic description of
the tool, and links to online resources to explain what the tool does. Several
support files are provided in a separate Index folder.
The Terrain Mapping tools, discussed in chapter 1, represent a more
complex file structure because of the many different files used by these
tools.
A single custom toolbox file (.tbx) in the root folder consists of 14 different
script tools. A readme.txt file in the root folder provides a basic description
of the tools, including a reference to the documentation and sample
datasets. A comprehensive user manual is provided as a PDF file in the Doc
folder. The Samples folder includes datasets to practice the use of the tools,
including map documents (.mxd) to get started with the samples. These
.mxd files can be imported into ArcGIS Pro, and the tools are designed to
work with ArcGIS Desktop 10.x and ArcGIS Pro. The Python scripts
associated with each of the tools reside in the Scripts folder. To assist in
creating the symbology for the outputs, a set of custom color ramps and
layer files are also provided in separate folders. Additional support files
reside in the SkyLuminance folder.
These examples illustrate that published tools employ a wide range of
folder and file structures. There is no single required structure, and the most
appropriate file structure depends largely on the nature of the files that must
accompany a specific tool. As a rule, the necessary files should be easy to
locate, which can be accomplished using meaningful names for files and
folders.
5.5 Working with paths
Paths are an integral part of working with data and tools. When tools are
shared, paths become particularly important, because without proper
documentation of where files are located, the tools will not run.
If you have worked with ArcGIS Pro to create projects or tools, you are
probably familiar with absolute and relative paths. Absolute paths are also
referred to as “full paths.” They start with a drive letter, followed by a
colon, and then the folder and file name—for example,
C:\Data\streams.shp. Relative paths refer to a location that is relative to a
current folder.
Consider the following example with two shapefiles located in the
C:\AllData\Shapefiles\Final folder: boundary.shp and locations.shp.
Relative to each other, there is no need to know the path other than the file
names. Now consider an example in which you want to run a tool that uses
the shapefiles locations.shp and floodzone.shp. These files are in two
different folders, and therefore their relative paths are Final\locations.shp
and Project\floodzone.shp. The higher-level folders—that is,
AllData\Shapefiles—are not needed to locate one file relative to the other.
Description
The use of relative paths makes it possible to move or rename folders. For
example, if the AllData folder was renamed Data, all relative paths remain
intact. Similarly, if the drive was modified from C:\ to E:\, all relative paths
also remain intact.
One limitation of relative paths is that they cannot span multiple disk
drives. If some files are located on the C drive and some on the E drive,
only absolute paths preserve the correct locations of all files.
Both absolute paths and relative paths can be used in model tools and script
tools, but in general, shared tools should rely on relative paths. Relative
paths for models are enabled on the model properties dialog box.
Description
For script tools, relative paths are enabled on the script tool properties
dialog box.
Description
Relative paths for model tools and script tools are relative to the current
folder in which the toolbox file is located. When relative paths are enabled,
it applies to the script files, datasets used for default values for parameters,
files referenced in the tool documentation, layer files used for the
symbology properties, and style sheets.
It is important to recognize that paths within the script are not converted
because ArcGIS Pro does not examine and modify the code. Therefore, if a
script uses absolute paths, they are not converted to relative paths when
relative paths are enabled for the script tool using the settings in the script
tool properties.
Note: In general, Python code must be written so that files can be found
relative to a known location, which typically is the location of the script
itself.
After this review of working with paths, it is worthwhile to revisit relative
paths in the context of sharing tools. For the purpose of this discussion, the
same example folder structure discussed earlier in this chapter, as shown in
the figure, is used.
To share tools, relative paths must be enabled in the script tool properties.
In this example, the script tool will reference a script in the Scripts folder. It
also may reference tool documentation in the Doc folder. The script itself
may reference data in the Data folder. These references will continue to be
valid when the script tool is shared with another user if the standard folder
structure is maintained. If the toolbox file (Toolbox.tbx) containing the
script tool was moved to a different location separate from the other folders
and files, the script files called by the script tool would not be found and the
script will not work. The tool dialog box will open, but upon tool execution,
the following error message will appear:
ERROR 000576: Script associated with this tool does not
exist.
Failed to execute (<toolname>).
Therefore, for a script tool to work correctly, the folder structure must be
maintained.
5.6 Finding data and workspaces
In general, it is best to avoid hard-coded paths in your script if it is going to
be shared with others as a script tool or Python toolbox. Instead, the paths
are derived from the parameters on the tool dialog box, and these paths are
passed to the script. The script reads these parameters using the
GetParameterAsText() and GetParameter() functions in the case of a script
tool.
Sometimes, however, it is necessary to use hard-coded paths to the location
of a file. For example, an existing layer file may be necessary to set the
symbology for an output parameter. Or a tool may require the use of a
lookup table. Depending on the nature of the information, it already may be
incorporated into the script (for example, a lookup table can be coded as a
Python dictionary), but this may not always be possible. Therefore, some
files may be necessary for the tool to run, even though they are not provided
as parameters for a user to specify. Instead, these files are provided by the
author of the script and distributed as part of the shared tool. Following the
suggested folder structure presented earlier, these files can be placed in the
Data folder, making it possible for the data files to be found relative to the
location of the script. If the necessary files consist of layer files for the
purpose of symbology, a separate Layer folder can be used as well. The
exact location of the files is not important—what is important is that they
can be located from within the script.
The path of the script can be found using the following code:
scriptpath = sys.path[0]
or:
scriptpath = os.getcwd()
Running this code results in a string with the complete path of the script but
without the name of the script itself. If the files necessary for the script to
run are in the Data folder, per the suggested folder structure, the Python
module os.path can be used to create a path to the data.
The folder structure used thus far can serve as an example. The Tools folder
contains the shared tool, including the toolbox in the root folder, the script
in the Scripts folder, and the data files in the Data folder. Relative paths are
enabled for the script tool, so the Tools folder can be moved, or even
renamed, and the script tool will still work. In order to run, the script needs
a geodatabase table called “lookup,” located in a file geodatabase called
TestData.gdb in the Data folder. The name of the table and the geodatabase
can be hard-coded into the script because the author of the script is also the
author of the table and the creator of the Data folder. However, the absolute
path should not be hard-coded into the script, but the relative path should be
used instead: Data\TestData.gdb\lookup. This will make it possible for the
Tools folder to be moved to any location without the user of the script tool
being limited to the absolute path originally used by the author of the script.
The code that references the lookup table in the script is as follows:
import arcpy
import os
scriptpath = os.getcwd()
toolpath = os.path.dirname(scriptpath)
tooldatapath = os.path.join(toolpath, "Data")
datapath = os.path.join(tooldatapath, "TestData.gdb/lookup")
Notice that three elements are hard-coded into the script: the actual file
name of the geodatabase table, the file geodatabase, and the folder in which
the data is located. These elements are created by the author of the tool and
therefore can be hard-coded into the script because they do not depend on
user input.
Some tools may require the use of a scratch workspace to write
intermediate data. Although it is possible to set a specific scratch workspace
in a script, such a workspace is unreliable because a script tool should not
include its own geodatabase to write results. A robust solution is to use the
scratch GDB environment setting, which points to the location of a file
geodatabase. This location can be accessed using the scratchGDB property
of the arcpy.env class. This property is read-only, and the primary purpose
of this environment setting is for use in scripts and models. The use of the
scratch GDB is reliable because this geodatabase is guaranteed to exist
when a script tool is run. A user can specify a scratch workspace in ArcGIS
Pro, but if no scratch workspace is set, the scratch GDB defaults to the
current user’s folder.
You can check the location of the scratch GDB by running the following
code in the interactive window of your Python IDE:
>>> import arcpy
>>> print(arcpy.env.scratchGDB)
The result looks as follows:
C:\Users\<username>\AppData\Local\Temp\scratch.gdb
The same code in the Python window brings up the default scratch GDB
associated with the current project and looks as follows:
C:\Users\<username>\AppData\Local\Temp\ArcGISProTemp………
\scratch.gdb
Regardless of the location, this file geodatabase is guaranteed to exist. The
location and name of this geodatabase should not be hard-coded in a script
but can be obtained using arcpy.env .scratchGDB if necessary. Writing
output to the scratch GDB makes your script portable because you don’t
need to validate whether it exists at runtime.
Another scenario in which hard-coded paths are commonly employed is the
use of layer files to symbolize output. Consider the Terrain Tools referenced
earlier in this chapter. The file structure is as shown in the figure.
The original tool includes 14 script tools and several dozen layer files, but
only one script tool and one layer file are shown here for illustration. The
script IllumContours.py resides in the Scripts folder, while the layer file
referenced in the script resides in the LayerFiles folder. In the script, the
reference to the layer file is set as follows:
# set the symbology
scriptPath = sys.path[0]
one_folder_up = os.path.dirname(scriptPath)
toolLayerPath = os.path.join(one_folder_up, "LayerFiles")
lyrFile = os.path.join(toolLayerPath, "Illuminated
Contours.lyr")
Layer files are one of the most common reasons for using hard-coded folder
and file names in a script. It works well if the author of the script tool has
carefully considered the use of relative paths, and the user of the script tool
does not change the folder structure and the names of folders and files.
Although the previous examples used script tools, the same approach can be
employed in Python toolboxes.
5.7 Embedding scripts, password-protecting tools
The most common way to share script tools is to reference the Python script
file in the script tool properties and provide the script file separately,
typically in the root folder or in a separate Scripts subfolder. Providing the
script files allows users to clearly see which scripts are being used, and the
scripts can be opened to view the code. Similarly, by sharing a Python
toolbox, a user can open the .pyt file to view the code.
Scripts also can be embedded in a custom toolbox. The code is then
contained within the toolbox, and a separate script file is no longer needed.
This approach can make it easier to manage and share tools.
To import a script to be embedded, right-click the script tool, and click
Properties. In the tool properties dialog box, click on the General tab, and
locate the Options section in the lower part of the panel. When you check
the Import script option, the .py file becomes embedded in the toolbox.
Once a script is imported into a tool, the toolbox can be shared without
including the script file. In other words, just sharing the .tbx file is enough,
and no separate .py files need to be provided for the script tool to run. When
a script is imported, however, the original script file is not deleted—it is
simply copied and embedded in the toolbox.
Embedding scripts does not mean they can no longer be viewed or edited.
Say, for example, you imported a script and shared a toolbox with another
user. The recipient can go into the tool properties and uncheck the Import
script option to obtain a copy of the original script. The recipient also can
right-click on the script tool and click Edit, and a temporary script file
opens in the default editor. Both these options make it possible to view and
edit the script. Although embedding scripts is a useful way to reduce the
number of files to manage and share, it can lead to some confusion. For
example, some script tools use multiple scripts—e.g., a script that is
referenced by the script tool and additional scripts that are called by the first
script. Embedding multiple scripts can be confusing to users because it
becomes less transparent how the scripts work.
Regular script files cannot be password protected. If you share your tools
including individual .py files, any user can open these scripts with a Python
editor or a text editor. Users can modify the code or copy it for use in their
own scripts. This feature is, in fact, one of the reasons why working with
Python is so appealing. Sometimes, however, there may be a need to hide
the contents of a script, such as login credentials and other sensitive
information. If you need password protection for your script files, you can
embed the script first, and then check the Set password option.
Setting a password does not affect execution of the script tool, but any
attempt to uncheck the Import script option prompts the use of a password.
Because a Python toolbox consists of a single .pyt file, there is no need to
embed separate script files to reduce the number of files. To prevent a user
from viewing the contents of a Python toolbox, you can encrypt the file.
From within the Catalog pane of ArcGIS Pro, right-click on the Python
toolbox, and click Encrypt.
Clicking Encrypt brings up the Set Password dialog box. As per the
warning message, setting a password overwrites the existing unencrypted
file so you should make a backup copy of the Python toolbox first. Because
a Python toolbox consists of a single file, you cannot set a password for
individual tools as you can for script tools, and you can encrypt only the
Python toolbox. On the other hand, you cannot encrypt a custom toolbox,
only individual script tools inside the toolbox.
To decrypt an encrypted Python toolbox, right-click on the Python toolbox,
and click Decrypt. Encrypting and decrypting also can be accomplished
using the EncryptPYT() and DecryptPYT() functions of ArcPy, respectively.
5.8 Documenting tools
Good documentation is important when sharing tools. Documentation
includes background information on how the tool was developed as well as
specifics on how the tool works. Documentation also can explain specific
concepts, which may be new to other users.
Many coders provide detailed comments inside the script itself about how a
Python script works. Although this is good practice, keep in mind that users
of shared tools may not have experience with Python. One of the benefits of
developing tools in Python is that a finished tool looks and feels the same as
any other geoprocessing tool. Therefore, you should not rely on only
comments inside your script to explain the use of the tool.
Tool documentation is created using the same metadata creation tools used
for datasets and other items. You can create documentation for both the
toolbox and individual tools, which applies to both custom tools with script
tools and Python toolboxes.
To edit the metadata for a toolbox or tool, right-click it, and click Edit
Metadata, which brings up the metadata for the toolbox or tool.
You can upload a thumbnail image, enter tags, provide a summary, and
enter several other pieces of descriptive information. Click the Save button
at the top of the screen to save your edits.
The options for documenting toolboxes and tools are similar, with several
important distinctions. The metadata for a tool includes sections for syntax
and code samples.
The syntax portion is particularly relevant, because it provides an
opportunity to make the tool dialog box more informative. Consider the
example of the Random Sample tool. A dialog explanation can be entered
as shown in the figure.
Description
When the metadata is saved, it becomes part of the tool’s documentation
and is used as part of the tool dialog box. When the tool dialog box is open,
hovering over the blue info icon to the left of a tool parameter brings up the
dialog box explanation.
Description
Many users may not take the time to review detailed documentation in
separate files or in the source code, but they may appreciate getting dialog
explanation directly from the tool dialog box.
For the most part, documentation for script tools and Python toolboxes
works the same. There is one important difference, related to where the
metadata is stored. For a custom toolbox with one or more script tools, the
metadata is stored as part of the .tbx file. Storing metadata there means no
additional files are created when you document the toolbox and/or tool by
editing the metadata. A Python toolbox, on the other hand, consists of a text
format with the .pyt file extension, and this format does not allow for saving
metadata. Instead, metadata is stored in separate XML files with the same
names as the Python toolbox and/or the individual tools.
Consider the following example of a Python toolbox with a single tool.
Editing the documentation as part of the metadata results in one XML file
for the Python toolbox called <ToolboxName>.pyt.xml and one XML file
for each tool called <ToolboxName>.<ToolName>.pyt.xml. These files are
created automatically when you start editing the metadata.
Note: XML files do not show up in the Catalog pane in ArcGIS Pro, but
you can check the file names using File Explorer.
This approach to storing metadata can lead to long file names. It also can
lead to some confusion. For example, in File Explorer in the Windows
operating system, you can choose to show file extensions or not. When file
extensions are not shown, the XML files look like .pyt files, even though
the default file associations recognize the file type correctly.
Therefore, it is recommended to show the file extensions. In File Explorer,
you can check the option for File name extensions under the View tab.
When sharing a Python toolbox with documentation, the XML files must be
shared as well. If the XML files are left out, the Python toolbox will
continue to work as before, but there will be no documentation as part of
the metadata.
There are other ways to provide documentation as well, within the script
itself or on disk, as follows:
By commenting code. Good scripts contain detailed comments,
which explain how a script works. Not all users of a script tool may
look at the code, but for those who do, comments can be informative.
Comments are located inside the actual script files.
Through separate documentation located on disk—for example, in the
Doc folder. Documentation files can be provided as Microsoft Word,
PDF, or other file types. This documentation typically includes a
more detailed explanation of the tools and any relevant background
concepts.
Creating tool documentation takes extra effort but contributes to a userfriendly tool that others will benefit from.
5.9 Example tool: Terrain Tools
This section looks at an example to review the organization of the files that
are part of shared tools, as well as the documentation. The example, Terrain
Tools, was introduced in chapter 1 and referenced earlier in this chapter.
The Terrain tools are a collection of script tools that provide improved
methods for representing terrain and surfaces in ArcGIS. The tools are
distributed as a custom toolbox file. They can be used in ArcGIS Desktop
10.3 and higher and imported into ArcGIS Pro 1.0 and higher.
The tools can be downloaded as a ZIP file from www.arcgis.com by
searching for Terrain Tools Sample. Once extracted, the organization of the
files closely follows the suggested folder structure.
There is a single .tbx file, which contains 14 script tools. The associated
Python script files reside in a Scripts folder. The Samples folder contains
datasets for testing the tools, as well as examples of the outputs of each
tool, and map document (.mxd) files to facilitate viewing these examples.
Map documents can be imported into ArcGIS Pro. Most of the tools add the
outputs to the current map, and the LayerFiles folder contains layer files
(.lyr) to give these feature layers or raster layers symbology. Additional
color ramps are provided in a separate folder.
A detailed manual is provided as a PDF file in the Doc folder. The manual
includes background on the tools, an explanation of the files provided with
the tools, and a detailed explanation of each tool.
Tool documentation also is provided with the metadata for each tool. For
example, the Illuminated Contours tool includes several tool parameters,
and clicking on the info icon for each parameter brings up a short
description.
Clicking on the help icon in the upper-right corner of the tool dialog box
brings up a detailed description of the tool, which replicates the information
in the PDF document.
Finally, the script itself contains documentation in the form of comments.
The Terrain Tools are relatively sophisticated. Of the 14 tools, each
associated script is typically around 100 lines of code. Some of the
underlying algorithms are also advanced and rely on techniques published
by a community of researchers and cartographers. The user manual is 66
pages long. Despite this level of sophistication, the consistent
documentation and examples make the tools easy to use.
In addition, because all the original source code is provided, you can learn
from the code and modify it for your own purposes.
5.10 Creating a geoprocessing package
The approach for distributing shared tools as described so far is robust but
also cumbersome. It typically requires that you manually consolidate data,
tools, and supporting files into a single folder. As an alternative, ArcGIS
Pro uses geoprocessing packages, which are a more convenient way to
distribute all the tools and files related to geoprocessing workflows. This
section describes what a geoprocessing package is and how to create it.
A geoprocessing package is a single compressed file with a .gpkx
extension. This single file contains all the files necessary to run a
geoprocessing workflow, including custom tools, input datasets, and other
supporting files. This file can be posted online, emailed, or shared through a
local network. Although this file sounds identical to the use of a ZIP file, as
described earlier in this chapter, geoprocessing packages are created
differently and have additional functionality.
A geoprocessing package is created from one or more entries in the
geoprocessing history, which have been created by successfully running one
or more tools, including custom tools. A basic workflow to create and share
a geoprocessing package is as follows:
1. Add data and custom tools to a project in ArcGIS Pro.
2. Create a geoprocessing workflow by running one or more tools.
3. In the History pane, select one or more entries, right-click the
selection, and click Share As > Geoprocessing Package.
4. Create a .gpkx file by completing the entries in the Geoprocessing
Package pane, which includes several options to configure how the
package is created and shared.
5. Share the resulting .gpkx file.
An alternative to step 3 is to use the Share tab in ArcGIS Pro. Click Share >
Geoprocessing, and choose the tool(s) of interest. This step also brings up
the Geoprocessing Package pane.
Note: The Geoprocessing Package interface element is called a
“pane,” just like the Catalog pane. It is not a geoprocessing tool.
When creating a geoprocessing package, you not only are sharing the
tool(s), you also are sharing how each tool was run. These settings include
the following:
The parameter settings of each tool
The input and output data used by each tool
The environment settings in effect when running the tool
Any additional files you choose to add to the package
The Geoprocessing Package pane includes several details, which require
some careful consideration. For example, you can create a local .gpkx file,
or you can upload the package directly to your ArcGIS Online portal. You
also must provide a basic description of your package and tags when
sharing the package.
Description
Checks also are performed to ensure individual tools have a minimum level
of documentation, including a description. In the Geoprocessing Package
pane, you can click the Analyze button to identify potential issues with the
package before it is created. The analysis of the package brings up a new
tab called Messages. This tab includes any warning or error messages.
The analysis checks, among other things, whether tool parameters have a
description. If no documentation is created, it is reported as error messages.
Consider the result when using the same script tool without any
documentation, as shown in the figure.
Each tool parameter requires a minimum description. You cannot create a
geoprocessing package unless all parameters in all tools that are part of the
package have an item description. This requirement applies only to model
tools and script tools because all standard tools have this description
already. Once descriptions are filled in and any other errors are addressed,
you can create the geoprocessing package.
The Tools tab on the Geoprocessing Package pane allows you to add
additional tools from the geoprocessing history, whereas the Attachments
tab allows you to add other files, such as documentation in PDF format or
additional datasets that were not part of the tool execution.
The Geoprocessing Package pane has the look and feel of a tool, but it is
not a tool, and you will not find it when you search for it in the
Geoprocessing pane. As an alternative to using the Geoprocessing Package
pane to create a geoprocessing package, you can use the tools in the
Package Toolset in the Data Management Tools toolbox. The tools there
give you a finer degree of control over how geoprocessing packages are
created and how tools are shared. The Package Result tool allows you to
create a geoprocessing package on the basis of entries in the geoprocessing
history and includes several advanced options not available in the
Geoprocessing Package pane. On the other hand, the Geoprocessing
Package pane includes capabilities to analyze the tool(s) before sharing
them as a geoprocessing package.
The Package Result tool creates only a local .gpkx file. To share the
package to ArcGIS Online, you can use the Share Package tool. Note that
both the Geoprocessing Package pane and the Share Package tool
automatically recognize your ArcGIS Online credentials if you are using a
Named User license for ArcGIS Pro.
A .gpkx file also can be shared using email, FTP, a USB drive, or other filesharing mechanisms. A recipient of the geoprocessing package can open the
contents in ArcGIS Pro to examine the datasets and workflows used. Once
copied to a local folder, a .gpkx file shows up on the Project tab of the
Catalog pane.
A single .gpkx file contains all the resources needed to run the
geoprocessing workflow again, including tools, parameter settings, datasets,
and other files. Tools can include system tools as well as custom tools.
Therefore, if a geoprocessing result was created using a script tool, the
toolbox in which the script tool resides and the underlying .py files
necessary for the tool to run are all included in the geoprocessing package.
Other supporting files also can be part of a .gpkx file, including
documentation, text files, images, and so on.
To use the tools and data inside a geoprocessing package, right-click on the
.gpkx file in the Catalog pane, and click Add To Project. Data layers are
added to the open map, and the executed workflow is added to the
geoprocessing history. Adding the geoprocessing package makes it look as
if the workflow ran on your computer in the current ArcGIS Pro session,
even though the geoprocessing package was created on someone else’s
computer.
When a geoprocessing package is added to a project, the actual files are
extracted to a local folder: C:\Users\<Your
Name>\Documents\ArcGIS\Packages. A new folder is created on the basis
of the name of the package, and this folder contains all the original files,
plus several additional ones created as part of the package. If you download
a geoprocessing package and are looking for the underlying source code (in
the .py or .pyt files), you can locate the files in this folder.
An alternative to adding a geoprocessing package to a project is to use the
Extract Package geoprocessing tool, which allows you to extract the
contents of a geoprocessing package to a folder of your choice without
adding it to a project.
The single greatest benefit of using geoprocessing packages is that all the
necessary resources are automatically combined in a single file, no matter
where they are located. There is no need to manually consolidate all the
resources into a single folder as required by the more traditional approach
using ZIP files.
Although the examples in this section used a custom toolbox with a script
tool, the same steps can be used to create a geoprocessing package for a tool
in a Python toolbox. In addition, if a custom toolbox or Python toolbox
contains more than one tool, the entire toolbox becomes part of the
geoprocessing package, even if only one of the tools was executed. This is
because tools do not exist separately from their Python toolbox. When
adding a geoprocessing package to a project, however, only the tools that
were executed before creating the geoprocessing package are added to the
geoprocessing history of the project.
5.11 Creating a web tool
An alternative to creating a geoprocessing package is to share a tool as a
web tool. This approach allows you to share a tool that can be accessed in
several different applications through your organization’s ArcGIS
Enterprise portal. Creating a web tool is similar to creating a geoprocessing
package because it also relies on entries in the geoprocessing history. In
other words, you run one or more tools in ArcGIS Pro, and then you create
a web tool that replicates this workflow.
After you successfully run one or more tools, navigate to the History pane,
right-click a tool, and click Share As > Web Tool. You can also use the
Share tab. Click Share > Web Tool, and choose the tool(s) of interest. The
Share As Web Tool pane opens. This pane is not a regular geoprocessing
tool, but a pane, like the Geoprocessing Package pane.
There are many details to consider here, and they require careful attention.
These details include several options under the General tab and the
Configuration tab. The Content tab allows you to add additional tools. Once
you provide the minimally required information, you can click the Analyze
button to perform several checks, like the steps when creating a
geoprocessing package.
Sharing web tools through your organization’s ArcGIS Enterprise portal
requires administrative or web tool publisher permissions. Typical users
may not be assigned these permissions. Because the tools recognize login
credentials for Named User licenses, users without permission may see the
error message as shown in the figure.
This issue cannot be addressed from within ArcGIS Pro but requires a
change to your permissions through an administrator of your ArcGIS
Online account.
Once a tool is shared as a web tool, it can be used by any user connected to
the ArcGIS Enterprise portal. A web tool runs on an ArcGIS server as a
geoprocessing service. Web tools can be used in a variety of ways,
including custom web apps built using JavaScript API or Web AppBuilder
in ArcGIS Enterprise. Web tools can be used in Python scripts by
referencing the URL of the web tool using ArcPy’s ImportToolbox()
function. Web tools also can be used within ArcGIS Pro. In the Catalog
pane, click on the Portal tab to search for the web tool of interest. You also
can connect directly to the URL of the web tool by clicking Insert >
Connections > New ArcGIS Server.
A detailed discussion of web tools is beyond the scope of this book. You
can find details on how to author, publish, and use web tools on the ArcGIS
Pro help page “Share Analysis with Web Tools.”
Points to remember
The ArcGIS Pro geoprocessing framework is designed to facilitate
the sharing of tools. Script tools and Python toolboxes can be added
to a project and integrated into regular workflows. Toolboxes can
contain any number of tools, consisting of both model tools and script
tools. Script tools can be shared by distributing a toolbox file (.tbx)
and the accompanying Python scripts (.py), together with any other
resources needed to run the tools. Python toolboxes consist of a
single .pyt file and can be shared by distributing this .pyt file, plus
any other resources.
To ensure custom tools work properly, the resources needed to run the
tools should be made available in a standard folder structure. This
structure includes folders for scripts, data, and documentation. There
is no single required structure, but the organization of files is strongly
influenced by the complexity of the tool(s), as well as the number and
type of files being distributed with the tools.
Absolute paths work only when files are not moved and when folders
are not renamed. To share tools, relative paths should be enabled for
each script tool. Relative paths are relative to the current folder,
which, for script tools, is where the toolbox is located. Relative paths
cannot span multiple disk drives.
Custom tools can be documented by editing the metadata of the
toolbox and/or tool(s). Metadata includes a basic description, a
summary of what each tool does, and specific explanations of tool
parameters. Separate documentation also can be provided in the form
of a manual or user guide in Microsoft Word or PDF format.
Geoprocessing packages and web tools provide an alternative way to
distribute custom tools. A geoprocessing package is a single,
compressed file with a .gpkx extension that contains all the files
necessary to run a geoprocessing workflow, including custom tools,
input datasets, and other supporting files. A web tool is like a
geoprocessing package, but the tool is being shared to a portal as a
geoprocessing service. Both geoprocessing packages and web tools
are created by first running a custom tool in ArcGIS Pro, and then
sharing the entries in the geoprocessing history.
Key terms
absolute path
embedded
encrypt
geoprocessing package
relative path
root folder
scratch workspace
web tool
Review questions
What are some of the differences between script tools and Python
toolboxes in terms of how they are shared?
Describe a typical folder structure used for sharing custom tools and
the resources needed to run the tools.
Describe some of the ways in which tool documentation can be
created and where this information is stored.
What is a geoprocessing package, and what are the benefits of
sharing a geoprocessing package compared with sharing a custom
tool?
What are the steps to create a web tool from a custom tool?
Chapter 6
Managing Python packages and
environments
6.1 Introduction
One of the strengths of Python is that, in addition to the standard library of
built-in modules, a large collection of third-party packages exist to expand
its functionality. A large user community develops and supports these
packages. ArcGIS Pro is installed with many of the most important
packages that are used in GIS and spatial data analysis workflows. To
install, maintain, and keep track of these packages, ArcGIS Pro uses a
package manager called “conda.” In addition to managing Python packages,
conda also manages different Python environments, which allows you to
manage different collections of packages for different projects. This chapter
explains the use of packages and how conda is used in ArcGIS Pro to
manage environments and packages.
6.2 Modules, packages, and libraries
The core Python installation comes with many modules that are referred to
as “built-in modules.” There are about 200 of these built-in modules, and
they are available to you regardless of how Python is installed. Earlier
chapters used several of these modules, including math, os, random, and
time.
A complete list of all the built-in modules can be found in the Python
documentation in the section called “Python Module Index.” These
modules significantly add to the functionality of Python, reflecting the
“batteries included” philosophy of Python.
In addition to the built-in modules, Python functionality can be expanded
by using third-party libraries, more correctly referred to as packages. A bit
of clarification of the terminology is in order. A module in Python consists
of a single file with a .py extension. The name of the module is the same as
the name of the file without the .py extension. A package in Python is a
collection of modules under a common folder. The folder is created by
placing all the modules in a directory with a special __init__.py file. When
you import a module, you use the syntax import
<module>.
is used when importing a package—for example, import
The same syntax
arcpy.
ArcPy is a
package and consists of many modules and other elements, but when you
want to bring the functionality of ArcPy into a script, you treat it like a
module. Therefore, when you are importing modules into your script, you
are referring to both modules and packages using the import statement. So,
the terms module and package are often used interchangeably, but they are
not the same in terms of how their code is organized.
In addition to the terms module and package, you also will see the term
library. In some programming languages, the term library has a specific
meaning, but this is not the case in Python. When used in the context of
Python, a library loosely refers to a collection of modules. The term
standard library in Python refers to the syntax and semantics of the Python
language, which comes bundled with the core Python installation, including
the built-in modules. The term third-party library is used to refer to
components that can be added to Python beyond what is available in the
standard library. These components are typically in the form of packages,
and this term will be used here instead of the term libraries, which is much
looser in its meaning.
6.3 Python distributions
When you go to the Python home page to download and install the
software, you are getting Python, including the standard library. This
installation, however, is only one of many different Python distributions. A
Python distribution refers not only to the version of the software (e.g.,
3.6.9), but also to the operating system that it is intended for (e.g.,
Windows, Linux, Mac) and the packages that may have been added.
Different distributions target different audiences and usage scenarios.
ArcGIS Pro includes a custom distribution that installs the Python version
that works with ArcGIS Pro (e.g., version 3.6.9 for ArcGIS Pro 2.5) as well
as the relevant packages, including ArcPy. This is referred to as the
“ArcGIS Pro Python distribution.” This distribution includes a package
manager called conda, which the next section explains in more detail.
Conda is tightly coupled with the Anaconda distribution of Python.
Anaconda is a private company (formerly called Continuum Analytics) that
specializes in developing distributions for Python and R with a focus on
data science applications. Therefore, when you launch your Python IDE, the
first line of code reads:
Python 3.6.9 |Anaconda, Inc.|
The Anaconda distribution is a large distribution with over 1,500 packages
for Python and R. The ArcGIS Pro Python distribution does not use the
Anaconda distribution, but because it uses conda, the Anaconda name
appears.
Note: Although Anaconda is a private company, its focus is on the
development of open-source software. The Anaconda distribution is
free, and conda is open source. Anaconda provides additional services,
support, and training for a fee, but there is no requirement to use any of
these services to use conda or the Anaconda distribution.
6.4 Python package manager
Considering the many different packages that are available to Python, it is
important to be able to manage them effectively. Managing means being
able to add a package, update a package, remove a package, and check
which packages are installed. These tasks are carried out by a Python
package manager.
Python has a built-in package manager called PIP, which is available as a
module called pip. PIP is part of the core Python distribution and can be
used to perform package management tasks from command line. For
example, to install a package, you use the following command:
pip install <name of package>
PIP is widely used by Python developers, but it can be cumbersome. One of
the challenges is the sheer volume of packages available. The Python
Package Index, or PyPI, is an online resource that contains tens of
thousands of different packages. PIP is designed to find the packages you
are looking for in PyPI and install them. However, sorting through which
packages you need, managing different versions of these packages, and
keeping track of which packages are installed can become challenging. In
addition, PIP primarily is intended to handle pure Python packages. A pure
Python package contains only Python code and does not include compiled
extensions or code in other languages. Many Python packages include
compiled extensions, which PIP does not handle well.
As an alternative, the Anaconda distribution comes with its own package
manager called conda, which is the preferred way to manage packages for
ArcGIS Pro. Not only is conda used as a package manager, it also is used to
manage Python environments. The next section explains what Python
environments are, followed by more details on the use of conda.
Although PIP is an excellent package manager, conda is the recommended
package manager for ArcGIS Pro. This recommendation comes, in part,
because conda can be used to manage both environments and packages.
However, PIP still has some value to managing packages that is not
available through conda. Conda has a few benefits, including the fact that it
is part of the Anaconda distribution, which is widely used in the Python
community. It is also the preferred package manager for ArcGIS Pro. Esri
has created a user-friendly interface to conda that is integrated in ArcGIS
Pro software.
6.5 Python environments
The ability to add packages to the core Python installation makes it possible
to accomplish more sophisticated tasks with your scripts. It also introduces
additional complexity because different projects or tasks may require
different packages. Many projects may require no additional packages at all,
whereas others may require a substantial number. Different projects may
require different versions of the same package. One specific package may
require another package for it to run. In other words, each of your scripts
may require a different set of packages to run successfully, and these
requirements can include packages beyond the ones you import directly into
your scripts. The package requirements for a specific project are referred to
as dependencies. To manage these dependencies, Python uses so-called
virtual environments. A virtual environment, in this context, is a unique
installation of Python and any packages that have been added. Instead of
installing a different version of Python with a specific set of packages on a
different computer, you can create many virtual environments on the same
computer. The environments are called “virtual” because they replicate
what a different installation on a different computer would look like but, in
fact, reside on a single computer. Instead of virtual environments, these
isolated configurations often are referred to as Python environments or
simply environments. Because ArcGIS Pro uses conda as the package
manager, you also will see the term “conda environments.” Note that
Python environments are not the same as geoprocessing environments in
ArcGIS Pro. These two types are completely unrelated, even though they
use the same term.
Note: Just to revisit the difference between a Python distribution and a
virtual environment, a distribution is the version of Python that is
installed on a computer and all the packages that it comes with. A
typical user often needs only a single distribution. A virtual environment
controls which packages are available at runtime, which is usually a
small subset of all the packages that are installed. A typical user must
have at least one virtual environment but often will use several different
ones, and these can be switched relatively easily, using the same
distribution.
ArcGIS Pro has a default environment. When you first install ArcGIS Pro
and start writing scripts in Python, you are using the default environment.
The default environment is called arcgispro-py3. This default environment
logically includes the Python standard library, but it also includes many
other commonly used packages. You can view the default environment by
going into the Python Package Manager. In ArcGIS Pro, click on the
Project tab, and then click Python, which brings up the Python Package
Manager, also referred to as “Python backstage.”
The Python Package Manager provides a user interface to conda, even
though there is no direct reference to conda here. This interface was
developed as part of ArcGIS Pro to make it easier to use conda. Section 6.8
explores the use of command line to use conda as an alternative to the
Python Package Manager.
The project environment chosen by default is called arcgispro-py3. This
name is a reference to the folder in which the files for this environment are
installed, and it is typically located here: C:\Program
Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3. As an aside, the files for
ArcPy are in a slightly different location: C:\Program
Files\ArcGIS\Pro\Resources\ArcPy.
The Python Package Manager shows which packages are installed with the
default environment. The default environment includes all the packages
needed to support Python-related functionality in ArcGIS Pro, as well as
some other packages to support typical GIS workflows. You will not see
ArcPy on this list. Because you are using ArcGIS Pro, ArcPy is installed by
default, and this installation cannot be modified. You will also not see any
of the modules from the standard library, such as math or os. What you do
see on the list are widely used third-party packages such as NumPy and
SciPy. You can scroll through the list of packages, and click on an entry to
read a description.
The side panel includes the version number of the package, as well as a link
to the home page of the package. Notice that the Uninstall button is dimmed
because you cannot remove the package from the default environment.
The list of packages can be broken down into several major categories,
including the following:
Jupyter Notebooks and tools necessary to support them (IPython,
Jupyter console, JupyterLab, nbconvert for notebook conversion)
data analysis (Pandas, openpyxl to work with Excel files)
handling dates (pytz, python-dateutil)
visualization (Matplotlib)
handling web data (urllib3, Requests)
scientific data (h5py to work with HDF5 data, netCDF4 to work with
netCDF data, NumPy to work with arrays)
scientific routines and statistics (SciPy)
general Python utilities (future, pip)
Note: Packages in Python code always use lowercase, but the names of
the packages sometimes uses uppercase characters. For example, the
numpy package is commonly referred to as NumPy, and the requests
module is referred to as Requests.
When a package is installed as part of the default environment, you can use
the package immediately in your scripts. For example, you can run import
numpy,
and it will not result in an error.
The 100 or so packages that are installed with the default environment are
only a small subset of all the potentially available packages in the ArcGIS
Pro Python distribution. To view the rest of the packages, click Add
Packages in the Python Package Manager. You can scroll through the list of
packages, or search for a specific one by name. As an example, one of these
packages is named scikit-learn.
The scikit-learn package includes many different algorithms for machine
learning. This package is not included in the default environment. Notice
that the Install button is dimmed because it cannot be added to the default
environment. Therefore, when you run import sklearn, it results in an
error. Even though the package is referred to as “scikit-learn,” it is called
sklearn when importing it.
Notice how the error is No
module named 'sklearn'.
To use this package, it
first must be added to the environment. Adding a package requires a new
environment separate from the default environment.
6.6 Manage environments using conda
The default environment arcgispro-py3 cannot be modified. The
environment is kept in a pristine state. By keeping the default environment
pristine, you can switch back to the default if a certain environment no
longer works.
As a result, if you want to add a package for use in your scripts, you first
must create a new environment. You can create one using the Python
Package Manager. Click the Manage Environments button.
In the Manage Environments dialog box, you can add previously created
environments, clone the default environment, and remove environments
you no longer need. If you have not previously worked with environments,
the only environment listed is the default arcgispro-py3 environment.
You can make a copy of the default environment by clicking the Clone
Default button at the top or the Clone icon to the right of the default
environment. When you click Clone Default, the name and path of the
environment are chosen for you. When you click the Clone icon to the right
of the default environment, the Clone Environment dialog box appears. The
name and path are filled in with their default names, but you can change
these names if you want.
The default location for new and cloned environments is as follows:
%LocalAppData%\ESRI\conda\envs
The notation %LocalAppData% is used to indicate a general location in the
user profile. For a typical user, this location corresponds to the following
folder on the local computer:
C:\Users\<Your Name>\AppData\Local\ESRI\conda\envs
After you confirm the name and path of the new environment, click Clone
to proceed. The new environment is added to the list of environments in the
Manage Environments dialog box. The installation may take several
minutes as the packages are being copied.
Once you create a new environment, it must be activated before it can be
used. In the Manage Environments dialog box, choose the environment you
want, and click OK. When you choose a different environment, you must
restart ArcGIS Pro before the new environment changes take effect. There
is a warning message at the bottom of the Manage Environments dialog box
to remind you to restart.
You also can remove an environment in the Manage Environments dialog
box by clicking the Remove button to the right of a specific environment.
You cannot remove the default environment. You also cannot remove the
active environment of the current project, so to remove an environment, you
must first activate a different environment.
The Manage Environments dialog box shows all the environments that
reside in two well-known locations. The first location is where the default
arcgispro-py3 environment is located:
C:\Program Files\ArcGIS\Pro\bin\Python\envs
The second location is where new environments are typically located:
C:\Users\<Your Name>\AppData\Local\ESRI\conda\envs
If you have created an environment in a different location, it can be added
using the Add button in the Manage Environments dialog box.
Cloning an environment using the Python Package Manager may not
always work. If cloning fails, an error message appears below the
environment where normally the location is shown.
Hovering with your cursor over the exclamation symbol brings up the
specific conda error. When these errors persist, it is recommended to use
conda with command line as explained later in this chapter.
6.7 Manage packages using conda
The Python Package Manager in ArcGIS Pro makes it easy to add a
package to a specific environment. To add a package to the active
environment, click the Add Packages button in the Python Package
Manager. You can scroll to the package of interest, or search for it by name.
Consider the example of the scikit-learn package used earlier.
When the active environment can be modified, the Install button can be
clicked. Once you click Install, you are prompted with the Install Package
dialog box to confirm the installation. In the case of scikit-learn, the
package depends on several other packages, which are listed. This list of
packages is an example of the dependencies discussed earlier.
Note: The specific packages that must be installed or updated because
of dependencies will vary over time as new versions are released.
Confirm that you agree to the terms and conditions, and then click Install.
For some packages, the installation can take a considerable amount of time
(i.e., several minutes or more), especially if there are many dependencies.
Not all packages have dependencies, and as a result, the Install Package
dialog box may show only the license agreement.
Once the new package is installed, it can be used in the Python window. In
the case of scikit-learn, following the installation of the package, you can
run import sklearn in the Python window.
By default, the most recent version of a package is shown under Add
Packages. Certain projects, however, may require a specific version of a
package, so there is a drop-down option under the Versions column to
choose an older version. Only those versions compatible with the current
version of Python are shown.
Packages may become out of date over time. Click the Update Packages
button in the Python Package Manager to see a list of packages installed
with the active environment, and review which updates are available. You
can choose a specific package to update, or you can click the Update All
button to update all the packages for the active environment. You cannot
update packages for the default environment.
Packages can be removed from the active environment by clicking the
Installed Packages button and viewing the list of installed packages,
choosing a package from the list, and clicking the Uninstall button. It is
possible to remove packages, including those required by ArcGIS Pro, so
you must be careful with removing packages. For example, it is possible to
remove the Python package, which would essentially make your
environment useless. If this happened, you would see an error message in
the Python window such as that shown in the figure.
Typically, it is difficult, if not impossible, to fix these issues. Instead of
trying to fix the active environment, switch back to the default arcgispropy3 environment and proceed with removing the environment that went
bad. Then start over by cloning the default environment. This is one reason
why you cannot remove packages from the default environment. Keeping
the default environment pristine prevents having to reinstall ArcGIS Pro
software because of a bad environment.
6.8 Using conda with command line
The Python Package Manager provides a user-friendly way to manage
environments and packages. It is important to realize, however, that this is
simply a user interface developed as part of ArcGIS Pro to run the conda
package manager. Sometimes you may want to use conda directly using
command line. Some Python developers prefer using command line, and it
is good to know how to use it.
The full command reference can be found in the online conda
documentation. What follows are some of the key commands that are used
to manage environments and packages for ArcGIS Pro.
To start the command prompt in Windows, search for the application called
Python Command Prompt. This application resides in the ArcGIS program
group and brings up the command prompt using the default environment.
A few notes about what the command prompt shows are in order. First, the
portion (arcgispro-py3) shows the active environment, which controls
which version of Python is running and which packages are available.
There is no reference to the current version of Python because it is implicit
in the environment being used. Second, the portion C:\Program
Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3 shows where the current
environment is located as a starting point, but this path can easily be
changed. For example, the cd\ command jumps back to the root, and cd
<dir>
jumps to an existing directory.
Note: This interface uses the regular Windows command prompt
initialized with the arcgispro-py3 environment. When you enter
commands here, you are not using Python, but you are using DOS
commands. Because you are working with a virtual environment using
arcgispro-py3, other commands can be used that are not part of the
DOS commands, including those for conda.
To work with conda, you don’t need to worry about the folder because the
conda commands work regardless of which folder is being used. This
flexibility is one of the benefits of working with a conda environment.
You can now type your conda commands. It is helpful to start with listing
all the environments that are available using the info command in conda.
Type the following at the prompt, and press Enter:
conda info --envs
In this example, info is the conda command, and --envs is a named
argument. This command is similar to using a function with an argument,
although the syntax is a little different. Unless you have already created
new environments, this command will show only the default environment
arcgispro-py3 located in C:\Program Files\ArcGIS\Pro\bin\Python\envs.
You can use the create command in conda to clone the default
environment, as follows:
conda create --clone arcgispro-py3 --name testing-env
In this example, create is the conda command, and --clone and --name are
the named arguments. The --clone argument indicates which environment
must be cloned, and the --name argument gives the new environment a
name. There is no need to specify the location of the default environment. If
you don’t specify the location of the new environment to be created, it will
be created in the default location C:\Users\<Your
Name>\AppData\Local\ESRI\conda\envs. The name of the new
environment can be anything of your choosing, but it should not contain
any spaces.
When the command is executed, you will receive a series of messages,
including a possible warning message that your path contains spaces. Once
the execution is finished, you can navigate to C:\Users\<Your
Name>\AppData\Local\ESRI\conda\envs to confirm that a new folder with
the name of your environment has been created. When you start ArcGIS
Pro and navigate to the Python Package Manager, you will see a new entry
in the Manage Environments dialog box.
You also can clone an existing environment other than the default by
specifying the name, as follows:
conda create --clone testing-env --name testing2-env
Again, there is no need to provide the path for the environments when you
are using the default location C:\Users\<Your
Name>\AppData\Local\ESRI\conda\envs.
The conda remove command can be used to remove an existing
environment:
conda remove --name testing2-env --all
The conda command remove also can remove specific packages or features,
but the --all argument indicates that the entire environment is to be
removed—i.e., all packages.
You cannot remove the default arcgispro-py3 environment.
Once an environment is created, it must be activated. To activate the
environment and make it the default for future sessions of ArcGIS Pro, you
can use the proswap command:
proswap testing-env
This command returns a message that the active ArcGIS Pro environment is
changed to your new environment. The proswap command is not part of the
standard conda commands but is unique to the ArcGIS Pro installation.
Notice how the proswap command is not preceded by conda. You can test
the result by launching ArcGIS Pro and checking the active environment in
the Python Package Manager.
You can use the same command to swap back to arcgispro-py3:
proswap arcgispro-py3
Using proswap does not change the environment of the command prompt
because it changes only the default for future sessions of the ArcGIS Pro
application. You can activate an environment for the current command
prompt session using the activate command:
activate testing-env
Note that the activate command is not preceded by conda. The result is that
the next line in the command prompt starts with the name of the new active
environment.
The new active environment is for only the current command prompt
session, and it has no effect on ArcGIS Pro.
Managing environments (and their packages) using conda in the command
prompt can be carried out while ArcGIS Pro is running because it does not
impact the current session of ArcGIS Pro and the environment being used.
For any changes to take effect, you must restart ArcGIS Pro. It is
recommended that you don’t make changes in the command prompt using
conda to an environment that is being used in the current session of ArcGIS
Pro.
In addition to managing environments, you can use conda in the command
prompt to manage packages. First, you must make sure that the proper
environment is activated:
proswap testing-env
Then you can use the install command in conda to add a specific package
to the active environment. For example, here is the code to add the scikitlearn package:
conda install scikit-learn
This command prompts you with a confirmation message about what will
be installed and updated. Note that the name of the package to be installed
is scikit-learn, which is not the same as how the package is referenced when
importing it into a script (i.e., sklearn). Trying to use conda install
sklearn
results in a message that no package by that name can be located.
For many packages, however, these names are identical.
Type y (for yes) to proceed with the installation. Once the installation is
completed, you can confirm the addition of the package by using the list
command in conda:
conda list
This command produces the same list you see in the Python Package
Manager for the active environment. When installing a new package, the
version number is not required, and by default, the most recent, compatible
version is installed. The version number is needed only when you are
installing an earlier version of the package. For example, the most recent
version of scikit-learn at the time of writing is version 0.21.3. To install an
earlier version, you can specify the version number, as follows:
conda install scikit-learn=0.20.3
Note that there are no spaces around the equal-sign. This is not Python
code, and the use of spaces would result in an error.
Note: The ArcGIS Pro Python distribution includes the most recent
versions of the packages that are compatible with the Python
environment. Newer versions of the packages may already be released
but have not been tested for compatibility. For example, the most recent
version of the scikit-learn package for ArcGIS Pro 2.5 is version 0.21.3,
but the most recent release of scikit-learn is version 0.22.2 at the time of
writing. You should not try to install more recent versions of the
package than those that are part of the ArcGIS Pro Python distribution
to avoid potential conflicts.
You can remove a package in a similar manner using the uninstall
command in conda:
conda uninstall scikit-learn
There is no need to include a version number here, regardless of which
version was installed, because there can be only one version of a package in
an environment.
It is not uncommon for Python packages to depend on each other. For
example, the scikit-learn package works correctly only if several other
packages are also installed in the same environment. In addition, some
packages may need to be updated to work with the newly added packages.
These dependencies of packages on each other are one of the main reasons
to use a package manager such as conda because it keeps track of the
dependencies and manages all the packages at the same time for a given
environment.
6.9 Environment and IDEs
When you create and activate a new environment for ArcGIS Pro, the
current session uses this new environment. As a result, the packages you
added are available immediately in the Python window. When using an IDE
such as PyCharm or Spyder, you must configure your IDE to use the
specific environment. This is similar to setting up your IDE to use the
default environment, but you must point to the location of the new
environment. Both IDLE and Spyder require a separate application for each
environment, whereas a single installation of PyCharm can be used for any
number of environments.
Points to remember
The core Python installation comes with about 200 built-in modules.
You can import these modules into a script to make their functionality
available.
In addition to the built-in modules, the functionality of Python can be
expanded by using third-party packages. These packages are
organized in the online Python Package Index, or PyPI, which
contains numerous packages.
Python is installed as part of the standard ArcGIS Pro installation.
ArcGIS Pro uses a specific Python distribution, which includes some
of the most important packages used in GIS and data analysis
workflows.
To install, maintain, and keep track of packages, ArcGIS Pro uses a
package manager called conda. You can work with conda using
command line, but the Python Package Manager in ArcGIS Pro also
provides a user interface to some of the most important functionality
of conda.
In addition to managing Python packages, conda is used to manage
different Python environments, which allows you to create and use
different collections of packages for different projects.
The default environment in ArcGIS Pro is called arcgispro-py3,
which includes more than 100 packages, including many that are
required for Python-related tasks in ArcGIS Pro. This default
environment cannot be modified. To install additional packages, you
can use conda to clone the default environment, and then make
changes to this cloned environment.
Because ArcGIS Pro can use more than one environment, it is
important to configure your Python IDE to use a specific
environment that includes the necessary packages for a given task.
Key terms
Anaconda distribution
conda
dependency
distribution
environment
library
module
package
package manager
Python Package Manager
standard library
third-party library
virtual environment
Review questions
What is a Python package?
What is the name of the default Python environment when running
ArcGIS Pro?
What are dependencies in Python?
Why are Python environments referred to as “virtual environments”?
Describe the process to create a new Python environment for ArcGIS
Pro and install packages using the Python Package Manager or conda
command line.
What steps are necessary to use a different Python environment for
ArcGIS Pro in IDLE, Spyder, and PyCharm?
Chapter 7
Essential Python modules and
packages for geoprocessing
7.1 Introduction
This chapter looks at some of the many modules and packages that
commonly are used to support GIS workflows using Python. There are
many thousands of different Python modules and packages, reflecting the
popularity and versatility of Python. The focus here is on a select few that
complement the functionality of ArcPy to support geoprocessing
workflows. They include a handful of Python’s built-in modules, which
have not been covered in earlier chapters, as well as several third-party
packages.
The Python standard library includes around 200 built-in modules. A
complete list of all the modules can be found in the Python Module Index of
the online Python documentation, including fileinput, math, os, random,
sys,
and time. These modules are part of any Python installation. You can
use the functionality of these modules directly by importing the module—
e.g., import os. This chapter looks at a few additional built-in modules to
carry out specialized tasks. When reviewing the modules in the Python
Module Index, be aware that some modules are deprecated, which means
they remain in the software but have been replaced with better alternatives.
They are maintained to improve backward compatibility but should not be
used for new projects.
In addition, the Python Package Index (PyPI) is an online repository of
more than 100,000 packages that can be added to your Python installation.
The best way to manage these packages is through conda. ArcGIS Pro
installs with the ArcGIS Pro Python distribution, which includes a small
subset of all the packages available in PyPI. Only a subset of those
packages are part of the default environment arcgispro-py3 when ArcGIS
Pro is installed. Additional packages can be added to a cloned environment
by using conda. If you are interested in using a specific Python package,
first check if it is part of the default environment. If it is, you can proceed
using the default environment. If not, review the steps in chapter 6 on
creating a new conda environment and adding a package. All the packages
in this chapter are part of the default environment.
Table 7.1 summarizes the modules and packages covered in this chapter.
The type is indicated as “standard library,” which means it is part of the
standard Python installation, or arcgispro-py3, which means the package
has been added to the default environment in ArcGIS Pro. The distinction is
important because when you review the installed packages using the Python
Package Manager in ArcGIS, you will not see the standard modules listed,
but you can import them into your script.
Table 7.1. Standard modules and third-party
packages covered in this chapter
Task
Module or package Type
Working with FTP
ftplib
standard library
ZIP files
zipfile
standard library
XML files
xml.dom
standard library
Working with web pages urllib.request
standard library
Task
Module or package Type
CSV files
csv
standard library
Excel files
openpyxl
arcgispro-py3
JSON files
json
standard library
NumPy arrays
numpy
arcgispro-py3
Panda data frames
pandas
arcgispro-py3
Plotting 2D graphs
matplotlib
arcgispro-py3
There are many other packages of potential interest, but the table covers
some of the most widely used ones.
7.2 Working with FTP using ftplib
One of the earliest protocols to transfer files between computers is file
transfer protocol (FTP). FTP is designed for transfers between a client and
a server. FTP was in widespread use before other protocols, such as HTTP,
became popular. FTP has several security weaknesses, but it continues to be
used by many organizations to share their data publicly. One of the
advantages of FTP is that it allows you to transfer many files and folders,
and it maintains the folder structure. Many GIS portals no longer use FTP,
but being able to work with FTP continues to be an important skill.
Most FTP sites can be accessed using a regular web browser. For example,
the figure illustrates the FTP site ftp.hillsboroughcounty.org. The URL
starts with ftp:// instead of http://, used for websites.
A typical FTP site contains various folders or directories. Using a web
browser, you can click on these folders to navigate the directory structure
and locate the files of interest. For the example FTP site, many of the GIS
datasets of interest are located in the directory
ftp://ftp.hillsboroughcounty.org/gis/pub/corporate_data/.
Using a web browser, you can click on these files and download them, one
by one, to a local computer. In addition to using a web browser, you also
can use FTP client software to transfer files to and from an FTP site.
Popular applications include FileZilla and SmartFTP. FTP client software
makes it easier to transfer entire folders between a local computer and a
server.
Working with FTP sites using Python is accomplished using the ftplib
module, which is part of the standard library. A common scenario is to use a
script to download one or more specific files, or all the files in a folder. FTP
sites often are used to post frequent updates, and you can run the same
script repeatedly to obtain those updates.
Downloading files using FTP in Python requires a few steps, as follows: (1)
establishing a connection to an FTP site, (2) logging into the FTP site, (3)
navigating to a specific folder, and (4) retrieving the file(s) of interest. The
first two steps can be accomplished using the following lines of code:
import ftplib
server = "ftp.hillsboroughcounty.org"
ftp = ftplib.FTP(server)
ftp.login()
After importing the ftplib module, an FTP object is created by specifying
the FTP address. Note that the address does not include ftp:// because it is
implicit in working with FTP sites. The code example uses an anonymous
login. If a user name and password are needed, they are provided as the
second and third parameters of the FTP object in the form of strings, as
follows:
ftp = ftplib.FTP(server, "username", "password")
Once you log into an FTP server, you can start exploring its contents and
navigating through the folders. For example, you can use the dir() method
to examine the contents of the current directory:
ftp.dir()
This code prints a list of the folders and files in the root of the FTP site.
The root typically does not contain the files of interest. The next step,
therefore, is to navigate to the folder of interest by using the cwd() method
and providing the subfolder as a string:
ftp.cwd("gis/pub/corporate_data")
Once inside the correct folder, you can list all the files inside a directory
using the nlst() method:
ftp.nlst()
The files are returned as a list:
['Acq_elapp_1000_Buffer.zip', 'Airports.zip', ...]
As an alternative, the msd() method can be used to list all the files. This
method provides more control (including being able to better separate
folders from files), but not all FTP sites support this method.
Next, the specific file of interest must be specified. The retrbinary()
method of the FTP object obtains the file, but to save the file, a local copy of
the file first must be opened, and then written when it is retrieved, as
follows:
filename = "Airports.zip"
localfile = open(filename, "wb")
ftp.retrbinary("RETR " + filename, localfile.write)
The argument "wb" means you are writing the file contents in binary mode.
The first parameter of the retrbinary() method consists of a string starting
with RETR followed by a space and the name of the file of interest. The
second parameter writes the file locally. The retrbinary() method is used
for binary file transfer, which is appropriate for most files. To work with
plain text files, use retrlines() instead.
To finish the script, the local copy of the file must be closed, and you must
disconnect from the server, as follows:
localfile.close()
ftp.quit()
Finally, you must be aware of where the downloaded files end up. By
default, they are saved to the current working directory of the script, which
is where the script itself is located. Depending on your IDE, however, they
may end up in a different location. To control where the files are saved,
change the working directory in the script, as follows:
import os
os.chdir(<yourworkspace>)
The complete script now is as follows:
import ftplib
import os
os.chdir("C:/Demo/Downloads")
server = "ftp.hillsboroughcounty.org"
ftp = ftplib.FTP(server)
ftp.login()
ftp.cwd("gis/pub/corporate_data")
filename = "Airports.zip"
localfile = open(filename, "wb")
ftp.retrbinary("RETR " + filename, localfile.write)
localfile.close()
ftp.quit()
This script downloads only a single file. To download more files, create a
list of the files of interest, and then iterate over this list in a for loop. In the
example script, only lines 8–11 must be changed. It also is possible to
download all the files to a folder using the nlst() method to generate a list
of all the files. Before doing so, you may need to check the contents of a
folder in terms of the total number of files and their file size because FTP
sites often are used to host large datasets.
For those with administrative privileges, the ftplib module provides
additional functionality to create a new folder using mkd(), delete a file
using delete(), and rename a file using rename(), among other tasks. You
also can upload files using the storbinary() method for binary files and
storlines()
for plain text files.
The file type used in the example consists of ZIP files, which are common
in GIS. You can use ftplib to transfer any file type, not only ZIP files. The
next section looks at how to work with ZIP files.
7.3 Working with ZIP files using zipfile
Many GIS datasets are large and consist of many individual files. Consider
a shapefile that includes several files with the same name and different file
extensions (e.g., .shp, .shx, .dbf, .prj, and so on). Or consider a file
geodatabase, which consists of a separate folder with numerous files.
Transferring individual files is cumbersome and may corrupt the folder
structure. A widely used approach to facilitate file transfer is to use ZIP
files.
ZIP files were originally developed as a format to support lossless data
compression, which reduces the size of a file without sacrificing the quality
of the data. In addition, ZIP files make it possible to combine many files,
including the underlying folder structure, into a single file. This ability
makes ZIP files a preferred format to package GIS datasets and supporting
files into a single file for the purpose of transferring. Transferring files is
not limited to FTP, and ZIP files can be shared by email, HTTP, and other
mechanisms.
A ZIP file generally uses the .zip file extension. Operating systems
recognize this file type and have built-in tools to create and extract ZIP
files. Many utilities also work with ZIP files, including WinZIP and 7-Zip
on the Windows platform, which provide additional control over the process
of creating and extracting ZIP files. There are several other formats, which
have similar compression and archiving abilities, including .7z, .dmg, .gz,
and .tar. For the purpose of this section, the focus is on ZIP files only, but
similar steps can be accomplished using these other formats.
Working with ZIP files using Python is accomplished using the zipfile
module, which is part of the standard library.
Note: Python also has a built-in zip() function, which works with
iterators and has nothing to do with ZIP files. Make sure to use the
zipfile module when working with ZIP files.
The main class of the zipfile module is ZipFile. In a typical script, you
create a ZipFile object by pointing to an existing ZIP file or by creating a
new one, and then you carry out specific tasks using the methods of this
object. Creating a ZipFile object by pointing to an existing ZIP file works
as follows:
import zipfile
zip = open("C:/Demo/test.zip", "rb")
myzip = zipfile.ZipFile(zip)
The argument "rb" means you are reading the file contents in binary mode.
You can check the contents of the ZIP file using the namelist() method, as
follows:
for file in myzip.namelist():
print(file)
You can use the same method to iterate over all the files in the ZIP archive
and extract them one by one:
for file in myzip.namelist():
out = open(file, "wb")
out.write(myzip.read(file))
out.close()
This code extracts each file to the current working directory, which by
default is the location of the script. You can change this directory using
os.chdir(). As an alternative, you can specify a path when saving each
local file:
out = open("C:/Temp/" + file, "wb")
In most cases, however, you don’t need to iterate over the files in a ZIP file
one by one because you simply want to extract all files, which can be done
more easily using the extractall() method. The entire script is shown for
clarity, as follows:
import zipfile
zip = open("C:/Demo/test.zip", "rb")
myzip = zipfile.ZipFile(zip)
myzip.extractall()
You can specify a folder to extract the files to, using the following:
myzip.extractall("C:/Temp")
You can use the zipfile module to create a new ZIP archive for one or
more files. You create a ZipFile object by specifying a new .zip file that
does not exist yet and using the "w" argument to have write access. You also
can specify the compression type. If the compression type is left out, the
default ZIP_STORED is used. The code to add a single file to the ZIP
archive is as follows:
import zipfile
zfile = zipfile.ZipFile("mytiff.zip", "w",
zipfile.ZIP_DEFLATED)
zfile.write("landcover.tif")
zfile.close()
The same approach can be used to iterate over a list of files in a directory.
The script iterates over the list and adds each file to the same ZIP archive
using the write() method. For example, the following script creates a list of
all the files in a folder using os.listdir, and then adds each file if they use
a particular file extension:
import os
import zipfile
zfile = zipfile.ZipFile("shapefiles.zip", "w")
files = os.listdir("C:/Project")
for f in files:
if f.endswith(("shp", "dbf", "shx")):
zfile.write("C:/Project/" + f)
zfile.close()
Note that most shapefiles include additional file extensions. A more robust
approach is to use an ArcPy function such as ListFeatureClasses() to
create a list of all shapefiles in a folder, and then add all the files with the
same base name, regardless of their file extension.
It also is possible to work with entire folders and their contents. To do so,
use os.walk() to create a list of the paths of all the files inside a folder. This
approach preserves the folder structure as well. The following script creates
one ZIP archive of the entire contents of one specific folder, including all
subfolders:
import os
import zipfile
mydir = "C:/Demo/Project"
zfile = zipfile.ZipFile("newzip.zip", "w")
for root, dirs, files in os.walk(mydir):
for file in files:
filepath = os.path.join(root, file)
zfile.write(filepath)
zfile.close()
This approach can be used to create a ZIP archive for a folder that contains
one or more file geodatabases, because, from a file management
perspective, a file geodatabase is a folder with many files. It is impractical
to work with those individual files so you can add the entire folder to a ZIP
archive instead.
7.4 Working with XML files using xml.dom
Extensible Markup Language (XML) is a common file format used for a
variety of applications. XML is used to structure, store, and transfer data
between different systems, including GIS. XML is comparable to HTML,
but HTML is used mostly for display purposes, and XML is used mostly to
store data. In its simplest form, XML consists of text with carefully placed
tags. These tags make it possible to identify specific elements within the
XML file. Formats such as KML (used in Google Earth) and GPX (used by
GPS devices) are specialized versions of XML. Microsoft Office documents
(e.g., .docx, .xlsx) are also XML-based.
XML files use tags, but they are organized in a hierarchical structure
referred to as a tree structure. This structure consists of a root element, child
elements, and element attributes. The root element is the parent to all other
elements. Elements in XML also are called nodes. The basic structure of
XML is as follows:
<root> <child> <subchild> … </subchild> </child></root>
Notice how the tags come in pairs—e.g., <child> and </child>. The first of
these tags is the starting tag, and the second one that uses a forward slash (/)
is the ending tag. The example tags illustrate the tree structure, but when
opened as text, the XML looks as follows:
<root> <child> <subchild> … </subchild> </child> </root>
Because of the tree structure, tags appear “nested,” which makes it
cumbersome to use simple string manipulation to work with XML files. It is
possible to read an XML file as text and use the tags to find the information
you are looking for. However, this approach is prone to errors, and therefore
specialized Python modules are used to facilitate the process of reading
XML files. To read the contents of the XML file, the document first must be
parsed. Parsing an XML file breaks down the file into its tree structure.
Once the tree structure is created, you can navigate up and down this
structure to locate the elements or nodes of interest.
Python has several built-in modules to work with XML files, including
xml.dom, xml.sax, and xml.etree (called ElementTree). Third-party
packages for working with XML files are available, including Beautiful
Soup. Each of these modules and packages have their strengths and
weaknesses.
Consider the following example of a simple KML file, shown as simple
text:
<Placemark>\n <TimeStamp>\n \<when>2020-0114T21:05:02Z</when>\n </TimeStamp>\n <styleUrl>#paddlea</styleUrl>\n <Point>\n
<coordinates>-122.536226,37.86047,0</coordinates>\n </Point>\n
</Placemark>
This example is essentially a point location with a time stamp and a pair of
coordinates.
The values of interest in this case are the coordinates, and the processed
result should look something like this:
x,y,z (-122.536226, 37.86047, 0.0)
It is easy enough to manually copy and paste these values from a text file,
but the process must be automated to work with XML files containing
thousands of point locations, as well as more complex data.
The following code illustrates the use of the xml.dom.minidom module to
obtain the coordinates. The script parses the XML file, and then reads the
values of the element of interest, as follows:
from xml.dom import minidom
kml = minidom.parse("example.kml")
placemarks = kml.getElementsByTagName("Placemark")
coords = placemarks[0].getElementsByTagName("coordinates")
point = coords[0].firstChild.data
x,y,z = [float(c) for c in point.split(",")]
This script is broken down to explain how to use the xml.dom.minidom
module. The minidom module is a simplified implementation of the
Document Object Model (DOM). DOM is an API used by many different
programming languages to work with XML files. A typical script starts by
parsing an XML file (or KML file, in the example) into a Document object,
which is essentially the tree structure of elements. For parsing, use the
parse() function. Once the tree structure of elements inside the XML file is
obtained, various methods of the Document object can work with these
elements. The getElementsByTagName() method obtains a list of Element
objects on the basis of a specific tag. In the example, the elements of
interest are called Placemark. Within a given element, the same
getElementsByTagName() method is used to move on to the next tag, which
represents the next level in the tree structure. Once the element of interest is
located, the firstChild.data property returns the values as a string. Finally,
to obtain the correct output, the string is split, and the values are cast as a
float.
Several alternatives are possible. For example, the childNodes() method
can obtain a list of elements from an XML file, and the getAttribute()
method can obtain the value of an element.
As the example illustrates, working with XML files is a bit like using smart
text processing—i.e., working through the hierarchy of tags, searching for
specific tags, and breaking up strings into pieces of information for further
use.
There are several challenges to working with XML files, including the
existence of several different “flavors” of XML, and incomplete or missing
tags. These characteristics of XML partly explain why there are several
modules and packages to work with XML files, each with their own
strengths and weaknesses relative to a given task.
HTML files and XML files have a lot in common, including the use of tags.
Therefore, some of the techniques used to process XML files also can be
used to work with HTML files. For example, the popular package Beautiful
Soup is used to work with XML files and also for web scraping of pages in
HTML format.
7.5 Working with web pages using urllib
Web pages often are used as a source of information in GIS workflows. A
web address is a uniform resource locator, or URL. Python’s standard
library includes the urllib package, which consists of several modules to
work with URLs. This section focuses on the use of the urllib
.request
module for opening and reading URLs.
Note: Python 2 includes the modules urllib, urllib2, and urlparse to
work with web pages. These modules are replaced in Python 3 by a
single package called urllib, which is different from the urllib module
in Python 2. Any code written for Python 2 that employs the urllib,
urllib2,
or urlparse modules should be carefully reviewed and
updated.
To open a web page, start by importing the urllib.request module, and
then use the urlopen() function. Once you open a web page, you can start
reading its contents using the read() method, as follows:
import urllib.request
url = urllib.request.urlopen("https://www.esri.com/")
html = url.read()
Reading an entire web page is not common. A more typical scenario is to
download one or more files from a web page. To do so, use the
urlretrieve() function. The arguments of this function are a URL and a
local file name. The following example downloads a ZIP file and saves it as
a local file:
import urllib.request
url = "http://opendata.toronto.ca/gcc/bikeways_wgs84.zip"
file = "bikeways.zip"
urllib.request.urlretrieve(url, file)
The local file name can be identical to the file being downloaded, but it also
can have a different name if it keeps the same file extension. The local file
is saved to the current working directory, which is typically the location of
the script. Similar to downloading files using FTP, this directory can be
changed using os.chdir().
Some files, such as TXT or CSV, can be read directly using the urlopen()
function, as follows:
import urllib.request
url = "https://wordpress.org/plugins/readme.txt"
content = urllib.request.urlopen(url)
for line in content:
print(line)
The urllib.request module includes functionality for related tasks,
including authentication, working with proxies, and handling cookies.
Additional modules of the urllib package include functionality for error
handling and parsing URLs. Although urllib provides solid functionality to
open and read web pages, a widely recommended alternative is the
requests package. This package is not part of the standard library and must
be installed as a package, but it is part of the argispro-py3 default
environment. The requests package is one of the most popular Python
packages and has become the de facto standard for working with web
pages.
The following example illustrates how to use the requests package to
download a ZIP file:
import requests
url = "http://opendata.toronto.ca/gcc/bikeways_wgs84.zip"
file = "bikeways.zip"
response = requests.get(url)
open(file, "wb").write(response.content)
The get() function of the requests module creates a Response object. This
object gives you access to the contents of the web page using the content
property. To save the contents of the web page, a local file is opened, and
the contents are written. This process may appear slightly more elaborate
relative to using urllib, but the Response object provides great versatility.
For example, it can work with JSON objects, which have become a popular
way to share tabular and spatial data.
7.6 Working with CSV files using csv
Plain text files are widely used to store and transfer data, but their lack of
formatting can present difficulties. Hence, the usefulness of commaseparated values (CSV). A CSV file is a plain text file in which the values
are separated by commas. CSV files are more robust than other forms of
plain text because the separator between values is predictable—i.e., a
comma.
Before looking at the use of CSV files, a brief review of working with text
files is in order to show the similarities and differences. One common
approach to working with text files is to use the open() function, as follows:
f = open("C:/Data/mytext.txt")
for line in f:
<some task>
f.close( )
This function commonly is used for small files but becomes an issue for
larger files because the entire file is read into memory. A good alternative is
the fileinput module, which creates an object to iterate over the lines in a
for
loop, as follows:
import fileinput
infile = "C:/Data/mytext.txt"
for line in fileinput.input(infile)
<some task>
A more complete example working with a text file follows. The example is
relevant because the same script is used to illustrate how to work with CSV
files (in this section) and Excel files (in the next section). The example
reads coordinates from a text file to create point features. Each line starts
with an ID number, followed by an x-coordinate and a y-coordinate. The
three values are separated by a space. These coordinates are stored in a
plain text file named points.txt, which is shown in the figure.
Description
These coordinates are used to create new point features. The script reads the
contents of the text file and uses the split() method to parse each line of
text into separate values for the point ID number, the x-coordinate, and the
y-coordinate. The script iterates over the lines of the input text file and
creates a Point object for every line. The final line of code creates the point
features. The use of with statements ensures proper closure and avoids any
data locks. The script is as follows:
import arcpy
fgdb = "C:/Demo/Data.gdb"
infile = "C:/Data/points.txt"
fc = "trees"
sr = arcpy.SpatialReference(26910)
arcpy.env.workspace = fgdb
arcpy.CreateFeatureclass_management(fgdb, fc, "Point",
"", "", "", sr)
with arcpy.da.InsertCursor(fc, ["SHAPE@"]) as cursor:
point = arcpy.Point()
with open(infile) as f:
for line in f:
point.ID, point.X, point.Y = line.split()
cursor.insertRow([point])
The result of the script is a new feature class called trees with several point
features.
The same coordinates stored in a CSV file are used in a different script that
follows the same general steps. The CSV version of the same coordinates is
shown in the figure. As expected, the separator between values is a comma,
and there are no spaces.
The CSV file is shown in Notepad, which does not apply any formatting.
Keep in mind, however, that if you have Microsoft Office installed, the
default application to open a CSV file is Microsoft Excel, even though a
CSV file is not a spreadsheet format.
Working with CSV files in Python is accomplished using the csv module,
which is part of the standard library. You start by importing the csv module,
opening the CSV file, and then using the csv.reader() function to read the
contents of the file, as follows:
import csv
f = open("C:/Data/test.csv")
for row in csv.reader(f)
<some task>
f.close()
This code looks similar to working with regular text files but uses the csv
module. One key difference is how the contents of each line (or row) is
read. For a text file, you must know what the separator is between the
values in a single line of text and use the split() method accordingly. In
contrast, for a CSV file, the delimiter is expected to be a comma, so the
separator does not need to be specified. Instead, when using the
csv.reader() function, the values are returned as a list. Following is the
complete script, with the changes highlighted:
import arcpy
import csv
fgdb = "C:/Data/Demo.gdb"
infile = "C:/Data/points.csv"
fc = "trees"
sr = arcpy.SpatialReference(26910)
arcpy.env.workspace = fgdb
arcpy.CreateFeatureclass_management(fgdb, fc, "Point",
"", "", "", sr)
with arcpy.da.InsertCursor(fc, ["SHAPE@"]) as cursor:
point = arcpy.Point()
with open(infile) as f:
for line in csv.reader(f):
point.ID, point.X, point.Y = line[0], line[1],
line[2]
cursor.insertRow([point])
The for loop can be written as follows for clarity:
point.ID = line[0]
point.X = line[1]
point.Y = line[2]
As the code example illustrates, the differences between the two scripts are
subtle. Generally, working with CSV files is more robust compared with
text files because their formatting is more predictable.
ArcGIS Pro recognizes CSV files as a data format. When viewed in the
Catalog pane, a CSV file appears as an entry with an icon that indicates
text, as shown in the figure.
When added to a map, a CSV file appears as a table, as indicated in the
figure.
CSV files can be used directly in many geoprocessing tools that use tables
as input, including XY Table To Point and Table To Table. When a CSV file
does not include a header row with field names, default field names are
added—i.e., Field1, Field2, and so on. The figure shows an example of the
use of a CSV file as the source for the input rows of a table.
Note: Properly formatted text files (i.e., using spaces or tabs as
separators) are recognized by ArcGIS Pro, and also can be used in
geoprocessing tools, but they are less robust.
Although CSV files can be used directly in ArcGIS Pro, using them in
Python gives you more flexibility.
7.7 Working with Excel files using openpyxl
Another widely used format for data manipulation is Excel spreadsheets. In
addition to tabular information, Excel files can contain other elements,
including formulas, graphs, and images. Although Excel files often are used
to manipulate tabular information, they are not a database format. Typical
rules that apply to database tables do not apply to spreadsheets. For
example, spreadsheets can contain empty rows and empty columns, which
is not supported in database tables.
Even though Excel spreadsheets do not represent a database format, they
are so widely used for data entry and manipulation that being able to work
with them in Python is a good skill. Many GIS workflows also use Excel
files, either as data input or as an output for use in other applications.
Several modules work with Excel files in Python, but the most widely used
one is openpyxl. This package is not part of the standard library in Python,
but it is installed as part of the arcgispro-py3 default environment in
ArcGIS Pro. The openpyxl package works with Excel files in the format
.xlsx. To work with files in .xlsx format, you can use the xlrd package as an
alternative. You also can use Pandas (discussed in section 7.10) to work
with either Excel format.
A typical Excel file is a bit different from tabular data in plain text or CSV
format. First, an Excel file can contain more than one worksheet, with each
worksheet representing a separate table. In addition to opening an Excel
file, you also must point to a specific worksheet. Second, data in Excel
worksheets is entered in cells, which are organized in rows and columns.
You must reference a specific cell, and then obtain its value. The openpyxl
module has functions and classes for these tasks.
The first step is to open an Excel file, also referred to as a “workbook.” You
can use the openpyxl.load_workbook() function, as follows:
import openpyxl
book = openpyxl.load_workbook("C:/Data/Example.xlsx")
This function returns a workbook object.
Next, methods of the workbook object can be used to obtain the worksheets
in the Excel file. You can get a list of all the worksheets using the
sheetnames property:
sheets = book.sheetnames
You can obtain a specific worksheet using the worksheet() method and
specifying the index number:
sheet = book.worksheets[0]
Once you have a single worksheet, you can start working with the cells and
their values by referencing the rows and columns. You must first reference a
cell, and then you can use the cell’s value. For example, a single cell can be
referenced as follows:
b3 = sheet["B3"]
print(b3.value)
The first line of code returns a Cell object, and the value property returns
the cell value. An alternative way to obtain a cell is to write out the column
and row number:
b3 = sheet.cell(column=2, row=3)
print(b3.value)
Note that the first row or column integer is 1, not zero. In order words, the
number is not an index number but an argument of the column and row
keywords.
A more typical scenario is to read through all the cells instead of one or
more specific cells. You can read through all the cells by iterating over the
columns using the iter_cols() method or by iterating over the rows using
the iter_rows() method. The following example reads all cell values in a
worksheet using iter_cols():
for col in sheet.iter_cols():
for cell in col:
print(cell.value)
Iterating over the rows using the iter_rows() method works as follows:
for row in sheet.iter_rows():
for cell in row:
print(cell.value)
Both approaches print all the values of all the cells but in a different order.
The second approach (i.e., reading the values row by row) is more typical
because it is similar to reading the lines in a text or CSV file.
By default, the iter_cols() and iter_rows() methods continue until there
are no more columns or rows left, respectively, with valid cell values. The
methods do not stop reading when there is an empty column but instead
continue reading rows and columns until there are no more cells left with
values.
An alternative to using iter_cols() and iter_rows() is to iterate over the
columns or rows from their starting value (number 1) until the maximum
number of columns or rows with valid cell values are read. This maximum
number can be obtained using the max_col or max_row properties of a
worksheet.
Because cell values can be empty, you may need to check whether a cell
contains a value or not. The code for checking whether a cell contains a
value is
if cell.value != None:
This code can be rewritten more simply as
if cell.value:
The same script used earlier to work with a CSV file is adapted as follows
to work with an Excel file. As a reminder, the script reads coordinates from
a file to create point features. The Excel version of this file is shown in the
figure.
The script to work with the Excel file in .xlsx format using the openpyxl
module follows. As before, changes relative to working with text files are
highlighted.
import arcpy
import openpyxl
fgdb = "C:/Data/Demo.gdb"
infile = "C:/Data/points.xlsx"
fc = "points"
sr = arcpy.SpatialReference(26910)
arcpy.env.workspace = fgdb
arcpy.CreateFeatureclass_management(fgdb, fc, "Point",
"", "", "", sr)
with arcpy.da.InsertCursor(fc, ["SHAPE@"]) as cursor:
point = arcpy.Point()
book = openpyxl.load_workbook(infile)
sheet = book.worksheets[0]
for i in range (1, sheet.max_row):
point.ID = sheet.cell(row = i, column = 1).value
point.X = sheet.cell(row = i, column = 2).value
point.Y = sheet.cell(row = i, column = 3).value
cursor.insertRow([point])
In this script solution, the load_workbook() function opens the Excel file.
The sheet of interest is selected by using worksheets[0], which selects the
first (and only) worksheet. Next, the script iterates over the rows of the
worksheet from the first row (row 1) until there are no more rows left
(max_row). An alternative iteration is to use sheet.iter_rows(). Because
there are only three columns total, the column numbers are specified in the
script. For many columns, an alternative solution is to iterate over the rows
and columns using for row in sheet.iter_rows(): and for cell in col:.
Finally, the cell values are obtained using the value property, and these
values are assigned to the ID, X, and Y properties of the Point object.
The openpyxl module can be used for many other tasks, including working
with cell formatting and styles; modifying colors, fonts, patterns, and
borders; working with formulas; and working with charts. In short, most
tasks carried out in Excel can be automated to some degree using Python.
In addition, it should be noted that ArcGIS Pro has two standard tools to
work with Excel files. These tools are Excel To Table, which converts a
single worksheet in an Excel file to an ArcGIS Pro–compatible tabular
format, and Table To Excel, which converts a table in ArcGIS Pro to an
Excel file with a single worksheet. Both tools support .xls and .xlsx
formats. These tools make assumptions about how tabular data is organized
in Excel, and they may not work on all tabular datasets. Modules such as
openpyxl give you more control over how to manage the data in an Excel
file (including the ability to work with multiple worksheets), but for wellformatted Excel files using a single worksheet, the standard tools in ArcGIS
Pro may suffice.
7.8 Working with JSON using json
JavaScript Object Notation (JSON) is a text-based data format used to share
data between applications. JSON has its origins in the JavaScript
programming language. However, JSON has become its own standard and
is considered language agnostic, which means it is independent of a specific
programming language. As a result, it is widely used in many different
programming languages on different platforms, and it has become a de facto
standard for information sharing. This usage includes spatial datasets.
Consider a simple example of what a JSON file looks like. The following
example describes a person by using a name, hobbies, age, and children.
Each child also has a name and age.
{
"firstName": "Jennifer",
"lastName": "Smith",
"hobbies": ["dancing", "tattoos", "geocaching"],
"age": 42,
"children": [
{
"firstName": "Mark",
"age": 7
},
{
"firstName": "Ashley",
"age": 11
}
]
}
Note that the indentation and use of brackets is different from Python
because it is based on JavaScript. The example illustrates that JSON
supports data types such as numbers and strings, as well as lists and objects.
Also note that the structure looks a bit like a Python dictionary. JSON is
built on two types of structures: (1) a collection of name/value pairs, and (2)
an ordered list of values. These types are universal data structures in
programming, which makes JSON interchangeable with many
programming languages.
Working directly with JSON objects is facilitated using the json module,
which is part of the standard library. This module can be used to convert
between JSON and Python. A JSON object in Python is created by entering
the entire object as a string. The examples that follow use this simplified
JSON:
{
"name": "Joe",
"languages": ["Python", "Java"]
}
Because JSON is a text-based format, JSON objects are created as a string,
as follows:
import json
person = '{"name": "Joe", "languages": ["Python", "Java"]}'
The JSON object can be converted to a Python dictionary using the
loads()function of the json module, as follows:
py_person = json.loads(person)
print(py_person["languages"])
The result prints as follows:
['Python', 'Java']
JSON objects also can be stored as text files with the .json file extension.
The load() function of the json module can read this file and convert it to a
Python dictionary. In the following example, the person.json file contains
the same text as the JSON object referenced earlier and reads as follows:
import json
person = open("person.json")
py_person = json.load(person)
print(py_person["languages"])
A Python dictionary can be converted to a JSON object using the dumps()
function of the json module, as follows:
import json
person = {"name": "Joe", "languages": ["Python", "Java"]}
json_person = json.dumps(person)
print(json_person)
The result prints the entire JSON object as a string.
The dump() function can be used to write a JSON object to a file, as follows:
import json
person = '{"name": "Al", "languages": ["Python", "C"]}'
json_file = open("newperson.json", "w")
json.dump(person, json_file)
json_file.close()
To improve readability of JSON files, it is useful to use pretty print JSON,
also referred to as Pretty JSON or PJSON. For example, the earlier example
prints a JSON object as a simple string, as follows:
import json
person = {"name": "Joe", "languages": ["Python", "Java"]}
json_person = json.dumps(person)
print(json_person)
The result is a regular string:
{"name": "Joe", "languages": ["Python", "Java"]}
The formatting can be modified by using additional parameters for the
dumps() function, including indentation:
json_person = json.dumps(person, indent=4)
The result is a format that illustrates the organization of the JSON object
more clearly, as follows:
{
"name": "Joe",
"languages": [
"Python",
"Java"
]
}
Additional sorting can be accomplishing by adding the argument
sort_keys=True. The use of PJSON has no impact on the actual data, and
when saving to a file, the file extension is the same.
JSON is widely used to share data and has become a popular format in the
geospatial community. As one illustration of this wide acceptance, ArcGIS
Pro has standard tools to convert to and from JSON—i.e., Features To
JSON and JSON To Features. JSON is also used in services created using
the ArcGIS REST API. In addition, the GeoJSON format has been
developed as a file format to represent geographic data as JSON. Both
formats are in widespread use, and many applications can work with both
formats. The file extension for JSON is .json, whereas the file extension for
GeoJSON is .geojson.
Note: A detailed review of the differences between JSON and GeoJSON
is beyond the scope of this book. Both JSON and GeoJSON format can
be used to store spatial data, but the internal organization is slightly
different between the two. The remainder of this section focuses on
JSON only.
There are several ways to work with JSON objects in Python scripts. First,
JSON is widely used as an alternative file format when downloading data
from online resources—for example, using urllib or requests. Second, you
can convert existing spatial data using a standard tool in ArcGIS Pro such
as Features to JSON. Third, you can work with JSON objects directly in a
script—for example, by using a cursor of the arcpy.da module or by
creating JSON objects in the script itself (as illustrated in earlier examples).
A few additional examples illustrate some of these scenarios.
The earlier examples of a JSON file did not include geographic data, so it is
helpful to continue with an example that does. Consider a feature class of
parcels with a single polygon feature and a handful of attributes.
When this polygon feature is converted to JSON, the spatial data is
represented as text only, and it can be viewed using a simple text editor,
such as Notepad. The first portion of the JSON file includes information
about the fields in the attribute table (FID and PARCEL_ID), the geometry
type (polygon), and the spatial reference (factory code 2277). Note that the
Shape field is not included because this information is captured through the
geometry type, the coordinate system, and the actual coordinates.
The second portion of the JSON file includes the information on the
features—in this case, only a single polygon. This information consists of
the values of the two attribute fields, as well as the coordinates of the
vertices. There are five vertices total, but the first and last ones have
identical coordinate values and are coincident. The “rings” reference
indicates that JSON supports the use of exterior and interior rings to
represent polygons with holes, but only a single ring is needed in this
example.
It is not common to have to review in detail the contents of a JSON file, but
it illustrates how the entire spatial dataset, including the coordinate system,
the attribute structure, the attribute values, and the features, are represented
as text only.
The formatting employed here uses PJSON to improve legibility. When
using no additional formatting, the entire JSON file is one long line of text,
which is much more difficult to interpret. Following is an illustration of
what the unformatted JSON file looks like.
{"displayFieldName":"","fieldAliases":
{"FID":"FID","PARCEL_ID":"PARCEL_ID"},"geometryType":"esriGeo
metryPolygon","spatialReference":
{"wkid":102739,"latestWkid":2277},"fields":
[{"name":"FID","type":"esriFieldTypeOID","alias":"FID"},
{"name":"PARCEL_ID","type":"esriFieldTypeString","alias":"PARCEL_
ID","length":15}],"features":[{"attributes":
{"FID":0,"PARCEL_ID":"0206042001"},"geometry":{"rings":
[[[3116036.110156253,10071403.570008084],
[3115768.3600355834,10071482.069851086],
[3115847.3598775864,10071747.569976255],
[3116114.2300787568,10071667.570136249],
[3116036.110156253,10071403.570008084]]]}}]}
As discussed earlier in this section, Python’s json module can convert
between JSON and Python objects, and geoprocessing tools in ArcGIS Pro
can convert between JSON objects stored as files and feature classes. In
addition, the ArcPy function AsShape() can convert between JSON objects
and ArcPy geometry objects. This capability makes it possible to work with
JSON objects to store and create spatial data without having to save the data
to a file. The syntax of the arcpy.AsShape() function is as follows:
AsShape(geojson_struct.{esri_json})
The first parameter is a JSON object represented as a Python dictionary.
The second parameter specifies whether the object is a JSON (True) or
GeoJSON (False) object. The following example creates a JSON object for
a single point feature with a coordinate system and converts it to an ArcPy
Point object:
import arcpy
geo = {"x": -124.7548, "y": 46.5783,
"spatialReference":
{"wkid": 4326}}
point = arcpy.AsShape(geo, True)
The AsShape() function returns a geometry object on the basis of the input
JSON object. Points are created by using "x" and "y". Polylines are created
by using "paths", and polygons are created by using "rings". For example,
the following example creates a single Polyline object on the basis of a list
of coordinates:
import arcpy
geo = {
"paths": [
[[166.4359,19.5043], [166.4699,19.5098],
[166.5086,19.4887], [166.5097,19.4668],
[166.4933,19.4504], [166.4617,19.4410]]],
"spatialReference": {"wkid":4326}}
polyline = arcpy.AsShape(geo, True)
Note that PJSON formatting it not entirely maintained in this last example
to reduce the number of lines for display purposes.
The JSON objects used so far have been relatively simple because they
consist of only a single feature. An example of a JSON object using
multiple point features is as follows:
{"features":[{"geometry":{"x":3116036,"y":10071403}},
{"geometry":{"x":3115768,"y":10071482}},
{"geometry":{"x":3115847,"y":10071747}}]}
This JSON object can also be converted to a Python dictionary using the
load() or loads() function, and converting each point to a geometry object
requires an iteration over the “features” key.
The examples in this section illustrate how to create geometry objects by
writing out the JSON object as a Python dictionary. More complex JSON
objects can be read from a text file and converted to a Python dictionary
using the load() function of the json module.
7.9 Working with NumPy arrays
NumPy is a Python package to facilitate fast processing of large datasets.
NumPy is short for “numerical Python,” and the package is designed to
support scientific computing. It is typically pronounced as NUM-pie, but
sometimes as NUM-pee. It is part of a larger collection of Python packages
called the “SciPy (pronounced Sigh Pie) stack,” which also includes
Matplotlib, IPython, and Pandas. NumPy uses a data structure called
NumPy arrays, also referred to as multidimensional arrays or n-dimensional
arrays.
Why use NumPy? First, it often works fast relative to using other modules
or packages for similar tasks. Second, it includes many different algorithms
for processing and analysis not found in other packages. And third, the
NumPy array data structure is employed by many other packages, and
NumPy therefore is used to exchange data. In the context of GIS, NumPy is
often used for processing large raster datasets as part of remote sensing or
spatial analysis workflows.
Python uses several different data structures, which generally fall into two
categories. First, there are numbers, including integers and floating points.
Second, there are sequences, including strings, lists, and tuples. NumPy
uses a different data structure called an array. By design, this structure is
closer to how computer hardware works, which is one reason why it is
faster. Arrays are designed for scientific computing. They are like a Python
list, but n dimensional, whereas a list has only one dimension.
Note: Recall that ArcPy includes a class called Array. Although this
class shares some common elements with NumPy arrays, it is used only
for a specific purpose in the context of working with geometry objects
when using ArcPy. Therefore, Array objects in ArcPy and NumPy arrays
are not interchangeable.
NumPy arrays often consist of numbers, and these values are indexed by a
tuple of positive integers. Arrays are multidimensional, and each dimension
of an array is called an axis. The number of axes in an array is also called
rank. Each axis has a length, which is the number of elements in that axis,
just like a list. The numbers in a single dimension are all the same type.
Arrays support the use of slicing and indexing, like lists. An example of a
simple array is [0, 1, 2, 3]. The dimension or rank of this array is one
because there is only one axis. The length of the first (and only) axis is four.
A one-dimensional (or 1D) NumPy array is like a list.
Consider how this array is created, as follows:
import numpy as np
a = np.array([0, 1, 2, 3])
The NumPy package is imported, and by convention, it is imported as np,
but importing it in this manner is not required. The NumPy array object is
created using the array() function, and the argument of the function is a
Python list. This example effectively converts a list to an array.
Once an array is created, you can check some of its properties. The ndim
property returns the dimension of the array, as follows:
print(a.ndim)
This property returns a value of 1.
You can determine the length (or size) of the array using the len() function:
print(len(a))
This function returns the length of the first dimension—in this case, 4.
Next, consider a two-dimensional, or 2D, array:
b = np.array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]])
print(b.ndim)
print(len(b))
The argument for the array() function in the example is a list of lists. The
code returns the value of 2 for the dimension of the array and a value of 3
for the size of the first dimension. The shape property returns the size of all
the dimensions:
print(b.shape)
This property returns a value of (3, 4). The example array is also referred to
as a 3 × 4 (three by four) array. A 2D NumPy array is like a table or matrix.
In this example, the table has three columns and four rows, although the
concepts of rows and columns don’t apply in the same way to arrays.
A three-dimensional, or 3D, array can be created as follows:
import numpy as np
c = np.array([[[1], [2], [3]], [[4], [5], [6]]])
This code creates an array with three dimensions, and the shape property is
(2, 3, 1) or a 2 × 3 × 1 array. A 3D array does not have an equivalent,
although it is sometimes referred to as a “three-dimensional matrix.” The
same approach can be continued to create 4D, 5D, and higher-dimensional
arrays. This feature is why NumPy arrays are referred to as n dimensional,
where n is a positive integer.
It is important to recognize the meaning of dimensions here. When creating
a NumPy array, dimensions are data dimensions, not dimensions of
coordinate space. In GIS, it is common to think of 2D as horizontal space
(x,y coordinates) and 3D as horizontal plus vertical space (x,y,z
coordinates). In a NumPy array, location (x,y or x,y,z) is only one
dimension. Coordinates are typically stored as a tuple in this one dimension.
For example, the coordinates of a point in 2D space (e.g., (1512768,
3201482)) by itself is an array of rank 1, because it has one axis. Therefore,
dimension in NumPy is not what is commonly thought of when considering
coordinates in GIS. Basically, the coordinates represent one dimension, and
other dimensions can be such things as attribute values and time.
The following example represents two points as a NumPy array, each point
with an ID and a pair of x,y coordinates:
import numpy as np
newarray = np.array([(1, (471316, 5130448)), (2, (470402,
5130249))])
This code represents a 2D array, with the first dimension representing the
attribute values (ID, in this case) and the second dimension representing the
coordinates (tuples of x,y values in this case, but this could also be x,y,z).
NumPy arrays are created in a variety of ways. Values can be entered
directly into the code, as illustrated in the previous examples. Python
sequences such as lists and tuples can be converted to arrays. Existing data
sources also can be converted, including tables, feature classes, and raster
datasets. And finally, there are several NumPy functions to create arrays
from scratch. These include the zeros() function to create an array with
only values of zero, the ones() function for values of one, and arrange()
for a numeric sequence. The shape argument of these functions sets the
dimensions of the array and the size of each dimension. For example, the
following code creates a two-dimensional 3 × 5 array with values of zero:
import numpy as np
zeroarray = np.zeros((3, 5))
This code creates an array as follows:
[[0, 0, 0, 0, 0],[0, 0, 0, 0, 0],[0, 0, 0, 0, 0]]
The arrange() function makes it possible to convert a numeric sequence
into an array, like Python’s built-in range() function. The following
example creates a one-dimensional array:
import numpy as np
newarray = np.arange(1, 10)
This code creates an array as follows:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
The reshape() method of the array object can modify the numeric sequence
into the desired array:
import numpy as np
array3x3 = np.arange(1, 10).reshape((3, 3))
This code creates a 3 × 3 array on the basis of the values in the original
array, as follows:
[[1, 2, 3],[4, 5, 6],[7, 8, 9]]
The examples here use only one- and two-dimensional arrays because they
are easy to visualize, but the same functions and methods can be used for
any multidimensional array.
As is common in Python, the data type is derived from the value. Consider
the following array:
import numpy as np
a = np.array([2, 3, 4])
print(a.dtype)
This code returns int32 as the data type—i.e., a 32-bit integer. Instead of
relying on dynamic assignment, you can specify the data type explicitly
when the array is created, as follows:
import numpy as np
b = np.array([2, 3, 4], dtype="float32")
This code specifies the data type as a 32-bit float, even though the values
are integers.
Creating NumPy arrays from scratch is not so common, although they do
allow you to practice array manipulation with easy-to-visualize values. A
more common scenario in GIS workflows is to convert an existing dataset
to a NumPy array for processing. ArcPy includes several functions for this
conversion. To convert between raster data and NumPy arrays, ArcPy has
two functions. They are regular functions of ArcPy, not functions of the
arcpy.sa module:
NumPyArrayToRaster()
RasterToNumPyArray()
To convert between feature classes and tables, and NumPy arrays, the
arcpy.da module has four functions:
FeatureClassToNumPyArray()
NumPyArrayToFeatureClass()
TableToNumPyArray()
NumPyArrayToTable()
There is one additional function, called ExtendTable(), which joins the
contents of the NumPy array to an existing table on the basis of a common
attribute field. This function is the equivalent of a table join, but between a
table and a NumPy array.
The use of these functions is illustrated with an example of the
FeatureClassToNumPyArray() function of the arcpy.da module. The syntax
of this function is as follows:
FeatureClassToNumPyArray(in_table, field_names,
{where_clause},
{spatial_reference},
{explode_to_points},
{skip_nulls}, {null_value})
The first required parameter of this function is an input feature class, layer,
table, or table view. The second required parameter is a list or tuple of field
names. A string can be used for a single field. You can use an asterisk (*) to
access all fields, but this usage is typically not recommended. For faster
performance, you should narrow the fields to the ones that are needed for
the task at hand. Geometry tokens can be used (e.g., SHAPE@XY), but not full
geometry objects (i.e., SHAPE@). The parameters here are like those used in
the cursor classes of the arcpy.da module. When creating a cursor object,
the first parameter is a feature class, layer, table, or table view, and the
second parameter is a list or tuple of field names, including geometry
tokens.
The following example script converts a field of a feature class to a NumPy
array and determines a simple statistic:
import arcpy
import numpy
input = "C:/Data/Usa.gdb/Counties"
c_array = arcpy.da.FeatureClassToNumPyArray(input,
("POP2010"))
print(c_array["POP2010"].sum())
This calculation is easy enough to do with a tool in ArcGIS Pro or a simple
Python script without NumPy, but NumPy is often much faster and contains
other functionality not available in ArcPy. The following example uses the
corrcoef() function to determine the bivariate correlation coefficients
between two variables in a database table:
import arcpy
import numpy
input = "C:/Project/health.dbf"
field1 = "VAR3"
field2 = "VAR4"
h_array = arcpy.da.TableToNumPyArray(input, (field1, field2))
print(numpy.corrcoef((h_array[field1], array[field2]))[0][1])
In this script, the TableToNumPyArray() function of ArcPy is used to create a
two-dimensional NumPy array on the basis of the two fields of interest, and
the corrcoef() function of NumPy is used to create a bivariate correlation
matrix between the two axes of the array. Because the function returns a 2 ×
2 matrix (in the form of an array), the two indices at the end are used to
obtain just the one value of interest, which is a single correlation
coefficient.
To work with geographic data, we often need a structured array, which
includes fields, or structs, to map the data to fields in tables. Following is a
simple example of a structured array:
import numpy
a = numpy.array([(1, 2.4, "Hello"), (2, 3.7, "World")],
dtype=[("field0", "int32"), ("field1",
"float32"),
("field2", (str, 10))])
To break this down, each element in this array is a record that contains three
elements. These elements have been given default field names field0,
field1, and field2. The data types are a 32-bit integer, a 32-bit float, and a
string with 10 characters or less. Notice how the values are entered as a
tuple, so it is only a single dimension. The result is a one-dimensional array
with a length of two.
Note: There are numerous data types in NumPy, sometimes with
confusing notation. For example, int32 is often written as i4, float64 is
often written as f8, and so on.
You should now be able to follow along with the example from the ArcGIS
Pro help pages on “Working with NumPy in ArcGIS.” The example
illustrates the use of a structured array to create geographic data as a
NumPy array first, which is then converted to a feature class.
import arcpy
import numpy
outfc = "C:/Data/Texas.gdb/fd/pointlocations"
pt_array = numpy.array([(1, (471316.38, 5000448.78)),
(2, (470402.49, 5000049.21))],
numpy.dtype([("idfield", numpy.int32),
("XY", "<f8", 2)]))
sr = arcpy.Describe("C:/Data/Texas.gdb/fd").spatialReference
arcpy.da.NumPyArrayToFeatureClass(pt_array, outfc, ["XY"],
sr)
The NumPy array consists of a two-dimensional array containing values for
the ID field and tuples with x,y coordinates. The
NumPyArrayToFeatureClass() function is used to create the feature class
from the array. Although converting an array to a feature class is not nearly
as common as vice versa, the example syntax is helpful to understand how
to create a structured array for geographic data.
In addition to using NumPy on its own, one of the main reasons to work
with NumPy is because NumPy arrays are used by other Python packages,
including GDAL (raster and vector data processing), Matplotlib (creating
graphs), Pandas (data processing and analysis), and SciPy (scientific
computing). All these packages are part of the arcgispro-py3 default
environment in ArcGIS Pro.
7.10 Using Pandas for data analysis
Tabular data is widely used in GIS workflows, including the use of a variety
of formats, such as text files, CSV files, Excel files, geodatabase tables, and
so on. Python can work with all these formats, but it often requires the use
of a specific module or package for a specific format. Pandas has become
one of the most widely used packages to work with tabular data in Python,
and one of its strengths is that it can work with many different formats.
Pandas is installed as part of the arcgispro-py3 default environment. The
name Pandas is derived from the term panel data, which is used in statistics
to describe datasets with observations over multiple time periods.
As with any other package, Pandas must be imported in a script. It typically
is imported as follows:
import pandas as pd
There is no requirement to use import-as, but many scripts that employ
Pandas use this notation.
One of the most important data structures in Pandas is the DataFrame. A
Pandas DataFrame is a two-dimensional structure to store values. It
basically is a table with rows and columns. The columns have names (like
fields), and the rows have index values. You can create a DataFrame from
scratch, or you can create a DataFrame by converting from another format,
such as a NumPy array or CSV file. The following example uses a CSV
file.
Note: The term “DataFrame” in Pandas is not related to the term
“data frame” in ArcGIS Desktop 10.x (which is like a map in ArcGIS
Pro). Pandas was developed separately from ArcGIS, and the similarity
of the terms is coincidental. Also, some writings refer to a DataFrame
in Pandas as “Data Frame,” “data frame,” or simply “frame,” but the
only correct term is DataFrame.
Reading a CSV file using Pandas can be accomplished using the read_csv()
function. This function returns a DataFrame object, as follows:
import pandas as pd
df = pd.read_csv("health.csv")
You can check the contents of the DataFrame by printing it. A common
approach is to print only the first few lines using the head() method of the
DataFrame object. By default, this method prints the first five rows, but you
can specify a different integer value, as follows:
print(df.head(10))
The result is the first 10 rows of a Pandas DataFrame, as shown in the
figure.
You also can print the last few lines using the tail() method or a random
sample of lines using the sample() method.
The example illustrates the basic structure of a DataFrame. A DataFrame
consists of rows and columns. Column names are obtained from the header
row of the input file. Rows have an index number starting with the value
zero. This basic structure is similar to how other applications organize
tabular data, including ArcGIS Pro and Excel. What makes Pandas so
powerful is how easy it is to load data into a usable structure and how easy
it is to manipulate the data.
Choosing specific columns to work with can be accomplished by specifying
the names of the columns as a list, as follows:
df[["FIPS", "Diabetes"]]
Double brackets are needed here: the outer brackets indicate you want to
select columns, and the inner brackets specify a list of column names.
To work with the results, you can assign the result to a new DataFrame. You
can print the result to confirm the contents, as follows:
small_df = df[["FIPS", "Diabetes"]]
print(small_df.head())
The order of the columns can be changed by changing the order of the list
of column names, as follows:
new_df = df[["FIPS", "Median_hh_income", "Diabetes"]]
print(new_df.head())
The result prints with the new specified order of column names as shown in
the figure.
A DataFrame is a two-dimensional data structure, but if the number of
columns is reduced to one, it effectively becomes a one-dimensional data
structure. In Pandas, this structure is known as a Pandas Series. You can
create a Pandas Series from scratch, by reading data from another source or
by choosing only a single column from an existing DataFrame. The
following example illustrates the latter:
import pandas as pd
df = pd.read_csv("health.csv")
s = df["Diabetes"]
print(s.head())
The result prints with a single column, as shown in the figure.
A Pandas Series is a one-dimensional data structure to store values, like a
list or a one-dimensional NumPy array. As the printout illustrates, the rows
have an index and the column has a name, but because there is only a single
column, it is not shown as a header.
Returning to the manipulation of DataFrames, selecting and reordering
columns is only one of many tasks that can be accomplished using Pandas.
Another common task is to filter for specific values. For example, the
original data contains a record for every US county. To obtain a DataFrame
with the rows for only one state, use an expression like the following:
STATE_NAME == "Florida"
This code returns a value of True or False, like a SQL statement. The
following code uses this expression to filter the data and stores the result as
a new DataFrame object. The entire code is shown for clarity, as follows:
import pandas as pd
df = pd.read_csv("health.csv")
fl_df = df[df.STATE_NAME == "Florida"]
print(fl_df.head())
The result is a new DataFrame object with only the rows corresponding to
the state of interest.
This last example is a good illustration of how effective Pandas is at
organizing data in a usable structure and manipulating the data. Only a few
lines of code are needed to read a CSV file and filter the data on the basis of
the value.
Many other data manipulations are possible, including counting, descriptive
statistics, and aggregation. The following examples represent only a few of
the possibilities.
The following script filters for the records with the maximum value for a
specific column:
import pandas as pd
df = pd.read_csv("health.csv")
print(df.loc[df["Obese"].idxmax()])
In this example, df["Obese"].idxmax() returns the index of the row in
which the column name Obese has the maximum value, and df.loc[]
returns the row of that index.
The result prints the maximum obese rate, as shown in the figure.
The following script determines the median value of the column Obese by
state:
import pandas as pd
df = pd.read_csv("health.csv")
new_df = df.groupby("STATE_NAME").median()["Obese"]
print(new_df.head(10))
In this example, groupby("STATE_NAME") aggregates the data on the basis of
the column name STATE_NAME, the median() method determines the
median value for each column on the basis of the aggregation, and
["Obese"] selects only the column of interest for printing purposes.
The result prints the median value for obesity for each state as shown in the
figure.
These examples illustrate the versatility of Pandas to manipulate data using
Python. Pandas can accomplish many other tasks, including handling
missing data, reshaping data (pivoting, appending rows or columns,
sorting), creating subsets (removing duplicates, filtering), summarizing
data, aggregating data, combining datasets (table associations), and basic
plotting.
In addition, Pandas can read data in many different formats, including
NumPy arrays, text files, CSV files, Excel files (both .xls and .xlsx), JSON
strings, HTML files, SQL tables, and several others. This versatility,
combined with the effective data manipulation techniques using
DataFrames, has made Pandas one of the most popular Python packages in
the data science community. It does not make other packages obsolete, but
programmers that have learned Pandas tend to make a lot less use of
separate modules/packages such as csv and openpyxl, among others.
The functionality of Pandas is expanded by the GeoPandas package. This
package extends the data types used by Pandas to allow for geometric
operations. The two main data structures of GeoPandas are GeoSeries and
GeoDataFrames to store the geometry and attributes of vector objects,
respectively. The result is a powerful set of tools for vector data
manipulation. Under the hood, GeoPandas uses several other Python
packages to work with geospatial data, including Shapely for geometric
operations, Fiona for file access, and Matplotlib for plotting. Once you start
using open-source Python packages, you realize how they are frequently
used together.
7.11 Using Matplotlib for data visualization
Python’s standard library includes limited capabilities for data visualization,
but many packages are available to support the creation of 2D and 3D
graphics. The most widely used of these packages is matplotlib. Matplotlib
makes it possible to create a variety of different graphics, which enhance
those of ArcGIS Pro. The functionality of Matplotlib is somewhat
comparable to the plotting capabilities of MATLAB, and users of that
software will find it relatively easy to lean Matplotlib in Python. MATLAB
is a proprietary programming language and numerical computing
environment.
The Matplotlib package includes the pyplot module for relatively simple
tasks, but many other modules exist for more sophisticated tasks, such as
control of line styles, colors, fonts, and others. Matplotlib is part of the
arcgispro-py3 default environment. This section illustrates the use of
matplotlib.pyplot to create basic graphs.
The first step in using the pyplot module is to import it as follows:
import matplotlib.pyplot as plt
There is no requirement to use import-as but using it is common practice to
shorten code. Similarly, the use of the name plt is not required, but many
scripts use this notation.
Matplotlib relies heavily on NumPy arrays, and NumPy is imported as
follows:
import numpy as np
Basic plotting, however, does not require NumPy, so not all scripts need this
code.
Creating a basic graph can be accomplished using the plot() function of the
pyplot
module. The basic syntax of a 2D plot is
plot(x, y, <format>)
Values for x and y can be obtained from existing data, but they can also be
entered directly in the function as lists. The format argument is a string that
uses codes for color, marker style, and line style. The format argument is
optional, and the default is "b-", which means a blue line. The following
example creates a scatter plot of five points using green ("g") circles ("o").
plt.plot([1, 2, 3, 4, 5], [2, 4, 3, 5, 4], "go")
You can use Python lists to enter values for a figure, but internally all
sequences are converted to NumPy arrays. There is no need to import
NumPy, however, unless you are specifically working with NumPy arrays
as inputs.
The final step is to make the graph appear using show():
plt.show()
The entire script so far is as follows:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4, 5], [2, 4, 3, 5, 4], "go")
plt.show()
Running the script brings up the figure in a simple viewer, as shown.
The nature of the viewer varies somewhat among Python IDEs, but the plot
itself is the same. Instead of displaying the figure, you can also save it to a
local file, as follows:
plt.savefig("demoplot.png")
Many details can be added to control the display of the figure. The axis()
function controls the values of the x and y axes as a list, as follows:
[xmin, xmax, ymin, ymax].
plt.axis([0, 6, 0, 6])
These and other lines of code are used after the figure is created using
plot(), but before the figure is displayed or saved.
Labels for the axis can be added using the xlabel() and ylabel() functions:
plt.xlabel("variable x")
plt.ylabel("variable y")
The complete script now is as follows:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4, 5], [2, 4, 3, 5, 4], "go")
plt.axis([0, 6, 0, 6])
plt.xlabel("variable x")
plt.ylabel("variable y")
plt.savefig("demoplot.png")
And the resulting figure is a scatter plot with axes and labels, as shown.
Although data values can be entered directly into a script as a list or a
NumPy array, it is more typical to read the values from existing data
sources. Consider an example CSV file with a time series of global mean
sea level rise (Source: CSIRO, 2015). The data includes values for the year
and the sea level in inches.
In the following code example, Pandas is used to read the contents of the
CSV file. Each variable of interest is created as a Pandas Series, which is a
sequence of values like a list. Matplotlib is used to create a scatter plot
between the two variables, including labels for the axes and a title. The
markersize parameter of the plot() function sets an appropriate size
considering the number of observations. The code is as follows:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("sealevel.csv")
year = df["Year"]
sea_levels = df["CSIRO Adjusted Sea Level"]
plt.plot(year, sea_levels, "ro", markersize=2.0)
plt.xlabel("Year")
plt.ylabel("Sea Level (inches)")
plt.title("Rise in Sea Level")
plt.savefig("sealevel.png")
The result is a professional-quality graph, generated using only a few lines
of code.
Matplotlib includes functionality to create many other types of graphs. One
of the best ways to become familiar with the possibilities is to review the
gallery of examples on the Matplotlib home page.
Each example links to the complete code to create the specific example, and
the source code can be downloaded as a Python script file (.py) or Jupyter
Notebook file (.ipynb).
Points to remember
The Python standard library includes several additional modules that
are widely used in geoprocessing scripts. There are many other thirdparty packages that can be added to support GIS workflows in
Python.
Transferring files using FTP can be automated in Python using the
ftplib module. Common tasks include navigating to the correct
directory, reading the files in a directory, and downloading files to a
local computer.
ZIP files are widely used to compress, organize, and transfer GIS
datasets. The zipfile module in Python can examine the contents of
a ZIP file, extract files from a ZIP file, and create new ZIP files.
XML files are widely used to store datasets. Several different
modules and packages can parse the contents of an XML file and
read the contents of the elements of interest.
Accessing the contents of web pages can be accomplished using the
urllib module. Typical tasks include reading the contents of a
specific page or downloading a file using a URL. A commonly used
alternative is the requests package.
Working with tabular data in CSV format can be accomplished using
the csv module, whereas the openpyxl package is widely used to work
with Excel files.
JSON has become a widely used format to share data, and the json
module can convert between JSON objects and Python. Several tools
in ArcGIS and functions in ArcPy are also available to work with
JSON objects.
Fast processing of large datasets is facilitated using the NumPy array
data structure, which is supported by the NumPy package. NumPy
arrays also are widely used in other packages for data analysis and
visualization. Several tools are available to convert between spatial
datasets in ArcGIS and NumPy arrays.
Pandas has become a widely used package to work with tabular data
in Python. The DataFrame data structure in Pandas is effective for
reading data from different formats and manipulating data for further
analysis.
Matplotlib provides a complete set of tools to create professionalquality graphics using Python.
Many other packages can expand the functionality of your scripts.
Being able to utilize some of these packages will make you a more
effective coder.
Key terms
array
axis (of NumPy array)
built-in module
child element (XML)
comma-separated values (CSV)
deprecated
directory
Extensible Markup Language (XML)
extract
file transfer protocol (FTP)
GeoJSON
JavaScript Object Notation (JSON)
lossless data compression
node (XML)
NumPy
NumPy array
parsing
Pandas
Pandas DataFrame
Pandas Series
Pretty JSON (PJSON)
Python Module Index
Python Package Index (PyPI)
rank (of NumPy array)
root element (XML)
spreadsheet
struct
structured array
tag (XML)
tree structure (XML)
uniform resource locator (URL)
ZIP file
Review questions
What is the typical workflow to obtain datasets from an FTP site
using a Python script?
Explain the structure of an XML file and the approach used to read
data stored in XML using Python.
What are some of the challenges working with Excel files in Python?
Describe the structure of a JSON file, and explain how this structure
influences the use of JSON objects in Python.
What makes Pandas so effective in reading and manipulating
datasets?
Describe some of the functionality of the Matplotlib package.
Chapter 8
Migrating scripts from Python 2 to 3
8.1 Introduction
This chapter examines how to migrate scripts developed for ArcGIS
Desktop 10.x to ArcGIS Pro. Migration requires addressing changes in the
Python language, changes in the ArcGIS software, and specific changes in
ArcPy. Each of these changes is discussed in detail, including strategies to
facilitate the migration.
8.2 Overview of migrating scripts
The focus of this book is on developing scripts and tools using Python for
ArcGIS Pro. However, sometimes you may be tasked to take an older script
or tool and migrate it to ArcGIS Pro. In some cases, the older script or tool
may run perfectly in ArcGIS Pro without any changes. In other cases, you
may encounter several errors and other issues. To effectively migrate scripts
and tools developed with Python for ArcGIS Desktop 10.x to ArcGIS Pro,
you must be aware of several changes. There are three types of changes to
consider.
First, ArcGIS Desktop 10.x uses Python 2, whereas ArcGIS Pro uses
Python 3. Although much of the Python language remains the same
between Python 2 and 3, there are several important differences to be aware
of. For typical scripts used in GIS workflows, these differences often are
limited to a few different types, which are reviewed in this chapter. There
are several utilities to facilitate the Python elements associated with
migrating scripts.
Second, ArcGIS Pro is different from ArcGIS Desktop 10.x in several ways.
The look and feel of the software interface are different, and many of the
typical workflows are different, too. In addition, not all tools in Desktop
10.x are available in ArcGIS Pro, and the supported data formats have also
changed.
Third, changes have been made to the ArcPy package. Most notable is that
the arcpy .mapping module in ArcGIS Desktop 10.x is replaced with the
arcpy.mp
module, but there are several other, more subtle changes as well.
The following sections discuss each of these types of changes in more
detail.
8.3 Changes between Python 2 and 3
Python as a programming language was conceived in the late 1980s. The
first version was released in 1991. Since then, the language has evolved
tremendously, although the structural changes have been gradual. The
relative stability of Python is one of its strengths. Python 2 was released in
2000, and its most recent update from 2.7.17, was released in October 2019.
Over time, a lot of functionality was added, but the many gradual changes
resulted in several issues, including inconsistencies, ambiguities, and errors.
Python 3 was created, in part, to address these issues and “clean up” the
language. Python 3 was released in 2008, and the most recent version is
Python 3.8, with 3.9 under development. Python 2 and 3 have coexisted for
quite some time, but in recent years, the preference has shifted strongly
toward Python 3. The consensus is that any new projects should be created
using Python 3.
One of the fundamental tenants of Python is backward compatibility. Few
features are ever removed, which is one of the reasons for the issues in
Python 2. Backward compatibility is generally a good thing because scripts
written for an older version (e.g., 2.5) will continue to work when running a
newer version of Python (e.g., 2.7). However, the nature of the changes
necessary to fix some of the issues in Python 2 means that Python 3 is not
compatible with Python 2. In other words, not all code written in Python 2
will run correctly in Python 3, and vice versa. Python 3 itself will be
backward compatible, meaning, for example, that scripts written in Python
3.4 will run correctly in a newer version of Python 3 (e.g., 3.8).
Nonetheless, some aspects of Python 3 have been backported to Python 2.6
and 2.7. In backporting, some functionality that is new or different in
Python 3 also is added to those versions. This measure is not quite enough
to ensure full backward compatibility, but with some careful planning, it is
possible to write code that runs correctly in Python 2 and Python 3. The
backporting of functionality also facilitates the migration of scripts from
Python 2 to Python 3. Backporting applies to the Python language itself, but
not to all its packages, including ArcPy.
Maintenance of Python 2 ended on January 1, 2020. This date is considered
the official “end of life,” or sunset date. There will be no further additions
or improvements to Python 2.7 from the Python Software Foundation, and
there will be no version 2.8. Python 2.7 will continue to exist, and code will
continue to work, but no new functionality will be added. For some
applications, Python 2.7 will remain the version for many years to come.
The larger Python community, however, has moved toward Python 3,
including the geospatial community. If you are relatively new to Python,
you should focus your efforts on learning Python 3, and learn only about
Python 2 as required when the need arises to migrate scripts from Python 2
to 3, or when certain projects require support for both versions. That said,
the two versions are much more similar than they are different.
The rest of this section reviews some of the key differences between Python
2 and 3 that are most relevant for writing scripts for ArcGIS Pro. Numerous
other resources document the differences in much greater detail, including
the official Python documentation. For example, search for “Porting Python
2 Code to Python 3” under the Python how-tos at
https://docs.python.org/3/howto/.
Printing
The print statement in Python 2 is a built-in function in Python 3, as
follows:
Python 2: print
"Hello World"
Python 3: print("Hello
World")
Although the change consists of only a pair of parentheses, it is significant
because of how often printing is used, especially when testing scripts.
Because this code is so widely used in scripts, most IDEs have a built-in
check. For example, running the Python 2 code in Python 3 in IDLE
produces a syntax error message, as shown in the figure.
The print() function is a good example of functionality that has been
backported. In other words, the Python 3 code runs fine in Python 2.7.
Unicode strings and encoding
All strings (type str) in Python 3 are unicode strings, whereas Python 2
uses both ASCII (by default) and unicode strings (by adding a u in front of
the string—e.g., u"text"). Unicode is more versatile than ASCII. So, there
is less confusion over different types of strings in Python 3, especially when
working with character sets in different languages.
Python 3, by default, uses a character encoding system referred to as UTF8. To ensure your scripts use this encoding system, especially scripts that
also may be used in older Python versions, you can add the following line at
the top of your script:
# coding: utf-8
This comment ensures that Python is aware of the encoding system of the
script, regardless of the version being used to run the script. You will see
this line of code at the top of scripts that are exported from the Python
window in ArcGIS Pro.
Integer types
All integers in Python 3 are long, and their type is referred to as int. Python
2 used both short integers (type int) and long integers (type long).
True integer division
Integer division is somewhat confusing in Python 2. Consider the following
example:
Python 2:
7/2
# returns 3
Integer division returns an integer, and any fractional part is ignored.
Integer division is also referred to as floor division. To obtain the fractional
part, also referred to as “floating point” or true division, at least one of the
inputs must be a floating point.
Python 2:
7/2.0 # returns 3.5
In Python 3, all integer division returns a floating-point number (type
float), regardless of whether the operands or the results are integers.
Python 3: 7/2
# returns 3.5
Python 3:
8/2 # returns 4.0
Floor division can be accomplished in Python 3 using the floor division
operator (//):
Python 3: 7//2
# returns 3
Input
The raw_input() function in Python 2, to obtain user input, is equivalent to
the input() function in Python 3.
Python 2:
newvar = raw_input()
Python 3:
newvar = input()
Opening files
The file() function in Python 2 is replaced by the open() function in
Python 3.
Python 2:
myfile = file("newtext.txt")
Python 3:
open("newtext")
The open() function has been backported to Python 2.7.
myfile =
Working with ranges
The range() function in Python 3 is the same as the xrange() function in
Python 2. Python 2 includes the functions range() and xrange(). Python 3
includes only the range() function, which works the same as the xrange()
function in Python 2. The range() function in Python 2 returns a list,
whereas the range() function in Python 3 returns a range object instead of a
list. For a large range of values, the use of the range object has the
advantage that not all values are stored simultaneously. To obtain the values
as a list, you can use the list() function in Python 3.
Python 2:
mylist = range(3) # my list is [0, 1, 2]
Python 3:
mylist = list(range(3)) # my list is [0, 1, 2]
Iteration using next()
The next() method in Python 2 is replaced with the next() function in
Python 3.
Python 2:
row = rows.next()
Python 3:
row = next(rows)
The Next() function returns the next item in an iterator. This change is
significant because iterations are widely used for geoprocessing tasks,
including the use of cursor objects.
Working with dictionaries
To iterate over the items in a dictionary, Python 2 uses both the iteritems()
and items() methods, but the former is removed from Python 3.
Python 2:
mydictionary.iteritems()Python
3:
mydictionary.items()
Similarly, the methods iterkeys() and itervalues()also are removed, and
Python 3 uses only keys() and values() instead.
8.4 Python 2to3 program
The Python 2to3 program greatly facilitates the migration of scripts. This
utility reads code in Python 2 and applies a series of fixers to transform the
code to Python 3. The 2to3 program is installed as part of the standard
library and can be run from command line. For example, consider the
following script called myscript.py:
def greet(name):
print "Hello, {0}!".format(name)
print "What's your name?"
name = raw_input()
greet(name)
This code is a standard testing script from the Python documentation. To
migrate this script to Python 3, run the following from command line:
2to3 myscript.py
The implicit assumption is that the script is in the current directory.
Otherwise, you must navigate to the correct directory or provide the full
path. For example:
2to3 C:\Testing\myscript.py
The result prints to the screen, as shown in the figure.
The results are organized as a series of differences, referred to as diffs. A
diff consists of all the lines that must be removed as indicated by a minus
sign (-) and all the lines that must be added as indicated by a plus sign (+).
As the result illustrates, only lines 1 (def greet(name):) and 5
(greet(name)) remain unchanged. The necessary changes include changing
the print statement to the print() function and changing the raw_input
function to the input() function.
You also can have the changes written directly to the source file by adding
the -w flag, as follows:
2to3 -w myscript.py
The new script is as follows:
def greet(name):
print("Hello, {0}!".format(name))
print("What's your name?")
name = input()
greet(name)
Note: When changes are written to the source file, a backup is made
automatically unless this option is declined using the -n flag.
The Python 2to3 program also is used in the Analyze Tools For Pro
geoprocessing tool, which section 8.7 discusses.
There are many other resources to assist with learning Python 3 and on
migrating your code. These resources include several online books and
tutorials, including the well-regarded online book Supporting Python 3: An
In-depth Guide by Lennart Regebro (http://python3porting.com/).
8.5 Changes in ArcGIS Pro
ArcGIS Pro was completely redesigned to modernize desktop GIS. The user
interface and typical workflows are different from ArcGIS Desktop 10.x.
Some of the changes that are most relevant to writing Python scripts,
however, are not related to the user interface. Specific changes to the
functionality of ArcPy are discussed in a later section of this chapter,
whereas this section focuses on two other important changes: supported
data formats and availability of geoprocessing tools.
ArcGIS Pro supports many different data formats for both spatial and
nonspatial data. These formats include local datasets as well as web
services. Some of these formats are native to Esri (e.g., shapefiles and file
geodatabase), whereas others are developed by different companies (e.g.,
Microsoft Excel) or are open-source efforts (e.g., geopackages by the Open
Geospatial Consortium).
Several data formats, however, are no longer supported. The most relevant
are personal geodatabases (.mdb files). This format was widely used in
ArcGIS Desktop 10.x, but its use has been discouraged in recent years, and
you can no longer read data in this format in ArcGIS Pro. Similarly,
coverages are no longer supported in ArcGIS Pro. This format dates to
much older versions of Esri software (i.e., ArcInfo), and its use has been in
steady decline. If some of your data is still in these formats, or if you
encounter these formats when locating data from other sources, you will
need to use ArcGIS Desktop 10.x software to process it.
Several other types of datasets are read-only in ArcGIS Pro. These datasets
include geometric networks, which are replaced by utility networks. Raster
catalogs are also read-only and are replaced by raster mosaics. You can still
read these types of datasets, which allows you to copy or convert them to
their newer equivalents within ArcGIS Pro.
Several geoprocessing tools from ArcGIS Desktop 10.x are no longer
available in ArcGIS Pro. These tools include all the tools that were
specifically designed to work with data formats that are no longer supported
in ArcGIS Pro, including personal geodatabases and coverages. For
example, the entire Coverages toolbox is removed for this reason.
In addition, some tools are replaced with other tools that have similar
functionality. For example, geometric networks are a read-only dataset in
ArcGIS Pro because this functionality is replaced by utility networks. This
change includes a new set of tools. As one example, the approximate
equivalent of the Create Geometric Network tool in ArcGIS Desktop 10.x is
the Create Utility Network tool in ArcGIS Pro.
Finally, some tools from ArcGIS Desktop 10.x have not been implemented
yet in ArcGIS Pro but are scheduled. You can expect many of these tools to
be released in future releases of ArcGIS Pro, although the functionality may
be somewhat modified.
8.6 ArcPy changes
Much of ArcPy has remained the same between ArcGIS Desktop 10.x and
ArcGIS Pro. Most of the modules, classes, and functions have not changed.
Nevertheless, there are several significant changes to ArcPy. What follows
is a detailed look at the most important changes.
The most obvious, and perhaps most significant, change is that the
arcpy.mapping module is replaced by the arcpy.mp module. Thus, any
reference to arcpy.mapping will produce an error. Consider the following
example script:
import arcpy
mxd = "C:/Project/Demo.mxd"
mapdoc = arcpy.mapping.MapDocument(mxd)
This script creates a MapDocument object on the basis of an existing .mxd
file. Running the script using a Python environment for ArcGIS Pro
produces the following error:
AttributeError: module 'arcpy' has no attribute 'mapping'
It is important to recognize that the version of ArcPy to be used cannot be
set in the script because it is controlled by the environment. The default
environment in ArcGIS Pro is arcgispro-py3, which includes the current
version of ArcPy that works with ArcGIS Pro. The example script may run
fine in IDLE or PyCharm using the Python environment for ArcGIS
Desktop 10.x, but it will produce errors in the same editor using an ArcGIS
Pro environment.
The differences between arcpy.mapping and arcpy.mp are substantial, and
simply doing a search and replace for the name of the module is not
enough. First, there are changes to terminology. For example, projects and
maps in ArcGIS Pro were called “map documents” and “data frames,”
respectively, in ArcGIS Desktop 10.x. However, many other elements are
identical, or at least very similar, in terms of terminology. For example, a
layer is still a layer. Second, there are changes in general functionality, such
as the ability to support multiple layouts in ArcGIS Pro, whereas only a
single layout was supported in ArcGIS Desktop 10.x.
Combined, these differences make it relatively cumbersome to update older
scripts that employ the arcpy.mapping module. There are no utilities to
facilitate the automated migration of mapping scripts. What is required to
migrate a script is a good understanding of the arcpy.mp module, and then
selectively identifying the (approximate) equivalents in the arcpy.mapping
module as required by a script.
Despite the many differences, however, the general logic for most scripts to
carry out mapping-related tasks remains the same. Consider the following
example of a script using the arcpy .mapping module. The script points to a
map document and lists all the layers in a data frame called City. If the layer
name is Parks, the visibility property is set to True, and the transparency is
set to 50%. The script is as follows:
import arcpy
mxd = "C:/Project/Demo.mxd"
mapdoc = arcpy.mapping.MapDocument(mxd)
df = arcpy.mapping.ListDataFrames(mapdoc, "City")[0]
lyrlist = arcpy.mapping.ListLayers(mapdoc, "", df):
for lyr in lyrlist:
if lyr.name = "Parks":
lyr.visible = True
lyr.transparency = 50
mapdoc.save()
del mxd
Now consider the same script using the arcpy.mp module:
import arcpy
aprx = "C:/Project/NewDemo.aprx"
project = arcpy.mp.ArcGISProject(aprx)
maps = project.listMaps("City")[0]
lyrlist = maps.listLayers()
for lyr in lyrlist:
if lyr.name = "Parks":
lyr.visible = True
lyr.transparency = 50
aprx.save()
del aprx
Some of the most obvious differences are that ArcGIS Pro uses projects and
maps instead of map documents and data frames. A more subtle but
important difference is that the list functions are replaced with methods on
the appropriate objects. The basic structure and logic of the script, however,
remain the same. The logic consists of pointing to a project (map
document), identifying a single map (data frame) by using its name, listing
all the layers in the map (data frame) of interest, iterating over the list of
layers, and updating the properties of a layer on the basis of its name. As a
result, migrating scripts requires a lot of small changes, but the basic
structure of the script mostly stays the same.
There are several other changes to ArcPy beyond the arcpy.mp module.
ArcGIS Pro has introduced four new modules: arcpy.ia, arcpy.metadata,
arcpy.sharing,
and arcpy
.wmx.
The new Image Analyst module arcpy.ia
gives access to the functionality of the ArcGIS Image Analyst extension for
the management and analysis of raster data. Some of these tools are
duplicates of those found in the ArcGIS Spatial Analyst extension, but there
are some unique tools for imagery, such as deep learning tools for feature
detection and tools for motion imagery analysis. The new metadata module
arcpy.metadata accesses an item’s metadata and exports it to a standard
metadata format. The new sharing module arcpy.sharing makes it possible
to create a sharing draft from a map in ArcGIS Pro as a web layer. This
sharing draft can be converted to a service definition file. The module
facilitates configuring web layers on the basis of maps created in ArcGIS
Pro. The workflow manager module arcpy.wmx includes the geoprocessing
tools of the Workflow Manager toolbox. These tools provide an integration
framework for multiuser geodatabase environments. They help streamline
and standardize business processes in large organizations. All these
modules provide specialized functionality that typical users don’t need.
There have been other, more subtle changes. The new arcpy.da.Describe()
function provides several benefits over the existing arcpy.Describe()
function (which continues to exist in ArcGIS Pro as well).
More generally, several ArcPy functions are removed, and a few new ones
are added. Table 8.1 lists all the changes to the general ArcPy functions but
does not include functions of the ArcPy modules.
Table 8.1 Changes in ArcPy functions
ArcPy functions no longer
available in ArcGIS Pro
New ArcPy functions in
ArcGIS Pro
GetImageEXIFProperties
ClearCredentials
GetPackageInfo
FromGeohash
GetUTMFromLocation
GetPortalDescription
LoadSettings
GetPortalInfo
RefreshActiveView
ImportCredentials
RefreshCatalog
SignInToPortal
RefreshTOC
SaveSettings
Considering that there are more than 100 ArcPy functions, the changes are
modest and mostly refer to specialized tasks. A total of eight ArcPy
functions available in ArcGIS Desktop 10.x are no longer available in
ArcGIS Pro. Three of these functions are related to refreshing the display of
views in map documents open in ArcMap, which no longer applies. Several
new functions are added, some of which are related to ArcGIS Online,
reflecting the increased importance of web services in ArcGIS Pro.
There have been fewer changes to the classes of ArcPy. Two classes are no
longer available: Graph and Graph Template. One class is added: Chart.
Although conceptually similar, the charting functionality of ArcGIS Pro is
different from the graphs in ArcGIS Desktop 10.x. This change requires a
new class with support for the different types of charts introduced in
ArcGIS Pro.
Finally, as discussed in the previous section, some geoprocessing tools are
no longer available in ArcGIS Pro. The following toolboxes and all their
tools are not available in ArcGIS Pro: Coverage (arcpy.arc), Schematics
(arcpy.schematics), and Tracking Analyst (arcpy.ta). In addition, the
Parcel Fabric toolbox (arcpy.fabric) is replaced by the Parcel toolbox
(arcpy
.parcel).
A complete list is provided in the ArcGIS Pro help topic
“Tools That Are Not Available in ArcGIS Pro” under Tool References >
Appendices.
Because of these changes, ArcPy is not backward compatible between
ArcGIS Pro and Desktop 10.x. A script written for ArcGIS Desktop 10.x
may not run correctly in ArcGIS Pro, and vice versa. General Python code
can be written so that it runs correctly in both Python 2 and 3. However, if
your code involves any of the changes in ArcPy discussed in this section,
the script you wrote for one application will not run correctly in the other.
On the other hand, if your script does not use the mapping module and does
not use any of the handful of ArcPy functions and classes listed in this
section, then with some careful attention to detail, it is possible to write a
single script or tool that works correctly in both versions of Python. Certain
projects may require support for both ArcGIS Desktop 10.x and ArcGIS
Pro. Depending on the specific requirements of the script or tool, a project
may require two different versions with slightly modified code.
8.7 Analyze Tools For Pro
ArcGIS Pro includes a useful geoprocessing tool to assist with migrating
scripts and tools, Analyze Tools For Pro. This tool uses the Python 2to3
utility to report potential issues in migrating a script or tool from Python 2
to 3, and it evaluates the code for any differences in the use of ArcPy
between ArcGIS Desktop 10.x and ArcGIS Pro. Effectively, it provides a
GUI (or “wrapper”) for the 2to3 utility, conveniently designed as a
geoprocessing tool, as well as checks for the use of ArcPy. You also can use
the 2to3 utility in command prompt, which provides functionality not
available through the Analyze Tools For Pro tool. Using the command
prompt includes the ability to apply a set of predefined fixes automatically
to the code. In contrast, the Analyze Tools For Pro tool does not change any
code but produces a report with suggested changes for review.
In addition to using the Python 2to3 utility, the Analyze Tools For Pro tool
looks for several other changes, including geoprocessing tools and data
formats that are no longer supported in ArcGIS Pro. The Analyze Tools For
Pro is also available as a geoprocessing tool in ArcGIS Desktop 10.x.
Because Analyze Tools For Pro is a regular geoprocessing tool, it can be
run from Python. The general syntax of the tool is
AnalyzeToolsForPro_management(input, {report})
And for comparison, the Analyze Tools For Pro tool dialog box is shown in
the figure.
The only required parameter is the input, which can consist of a Python file
(.py), a Python toolbox (.pyt), a toolbox (.tbx), or a tool name. Because a
tool name is not an actual file, the toolbox that the tool is part of must first
be loaded using arcpy.ImportToolbox. You can only import a toolbox when
calling the tool from Python, not when using the tool dialog box. The most
common way to use the Analyze Tools For Pro tool is to test an existing .py,
.pyt, or .tbx file. When using a toolbox as input, what is being analyzed is
not the tool dialog box(es), but the underlying Python script(s). When using
a Python toolbox or regular toolbox, all tools and scripts are analyzed at the
same time.
The second parameter is an optional output text file that records the issues
identified. When running the tool using the tool dialog box, the issues are
also printed as part of the messages that result from executing the tool.
Consider an example tool from the Python Scripting for ArcGIS book (Esri
Press, 2013) that was written for ArcGIS Desktop 10.x. The updated
version of this tool for use in ArcGIS Pro is explained in chapter 3. The
toolbox is specified as the input, and the output is left blank, as shown in
the figure.
When the tool runs, the result is a warning, as shown in the figure.
The View Details link brings up the specific issues identified in the script as
part of the messages. Notice that the messages reference the Python file
(i.e., random_percent.py), not the toolbox file. The issues are identified
only for the script, not for the design of the tool dialog box or its validation.
The formatting of the messages can make them cumbersome to read, but the
specific issues are listed with their line numbers, as follows:
Line 15: row = rows.next() -> row = next(rows)
Line 19: row = rows.next() -> row = next(rows)
As discussed in section 8.3, Python’s next() method is replaced with the
next()
function. This fix is relatively easy.
Note: It is important to recognize what happens here. Because
rows.next() is no longer supported, the script does not iterate correctly.
The result is that the random_percent.py script produces an output
feature class with only a single new feature, regardless of how many
features should be created on the basis of the tool parameters. When the
Random Features tool runs, no errors are reported, and the tool
appears to be working correctly. However, it does not produce the
correct output because the iteration does not work. Simply running
older tools in ArcGIS Pro to see if they work therefore can be a bit
misleading. The strength of the 2to3 utility is that it finds most of the
issues associated with a script, including some that don’t produce errors
at runtime.
You can copy text directly from the messages. Alternatively, you can
specify an output text file to obtain the same information, which makes for
easier editing. For this same toolbox, the raw text file is shown in the figure.
A few important points about running the Analyze Tools For Pro tool:
When using the tool dialog box to run the tool, the output file is
optional because the results are also printed to the geoprocessing
history as part of the tool’s messages for viewing. When running the
tool from Python, however, you must specify an output text file to
view the results, or print the messages using
print(arcpyGetMessages()) following tool execution.
Although a toolbox file can be used as input, the tool examines only
the associated script file(s). In other words, a tool dialog box may
contain errors (e.g., have no parameters at all), but these issues are
not identified. If the script cannot be found, the tool produces a
Failed to execute error. Therefore, use the tool to test your code, not
to test the robustness of your tool dialog box design.
The tool does not make any changes to the underlying script file(s),
and only reports issues for review.
Next are some typical lines of code that present problems and how they are
addressed by the Analyze Tools For Pro tool.
Consider the following script that uses the mapping module:
import arcpy
mxd = "C:/Project/Demo.mxd"
mapdoc = arcpy.mapping.MapDocument(mxd)
This script produces the following error:
Found REMOVED Python method mapping.MapDocument
The tool correctly identifies an issue with the mapping module because it
no longer exists (even though MapDocument was a class of the mapping
module, not a method).
Consider the following script that lists and then prints the feature classes in
a workspace:
import arcpy
arcpy.env.workspace = "C:/Project"
fcs = arcpy.ListFeatureClasses()
for fc in fcs:
print fc
This script produces the following error:
Line 5: print fc -> print(fc)
This error is easy to address because the print statement is replaced by the
print()
function.
Consider the following script that uses a personal geodatabase, which no
longer is supported in ArcGIS Pro:
import arcpy
arcpy.env.workspace = "C:/Testing/Study.mdb"
fcs = arcpy.ListFeatureClasses()
The Analyze Tools For Pro tool identifies an issue with this script and
produces the following warning:
WARNING 001682: Found NOTYETIMPLEMENTED Personal GeoDatabase
C:/Testing/Study.mdb within script C:\Scripts\test.py@2
If you ran the script, it would not produce any errors, but the list would be
empty because the personal geodatabase is not supported as a format, and
therefore it is not a valid workspace. This error does not prevent the script
from running, but the result is not as intended.
Finally, consider a script with a tool that no longer exists in ArcGIS Pro:
import arcpy
arcpy.env.workspace = "C:/Project"
infc = "streets"
outfc = "centerlines"
width = 50
arcpy.CollapseDualLinesToCenterline_arc(infc, outfc, width)
This script produces the following error:
Found REMOVED tool CollapseDualLinesToCenterline_arc
The Collapse Dual Lines To Centerline tool was designed to work with
coverages, and this format is no longer supported. As a result, all the tools
that work with coverages are removed, and the Analyze Tools For Pro tool
correctly reports the missing tool.
These examples illustrate that the Analyze Tools For Pro tool identifies
many of the issues associated with migrating scripts. It provides a good
starting point for migrating scripts, but it may not correctly identify all
issues. Importantly, it does not identify any logical issues with your script,
or with the design of your tool dialog box.
When running scripts and models, there is a built-in option to check for
compatibility with ArcGIS Pro. It is under Project > Options >
Geoprocessing.
By default, this option is unchecked. When checked, any geoprocessing tool
or model that is run is checked for issues. However, the checks are limited
to the use of unsupported geoprocessing tools or data formats. It does not
run the Python 2to3 utility to check the contents of scripts. As a result, this
option is much less informative. When migrating a specific script or tool, it
therefore is recommended that you manually run the Analyze Tools For Pro
tool for a more comprehensive set of checks.
Points to remember
Python scripts, script tools, and Python toolboxes developed for
ArcGIS Desktop 10.x may not work correctly in ArcGIS Pro.
Migrating scripts requires addressing changes in the Python language,
changes in the ArcGIS software, and specific changes in ArcPy.
ArcGIS Desktop 10.x uses Python 2, whereas ArcGIS Pro uses
Python 3. Although much of the Python language remains the same
between Python 2 and 3, there are several important differences to be
aware of. Some of the most relevant differences include: (1) the print
statement in Python 2 is replaced with the print() function in Python
3; (2) all strings in Python 3 are unicode strings; (3) all integers in
Python 3 are long integers, and integer division returns a floatingpoint number; (4) the file() function is replaced by the open()
function; and (5) the next() method is replaced by the next()
function.
The Python 2to3 program facilitates the Python-specific elements
associated with migrating scripts by reading code in Python 2 and
applying a series of fixers to transform the code to Python 3. The
Analyze Tools For Pro geoprocessing tool in ArcGIS Pro provides a
GUI to this utility.
ArcGIS Pro is different from ArcGIS Desktop 10.x in terms of the
look and feel of the software interface, which in turn impacts how
certain elements are referred to in ArcPy. In addition, not all tools in
Desktop 10.x are available in ArcGIS Pro, and the supported data
formats also have changed. The most relevant of the formats no
longer supported is the personal geodatabase (.mdb files).
Many specific changes have been made to the ArcPy package. Most
notable is that the arcpy.mapping module in ArcGIS Desktop 10.x is
replaced by the arcpy.mp module, but there are several other, more
subtle changes as well. ArcGIS Pro has introduced four new modules:
arcpy.ia (for imagery processing and analysis), arcpy.metadata (for
managing metadata content), arcpy.sharing (for creating a sharing
draft from a map in ArcGIS Pro as a web layer), and arcpy.wmx (for
workflow management). ArcGIS Pro has introduced the
arcpy.da.Describe() function, which provides certain benefits over
arcpy.Describe().
ArcGIS Pro also has introduced several new
functions related to ArcGIS Online. In terms of classes, the Chart
class replaces the Graph and Graph
Template
classes, reflecting the
enhancements in charting functionality in ArcGIS Pro.
Python 3 code is not backward compatible with Python 2, but with
some careful planning, it is possible to write code that runs correctly
in Python 2 and Python 3. However, ArcPy is not backward
compatible between ArcGIS Pro and Desktop 10.x. Some scripts you
write for ArcGIS Pro may work correctly in Desktop 10.x, but this
compatibility may be challenging, if not impossible, to achieve
because of the changes in ArcPy. Some scripts may require a different
version for each application, depending on the nature of the workflow
and the data formats and tools being used.
The Analyze Tools For Pro geoprocessing tool in ArcGIS Desktop
10.x and ArcGIS Pro identifies many of the issues associated with
migrating scripts. Importantly, it does not identify any logical issues
with your script or with the design of your tool dialog box.
Key terms
backporting
backward compatibility
diff
fixer
floor division
true division
unicode string
Review questions
What are some of the specific reasons that not all code written in
Python 3 is compatible with version 2?
What is “backporting,” and how does it impact writing scripts for
Python 2 and 3?
What are some of the key differences between Python 2 and 3 that
impact the migration of geoprocessing scripts for ArcGIS Pro?
What are some of the most relevant changes in ArcPy between
ArcGIS Desktop 10.x and ArcGIS Pro?
What typical issues in scripts are identified by the Analyze Tools For
Pro tool?
Chapter 9
ArcGIS API for Python
9.1 Introduction
Python and ArcPy make it possible to extend the functionality of ArcGIS
Pro. ArcGIS Pro is a software application that runs on desktop computers
and primarily is designed to work with local datasets. Increasingly,
however, geospatial data and its applications reside on the web, referred to
as web GIS. Web GIS is a type of distributed information system that allows
you to store, manage, visualize, and analyze geographic data. Examples of
web GIS that employ Esri technology are ArcGIS Online and ArcGIS
Enterprise.
Typically, web GIS includes datasets hosted in ArcGIS Online or ArcGIS
Enterprise, as well as other online resources. You can work with these
datasets in ArcGIS Pro by bringing hosted datasets into a map, and then
overlaying local datasets. However, ArcPy has limited functionality to work
directly with such web layers. This is where the ArcGIS API for Python
comes in.
Note: When the API was in the early stages of development, it was
referred to as the ArcGIS Python API, but it was renamed to the ArcGIS
API for Python upon release.
The ArcGIS API for Python is a Python package for working directly with
web GIS independent of ArcGIS Pro. It provides tools for tasks such as
creating maps, geocoding, vector and raster analysis, and managing data.
These tasks are comparable to the functionality in ArcPy but are
specifically designed for web GIS. In addition, the ArcGIS API for Python
provides tools to manage the organization of web GIS, such as managing
users, groups, and items. These tasks have no equivalent in ArcPy.
When writing scripts and creating tools with ArcPy, you use a Python editor
such as IDLE, Spyder, or PyCharm. For example, you write a script in your
Python editor and run it as a stand-alone script or as a script tool in ArcGIS
Pro. Even though you are running Python, ArcGIS Pro provides the user
interface to run the script tool and examine the results. The ArcGIS API for
Python is designed to work with web GIS independent of ArcGIS Pro.
Although a more traditional Python editor such as PyCharm might be
adequate for certain tasks, it does not provide data visualization to the level
of desktop software such as ArcGIS Pro. To work effectively with web GIS,
you need an IDE that has built-in tools for visualization. This is where
Jupyter Notebook comes in. Jupyter Notebook has its roots in IPython,
which stands for “interactive Python” and provides useful features over the
default Python interpreter. Several IDEs, including Spyder and PyCharm,
employ IPython as their interpreter. In addition, Jupyter Notebook has tools
for visualization and other interactive components, which makes it an
interactive coding environment.
This chapter describes the functionality of the ArcGIS API for Python and
how to get started writing code using Jupyter Notebook.
9.2 What is the ArcGIS API for Python?
The ArcGIS API for Python is a Python package for carrying out
visualization, spatial data management, spatial analysis, and system
administration of web GIS. It employs Python’s best practices in its design
and in how it uses data structures. This package makes it easier for Python
programmers to start using ArcGIS without necessarily becoming skilled in
a desktop application such as ArcGIS Pro.
The ArcGIS API for Python is implemented using the ArcGIS REST API
powered by ArcGIS Online and ArcGIS Enterprise. This implementation
means you typically are working with datasets that are hosted in your
organization or that are available publicly. You also can use the ArcGIS API
for Python to add new content, and you can combine local and online
datasets for visualization or analysis.
Like any typical Python package, the ArcGIS API for Python consists of
modules, classes, and functions. The general organization of this
functionality is somewhat comparable to ArcPy, but the specific modules,
classes, and functions have different names and carry out their tasks slightly
differently. This organization will become easier to understand with some
examples later in this chapter.
As the name implies, the ArcGIS API for Python is not only a Python
package but also an application programming interface (API). An API is a
collection of tools that allows two applications to interact with each other.
Another real-world example of an API is the ArcGIS REST API, which
consists of tools that allow applications to make requests of ArcGIS Online
or ArcGIS Enterprise. Representational state transfer (REST) is a style that
organizes a site in a way that allows users to read URLs. ArcGIS uses the
REST architectural style to create sites that can be navigated similar to the
way you navigate through computer folders. The ArcGIS API for Python
interacts with the ArcGIS REST API. The ArcGIS API for Python can be
considered a pythonic wrapper around the ArcGIS Rest API, and both APIs
work together as the interface between Python code and the web GIS portal.
The ArcGIS API for Python wraps the construction of ArcGIS REST API
URLs in pythonic functions, so instead of constructing a URL and
authenticating it manually against the server, you can call on prebuilt
functions to carry out these tasks.
In summary, the ArcGIS API for Python is both an API and a Python
package. It includes tools that make it possible for a Python script to use the
ArcGIS REST API, which in turn creates requests of ArcGIS Online or
ArcGIS Enterprise services.
9.3 Installation of ArcGIS API for Python
The ArcGIS API for Python is distributed as a Python package called
arcgis. The arcgis package is installed as part of the arcgispro-py3 default
environment of ArcGIS Pro, which makes it easy to get started using the
API.
Note: In older versions of ArcGIS Pro, you were required to install the
arcgis package using either the Python Package Manager or conda
using command prompt. Starting with ArcGIS Pro 2.1, the arcgis
package is installed as part of the default environment.
You can confirm the installed version of the arcgis package with the
Python Package Manager.
The version of the arcgis package that installs with ArcGIS Pro 2.5 is
1.7.0, but this version is likely to be updated frequently.
It is important to realize that you don’t need to use the ArcGIS Pro
application to work with the ArcGIS API for Python, but installation can be
accomplished through ArcGIS Pro. You can install the ArcGIS API for
Python in a stand-alone conda environment that is different from the
environment used by ArcGIS Pro.
The API is not open source, but it is a free library that you can install on a
computer, with or without ArcGIS Pro. The API is platform agnostic, which
means you can install it on Windows, Linux, or macOS operating systems.
To take full advantage of the API, however, it is beneficial to have Esri user
credentials. Without these credentials, your use will be limited to public
data sources.
Installation on a computer without ArcGIS Pro
In some cases, you may want to install the ArcGIS API for Python on a
computer that does not have ArcGIS Pro installed. This includes any
computer running Linux or macOS, which are not supported by ArcGIS
Pro. To install the API without ArcGIS Pro, you must have Python 3.5 or
higher installed, including a package manager. The Anaconda distribution is
recommended when working on a computer that does not have ArcGIS Pro
installed. This distribution includes conda as a package manager, but you
also can use PIP or Pipenv.
When using conda, the ArcGIS API for Python can be installed by entering
the following conda command in the Python command prompt:
conda install -c esri arcgis
This confirms that the arcgis package will be installed, and you will be
prompted to enter y (yes) to proceed. Note that the -c flag stands for -channel.
Channels are locations where conda looks for packages. This code
ensures the package to be installed is obtained from the correct source. Esri
maintains its own channel on the Anaconda Cloud to share its public
packages with the broader user community. To install the API using PIP,
enter the following command:
pip install arcgis
Finally, you can use Pipenv to install the API. Pipenv is used to install
packages and manage environments, like conda. It combines the
functionality of PIP (to install packages) and virtualenv (to manage
environments). To install the API using Pipenv, enter the following
command:
pipenv install arcgis
Experienced Python developers can use any of these three options to install
the ArcGIS API for Python on a computer that does not have ArcGIS Pro
installed, but those with less experience in managing environments and
installing packages are encouraged to use the Anaconda distribution of
Python and conda as the package manager.
A few important notes are in order when installing and running the ArcGIS
API for Python on a computer that does not have ArcGIS Pro:
The Anaconda distribution of Python is strongly recommended
because it includes many useful packages as well as utilities such as
conda. The Anaconda distribution does not include the arcgis
package. Even though the ArcGIS API for Python is free, it is not
open source. As a result, you must always install the arcgis package
as a separate step. The Anaconda distribution is available for
Windows, macOS, and Linux.
The recommended package manager is conda, which is part of the
Anaconda distribution. Without ArcGIS Pro, however, you cannot
use conda through the GUI specifically developed for ArcGIS Pro.
Instead, you must use conda through the command prompt, or use the
interface built into Anaconda called Navigator. It is not recommended
to mix the use of conda and PIP/Pipenv. If you use the Anaconda
distribution, conda is your best option. If you are using a different
distribution, you can use PIP or Pipenv instead.
The Anaconda distribution includes Jupyter Notebook and Spyder by
default, so it is relatively easy to start writing code without a lot of
additional configuration. PyCharm is available for Windows, macOS,
and Linux but requires some additional configuration to use the
correct environment.
Finally, a typical installation of the arcgis package includes all the
dependencies. These are documented in detail in the system requirements of
the ArcGIS API for Python and include packages such as Pandas, NumPy,
Matplotlib, Jupyter Notebook, and PyShp. If the current environment does
not already include these packages, they will be installed automatically
when the arcgis package is installed. If ArcPy is available in the current
environment, it may be used for certain tasks; if ArcPy is not available, the
PyShp package will be used instead. The flag --no-deps can be added to
install the API with no dependencies, as follows:
conda install -c esri arcgis --no-deps
Any dependencies for a task can then be added manually. Consult the
documentation of the ArcGIS API for Python to determine which
dependencies apply to a feature of the API.
9.4 Basics of Jupyter Notebook
Before describing the ArcGIS API for Python in more detail and looking at
some code examples, you need a place to write your code. You can use the
ArcGIS API for Python in any regular Python IDE, including IDLE,
Spyder, or PyCharm. However, since most IDEs do not have strong built-in
visualization capabilities, it makes sense to use a specialized IDE that has
this functionality. Time to introduce Jupyter Notebook, which is the easiest
IDE to use with the ArcGIS API for Python.
Jupyter Notebook is an open-source web application to create documents
that contain code, text, images, and other elements. The fact that it is a web
application means that you create your documents in a web browser, such as
Chrome or Firefox. Your web browser is, in effect, your IDE, but you are
writing your code in a notebook, not in a regular HTML page.
Jupyter Notebook supports many programming languages. The most
important ones are Python, R, Julia, and Scala. In case you are wondering,
the name Jupyter is derived from the first names of those three languages:
Ju(lia) + Pyt(hon) + (e)R = Jupyter. Even though there is support for several
languages, Python is required for Jupyter Notebook to run. A notebook is
stored as a file with extension .ipynb. This extension reflects the fact that
Jupyter Notebook has its origins in IPython. The .ipynb format is an opensource format based on JSON (JavaScript Object Notation). A single
notebook document contains all the code, text, images, external links, and
other elements created by the user. This characteristic of notebooks makes it
relatively easy to share your work, because all you must do is send someone
your .ipynb file.
More details about working with notebooks are covered after you see how
to create a notebook.
9.5 Creating and opening a notebook
Because Jupyter Notebook is the recommended IDE to work with the
ArcGIS API for Python, it is installed as part of the arcgispro-py3 default
environment of ArcGIS Pro. In the Python Package Manager, you can view
several packages related to the classic Jupyter Notebook and the nextgeneration interface JupyterLab.
There are several ways to work with notebooks: (1) open a notebook in the
ArcGIS Pro application; (2) open a notebook in stand-alone Jupyter
Notebook or JupyterLab; or (3) open a notebook hosted by ArcGIS
Enterprise, referred to as ArcGIS Notebooks. This section describes the first
two of these approaches, whereas section 9.15 explains ArcGIS Notebooks.
The easiest and most convenient approach is to open a notebook directly in
ArcGIS Pro. You can create, edit, and run a notebook directly within the
ArcGIS Pro application.
Note: The ability to work with notebooks directly within ArcGIS Pro
was introduced in version 2.5. When running a previous version of
ArcGIS Pro, or when ArcGIS Pro is not installed on your computer, you
can use the classic Jupyter Notebook approach discussed later in this
section.
You can create a new empty notebook using the Catalog pane in ArcGIS
Pro. Right-click on the folder where you want to create a new notebook,
and click Notebook.
Enter the name of the notebook file, and press Enter. A new file with the
.ipynb file extension is created in the folder. This format is recognized by
ArcGIS Pro so you will see a new entry in the folder in the Catalog pane.
You also can create a new notebook from the Insert tab by clicking New
Notebook. This allows you to save a notebook file in a folder of your
choice, and the new notebook is added under the Notebooks node in the
Catalog pane.
To open a notebook, double-click on the file in the Catalog pane, or rightclick the file, and click Open Notebook. The notebook opens in the main
viewer window of ArcGIS Pro.
Opening a notebook brings up the Notebook tab with buttons for creating a
new notebook, saving edits to the current notebook, and interrupting the
kernel. A kernel is a program that runs and reviews the code in the
notebook. Jupyter Notebook has a kernel for Python, but there are kernels
for other programming languages as well. When working with Python code
in a notebook, you are not running lines of code as in an interactive
interpreter or running a script file (.py), but instead you run code snippets.
The kernel is like a program running in the background so that the code in
the notebook can be executed. The kernel is specific to the environment,
which in this case is the active environment of the ArcGIS Pro application.
The Notebook view in ArcGIS Pro is like other views and can be moved
and resized. However, the Notebook view does not interact with the rest of
the ArcGIS Pro application. For example, you cannot drag datasets from the
Catalog pane into the code of the notebook as you can with the Python
window. Similarly, running code in the notebook does not generate entries
in the History pane.
The Notebook view includes several menus and tools, which will be
discussed shortly. First, however, it is important to review how to run a
notebook outside ArcGIS Pro. Although using a notebook directly inside
ArcGIS Pro is convenient, ArcGIS Pro is not required to use Jupyter
Notebook or the ArcGIS API for Python. The following steps show you
how to run a notebook outside ArcGIS Pro. In Windows, search for the
application called Python Command Prompt. This step brings up the
command prompt window.
Notice how the command prompt uses the arcgispro-py3 environment or a
clone. This is important, because this environment ensures that both the
arcgis package and Jupyter Notebook are available.
Note: Although you can use the ArcGIS API for Python within the
interactive Python shell of Pro, to experience the Jupyter Notebook
environment, you must run a conda environment that has the correct
packages installed. If you have installed ArcGIS Desktop 10.x, you will
have another Python command prompt, typically called Python
(command line). Command line cannot be used to work with the ArcGIS
API for Python or Jupyter Notebook.
Next, you must navigate to an existing folder on your hard drive where you
want to store your notebook. If you have existing notebooks you want to
use, navigate to the folder where those are located. If you are not familiar
with command prompt, here are a few useful commands, in which you type
the command, and press Enter to run it. The commands are as follows:
To go to the root folder: cd\ To go down one folder: cd
folder>
<name of
To go up one folder: cd.. To change drives: <drive
To go to a specific folder: cd
letter>:
<drive letter>:\path\<name of folder>
As you may have guessed, cd stands for “change directory.” In this
example, it is assumed that the folder of interest is C:\Demo. Therefore, you
first must run the cd\ command and then the cd demo command.
Alternatively, you can run cd
c:\demo
in one step. In command line, drive
letters and directory names are not case sensitive. In this example, it is
assumed that the folder C:\Demo already exists. You also can create a new
folder using the mkdir command, which stands for “make directory.”
These commands bring you to the folder of interest, also referred to as the
“workspace,” and now you can start Jupyter Notebook with the command
jupyter notebook.
This command results in several messages being printed, including a URL
for your notebook. The URL starts with http://localhost:8888/ and is
followed by a token. This token is necessary because it includes the
information on the specific Python environment being used (i.e., arcgispropy3 or a clone) as well as the location of the notebooks (i.e., C:\Demo). A
token looks something like the following:
http://localhost:8888/?
token=e2d3d028255a303a2df06cddcfc5fcd2114ee7af95b8b32c
Also note that the current session of the command prompt is now labeled
“jupyter notebook.”
Although the command prompt window remains open, your default web
browser application opens automatically. This URL is typically
http://localhost:8888/tree, which means it is pointing to your desktop
computer as the local server. The “tree” portion of the URL means that it is
showing the folder and files inside your working folder—in this case,
C:\Demo. The location of the working folder itself does not appear in the
interface, at this point. If you have existing notebooks in this folder, they
show up as a list. If you have not created any notebooks yet, the page shows
the message, “The notebook list is empty.”
Note: Jupyter Notebook opens in the default web browser on your
computer. You can change it in your Windows operating system under
Settings > Apps > Default apps > Web browser. It is important to keep
the Python command prompt running in the background because it is
running the kernel necessary to run the code in your notebook. As you
carry out certain tasks, messages will be printed here. You can view
them to understand what is going on, but generally you do not need to
look at these messages. You can minimize the command prompt window
with the kernel running, but if you close the window, it will end your
Jupyter Notebook session.
Now you are ready to create your notebook file. In the upper-right corner on
the Files tab, click New > Python 3.
A new tab opens in the browser window.
Click on the Untitled tag next to the Jupyter logo, which brings up the
Rename Notebook dialog box. You can enter a meaningful name. Spaces
are allowed, and there is no need to specify a file extension. Click the
Rename button to apply the change.
The name appears at the top of the page. Changes to notebooks are saved
automatically, as indicated by the (autosaved) tag next to the name. You
also can click the Save button. The name of the web browser tab has
changed, and the URL includes the name—for example,
http://localhost:8888/notebooks/demo_notebook.ipynb.
Click the browser tab labeled Home, and you will see a new entry for the
notebook file that was just created. The file extension .ipynb is shown. If
you close your notebook tab, you can open it again by double-clicking on
the entry for the notebook on the Home tab.
If you navigate using File Explorer to the folder of interest on your
computer, you will see the new .ipynb file created. However, you cannot
open the .ipynb file by double-clicking it because the notebook can be
opened only from within an application that has the appropriate Python
kernel running.
You have now seen two different ways to work with notebooks: (1) directly
from within ArcGIS Pro and (2) using the command prompt to launch
Jupyter Notebook in a web browser. The latter is referred to as the “classic”
Jupyter Notebook. Both approaches can be used to create a new notebook,
edit an existing notebook, and run code in a notebook. Generally, the two
approaches provide the same functionality, and they can be used
interchangeably. For example, you can start working on a new notebook in
ArcGIS Pro, and then open it later in a web browser, or vice versa.
However, there are some differences to be aware of. First, as the previous
explanation of steps shows, working with notebooks directly in ArcGIS Pro
includes fewer steps and is more convenient. Second, both approaches
require a conda environment that includes the necessary packages. When
working directly within ArcGIS Pro, the environment being used is the
active environment in the current session of the application. When
launching Jupyter Notebook from the command prompt, the environment is
set from the command prompt. Third, the menus and tools are similar, but
not identical. When working with a notebook directly in ArcGIS Pro, some
elements are controlled by the application and therefore are removed from
the notebook interface. These elements include options to save the notebook
and interrupt the kernel, which are part of the Notebook tab in ArcGIS Pro
but are regular menu and tool options in the classic Jupyter Notebook.
Finally, and perhaps most importantly, you can launch Jupyter Notebook
using a web browser on a computer that does not have ArcGIS Pro
installed, regardless of how the notebook was created.
Despite some of these differences, once you have created a new notebook
and are writing code, both approaches feel the same and provide the same
functionality.
Note: The remainder of this chapter employs the classic Jupyter
Notebook approach, which means some functionality will be slightly
different when using a notebook directly in ArcGIS Pro. The code,
however, works the same regardless of the approach.
Recall that the third way to work with notebooks is to host them using
ArcGIS Enterprise. The interface of these hosted notebooks is nearly
identical to the Notebook view in ArcGIS Pro.
9.6 Writing code in a notebook
Now that you have created a new notebook, you can start writing code.
Python code in a notebook is organized in cells. Cells are like small blocks
of code. You enter code in a cell line by line, and then run the code for a
cell. A cell can consist of only a single line of code, but it also can contain
many lines. A single notebook typically contains more than one cell.
You type your line of code into a cell. To run the code, press Ctrl + Enter on
your keyboard, or click on the Run icon in front of the cell. The result is
printed immediately below the cell.
Note: You also can click the Run button in the top menu of the
notebook. Clicking Run runs the current cell and shows the result, but it
also adds a new empty cell below. Which option you use to run a cell is
a matter of preference.
This simple example illustrates a few key points about Jupyter Notebook.
First, code is organized into cells, and code is run cell by cell. Second, the
results are printed below the cell for immediate inspection. As later
examples illustrate, results are not limited to printing text, but can include
graphs, maps, and other visualizations.
You can add multiple lines of code for a single cell by using the Enter key
after each line of code. Recall that in a typical interactive window in a
Python IDE, the Enter key results in the line of code being run. In a Jupyter
Notebook, when you click Run or Ctrl + Enter, all the lines of code for a
single cell are run. The following example shows a cell with four lines of
code, and the resulting string is printed when the cell is run.
Results are returned not only when printing messages. Consider the
following example code, which counts the number of occurrences of a
specific character in a string.
When the cell is run, the result is shown as output. Therefore it is not
necessary in a Jupyter Notebook to print messages using the print()
function.
As you may have noticed, Jupyter Notebook uses elements of the IPython
interface, as illustrated by the use of the input prompt In[n], where n is a
positive integer. This number starts at 1 and increases for additional cells.
However, the number also increases every time you run the same cell again.
For example, you can make a change to the code of a cell and run the cell
again. The numbers for the input prompt and the output prompt are updated
as a result.
The numbers for the input and output have no influence on code execution.
They simply keep track of the order in which the code was run. The
numbers can be reset by restarting the kernel by clicking Kernel > Restart.
As previously discussed, the kernel is the execution backend for Jupyter
Notebook, which you can see running in the command prompt window.
Examples in this chapter often start every new example with the number
one, but starting at number one is not necessary for the code to run.
New cells can be added by clicking on the “insert cell below” button (a plus
sign) on the top menu of the notebook. When you click the Run button on
the top menu, the current cell is executed, and a new cell is added below
automatically. The same can be triggered when you execute a cell using the
Shift + Enter keyboard shortcut, instead of Ctrl + Enter. You also can add a
cell by pressing the b key on your keyboard, but it requires that your cursor
is not inside a cell—otherwise, you are simply entering the character b.
The following example illustrates the use of two cells, each with its own
output.
Even though code in a notebook is entered cell by cell, any previously used
variables are stored in memory and can be used. In that sense, a notebook is
like a regular Python script, but it is organized into blocks of code (called
“cells”), and these can be run separately from each other.
The organization of code into cells presents some issues. For example, if
you updated a variable, the cell in which that variable is assigned a value
must be run before that variable can be used in a different cell. Consider the
following example in which the string in the first cell is modified (and a
typo is introduced by changing “Notebook” to “Notebok”), but the cell is
not run. Running the second cell in which the variable is used results in an
incorrect answer because it is still using the earlier value of the variable. In
the example code, the result prints True even though the string “book” does
not appear in the string “Jupyter Notebok.”
The solution is to first run the cell in which the change to the variable has
been made, and then run the cell in which the variable is being used. This is
a bit counterintuitive when you are used to running Python scripts because
the entire script is run by default.
An alternative to running each cell manually is to use the Run All option
from the Cell menu. This option is useful when you have many cells in a
notebook and made a change at the top. The Run All command runs all the
cells in the sequence in which they are listed in the notebook and updates
the results.
There are many tools to manage cells in a notebook. You can select a cell
and use one of the tools from the toolbar, including inserting a cell (plus
sign), deleting a cell (scissors), copying a cell, pasting a cell, moving a cell
up (Up Arrow), and moving a cell down (Down Arrow). There are several
additional options under the Edit menu, including merging cells and
splitting cells.
Navigating around the cells in a notebook, you may have noticed how they
change color between green and blue. When a cell is green, your cursor is
inside the cell, and you can type your code or text. When a cell is blue, the
cell is selected, but your cursor is not inside the cell to write code. This
color-coding of cells is particularly helpful when using keyboard shortcuts
because they may not work when your cursor is inside a cell.
Working with cells in a notebook takes a bit of getting used to if you have
been writing regular Python scripts in a more traditional IDE. However,
there are many advantages to organizing your code in cells. You can fix
errors and run the code again without having to use multiple windows.
Consider the following example, which uses the print statement from
Python 2 instead of the print() function as shown in the figure.
You can update the line of code and run the cell again. There is no need to
check the print results in a different window (typical when running a standalone script) or to copy and paste the line of code to a new line (typical
when using an interactive interpreter). There also is no need to run all the
cells in the notebook, because you can make the change, and then run only
the cell(s) of interest.
9.7 Using Markdown in Jupyter Notebook
One of the advantages of working with Jupyter Notebook is that you can
add elements other than code to your notebook. These elements include
formatted text, URLs, graphics, mathematical notations, HTML pages, and
various types of multimedia. These elements can be added using a special
type of cell called a Markdown cell. Markdown is a lightweight markup
language that is widely used in the data science community. It is a text-toHTML conversion tool that allows you to write in plain text, and then apply
some formatting to HTML so the results can be viewed in a web browser.
Markup language is like HTML, using the same opening and closing tags—
i.e., <tagname> </tagname>—in addition to special formatting symbols.
When you add a new cell to a notebook, by default the cell type is set to
Code, but you can change it by selecting the cell (i.e., the color of the cell is
blue or green) and doing one of the following: (1) changing the type of cell
from Code to Markdown using the drop-down options on the toolbar; (2) on
the top menu, clicking Cell > Cell Type > Markdown; or (3) using the m
keyboard shortcut.
Once the cell type is changed to Markdown, you can enter text and apply
formatting. Formatting includes the use of headings, block quotes,
numbered or bulleted lists, and italic or bold text. In addition to text, you
can add the following types of elements:
Line breaks and horizontal lines
Python code used for illustration instead of execution
URLs and other external links
Mathematical symbols and equations
Tables
Images, videos, and animations
Headings are created using the hash mark symbol, a number sign, followed
by a space. Additional hash mark symbols can be added for other heading
levels. Regular text does not use any special symbols. When you enter the
headings, some formatting is immediately visible, as shown in the figure.
The final rendering of the formatting, however, is applied only when you
run the cell, as shown.
To return to editing the contents of the Markdown cell, double-click on the
cell.
The same formatting can also be accomplished using Markup tags. For
example, instead of using a single hash mark symbol for heading 1, you can
use the <h1>tag—i.e., <h1>Heading</h1>.
When you use these tags, no formatting is immediately visible, but the tags
appear as a different color. Again, the rendering is applied when you run the
cell, as shown in the figure.
Note: When using special symbols (such as # and several others), a
space must follow for the symbol to be recognized. Without the space,
the special symbol would be considered part of regular text. When using
tags (such as <h1>), no space is used.
Bold text uses a double asterisk (**), double underscore (__), or the <b>
tag. Italic text uses a single asterisk (*), a single underscore (_), or the <i>
tag.
Again, you can render the text by running the cells, as shown in the figure.
A bulleted list can be created by using dashes (-), plus signs (+), or asterisks
(*), followed by a space. You can create nested lists by using tabs or two
spaces.
The rendered result is shown in the figure.
A numbered list can be created by starting with the number 1, followed by a
dot and a space, as shown in the figure.
The rendered result is shown in the figure.
Regular text formatting can be a bit tricky. For example, simply pressing
Enter following a line of text does not produce a line break. Using Enter
twice, however, starts a new paragraph. You can use two spaces or the <br>
tag for a line break. The <br> tag is preferred because the two spaces are
not clearly visible.
The rendered result is shown in the figure.
A block quote is accomplished using the right-arrow bracket (>), also
referred to as the “greater than” symbol, followed by an optional space. The
symbol must be used at the start of every line of the block quote.
The rendered result is shown in the figure.
Horizontal lines can be added using either three hyphens (---), three
asterisks (***), or three underscores (___).
The rendered result is shown in the figure.
An external link in Markdown uses a set of square brackets for the link text,
followed by the URL in parentheses, as shown in the figure.
The rendered result is shown in the figure.
Local images can be inserted by clicking Edit > Insert Image in the top
menu and browsing to a locally stored file. After the image is inserted, the
file name is preceded by an exclamation point (!).
The rendered result is shown in the figure.
You can insert external images using the same syntax used for external links
such as a URL, except prefixed with an exclamation point.
The rendered result is shown in the figure.
You also can insert images using the <img> tag. Additional properties can
be specified, such as width and height.
The rendered result is shown in the figure.
Sometimes you may want to include example code for illustration, but the
code itself should not be executed. Example code can be included using
Markdown by enclosing a block of code in three back ticks (```), which
supports multiple lines. You can add syntax highlighting by adding the
programming language after the opening back ticks. For inline code, you
can use a single back tick or the <code> tag.
The rendered result is shown in the figure.
Note that using code in Markdown is just a way to illustrate a concept or
explain something about Python code. The code does not actually run, and
there is no syntax checking.
Mathematical symbols and expressions can be created by surrounding text
with a dollar sign ($) on each side, as follows: $mathematical symbol or
expression$. The syntax for mathematical symbols and expressions uses
LaTeX, which is a typesetting language for producing technical and
scientific documentation. Jupyter Notebook recognizes LaTeX code written
in Markdown cells and renders the symbols using the MathJax JavaScript
library. The ability to use LaTeX inside Jupyter Notebook is one of the
reasons that notebooks have become popular in fields such as mathematics
and physics.
For example, the following expression creates a simple inline formula, as
shown in the figure.
The rendered result is shown in the figure.
Mathematical expressions on their own line are surrounded on either side
by a double dollar sign ($$), as shown in the figure.
The rendered result is shown in the figure.
Because Markdown uses many special symbols for formatting, what do you
do when you need those characters? You can use a backslash to generate
literal characters. For example, the following code creates the asterisk and
dollar symbols, as shown in the figure.
The rendered result is shown in the figure.
The examples shown cover only some of the key elements of using
Markdown cells. There are many online resources that cover the use of
Markdown for Jupyter Notebook in greater detail. Markdown provides a
great way to enhance your notebooks because you can add explanations to
your code, provide background information, and include visualizations, in
addition to the Python code itself.
There are several different “flavors” of Markdown, which vary slightly in
terms of syntax and functionality. For example, the GitHub platform
employs a dialect of Markdown called GitHub Flavored Markdown (GFM),
which is different from the “standard” Markdown. Jupyter largely follows
GFM but with minor differences.
The following images illustrate one of the sample notebooks published on
the help pages of the ArcGIS API for Python called “Chennai Floods 2015
—A Geographic Analysis.” The notebook starts off with some background
information on the flood event that took place in 2015, followed by various
visualizations and analysis of the rainfall and flooding. The example
illustrates the use of Markdown cells and code cells in a single notebook.
One of the benefits of using Jupyter Notebook is that you can provide the
code to run a workflow and the documentation side by side in a single
document. Users can read the documentation, written using Markdown,
while interacting with the code at the same time.
9.8 Starting the ArcGIS API for Python
Now that you have seen how to start Jupyter Notebook, it is time to start
using the ArcGIS API for Python. First, you must import the arcgis
package, as shown in the figure.
Note that for the remaining examples, the input prompt (In[ ]) and output
prompt (Out[ ]) of the interface are omitted from the figures.
You can perform a quick check to confirm the current versions of Python
and the ArcGIS API for Python, as shown in the figure.
The ArcGIS API for Python includes several modules, the most important
being the gis module. The gis module allows you to manage the contents
of your GIS, as well as users and their roles. The main class of the gis
module is GIS. A GIS object represents the GIS you are working with
through ArcGIS Online or through an instance of ArcGIS Enterprise. This
object becomes your entry point for using the ArcGIS API for Python.
To get started, import the GIS class as shown in the figure.
When the GIS class is imported using from-import, there is no need to use
import arcgis
first. Next, a GIS object is created, as shown in the figure.
The GIS class has several optional parameters, including a URL, a user
name, and a password. The URL can be a web address to ArcGIS Online or
a local portal in the form: https://gis.example.com/portal. If these
parameters are left blank, you are using an anonymous login to ArcGIS
Online. Example code for providing user credentials is shown in the figure.
To create a Jupyter Notebook and work with the ArcGIS API for Python,
you do not need to provide user credentials, but your use will be limited to
public datasets. To work with datasets hosted within an organization, you
must authenticate with your user credentials. When working with the
ArcGIS Online portal, some functionality of the API uses credits, which is
another reason why authentication may be necessary.
The complete syntax for the GIS class of the arcgis.gis module can be
found in the online API reference and is as follows:
class arcgis.gis.GIS(url=None, username=None, password=None,
key_file=None, cert_file=None,
verify_cert=True, set_active=True,
client_id=None, profile=None, **kwargs)
The notation of the syntax in the documentation of the ArcGIS API for
Python is slightly different from the notation used in the documentation of
ArcPy. Recall that optional parameters for classes and functions of ArcPy
are enclosed in braces { }, whereas in the preceding example, any optional
parameters are given a default value or initialized to None. Any required
parameters are listed without a default, but none of the parameters of the
GIS class are required. Also notice that the syntax in the documentation
starts with the keyword class, but it is not used in actual code.
Note: The **kwargs parameter at the end stands for “keyworded,” or
named, arguments, which are separate from the preceding explicitly
named parameters. The use of **kwargs makes it possible to pass one
or more additional parameters. In the case of the GIS class, these
arguments consist of proxy server and token settings.
Alternative ways to authenticate your user credentials are described in
detail in the help pages under the topic “Working with Different
Authentication Schemes.” One useful alternative is to connect using the
active portal in the ArcGIS Pro application, known as the pro authentication
scheme, as follows:
from arcgis.gis import GIS
mygis = GIS("pro")
This authentication works only when ArcGIS Pro is installed locally and
running concurrently. The credentials in ArcGIS Pro are used for
authentication without specifying those credentials in the code.
The next step is to create a basic map display to visualize your spatial data.
The GIS object includes a map widget for this purpose. A widget is like a
miniapplication that runs inside a notebook. Jupyter Notebook includes
several widgets for information display and user interaction, but the map
widget is added as part of the arcgis package.
The widget creates a map centered on the location provided. You also can
provide a tuple of coordinates for latitude and longitude. When providing a
general location description, an address, or a landmark, these locations are
geocoded using a default geocoder. If you do not specify a location, the map
widget returns a map of the world.
You can bring up a display by typing the name of the map widget.
The display, in this case, uses the default ArcGIS Online basemap. The
zoom level is based on the nature of the geocoded location.
One of the key benefits of using Jupyter Notebook is that you can display a
map of your GIS directly within the notebook, and the results update
interactively with your code. For example, you can change the basemap
using the basemap property of the map widget.
In the preceding example, several lines of code are merged into a single
cell. Using one cell is not required but allows you to make several changes
at once. It also makes sure the entire block of code is run instead of running
several cells individually.
Although Jupyter Notebook is relatively intuitive to use, it is common to
experience issues in the beginning to get it up and running. One common
issue when using a map widget is getting no response at all. The code
appears to have run, but no map display comes up, and there is no error
message. This scenario typically means there is an issue with your browser.
You can change the default browser on your operating system and restart
Jupyter Notebook, or you can copy the URL from the command prompt
window into a different browser. Just copying http://localhost:8888/tree is
not enough—you must copy the token as well. A complete URL looks
something like the following:
http://localhost:8888/?
token=e2d3d028255a303a2df06cddcfc5fcd2114ee7af95b8b32c
Other common issues are difficulties with authenticating user credentials
because there are several different authentication schemes. If you have
ArcGIS Pro installed locally, the pro authentication is a convenient
workaround if you keep ArcGIS Pro running concurrently.
9.9 Adding content
Now that you have a Jupyter Notebook up and running and can use the
arcgis package to bring up a simple map display, it is time to add new
content. The example uses a new notebook and starts with a map for a
different study area, as shown in the figure.
You can use the content property of the GIS object to search for content.
The content property returns an instance of the ContentManager class. The
search()
method of this class locates items of interest. So far, the line of
code is
search_result = mygis.content.search()
The search() method has arguments for the query (as a string), the type of
item, the maximum number of items, and several others. In the following
example, the search is for feature layers related to NYC taxi data and
returns a maximum of five items. The result is a list of Item objects.
An anonymous login was used, so the result consists of items in ArcGIS
Online that meet the search criteria, and which are publicly available. The
items published in ArcGIS Online are dynamic, and contents can change
quickly. Therefore, the same search may produce different results a few
months later.
Items can be many different things, including web maps, feature services,
image services, and so on. Each item has a unique identifier and a wellknown URL. By restricting the search to a specific item type, only those
types are returned as part of the list. Because the items are returned as a
Python list, you can use an index to obtain a specific item. When you query
a single item, a rich representation of the item is displayed with a thumbnail
image, brief description, type, modification date, and a hyperlink to the item
page on the portal.
The item, in this case, is a feature layer collection, which can contain one or
more feature layers. This specific feature layer collection includes four
feature layers. When adding this item to a map widget, all these layers will
be added. You can also choose a specific layer to work with by using an
index, as shown in the figure.
The result is the URL of a single feature layer. Note that the second line of
code, referencing the feature layer variable, is not necessary and is used in
this example only to confirm the results. The layer can now be added to the
map display using the add_layer() method, as shown in the figure.
You can customize the symbology of the layer that you add to the map
widget using the functionality of the arcgis.mapping module. A detailed
description of this functionality can be found in the documentation of the
ArcGIS API for Python.
9.10 Creating new content
In addition to working with data that is already available in your web GIS,
you can create new content by publishing the data to your GIS. The
following example, shown in the figure, uses a new notebook and starts
with a map for a different study area, Vancouver, British Columbia, Canada.
Because the task involves publishing a new item to ArcGIS Online, an
anonymous login is not enough. For the code example to work, you must
use ArcGIS Online credentials with publisher permissions or employ the
pro authentication scheme. The URL, user name, and password in the
second line of code must be replaced with your credentials.
Note: For the next example, you must be logged into ArcGIS Online or
a portal to publish items and because geocoding the results requires
credits.
The data to be added in this example resides in a CSV file on the local
computer. The data consists of the locations of more than 100,000 street
trees in the City of Vancouver. The purpose of the notebook is to read this
dataset using Pandas, and then to publish this data as a new item to ArcGIS
Online.
After importing Pandas, a DataFrame object is created by reading the local
CSV file. To facilitate the process of publishing the data, a random sample
of 100 records is created for illustration purposes. The sample() method is
used to create this sample as a new DataFrame object. To examine the data
inside the notebook, the first five rows are displayed.
Description
The dataset includes fields for unique ID, street address, latitude, and
longitude, as well as descriptive details for each tree. Because there are too
many columns to show, the display of the first five rows includes a sliding
bar to scroll through the columns.
This example also illustrates the versatility of Pandas when used in Jupyter
Notebook. Not only is it easy to read datasets from local and online
resources, a sample of the data can be displayed directly inside the
notebook. This feature is different from other IDEs, in which the results are
typically viewed in a separate window with less convenient formatting.
To make the attributes more manageable, the most pertinent fields are
selected and organized. The results are returned as a new DataFrame object,
as follows.
>
Once the desired dataset is obtained in the form of a DataFrame object, the
next step is to import the data as a feature collection. Importing can be
accomplished using the import_data() method of the ContentManager class.
An instance of this class called content is available as a property of the GIS
object. The DataFrame object is the argument of the import_data() method.
The second line of code here is not required. It is added to confirm that the
result is a feature collection.
Note: The import_data() method works with a Pandas DataFrame or a
Spatially Enabled DataFrame (SEDF). SEDF is a class of the
arcgis.features module to give the Pandas DataFrame spatial
abilities. When using import_data(), there is a limit of 1,000 rows when
importing a Pandas DataFrame as a feature collection Item. No such
limit applies when using an SEDF. The arcgis .features module also
includes functionality to convert a Pandas DataFrame to a SEDF.
To publish the feature collection as an item in ArcGIS Online, the feature
collection must be converted to a JSON object. To make this possible, the
properties of the feature collection are converted to a Python dictionary, and
this dictionary is used in the json.dumps() function to create the JSON
object.
The final step is to publish the JSON object as an item using the add()
method of the ContentManager class. To make the published item more
usable, several item properties are provided as a dictionary, including a title,
a description, several tags, and a type. None of these items are required but
they represent good practice. The only required key, in this case, is the text
key to specify the JSON object.
Description
The final line of code in this cell brings up a snapshot of the published item.
Once the item is published, it can be added to the map display as a layer.
An alternative solution is to first publish the CSV file as an item in ArcGIS
Online, and then publish it as a feature layer using the publish() method of
the GIS object. This solution does not require the use of Pandas or
converting the data to a feature collection using a JSON object. The
complete code solution is shown as a single cell in the figure.
Description
This alternative solution publishes the entire CSV file, including all the
records and all the fields. The use of Pandas provides more flexibility
because it allows for data cleaning and filtering before publishing the data.
In both cases, the result is a hosted feature layer, which can be added to the
display.
Many additional methods are available in the ContentManager class of the
arcgis.gis
module. In addition to adding items, there are methods for bulk
updates, cloning, creating services, deleting items, and sharing.
9.11 Performing analysis
Many other tasks can be performed using the ArcGIS API for Python,
including geocoding, working with imagery, and performing network
analysis. Many of the workflows that employ ArcGIS Pro and ArcPy on
local data can be replicated for web GIS using the ArcGIS API for Python
without having to perform these tasks manually using the interface of
ArcGIS Online or Portal for ArcGIS. This section focuses on performing
spatial analysis tasks, which are comparable to using geoprocessing tools in
ArcGIS Pro.
The ArcGIS API for Python includes several modules to carry out
specialized analysis tasks. These modules include the arcgis.raster
module for raster and imagery analysis, the arcgis.network module for
network analysis, and the arcgis.geoanalytics module for the distributed
analysis of large datasets. Some of the more “basic” analysis tools are part
of the arcgis.features module to work with vector datasets.
Note: The arcgis.geoprocessing module is primarily used to import
web tools published to your web GIS as Python modules. It does not
include the geoprocessing tools that are available in ArcGIS Pro as the
name might suggest.
The arcgis.features module allows you to perform spatial analysis tasks
on feature layers. These tasks are organized in several submodules. This
organization is similar to how geoprocessing tools in ArcGIS Pro are
organized into toolboxes and toolsets. However, there is no direct
correspondence between the functions in the ArcGIS API for Python and
the geoprocessing tools in ArcGIS Pro. Recall that every geoprocessing tool
in ArcGIS is a function in ArcPy, but this is not the case for the ArcGIS
API for Python. On the other hand, many geoprocessing tools in ArcGIS
Pro have a similar function in the ArcGIS API for Python, but they are
organized into different modules, may have slightly different names, and
their syntax is often somewhat different, too. These similarities and
differences are illustrated here with an example on buffering.
Buffering is one of the most widely used examples of spatial analysis and is
used to illustrate the nature of spatial analysis functions in the ArcGIS API
for Python. In ArcGIS Pro, the Buffer tool uses an input feature class to
create an output polygon feature class on the basis of a buffer distance. The
same procedure can be accomplished using the ArcGIS API for Python with
the create_buffers() function of the arcgis.features.use_proximity
submodule. The general syntax of this function is as follows:
use_proximity.create_buffers(input_layer, distances=[],
field=None,
units='Meters',
dissolve_type='None',
ring_type='Disks',
side_type='Full',
end_type='Round',
output_name=None,
context=None, gis=None,
estimate=False,
future=False)
Note that the syntax in the ArcGIS API for Python is different from the
syntax employed in ArcPy. In the syntax example for create_buffers(), the
input_layer
argument is required because no default value is specified. On
the other hand, the field parameter is optional because a default value is
shown in the syntax. Sometimes the default value is None. Recall that the
syntax notation for functions in ArcPy uses braces to indicate optional
parameters, and no default values are shown in the syntax itself. These
differences are mostly a result of choices made in the creation of the
documentation and don’t reflect differences in what the functions do.
If you carefully compare the syntax of the create_buffers() function with
that of the buffer() function of ArcPy, you will notice several additional
arguments for the create_buffers() function. These arguments include the
gis
argument to specify the GIS on which the function is run and the
estimate
parameter to return the number of credits to run the operation. The
differences in function name, syntax notation, and several arguments aside,
both functions accomplish the same task—i.e., create buffer polygons
around input features.
Note: For the next example, even though the result is created in memory
and not published, you must be logged into ArcGIS Online or a portal
because running the analysis requires credits.
The example notebook uses the create_buffers() function to create
polygon features around point features. The input is a feature layer hosted
in ArcGIS Online. The code starts with creating a GIS object. A feature
layer collection is obtained using the get() method of the ContentManager
class. The argument of the get() method is a unique item ID. The same
item can also be obtained by searching for “USA Airports” and filtering the
results.
Description
This specific item consists of multiple layers, so an index is used to obtain
the first layer representing only the major airports. The URL is printed for
confirmation. Next, the use_proximity submodule is imported from the
arcgis.features
module. The create_buffers() function creates the buffer
polygons. The function uses three arguments: the input layer, the distance
value as a list, and the units. The output of the function is assigned to a
variable, and the type of this variable is printed for confirmation.
The result is an in-memory feature collection and is not published as an
item. If you prefer to store the output as a feature layer, specify the
output_name parameter of the function.
The final step is to create a map display, and to add the airports and the
buffer polygons.
The map display is set to a single state, Florida, to show a close-up of the
buffers instead of showing the full extent of the data.
The use_proximity submodule includes several additional functions to carry
out proximity analysis, including create_drive_time_areas() and
find_nearest().
The arcgis
.features
module includes several other
submodules for analysis: analyze_patterns, elevation, enrich_data,
find_locations, hydrology, manage_data,
and summarize_data. All the
functions in these submodules are also organized under the analysis
submodule for convenience. Identifying a specific function of interest is
somewhat complicated by the fact that the function names don’t match
exactly with the names of standard tools in ArcGIS Pro. In addition, the
organization of functions into modules and submodules does not match the
organization of geoprocessing tools in toolboxes and toolsets. The easiest
way to view all the available functions is to scroll through the
documentation of the ArcGIS API for Python hosted on GitHub.
Although the number of analysis functions in the ArcGIS API for Python is
substantial, there are many geoprocessing tools in ArcGIS Pro that do not
have an equivalent function in the API. On the other hand, there are some
functions in the ArcGIS API for Python that do not have an equivalent in
ArcGIS Pro. For example, the arcgis.learn module includes several tools
to support artificial intelligence (AI)–based deep learning tools, including
the use of computer vision tools for object identification and pixel
classification.
The ArcGIS API for Python is relatively new, and it is anticipated that
additional modules and functions will be added in future releases.
9.12 ArcPy versus ArcGIS API for Python
The examples so far have illustrated how the ArcGIS API for Python allows
you to automate tasks for web GIS, similar to how ArcPy automates
workflows in ArcGIS Pro. Jupyter Notebook is a natural fit for writing code
using the ArcGIS API for Python because of its ability to interact with both
local and online resources, and to visualize tabular data, maps, graphs, and
other elements without having to use a separate application for display
purposes. On the other hand, Python scripts that employ ArcPy are often
written in a more traditional IDE such as Spyder or PyCharm, and ArcGIS
Pro is used to visualize the results or to obtain user input through a tool
dialog box.
It is important to recognize that both arcpy and arcgis are Python packages
created by Esri, and both are installed as part of the default arcgispro-py3
environment. Both can be used in any IDE that is configured to use this
environment (or another conda environment with the same packages). For
example, you can import the arcgis package in the Python window in
ArcGIS Pro or in IDLE, Spyder, or PyCharm.
All the code in the earlier examples in this chapter run correctly in the
Python window or a Python IDE, but visualization is different. Consider the
example code that imports the arcgis package, creates a GIS object, and
then creates a map display.
When the map display is called, the result is a reference to the MapView
object instead of a graphical display of the map. This result is less
informative, but the code works, and the object reference confirms that the
MapView object was created. Consider the earlier example of adding a CSV
file as an item in ArcGIS Online and publishing it as a feature layer. When
the code is stripped from the interactive map display elements, the code is
as follows:
from arcgis.gis import GIS
mygis = GIS(URL, username, password)
csv = "C:/Demo/trees.csv"
data_prop = {"title": "Vancouver trees",
"description": "CSV file of street trees in the
City",
"tags": "trees, csv"}
trees_csv_item = mygis.content.add(item_properties =
data_prop,
data = csv)
tree_feature_layer = trees_csv_item.publish()
This script can be run using a regular Python IDE and carries out the same
task with the same results. Being able to display intermediate steps and
results inside a notebook is helpful, especially when troubleshooting code,
but the code works regardless of whether it is run as a stand-alone script or
inside a notebook. Similarly, you can use ArcPy in a Jupyter Notebook
without using the ArcGIS API for Python. Consider the example of a
geoprocessing script to run the Clip tool. ArcPy is imported, a local
workspace is set, and the Clip() function is used to carry out the task.
Running this code in a Jupyter Notebook produces the same result as
running the script in another IDE. The last line of code is not typically part
of a stand-alone script but is added to the notebook to confirm that an
output file was created, similar to printing a message to the interactive
window of an IDE.
In summary, both the arcpy and arcgis packages can be used in any Python
IDE that runs a conda environment with both packages installed, including
the arcgispro-py3 default environment. A more traditional IDE is a more
natural fit for scripts using ArcPy, whereas Jupyter Notebook is a good
match for the functionality of the ArcGIS API for Python. Some tasks may
require both packages in the same script or notebook, and then the choice
for which IDE to use is largely a matter of preference.
9.13 Working with JupyterLab
Jupyter Notebook is still relatively new but has quickly become popular
because of its versatility and functionality. Being able to write code, inspect
the results, and get rich output is appealing to the data science community,
as well as to educators and application developers.
Developments in this field take place rapidly. Project Jupyter is a nonprofit
open-source project that started in 2014. This project developed Jupyter
Notebook by building on the earlier IPython project. There are many
aspects to Project Jupyter beyond the user interface, but the Jupyter
Notebook interface is the most visible result. The next version of the
interface is called JupyterLab. Development of JupyterLab started in 2017,
and this new interface will eventually replace the “classic” Jupyter
Notebook. Both versions of the interface support the same notebook
document format.
JupyterLab is a next-generation web-based user interface for working with
documents, writing code, and developing workflows for interactive
computing. The new interface maintains much of the functionality of the
Jupyter Notebook but adds many other features found in a typical IDE.
To start JupyterLab, enter the following command in the Python command
prompt while running a conda environment. Note that there is a space in the
command, even though the interface name does not have a space:
jupyter lab
This command launches a browser window, just like the classic Jupyter
Notebook. The URL is typically http://localhost:8888/lab. The interface
consists of a main work area, a collapsible left sidebar, and a menu bar at
the top. The main work area is used to arrange documents and perform
other activities.
When the interface first opens, the left side bar shows the File Browser tab,
which allows you to explore the files inside the current workspace. In the
example in the figure, several example .ipynb files are shown, as well as a
Python script and a CSV file. A geodatabase is recognized as a folder.
The Launcher panel allows you to start a new activity—for example, a
notebook or a Python script. You also can open an existing notebook by
double-clicking on the file in the File Browser window. Once a notebook is
open, many of the controls and the display are like the classic Jupyter
Notebook. You can write and interact with your code in much the same
manner.
Description
JupyterLab represents a modern IDE to work with Python code in a
notebook format. You also can run stand-alone Python scripts. Although
JupyterLab will eventually replace Jupyter Notebook, both interfaces will
coexist for some time, and the experience of creating and using notebooks
is similar between the two.
Note: Although JupyterLab is launched by simply running jupyter
lab,
the map widget used in the latest version of the ArcGIS API for Python
(1.7.0 at the time of writing) requires some additional configuration.
These steps are included on the documentation page of the ArcGIS API
for Python, under Guide > Get Started > Using the JupyterLab
environment > Installation. The steps are not included here because
they are likely to change in upcoming releases.
9.14 Documentation and help files
Extensive online resources are available to assist with learning more about
the ArcGIS API for Python. Resources include the official guide, the API
reference, and the sample notebooks.
The help pages are referred to as the “guide” and are located at the
following URL: https://developers.arcgis.com/python/guide/.
The guide follows the same organization as the help pages of other elements
of the ArcGIS platform. It provides an overview of the API and a detailed
look at the various modules. The guide includes explanations of the
functionality of each module with detailed code examples. However, there
is no complete inventory of all the modules, classes, and functions of the
ArcGIS API for Python, and no syntax is provided.
The complete documentation is referred to as the “API reference” and is
hosted on GitHub at the following URL:
https://developers.arcgis.com/python/api-reference/.
Here you will find a complete listing of all the modules, classes, and
functions, with their syntax.
Description
The organization of the documentation is like the style employed by other
Python packages on GitHub, which is different from the style employed by
typical ArcGIS help pages, including those for ArcPy.
When working with the ArcGIS API for Python, you therefore will
typically need to consult two sets of resources: the guide for general
explanations and the API reference for the complete functionality and
syntax. Typically, when you are just starting to use the ArcGIS API for
Python, you likely will rely more on the guide for ideas about what is
possible. Once you gain some experience, the API reference will become
more important when you need to look up the syntax for specific classes
and functions.
Finally, there is a growing library of sample notebooks at the following
URL:
https://developers.arcgis.com/python/sample-notebooks
You can preview the notebooks online or download them to use on a local
computer. As with any Python code, you can reuse some of the code for
your own scripts and notebooks.
9.15 ArcGIS Notebooks
A recent addition to the functionality of ArcGIS is the use of ArcGIS
Notebooks. ArcGIS Notebooks use the same approach as Jupyter
Notebook, but the notebook files are hosted by ArcGIS Enterprise. ArcGIS
Notebooks are hosted just like other items in a portal, such as maps, tools,
and feature layers, and users can be assigned roles to create and edit
notebooks.
ArcGIS Notebooks are hosted in an ArcGIS Enterprise portal using ArcGIS
Notebook Server. Hosting notebooks is implemented using Docker
containers, which provide a virtualized operating system to run the
notebook. All the resources necessary to run the notebook are made
available without installing anything locally. For example, when using
ArcGIS Notebooks, you do not need to have ArcGIS Pro or Python
installed locally, but you can still use all the functionality of Python in a
notebook. This includes both ArcPy and the ArcGIS API for Python.
Note: Docker is a software company that has developed an industry
standard to deliver software in packages called containers. Docker
containers are widely used to distribute applications with complex
dependencies to many users in an organization. Docker software is not
created by Esri, but ArcGIS Notebook Server uses Docker software to
create and provide a separate container for each user.
The use of ArcGIS Notebooks takes away some of the cumbersome
installation and configuration of software by individual users in an
organization. An individual user does not need to set up a specific conda
environment because it is already part of the Docker container. In addition
to ArcPy and the ArcGIS API for Python, several hundred Python packages
are available, which substantially overlap with those that are part of the
Python distribution for ArcGIS Pro. Additional packages can be installed
during a notebook session.
Installation and configuration of ArcGIS Notebooks builds upon a base
deployment of ArcGIS Enterprise. The detailed steps are on the help pages
of ArcGIS Enterprise at the following URL:
https://enterprise.arcgis.com/en/notebook/.
Once ArcGIS Notebooks is up and running, creating and editing notebooks
is similar to working with a notebook inside ArcGIS Pro or using a standalone Jupyter Notebook, and the Python code is identical between the
various approaches. The ArcGIS Notebooks interface includes the same
elements as the classic Jupyter Notebook. Existing notebooks can be added,
notebooks can be shared, and hosted notebooks can be saved locally as
.ipynb files. These features provide many new possibilities to share and
collaborate on workflows using the notebook format.
Points to remember
The ArcGIS API for Python is a Python package to work with web
GIS without using ArcGIS Pro. It provides tools for tasks such as
creating maps, geocoding, vector and raster analysis, and managing
data, which are comparable to the tools in ArcPy for desktop GIS but
are specifically designed for web GIS.
The ArcGIS API for Python is not only a Python package but an
application programming interface, which includes tools for a Python
script to use the ArcGIS REST API, which in turn creates requests of
ArcGIS Enterprise services.
The recommended IDE to use the ArcGIS API for Python is Jupyter
Notebook, which is an open-source web application. Jupyter
Notebook is a natural fit to write code using the ArcGIS API for
Python because of its ability to interact with both local and online
resources, and to visualize tabular data, maps, graphs, and other
elements.
Python code in a notebook is organized into cells, which can contain
one or more lines of code. Results are printed directly below each
cell. The actual Python code is identical to the code you would use in
a different IDE.
In addition to Python code, a notebook can include many other
elements by using Markdown cells. You can add headings, formatted
text, block quotes, example code, external links, images, and
multimedia files, among others. Using Markdown greatly enhances
the ability of notebooks to document and share workflows.
You can start using the ArcGIS API for Python by importing the
arcgis package, which is installed as part of the arcgispro-py3
environment. The arcgis package includes several modules, the most
important being the gis module, which allows you to manage the
contents of your GIS, as well as users and their roles. The main class
of the gis module is the GIS class, and this object represents the GIS
you are working with through ArcGIS Online or an instance of
ArcGIS Enterprise. The GIS object includes a map widget, which
allows you to visualize your GIS. You can add contents to a notebook
by searching for items in ArcGIS Online. You can also create new
content by publishing items. Many of the workflows that employ
ArcGIS Pro and ArcPy on local data can be replicated for web GIS
using the ArcGIS API for Python. The ArcGIS API for Python
includes several modules to carry out specialized analysis tasks,
including raster and imagery analysis, network analysis, and
distributed analysis of large datasets. Some of the more “basic”
analysis tools are part of the arcgis.features module.
The Jupyter Notebook interface is well suited to employ other Python
packages, including Pandas for data manipulation and analysis. The
JupyterLab interface provides additional functionality and will
eventually replace the classic Jupyter Notebook interface. The same
notebook format is supported in both versions.
ArcGIS Notebooks use the same approach as Jupyter Notebook, but
the notebook files are hosted by ArcGIS Enterprise. ArcGIS
Notebooks provide many new possibilities to share and collaborate
on workflows using the notebook format within the ArcGIS platform.
Key terms
application programming interface (API)
cell (in a notebook)
container (Docker)
IPython
JupyterLab
Jupyter Notebook
kernel
map widget
Markdown
notebook
platform agnostic
pythonic
representational state transfer (REST)
token
web GIS
Review questions
What are some of the key similarities and differences between ArcPy
and the ArcGIS API for Python?
What are some of the features of Jupyter Notebook that make it well
suited to write Python code using the ArcGIS API for Python?
What types of content can you add to a notebook using Markdown?
Describe the process of publishing a local dataset as an item in
ArcGIS Online.
When can you use an anonymous login to use the ArcGIS API for
Python, and when do you need to provide user credentials?
What are some of the advantages of writing Python code as a Jupyter
Notebook instead of a stand-alone Python script or script tool? What
are some of the limitations?
Index
A
absolute paths, 57, 119, 120–21
Add Packages button, 154–55
Anaconda distribution, 146, 147, 228–29
Analyze Tools For Pro, 217–22
application programming interface (API), 226
ArcGIS API, 3, 4
basics of Jupyter Notebook and, 229
creating and opening a notebook for, 230–36
creating new content with, 256–59
documentation and help files for, 266–68
features of, 226
installation of, 227–29
introduction to, 225–26
JupyterLab and, 264–66
performing analysis in, 259–62
starting, for Python, 250–53
using Markdown in Jupyter Notebook and, 240–50
web GIS and, 14–17
writing code in notebook in, 237–40
versus ArcPy, 263–64
ArcGIS Desktop 10.x, 2, 4
migrating scripts from (See migration of scripts)
ArcGIS Notebooks, 230–36, 268–69
ArcGIS Pro
2.5 code, 4
Analyze Tools For Pro in, 217–22
changes to, 213
geoprocessing packages in, 135–40
migrating scripts from (See migration of scripts)
Python versions and, 2–3
sharing of tools in (See sharing of tools)
using Python scripting in, 1–2
ArcPy, 14
changes to, 214–17
creating functions in, 22–28
JSON objects in, 189–90
messages in, 82–88
migrating scripts and (See migration of scripts)
modules, functions, and classes in, 21–22
package tools, 42–46, 145–46
progressor functions in, 88–90
tool parameters in, 66–67, 71–72
zip files in, 170–71
versus ArcGIS API for Python, 263–64
arguments, 20
arrays, NumPy, 190–96
B
backporting, 2
backward compatibility, 2, 208
Booleans, 101
built-in modules, 163
C
Category property, 76
cells, 237
class definition, 37
classes
creating, 36–42
working with packages and, 42–46
Clone Environments dialog box, 152–53
command line, 156–60
comma-separated value (CSV) files, 175–79, 256–57, 258–59
comments, 55
conda
installation of ArcGIS API and, 228–29
managing environments using, 151–53
managing packages using, 154–55
used with command line, 156–60
content creation in ArcGIS API, 256–59
Create Points on Lines tool, 118
CSV files, 175–79, 256–57, 258–59
custom classes, 36
custom functions, Python
creating, 22–28
introduction to, 19
overview of functions and modules and, 19–22
customized tool progress information, 88–90
D
data analysis using Pandas, 196–200
data and workspaces in sharing tools, 122–25
Data Type property, 70, 75–76
data visualization using Matplotlib, 200–205
Default property, 76, 78
Delete Field tool, 77–78
dependencies, 148
Dependency property, 76, 77–78
deprecated modules, 163
derived parameter, 72
dictionaries in Python 2 versus 3, 211
digital elevation model (DEM), 6–7
Direction property, 73–74
directories, FTP, 165
distributions, Python, 146
Distributive Flow Lines tool, 118
documentation
ArcGIS API, 266–68
tools, 127–30
Document Object Model (DOM), 173
Double data type, 75
Drop Field parameter, 77
E
editors, Python, 4–5
embedded scripts, 125–27
Empirical Bayesian Kriging (EBK) tool, 11, 76
Environment property, 76, 78
environments, Python, 147–51
managed using conda, 151–53
Esri Support, 9
Excel To Table, 183
Extensible Markup Language (XML), 129–30, 171–73
extraction, ZIP file, 169
F
Feature Class, 70–71, 76, 80–81
Feature Layer, 70, 73–74, 75, 80
Feature Type parameter, 75–76
Fences toolbox, 10–13
file transfer protocol (FTP) sites, 164–68
FileZilla, 165
Filter property, 74, 75, 102–6
Filter type, 75–76
FTP files, 164–68
functions
called from other scripts, 28–32
creating, 22–28
overview of modules and, 19–22
working with packages and, 42–46
G
geometry objects and classes, 40–41
GeoPandas, 200
geoprocessing modules and packages
Analyze Tools For Pro, 217–22
comma-separated value (CSV) files in, 175–79
Excel files using openpyxl in, 177, 179–83
file transfer protocol (FTP) sites in, 164–68
introduction to, 163–64
JavaScript Object Notation (JSON) in, 183–90
Matplotlib for data visualization in, 200–205
NumPy arrays in, 190–96
Pandas for data analysis in, 196–200
web pages using urllib in, 173–75
XML files in, 171–73
ZIP files in, 168–71
Geoprocessing Options, 65–66, 94
geoprocessing packages, 135–40
Get Count tool, 72–73
H
hard-coded values, 55
HTML files, 173
I
IDLE, 5, 65, 225, 263
environment and, 160
.pyt files in, 95
Illuminated Contours tool, 6–8, 34–35, 124
implicit line continuation, 102
initializing of objects, 38–39
Input Features parameter, 71
input in Python 2 versus 3, 210
instances, 37
instantiating, 37
integer division in Python 2 versus 3, 209–10
integer types in Python 2 versus 3, 209
Intersect tool, 71
iteration using next() in Python 2 versus 3, 211
J
JavaScript Object Notation (JSON) files, 183–90, 258
JupyterLab, 264–66
Jupyter Notebook, 3, 226, 263, 264
basics of, 229
creating and opening a notebook in, 230–36
JupyterLab and, 264–66
using Markdown in, 240–50
writing code in, 238–39
L
Label property, 69, 73–74
libraries, 145–46
Licensing issues in sharing of tools, 116
Long data type, 74, 75
lossless data compression, 168
M
Manage Environments dialog box, 151–53
Markdown in Jupyter Notebook, 240–50
MATLAB, 200
Matplotlib, 200–205
messages, 81–88
handled for stand-alone scripts and tools, 88
metadata, 127–29
methods, 36–39, 41–42
Microsoft Excel, 177, 179–83
Microsoft Office, 177
migration of scripts
Analyze Tools For Pro and, 217–22
changes between Python 2 and 3 and, 208–11
changes in ArcGIS Pro and, 213
changes in ArcPy and, 214–17
introduction to, 207
overview of, 207
Python 2to3 program for, 211–13
ModelBuilder, 50, 73
modules, 145–46
geoprocessing (See geoprocessing modules and packages)
organizing code into, 32–36
overview of, 20–22
Multiple Ring Buffer tool, 61–68
N
Name property, 73–74
nodes, XML, 171
notebooks
ArcGIS Notebooks, 230–36, 268–69
creating and opening, 230–36
using Markdown in, 240–50
writing code in, 237–40
Notepad, 177, 187
NumPy, 43–46, 49–50
arrays in, 190–96
O
object-oriented programming (OOP), 36
opening files in Python 2 versus 3, 210
openpyxl, 179–83
Outside Polygons Only parameter, 63
P
package managers, 146
Python, 147, 148–51
packages
creating geoprocessing, 135–40
geoprocessing (See geoprocessing modules and packages)
introduction to, 145
managed using conda, 154–55
managing environments using conda in, 151–53
modules, packages, and libraries of, 145–46
Python distributions and, 146
Python environments and, 147–51
Python package manager, 147
using conda with command line in, 156–60
working with, 42–46
Pandas, 196–200, 256
Pandas DataFrame, 196–98
Pandas Series, 198–99
Parameter Data type, 71
parameters, tool, 60–69
defining tools and, 101–8
editing tool code to receive, 78–81
setting of, 69–78
password-protected tools, 125–27
paths, 57, 119–22
PIP, 147
platform agnostic, API as, 227
printing in Python 2 versus 3, 209
pristine state of environment, 151
progressors, 88–90
properties, 36
PyCharm, 5, 66, 225, 263
environment and, 160
.pyt files in, 95–97
Python
ArcGIS and versions of, 2–3
ArcGIS API for (See ArcGIS API)
creating custom functions in (See custom functions, Python)
distributions of, 146
environments with, 147–51
example scripts, tools, and notebooks in, 5–17
geoprocessing in (See geoprocessing modules and packages)
migrating scripts from Python 2 to 3 in (See migration of scripts)
overview of functions and modules in, 19–22
package manager in, 147
as scripting language, 1–2
toolboxes in (See toolboxes, Python)
working with editors in, 4–5
working with packages of, 42–46
Python 2 to 3 differences. See migration of scripts
Python 2to3 program, 211–13
pythonic wrapper, 226
Python Package Index (PyPI), 147, 163
Python Package Manager, 147, 148–51
Manage Environments dialog box, 151–53
managing packages using conda, 154–55
PYTHONPATH, 29–31
R
Random Sample tool, 9–10, 69–70, 72, 74–75, 98
editing code to receive parameters, 78–81
messages and, 82–88
setting parameters in, 103–6
source code and, 109–13
Range Filter tool, 74
ranges in Python 2 versus 3, 210
relative paths, 57, 119–21
representational state transfer (REST), 226
root folder, 117
S
scalar values, 72
scratch GDB, 123–24
scratch workspace, 123
scripts, 1–2
calling functions from separate, 28–32
embedded, 125–27
examples of, 5–17
migration of (See migration of scripts)
organizing code into modules, 32–36
stand-alone, 50
script tools
comparing Python toolboxes and, 113
custom behavior of, 81
customizing tool progress information in, 88–90
edited to receive parameters, 78–81
exploring tool parameters for, 60–69
handling messages for stand-alone, 88
introduction to, 49
setting tool parameters for, 69–78
steps to creating, 51–60
versus Python toolboxes, 49–50
why create your own, 50
working with messages, 81–88
sharing of tools
choosing a method for, 115–16
creating geoprocessing package for, 135–40
creating web tool for, 140–42
documenting tools and, 127–30
embedding scripts, password-protecting tools in, 125–27
finding data and workspaces for, 122–25
handling licensing issues in, 116
introduction to, 115
Terrain Tools example of, 130–35
using a standard folder structure for, 116–19
working with paths in, 119–22
sinuosity index, 26–27
slice, data, 10–11
SmartFTP, 165
source code, 109–13
spreadsheets, Excel, 177, 179–83
Spyder, 5, 66, 225, 263
environment and, 160
.pyt files in, 95
Stack Exchange, 9
stand-alone scripts, 50
handling messages for, 88
standard folder structure for sharing tools, 116–19
standard libraries, 146, 163
Style Guide for Python Code, 4
Symbology property, 76, 78
T
Table To Table, 178
Terrain Tools, 5–8, 118, 124, 130–35
third-party libraries, 145, 146
3D Fences toolbox, 10–13
toolboxes, Python, 49–50
comparing script tools and, 113
creating and editing, 93–101
defining tools and tool parameters for, 101–8
introduction to, 93
working with source code in, 109–13
tool dialog box, 50
tool parameters, 60–69
setting of, 69–78
tools, sharing of. See sharing of tools
tree structure, XML files, 171–72
Type parameter, 73–74
U
unicode strings and encoding in Python 2 versus 3, 209
uniform resource locators (URLs), 173–75
Union tool, 71
urllib, 173–75
V
Validation tab, 81
virtual environments, 148
W
web pages using uniform resource locators (URLs), 173–75
web tools, 140–42
X
XML files, 129–30, 171–73
XY Table To Point, 178
Z
ZIP files
for geoprocessing, 168–71
for sharing of tools, 116
Transcription
Download