Lab 4: Python and Model Builder
Using ArcGIS 10.8.1 Python Shell and Model Builder Interface
Eidan W. Willis
Using ArcGIS 10.8.1 Python Shell and Model Builder Interface
Eidan W. Willis
Introduction
Section 1: Preparing our Data
1. Importing our Raster Data
2. Open and Prepare Model Builder
3. Importing a Tool
4. Exporting the Model to Python
5. Opening the Script
Section B: The Basics of Python
6. Creating Variables
7. Hello, World
8. Lists in Python
9. Loops in Python
10. Branches in Logic
Section C: Editing an Existing Script
11. Opening and Writing to a File
12. Clean Up the Script
Section D: The Python Shell Challenge!
13. Implementing the Script
14. Troubleshooting
15. Our Results
Conclusion
In the 4th lab of ENVB 530, we explored how to combine our previous knowledge of the Model Builder with the Python shell interface available in ArcGIS's ArcMap (version 10.8.1). We revisited several important data types (e.g., variables, strings, lists, functions) and basic Python functionality was explored through for loops, if statements, the with...as structure, and more. We also tested our knowledge by undertaking a challenge activity using the Python shell to reclassify several raster images, calculate some statistical values, and write our results to a new .txt file – all solely with the Python Shell. Let's begin!
We start the same as we always do with ArcMap: download our data – this time containing a set of raster .TIF files, each representing the same image but viewed in different bands. We then setup our connection to the destination folder in ArcCatalog and import our band layers to the ArcMap Table of Contents, beginning with the blue band (i.e., the file with the suffix "B10"). Take a quick look at the raster layer itself, thinking of what areas in the image you'd like to take a closer look at. Keep this region in mind as we move along, we'll be coming back to this!
A view of the B10 image in the Table of Contents.
Navigate to your folder destination in the Catalog, right-click and add a new toolbox titled Lab4.Toolbox.1.tbx . Within this toolbox, right-click and add a new model. This should feel relatively similar to Lab 2, where we had an in-depth look at the ArcMap Model Builder.
Next, right-click the model and click Edit to view the Model Editor window. Navigate to the Model Properties >> Environments and set the Processing Extent >> Extent environment setting to be Same as Display in the Values... tab.
Navigate to the ArcToolbox and find the Times tool found in the Spatial Analyst Tools >> Math tab; drag and drop the tool into the Model Builder canvas. Double-click on the Times box that appears in your canvas and set the parameters according to the following steps:
Set your Input raster or constant value 1 to the blue-band B10 raster layer we've imported.
Set your Input raster or constant value 2 equal to 1
Set the Output raster to be somewhere accessible, preferably within a Geodatabase .gdb file (I created a new .gdb file within the folder that contains the original raster data for this lab).
Doing so will create the model design as seen below. Take note and be sure to remember that the name of the second input raster/constant value parameter does not explicitly show that it is a constant value of 1, but it is.
Once you've adjusted the parameters as described above, right-click the Output raster box in the model canvas and check the box to add it to the display. Then, run the model and note that a new layer has been added to the dataframe in your Table of Contents delineating the region you chose (Note: you may need to change the color gradient of the new layer to be able to distinguish it from the original B10 layer).
We will now export our model to a python script. In the drop-down window tab at the top of the model canvas, navigate to Model >> Export >> To Python Script. Save the new Times script you've made one path-step before the folder containing all of your imported raster images (note: this also happens to be where I saved the geodatabase I created for this lab). Once you've done this, you're ready to open your new script! We won't be needing the Model Builder for the rest of this lab, but feel free to save your changes if you want.
To open your new script, navigate to the file outside of ArcMap, right-click and you should see an option to open the script in a text editor. For me in Windows, this option showed up as Edit with IDLE. It should look something like the left-half of the screenshot below when you open it. Then, open the Python shell in ArcGIS and stage the two new windows side-by-side. This shell serves as an environment within which we can run our script and enter Python commands; when you type or paste a new command in the shell, it'll act as if you're writing the script line-by-line and running after each complete edit made.
A look at the script we made in the first section staged next to ArcMap with the Python shell open in the top-right corner of the screen.
6. Creating Variables
We'll begin by learning how to create variables in Python. These variables can be written either in the script you opened in IDLE and then copy+pasted into the ArcMap Python shell, or they can be entered directly into the shell environment. Commands are executed by hitting return or enter. Simple calculations can be performed with your variables obeying standard syntax of grade school arithmetic, and the value of a new variable can be inquired by adding print statements, as seen on the right.
We can also manipulate and print strings – an ordered chain of characters (e.g., "A", "4", "&") – in Python in a similar way. As we can see, Python does not require variables to be delineated with the var indicator, which we've seen before in JavaScript with Google Earth Engine; instead, the variable name can simply be written out and set equal to the data type you want to store in it. For instance, we can see from a print statement on the right that the variable sentientString contains the String, "I am a String.".
Another important way that we can manipulate variables in Python (as well as in most programming languages) is by "casting" them to a new data type. For instance, we can cast our previously calculated Integer, z, to be of type String simply by wrapping the variable name in the str() command. Note that spaces can be added to a line of Strings simply by encapsulating them in quotation marks " ". As well, comments in Python can be written with a hash symbol # placed at the beginning.
7. Hello, World
Whenever you learn a new programming language, it is absolutely imperative that you initiate yourself by learning to write and print the phrase "Hello, World". I thought it'd be fun to start another tradition of writing the phrase in another spoken language of the user's choosing; for instance, in Japanese, we could print the iconic statement as " こんにちは、世界!" (read as "konnichiwa, sekai!"). But, it seems that we're limited to alphanumeric languages (this is at least the case with Python)...what a shame! Nevertheless, the same "Hello, World" output can be printed in a few different ways, as we can see below:
8. Lists in Python
As we saw when we explored the JavaScript API of GEE in Lab 1, lists are an extremely flexible data type that can be used to organize other data types, like Strings, Integers, Doubles, Characters, and so on. Because lists store their contents at specific indices increasing from 0, we can access individual contents of a list by printing the variable with a bracketed path to the index of our choice. We can see this below, where we access the first index of the sentientList, where the String "I" is stored.
We can also access segments of a list by providing a range of index values, for instance from index 0 to index 4. We can also get the length of a list (i.e., the total number of indices contained within the list) by using the len() function.
9. Loops in Python
Loops help us perform a given function, action, or command several times. It's often employed when it is less efficient to manually work through a given set of data step-by-step than it is to write out a short loop statement that can carry out the iteration for you. For instance, we can print each word in our sentientList variable – one after the other – by writing a short loop statement. In Python, iterative loops are mainly written with the for statement as seen below.
The for statement needs a very particular syntax to run correctly. First, for is written to indicate the start of a new for loop command. A variable unique to the for loop itself is specified (above, this variable is each_word). This variable represents the index at a given iteration through the list. the in statement tells the computer that the following variable is the list over which the loop will be executed, which happens to be the sentientList variable. A colon : is placed at the end of the statement, indicating the end of the line and the beginning of the code that tells the computer what to do at each iteration in the loop. In this case, the computer is told to print the word at the given index (i.e., each_word) for all words in the sentientList List. We can see that the list is printing each variable on its own line, but what would happen if we were to put another print statement immediately following the one already specified in the for loop?
We can see that the second print statement prints immediately following each of the words in the sentientList. This means that the order of your code within a for loop matters, because a line higher up in the script will be printed prior to any other lines below it. It also means that the for loop ends each time the final print statement is reached, at which point it will increment to the next index in the list (e.g., once it's done with index sentientList[0], it will move to index sentientList[1], and so on).
We can also iterate numerous times over by specifying a max number of iterations over which the program should run. One should take care to consider the computational power of your computer with this, however, because it could easily crash if you accidentally create infinite loops (or just set the max number of iterations to a number too high for the computer to handle). Below, we can see the result of printing 100 iterations of 1 times 2 (which could also be viewed as 2^n).
Note: unfortunately my remote computer crashed before I could get the last screenshot of the print statement, so what you're looking at is more like 90ish iterations rather than 100.
(...)
(...)
10. Branches in Logic
We can also use logic statements that will perform different actions depending on whether a given if statement is true or false. The conditional statement is placed after typing "if" in the Python shell. A colon : is used to separate the conditional statement from the code to be executed in the case that the condition is found to be true. Alternatively, "else" statements can be used to catch any conditions cases where the conditional statement returns false.
It is important to pay mind to your indentation when using the Python shell, and this is still very much the case when dealing with conditional statements. When writing conditional statements, ensure that you don't indent any if or else statements, as the code will produce an error otherwise. For instance, in the code on the right, the code breaks when the else statement is indented, but works just fine when it is not.
The Python shell in ArcMap can also be used to open and write in .py and .txt script files. This can be used in conjunction with for loops to do a number of handy things. For instance, a 2-line for loop can be used to print a copy of your script to the Python console within ArcMap, as seen below. To open a script within the shell, use the open() function and specify the path to your file of choice. Personally, I like to specify the Absolute Path (the longest path possible, starting from the drive containing your data) – I tend to encounter less unexpected errors this way. It's also much easier to keep track of where your data is in the long run when you can see the Absolute Path in your code. A template for the open() function could look something like this:
f = open("path\\to\\file", mode)
where
f --> a variable pointing to the opened file
mode (either "w" or "a"): defaults the file to either write "w" to the file, or append "a" new information to the file
We can also write to script files by using the .write() command, which, when attached to the variable pointing to an opened script, will tell ArcMap to write whatever is contained within the parentheses of the write() command to the script. After we've specified an opened script using the open() command, we can use the with and as statements in conjunction with the open() and write() functions to specify what should be written/appended to the script as follows:
with open("script.txt", "w") as f:
f.write('Hello World')
In these ways, the open() and write() functions are extremely useful tools for directly accessing files from a specified path and adding data from computational processes in ArcMap to separate text files. We'll be using this functionality of the shell with variables, lists, if statements, loops, and more as we explore Python programming in ArcMap at a more advanced level.
One of the most important habits to keep while programming is maintaining a neat and tidy script. If you can't read your own script, the likelihood of you understanding what it does, trying to find help from others, or explaining what it does to others will be immensely more difficult. As much as you can, maintain neat indentations and simple, easy-to-understand variable names. That way, both you and others will be able to understand what your code does!
It's also extremely important to save new changes in to your script in new files so as not to overwrite a script you know works with one that may not. If you are constantly overwriting your old files with new changes, you may encounter an impasse where you need to backtrack to an earlier version to save your progress – it's an absolute nightmare if you have nothing to backtrack to!
A screenshot showing a section of cleaned-up script, where variable names were changed to be more human readable.
Receipt of running our script in the Python shell. Don't forget to save!
Now that we've learned the basics of using the Python Shell in ArcMap, we can test our understanding in practice with a challenging exercise. For this, we imagine a scenario in which we want to:
Reclassify each of the bands of our first image (i.e., each band is a raster file with the ending in B10, B20, and so on until B70) into five levels using the Slice command from the Spatial Analyst toolbox. We want to employ the Natural Breaks algorithm and number the new classes from 1 to 5. Doing so will render 7 raster layers (i.e., 7 bands) each with class values ranging from 1 to 5, inclusive.
Specify that the slicing should be done for the entire full extent of the image.
Record the mean and standard deviation of each of our 7 bands in the resulting reclassified raster layers in a text file. This will be done using the GetRasterProperties() function in the Data Management toolbox (though we will be accessing it directly from the Python Shell) and our new skills of writing directly to text files from ArcMap.
To be able to do this efficiently for all raster layers, we can employ our new knowledge of the Python Shell to write a script that will iterate over all of our layers with a for loop and execute the necessary changes. To do this, we will first make a simple model using the Slice command from the Spatial Analyst toolbox and export our model as a Python Script to a text editor. From here, we will modify the script to be able to do everything above. We can break the problem down into the following steps:
1) If you haven't already, load all 7 of the band layers into the Table of Contents
2) Create a new toolbox in the Catalog to store your new Slice model
3) Create a new model and save it as Lab4.Models.Slice.1
4) Open the Model Editor and set the Input Raster of the Slice function to be equal to the B10 band layer
5) Set the Number of output zones to 5 and the Slice method to NATURAL_BREAKS
6) In Model >> Model Properties, check the Processing Extent >> Extent checkmark and set the extent to be the same as the B10 layer within the Values... tab
7) Save your changes to the model and export it as a Python Script
8) Change the variable names to be easier to understand
9) Load your script into the Python Shell and test your script
10) SAVE CHECKPOINT: Save your script!
11) Visit the help text for the GetRasterProperties() function, which can be found by navigating to Help >> ArcGIS 10.8.1 Desktop Help and searching for GetRasterProperties in the search tab
12) Write a set of code to calculate the mean and standard deviation of a given band
13) The function prints the Output Raster to the display and returns our desired set of statistical values. Use the getOutput method (as described at the bottom of the Help page) to access these values
14) Use our new knowledge of variables, lists, for loops, if statements, and accessing files to write a script that will iterate over all of our band layers. To do this, think of wrapping all of the lines of code we've written so far in a for loop. How can we use our new knowledge of lists to access all 7 of our bands?
A look at the Model Builder as we made changes to the Slice function.
Be sure to navigate to the help pages for information on the GetRasterProperties function.
13. Implementing the script
I will now cover my process of implementing the script described above. To begin, I implemented the model as described and exported it as a Python Script. I was able to make changes to the script as I moved along in the challenge by keeping it open in IDLE and loading it into the Python Shell whenever I wanted to test it. I also made sure to save new changes as new files, each incremented in ascending numerical order. My complete script, as well as a breakthrough of each section of code, can be seen below:
# Import arcpy module
A default import statement in any Python script used in ArcGIS.
# Set Geoprocessing environments
This code ensures that the important parameters (i.e., Processing Extent, Snap Raster, Cell Size) of the input raster remain the same in the output raster.
# Global variables
In order access and iterate over all 7 bands within the for loop without having to refer to the bands' long, clunky original names, I created 7 global shorthand variables, one for each band. Then, I created a list called aListOfBands that contains each of these shorthand variabes; this list essentially acts like the ee.Dictionary variables we saw in the GEE lab, where we can point to the original data without having to use their original names. Next, I created a similar list containing String names for each of these bands called aListOfBandNames. This list comes in handy within the for loop by ensuring that the Output Raster is named correctly according to its corresponding band. Lastly, the f variable represents the text file into which we will be writing the band statistics (i.e., mean and standard deviation) of each of our 7 bands. Note that the open() function contains 1) the Absolute Path to the pre-created .txt file and 2) sets the mode of the function to "w" for writing. I also initiated the text file by writing a line of blank code using the f.write("\n") line.
The for loop states that the index each_band in aListOfBands will carry out the following commands:
# Conditional statements for naming the output raster
Initially, I had nested the first for loop within a second that said "for each band name in a list of band names, perform the for loop for each of the indices in the aListOfBands list. However, this ended up doubling the amount of work that the computer had to do, because it then had to reach the first iteration of the aListOfBandNames (i.e., get the name of the first band), then go through all 7 iterations of the aListOfBands (i.e., calculate the slice for all 7 bands), then go back to the first for loop and move onto the second iteration (i.e., get the name of the second band) and repeat. This effectively squared the number of iterations and, therefore, doubled the amount of work that the computer had to do before exiting. Instead, I added several if statements to select and set the correct bandName uniquely depending on what iteration level the program is on. We exchange our 7 bands by setting theOutputRaster equal to the addition of the Absolute Path (up to the file of our choosing) and the variable bandName. In this way, we can also avoid having to hard-code 7 different output raster variables.
# Local variables
We set theInputRaster equal to the index of the aListOfBands at each given iteration level. In essence, at the first iteration, this is setting our input raster value equal to Band_1; at the second iteration, its Band_2, and so on. This is the main way in which the for loop can vastly improve the efficiency of our code!
As we mentioned above, we set theOutputRaster equal to the addition of the String representing the Absolute Path of the band raster file and the bandName variable. This effectively sets the output raster value equal to the new name of an output layer at each level of the aListOfBands iteration.
#Process: Slice
This command executes the Slice from the Spatial Analyst toolbox. The function itself takes theInputRaster, theOutputRaster, the number of slices as a string (i.e., "5"), and the name of the algorithm to be run (i.e., "NATURAL_BREAKS")
# Get the mean object
This section calculates the mean value of the Output Raster layer, which is the first of the two band statistics we're interested in, using the GetRasterProperties() function from the Data Management toolbox. The result is set equal to the mean_sliceResult variable.
# Get the mean value from the geoprocessing result object
The .getOutput() function is applied to the mean_sliceResult to extract the mean value of the band, which is set as the variable meanValue.
# Get the standard deviation object
This section employs the same GetRasterProperties() function, this time calculating the standard deviation of our band layer of choice – this is the second band statistic we're interested in. The result is then set equal to the std_sliceResult variable.
# Get the standard deviation value from the geoprocessing result object
The same .getOutput() function is applied to the std_sliceResult variable to extract the standard deviation value of the band; this value is then accessible via the stdValue variable.
# Print to shell
Self-explanatory. Both the meanValue and stdValue variables are printed to the shell; this allows us to check the values are being calculated correctly without having to access the resulting .txt file.
# Write to .txt file
Using the with open() as .txt_file: structure we described earlier, the code in this section appends the values stored within the meanValue and the stdValue variables in the .txt file we've specified, in this case named bandStatistics.txt. It is important in this step to append ("a") rather than write ("w"); the former will add the statistics of the following bands to the file, while the latter will overwrite previous additions at each step of the iteration.
14. Troubleshooting
I ended up encountering a bump in the road when I realized that the mean and standard deviation values of the band did not match up with those being posted by other students. I began sending print statements to the shell to also check the BANDCOUNT property of the image (i.e., the property that indicates how many bands there are in a given image). To my confusion, there was only one band, leading me to think that I had performed my previous steps incorrectly. Finally, after wasting lots of time tinkering with the Model Properties (i.e., I thought the issue had to do with miscalculating the Processing Extent of the B10 image), I finally realized that I was forgetting about the other 6 raster layers, each of which represented a different band! This explains why there was only one band available in the GetRasterProperties tab, because each layer only contains 1 band.
The solution might be right under your nose...don't give up!
When we run the script in the Python Shell in ArcMap, the program will continue to run until all 7 bands are saved to the Catalog and displayed in the Table of Contents. Band statistics are sequentially printed in the shell, whereupon we can assume that they are added to the text file for later verification. We can see in each of the bands displayed in the Table of Contents that the slice function created 5 separate classes within each of the bands, which is exactly what we set out to do in the beginning.
a sneak peek of the bandStatistics.txt file after running our code.
A screenshot of the sliced Band 1 raster layer.
A screenshot of the sliced Band 2 raster layer.
A screenshot of the sliced Band 3 raster layer.
A screenshot of the sliced Band 4 raster layer.
A screenshot of the sliced Band 5 raster layer.
A screenshot of the sliced Band 6 raster layer.
A screenshot of the sliced Band 7 raster layer.
In this lab, we explored how to intertwine our previous knowledge of the Model Builder with new applications using the Python Shell available within ArcMap. We revisited several important data types (e.g., variables, strings, lists, functions) and basic Python functionality was explored through for loops, if statements, the with...as structure, and more.
For the challenge section, we successfully met our initial goals, which included striving to:
Reclassify each of the bands of our first image (i.e., each band is a raster file with the ending in B10, B20, and so on until B70) into five levels using the Slice command from the Spatial Analyst toolbox. DONE
Employ the Natural Breaks algorithm and number the new classes from 1 to 5. Doing so will render 7 raster layers (i.e., 7 bands) each with class values ranging from 1 to 5, inclusive. DONE
Specify that the slicing should be done for the entire full extent of the image. DONE
Record the mean and standard deviation of each of our 7 bands in the resulting reclassified raster layers in a text file. This will be done using the GetRasterProperties() function in the Data Management toolbox (though we will be accessing it directly from the Python Shell) and our new skills of writing directly to text files from ArcMap. DONE