Before tackling this tutorial, you will need to download and install a dataset following these instructions:
Create a folder called joining_tables somewhere under your personal directory (e.g. C:\Users\jdoe\Documents\Tutorials\joining_tables\).
Download the data for this exercise then extract the contents of joining_tables.zip into your newly created joining_tables folder.
The project folder consists of an .aprx file, a shapefile of the Maine counties and two data tables: a CSV formatted data table and an Excel formatted data table. You will first learn how to load and view a data table in ArcGIS Pro, then you will learn how to join the table to the shapefile.
Open the Joining tables.aprx project file.
The project consists of a map of the Maine counties (USA).
Open the Catalog pane (if it's not already open).
The project folder should already be mapped as a folder connection.
Expand the Joining_Tables project folder.
ArcGIS Pro will list all files in a folder that it recognizes as being valid files for use in your project. This includes non-spatial data files like Alcohol_use_2008.csv and Estimated_median_income_2009.xlsx.
In the next step, you will load the CSV file.
Drag and drop the Alcohol_use_2008.csv file from the Catalog pane into the Map or Contents pane.
The file consists of the percentage of young people between the grades 6 and 12 who have consumed alcohol at least once. The file should appear in the contents pane (data were originally pulled from https://www.maine.gov/maineosa/survey/).
Recall that this is not a spatial dataset, hence, its contents will not be drawn in the map.
Before attempting a join, you should check the contents of the table.
Right-click on the Alcohol_use_2008.csv layer and select Open to bring up its table.
The data file consists of three fields (or attributes in ArcGIS lingo): The county names and the percentage of youth having and not having consumed alcohol at least once.
A join can only be performed when both the spatial object being joined to, and the data table to be joined have a common field that will be used to properly assign the data table rows to the spatial features in the spatial object.
Let's now open the attributes table associated with the Counties layer to help identify the matching field.
Right-click the Counties layer and bring up its Attribute table.
The Counties attribute table will appear as a separate tab in the Tables pane.
You can toggle back and forth between both tables to compare their contents.
Alternatively, you can split the Tables pane into two side-by-side tables to facilitate comparison as shown in the next step.
Click on the Counties table tab and undock it by dragging it away from its location but keeping the mouse somewhere inside the Tables pane. At some point, you will see a cross shaped docking indicator unto which you can drop the table tab. Select the right-most indicator (this will dock it to the right of the other table).
Note that you can always dock the table back to its original location, or dock it somewhere else in the project window.
The common fields that will be used to join the records are County name (in the CSV table) and COUNTY (in the Counties shapefile).
Note that the number of records do not need to match between the tables since we are not appending one table to the other by row order. In this example, the CSV file has just 16 records (one record for each Maine county) whereas the shapefile has 6,251 records. The large number of records reflects the number of individual polygons making up each county in the shapefile. Many counties in Maine cannot be represented by a single polygon--this is especially true for coastal counties that are made up of hundreds of island polygons.
Right-click the Counties layer and select Joins and Relates >> Add Join.
Make sure that you right-click on the layer you are joining a table to since this is a unidirectional operation. (I.e. you are NOT joining the shapefile table to the CSV table).
Set the Input Table to Counties, the Input Join Field to COUNTY, the Join Table to Alcohol_use_2008.csv, and the Join Table Field to County name.
Click OK to run the geoprocess.
If you expand the Counties Attribute table, you should now see three additional columns from the CSV file added to the table.
At this point, you can symbolize the Counties layer using any one of the newly joined columns.
In the accompanying figure, the Percent_use field is symbolized.
The join you created in the last step will only persist in the current ArcGIS Pro .aprx document. In other words, if you were to load the Counties shapefile into a new map document, the join would no longer be present.
To make the join permanent, you need to export the layer (with the join) to a new GIS data file. In the next example, you will export the layer to a dedicated shapefile.
Make sure that none of the features (polygons) are selected. If a selection is present, only the selected features will be exported.
Right-click the Counties layer and select Data >> Export Features.
Set Counties as the Input Features.
Name the output file Alcohol_consumption.shp and save it in the Joining_tables project folder.
Click OK.
Note that shapefiles limit the field names to 10 characters. This means that longer field names will be truncated. For example, the column Percent_no_use in the CSV file will be truncated to Percent_no in the output shapefile.
If you need to preserve the full column names that go beyond 10 characters, you can either export the layer to a geodatabase or you can export the layer to a geopackage.
You now have a dedicated shapefile with a permanently joined data table. This will probably be automatically added to the Contents pane. If so, you can remove it from the the map document since it will not be needed for the remainder of the tutorial.
For our next exercise, we will repeat the join operation with an Excel file. But first, we will remove the current join from the Counties layer.
Right-click the Counties layer and select Joins and Relates >> Remove All Joins.
This will remove all joins performed on this layer. In doing so, it will generate an empty map if you symbolized the features using one of the joined columns. You might need to revert the symbology back to its original state (i.e. a single symbol Symbology).
Next, we'll load the Excel file. Note that an Excel file can consist of more than one spreadsheet. ArcGIS Pro will only allow you to load one sheet at a time. So when opening the file, you will be asked to select the sheet of interest. In this example, we only have one sheet in the Excel file.
In the catalog pane, expand the Estimated_median_income_2009.xlsx file and drag and drop the Estimated_median_income_2009$ sheet into the Map pane or the Content pane.
Following steps outlined earlier in this exercise, open the Excel file's attribute table and generate a side-by-side table layout showing the Counties table on the left and the Excel table on the right. Note that you will need to close any other table no longer needed in this step.
This time, we will join by county code ID and not by county name (you'll note that this excel file does not have a county name column). US counties are assigned a numeric code by the federal government (FIPS). These codes are stored in the CNTYCODE field for the Counties layer and COUNTY ID for the Excel file. These columns will be used to perform the join.
But before we do, let's take a look at the data types associated with these fields.
For a GIS file layer (such as a shapefile or geodatabase) , click on the table's upper right-hand icon (the three horizontal lines) and select Fields View from the pull-down menu.
For a non-GIS file (aka Standalone Table) such as a CSV or Excel file, you might find the Fields View option greyed out.
The workaround is to open the Fields tool from the Standalone Table tab. If you don't see the Standalone Table tab, you probably do not have the Excel table selected.
We are comparing data types for both joining columns. Note that the shapefile's CNTYCODE column is stored as a Text and not as a number (even though the values appear numeric) and Excel's County_ID column is stored as a Double (a numeric data type). The difference in data type may influence the join. So let's see what happens if we ignore this discrepancy in data type.
Following the example outlined earlier, join the Excel file to the Counties layer. Don't forget to click OK to run the geoprocess.
Bring up the Counties' layer attribute table. (You might need to click on the Counties tab to get out of the Fields tab).
At first glance, the join seems to have performed as expected. We see the 2009 Estimated Income column from the Excel file joined to the Counties layer. This allows us to symbolize the layer using that column. However, in doing so, you will note that some counties are blank.
It's possible that the Excel file is missing records for those counties, but after further examination, you will note that the Excel file has 16 records, one for each of the 16 Maine counties.
So missing records is not the problem.
After a bit of detective work, you might have noted that the CNTYCODE values that start with 0 do not have matching Excel records (i.e. 01, 03, 05, 07 and 09). Yet, those values do exist in the Excel file without leading 0s as shown in the following figure.
So why the NULL matches? Recall that the data types in both tables do not match. When ArcGIS Pro is asked to combine or compare different data types, it will convert one data type to match that of the other data type. It does so by applying a data type hierarchy whereby anytime numbers are mixed with text, the numbers will be changed to text data type. In our example, County ID values are converted to text. So for the mismatched records, ArcGIS Pro is comparing the text string "01" to "1" , the text string "02" to "2", etc...
There are several workarounds to this problem. One could involve converting the County ID column to text (with padded "0"s) inside of an Excel session, then re-importing the file into ArcGIS Pro. Another solution is to convert the County ID text column to a numeric column in Pro. In this exercise, we'll adopt the latter solution.
At this point, it's probably a good idea to remove the join from the Counties layer before proceeding with the creation and computation of the new field. Revert to earlier steps in this tutorial for a refresher on how to remove a join.
Click on the Fields: Counties tab in the Table pane.
At the bottom of the Fields table, click on the "Click here to add a new field" link.
This will add a new row to the Fields table.
Name the new field cnty_id.
For Data Type, you can choose anyone of the numeric options (ArcGIS Pro will usually properly join any numeric data type). But for sake of consistency, we'll make this a Double since this is the data type adopted by the Excel file.
If this was a very large file and memory management was an issue, you would probably want to be a bit more judicious in your choice of data type. In such a scenario, you would want to adopt the smallest data type that can accommodate the full range of values in that field. In our example, you could choose a Short (integer) data type.
Once you've defined the new field, you need to save the changes to the shapefile.
Navigate to the top of the ArcGIS Pro window and select Save under the Fields tab.
Navigate back to the Counties attribute table by clicking its tab.
Right-click on the cnty_id column and select Calculate Field.
If you do not see the newly created cnty_id column, you probably did not properly save the changes as outlined in the previous step.
Next, you will assign the text values in the CNTYCODE field to the cnt_id column using the Python 3 syntax. Note that you could do the same using the Arcade syntax.
In the Calculate Field window, double-click on the CNTYCODE field in the Fields box. This will automatically add it to the Expression box at the bottom of the window.
Alternatively, you could manually type the expression !CNTYCODE! in the expression box, but be sure to respect all cases since the operation is case sensitive. If you use the Arcade syntax, the expression box will look like $feature.CNTYCODE.
Click OK to run the operation.
The expression we just created copies the values in the CNTYCODE to the cnty_id column. It will automatically convert the data type as needed (here, it converts the text representation of numbers to a numeric data which we defined for the cnty_id column).
Scan the newly computed field. You should now see numbers without the "0" padding.
Now, on your own, attempt another join using the cnty_id column instead of the CNTYCODE column.
Symbolize the income column to check that all counties have a matching Excel record.
Right-click on the Counties layer and select Data >> Export Features.
Name the output income.shp
NOTE:
With early version of ArcGIS Pro (pre 3.3), using the export features option by right-clicking the layer and selecting Data >> Export Features would have exported the layer with all joins by default. In version 3.3. of the software, if an earlier join with a different table was exported, any subsequent join and export operations may require that one explicitly adds the joined tables' fields to the exported file. These steps are shown next.
Expand the Fields tab.
You will note that the new joined fields are not part of the export . Instead, the fields from the previous join seem to persist in this export despite the earlier join having been cleared from the layer.
Delete Percent_use and Percent_no_use by howevering of the field names with the cursor and clicking the Remove button.
Click on the Add Fields button.
Add the the income field. Given that the window is truncating the full path to the field names, you may need to hover your cursor over the three Estimated_median_income fields until the correct one is identified.
After checking the field column that is to be added to the output file, click on the First button under the Select Actions section.
This last step tells the tool to grab the first match in the join if more than one record in the CSV file had a matching county feature in the shapefile. This should not be an issue here given that each county has a unique record in the Excel file.
Click Add.
You'll note that the tool adds the Excel filename to the attribute field name making for a longer than needed variable name. This will be a problem if we export to a shapefile given that this file format limits column names to 10 characters. Next, we will change the names of these fields.
Right-click on the field whose name you want to edit and select Rename.
Rename the field to Income.
Once renamed, click OK to export the layer.
The new income.shp file should now include the income attribute.
Save your project.
This wraps up this tutorial.