How to geocode addresses?

1. Getting the data

For this exercise, we are going to download the Toxic Release Inventory (TRI) Facilities data from 2014 using the TOXMAP web mapping service. The TRI database contains information about releases of specific toxic chemicals from industry or federal facilities to the land, water, and air. You can download the data here. We will geocode the data into a set of point features. Refer to that exercise if you need to download the data and format the table.

2. Preparing Your Data

Before you can import tabular data into ArcGIS, you must make sure it is in a file type, and a format that ArcGIS can recognize and your field names are clean and matching geocoding categories. First, we will format our data fields, and then we will save it correctly.

2.1 Formatting Tabular Files

Addresses must be correctly formatted to display correctly in ArcMap and fields cannot start with a number or contain special characters except for underscore (_).

    • Open the TRI_Facilities_Clean.csv in Microsoft Excel. Save the file as an Excel workbook with the same name, i.e., TRI_Facilities_Clean.xlsx.

      • Inspect your data and see if it contains the following columns: Address, City, State, Zip. Enter the corresponding data into each column, if needed.

      • If you are missing data from one of these categories, you do not need to include that header.

      • You will need to ensure that your data contain the City and State since the same address can be repeated in several cities and states.

      • Read the rules for field names carefully:

    • When creating spreadsheets, make sure fields are fewer than 255 characters. ArcGIS reads the first 255 field characters. Fields with more than 255 characters are converted to BLOB fields and are not readable. Abbreviate, manually truncate, or split any fields longer than 255 characters. 2. Check the numeric field type before and after importing Excel data. ArcGIS typically converts spreadsheet numeric fields to double precision (Double), which may not meet your needs. If necessary, create new fields of the desired type and calculate values into them.

    • Check the numeric field type before and after importing Excel data. ArcGIS typically converts spreadsheet numeric fields to double precision (Double), which may not meet your needs. If necessary, create new fields of the desired type and calculate values into them.

    • Check the format for date fields. ArcGIS uses the Lotus date/time format. In this format, the calendar date is represented by a whole number value that represents the number of days since January 1, 1900, plus one day (due to a bug in Lotus 123 and carried over to Excel). Time is represented as the decimal portion of a 24-hour day. If date/time data is important, format the input spreadsheet using a standard Excel date/time format. We only have the year information so that we can skip this step.

    • Follow ArcGIS field naming rules when creating Excel column names. The first row of an Excel worksheet sets the name for each column. Column names become field names when an Excel worksheet is imported into ArcGIS. Always follow these naming rules:

      • Column/Field names must begin with a letter.

      • Column/Field names must contain only letters, numbers, and the underscore character.

      • Column/Field names must be no more than 64 characters. If a name is longer than 64 characters, ArcGIS retains the first 63 characters.

      • Column/Field names may not consist solely of reserved words (date, value, name, text, and year). Do not use these words in field names. See the list of reserved words. ArcGIS typically adds a trailing underscore to reserved word field names added by copying and pasting from other sources.

    • Make sure the field names are not too long and do not have spaces or other problematic characters (e.g., *, &, !, #, etc.). Rename any columns as necessary.

What problems do you see in with the fields?

    • Rename the NAME field to FACILITY.

    • Change the FAC-ID field to FAC_ID.

    • Remove the asterisk (*) in the *OTHER_AMT field.

2.2 Useable File Extensions

Once you have formatted your data, you will save it using a file type that ArcMap can recognize. The following file types can be used in ArcMap. All of these file types can be read by Microsoft Excel:

    • .csv

    • .txt

    • .xls

Click on File, Save As, Name: TRI_Facilities_2014_Clean_2.xlsx. Click Save.

3. Importing Your Data in ArcGIS

Before you attempt to geocode or geolocate your tabular data, make sure that ArcGIS can read all of your columns without any errors.

Open ArcCatalog and navigate to your TRI_Facilities_2014_Clean.xlsx file. Click on the Preview tab. Inspect the table and make sure all fields are displayed.

Close ArcCatalog and open ArcMap.

In ArcMap, open a new map document. In the ArcCatalog, side window navigate to your file and drag it to the Table of Contents. Notice that the view changes from List by Drawing Order to List by Source (Note: this is the only way in ArcMap that you can view tables).

Open the attribute table and confirm once again that all fields display correctly. Close the attribute table view.

4. Geocoding addresses in ArcGIS

Geocoding is the process of transforming a description of a location, such an address or a name of a place, to a location on the earth’s surface. The resulting locations are output as geographic features with attributes, which can be used for mapping or spatial analysis.

To geocode addresses, we need an address reference dataset and an address locator. The reference dataset contains a database with the location of addresses for a particular region or locality. The address locator is the entity that specifies the method to interpret a particular type of address input, relate it with the reference dataset and deliver a matching option back to the user interface. Here is an example of how the process works:

We will use an address locator already created for us that contains addresses for the entire US.

    • In the ArcCatalog window, create a connection with GIS (\\group.clemson.edu\group\Apps\CCGT).

    • Navigate to Geocoding_data_2014 – Geocoding data.

In this folder, you will see different types of Address locators that will match your data depending on your attributes (zipcodes, CityState, etc.)

    • In ArcMap, go to Customize, Toolbars, and check the Geocoding toolbar. Drag the Street_Address locator to your Table Of Contents. Notice how this is the default geocoder in your Geocoding toolbar.

    • Click the Geocode Addresses button. Select the Street_Addresses as your Address Locator.

    • In the following window, select your TRI_Facilities_2014_Clean table. Use the dropdowns to make sure the input fields required for the locator to work match the fields from the table.

    • Save your results in your working folder. Make sure to create a new file geodatabase (Geocoding_Results) and save your new file as TRI_Facilities_2014_Geocoded.

    • Click the Geocoding Options button and change the Spelling sensitivity to 10 and the Minimum candidate and match scores to 10 as well.

    • Click OK to start the geocoding process. Once it is done, you should see a results window that looks similar to this:

    • Click the Rematch button to manually inspect the addresses that didn’t match. In the Status column, right-click to Sort descending to see all the records with a U (for unmatched) at the top of the column.

    • Select the first one: 12 ED NEEDHAM, MANSFIELD, GA 30055. Let’s inspect why this address didn’t match. Open Google maps and type this address.

    • In ArcMap, minimize the Interactive Rematch window. Add the Streets basemap to the map and zoom to Mansfield, GA. Notice that according to this map, Ed Needham St is known as Pine St.

    • Maximize the Interactive Rematch window and click on Pick Address from Map. Minimize the window once again and click on the approximate location of Beaver Manufacturing. Notice the address 56 N Pine St.

    • Open the Interactive Rematch window and type the address 56 N Pine St into the Street or Intersection box. Click Search and notice the results, with a 100% match at the top of the list.

    • Click the Match button and notice the address is now entered in the Match_Addr field of the table.

You can repeat the same process with the rest of the unmatched addresses. Close the Interactive Rematch window.

Open the attribute table of your new geocoded layer and review the matching fields.

Congratulations, you are a tiger of a geocoder!

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------