Census Data Acquisition

Instructions for Creating a Google Map from Census Data

Collection of Data

Data can be collected from various sources, for this instruction set we will focus on census data.  The U.S. Census has abundant resources that are freely available.  Boundary files can be found at: http://www.census.gov/geo/www/cob/bdy_files.html

The desired files are in ASCII format; make sure to download the proper files.  An example of a file to download is the county boundaries of the state of Arizona: http://www.census.gov/geo/cob/bdy/co/co00ascii/co04_d00_ascii.zip

This will be used throughout the following instruction set.

 

Unzip the file using a program such as ZipGenius.  After unzipping you will have two files, co04_d00.dat & co04_d00a.dat.  The larger of the two files, co04_d00.dat, is the file that contains the actual latitude and longitude coordinates of points that make up the boundary of a polygon that represents the counties.[1]  The smaller of the two files, co04_d00a.dat, contains metadata about the individual polygons themselves.[2]

 

Manipulation of Data

The data files from the census are in a format that is not readily available to be interpreted and used in creating a Google map.  To open and manipulate these files you will need a text editor, such as Notepad++, and a copy of Microsoft Excel.

The first step in changing the data will be to remove unwanted whitespaces, line breaks, ‘END’, and ‘-99999’ statements in the dataset.  This is most easily accomplished using Notepad++.  Right click on the larger file, co04_d00.dat, and choose ‘Edit with Notepad++.’  You will see data in this format:

1     -0.110473879809927E+03       0.361211189259259E+02
       -0.110000677000000E+03       0.369979680000000E+02
               .....                                        ............         
       -0.110000677000000E+03       0.369979680000000E+02
END
2      -0.111824993748873E+03      0.362597283950617E+02
        -0.110750690000000E+03      0.370031970000000E+02
..... 
END
..... 
END

After removing the unwanted characters the data should be in this format:

1      -0.110473879809927E+03     …   …    0.361211189259259E+02 
2      -0.111824993748873E+03    …     …   0.362597283950617E+02 
…     …
15     -0.110880330523380E+03    …    …    0.315235068518519E+02 

The data is now in a format where each individual polygon, a county in this case, is on its own line.  In the case of Arizona we now have 15 polygons to represent the 15 counties of Arizona.  Save the file so that it can now be manipulated with Excel.  Before continuing, the file with metadata, co04_d00a.dat, needs to be manipulated in Notepad++.  The file should have the data in this format:
 
1
"04"
"017"
"Navajo"
"06"
"County"

After processing the data should be in this format:

1     04017     Navajo

After saving the file there are now two datasets that need to be merged in excel, or some other spreadsheet software.  The leading number of each line in a dataset corresponds to the line in the other dataset that begins with the same number.

The data in the final dataset will be stored in a character separated value format.  If there are any commas to be used in your data that will be displayed on the map, the preferred character used to separate the data is a |.  To allow Excel, or any other spreadsheet software, to use a | as the separator character a setting must be changed on the computer.[3] 

Open Excel first and then open the larger dataset.  A Text Import Wizard should open.  Import the dataset using the delimited option and choose ‘Space’ as your delimiter.  This should open the dataset with the individual values in separate cells.

The latitude/longitude points need to be converted to a decimal format.  Highlight all of the latitude and longitude data, right-click, choose ‘Format Cells…’, click the ‘Number’ tab, choose ‘Number’ category, and select six decimal places.  Insert three empty columns, one for the county name to be merged in, one for the FIPS code, and one for a formula to determine the number of latitude/longitude points for each polygon. 

Open the smaller dataset using fixed width, instead of delimited, to account for counties with names that are multiple words, e.g. La Paz.  Merge the data by copying and pasting the data from the smaller dataset into the larger dataset.  In the remaining empty column you will need to insert a formula to count the total number of points.  The total number must be divided by two, to account for each point being a combination of both a latitude and longitude.  The simplest formula to use is ‘COUNTA’ which counts the number of cells in a range that are not empty.  The format is =COUNTA(‘1st column with data: Last column with data’)/2.

The dataset is now ready to be saved in character separated value format.  Choose ‘Save As’ and select ‘CSV (Comma delimited) (*.csv)’ as the ‘Save as type:’ and choose a filename.

Changing Displayed Data

To change information displayed on the map, e.g. text within a window that appears upon clicking a county, the JavaScript file will need to be changed.  Open ‘create_map.js’ in Notepad++.  As an example, the FIPS code will be added to the informational window that appears when a county is clicked.  On line 72, the text assigns a variable, ‘countyName,’ a value.  This variable is added to a polygon/county on the map.  In this case the third value from a row in the dataset, a[2] (2 denoting the third position in the row, starting with zero) is assigned to the variable ‘countyName.’  To make this variable actually appear in an informational window a second piece of text is needed, line 83.  This text adds the variable to the content string that is placed in the informational window.

To add the FIPS code these two lines need to be copied and changed.  The first line can be copied, pasted after the original line, and changed to ‘countyPolygon[j-1].set("FIPSCode", a[1]);’  This denotes that the value from a[1], the second value in the row from the dataset, is to be placed in the variable ‘FIPSCode’ and added to countyPolygon[j-1], which is the dataset’s row that is currently being created.  The second line can be copied, pasted after the original line, and changed to ‘contentString += "<b>FIPS Code: <\/b>" + this.get("FIPSCode") + "<br />";’  Notice in particular that this new line is lacking the text ‘var’ and that the = has been changed to +=.  This command keeps the previous information, the county name, and adds new information afterwards.

The informational window data is created using HTML, and because of this, anything that can be created in an HTML page can be created in the informational window.
 
Changing Map Legend

The map’s legend is statically created using JavaScript.  Open ‘legend.js’ in Notepad++.  The font style and size can be easily altered, as well as the alignment.  The coordinates are the x, y pair from ’ctx.fillText(‘text’,  x, y);’ and correlate to the bottom left corner of the text string, counted from the top-left corner of the bounding box.  The colored rectangles’ location and size parameters consist of ‘ctx.fillRect(x, y, width, height);’.  The rectangles’ outline parameters consist of ‘ctx.strokeRect(x, y, width, height, stroke width);’.  Finally the rectangles’ color parameters consist of ‘ctx.fillStyle = "rgba(r, g, b, a)";’ where ‘r’, ‘g’, & ‘b’ are numbers ranging from 0-255 denoting the RGB triplet and ‘a’ represents the opacity of the rectangle with 0.0 being completely transparent to 1.0 being completely opaque.

Placement of Files 

The files should be placed in the same directory on a web server.  You will need to include the following:

    HTML file (content)
    CSS file (style)
    JavaScript file (logic)
    CSV file (dataset)

Polygon coordinate files follow the format:
ID LON1       LAT1
LON2       LAT2
...
...
LONx       LATx
END
ID LON1       LAT1
LON2       LAT2
...
...
LONx       LATx
END
 
END
 
where ID = a unique polygon identification number, LON/LAT = a longitude/latitude coordinate internal to the polygon, and LONn/LATx = a sequence of longitude/latitude coordinate pairs defining the polygon vertices, with matching first and last vertices. An END statement indicates the termination of each polygon, and a final END statement indicates the termination of the polygon file. Islands or exclusions within a polygon are flagged with an ID number of -99999. The outer cycle of a complex polygon is listed first, followed by any islands. For example:
31 LON1       LAT1
LON2       LAT2
...
...
LONx       LATx
END
32 LON1       LAT1
LON2       LAT2
...
...
LONx       LATx
END
-99999 LON1       LAT1
LON2       LAT2
...
...
LONx       LATx
END
 
END

ID
FIPS CODE(S)
NAME
LSAD
LSAD TRANSLATION

1.  In Microsoft Windows, click the Start button, and then click Control Panel.
2.  Open the Regional and Language Options dialog box.
3.  Do one of the following:
·   In Windows Vista/7, click the Formats tab, and then click Customize this format.
·   In Windows XP, click the Regional Options tab, and then click Customize.
4.  Type a new separator in the List separator box.
5.  Click OK twice.
NOTE   After you change the list separator character for your computer, all programs use the new character as a list separator. You can change the character back to the default character by following the same procedure.
Comments