Homework3: Spatial Queries

(Updated on April 3: the nyprovoider.csv file updated as original longitude/latitude representation was in scientific format, which had limited accuracy. They are now represented in decimal format with original resolution. It won't affect your codes but the results could be different.)

Opioid Use Disorder (OUD) is a condition where people become dependent on opioids like prescription painkillers or heroin, leading to cravings, inability to stop using, and withdrawal symptoms. Buprenorphine is a medication used to treat OUD by reducing cravings and withdrawal without causing a high. A Buprenorphine Practitioner is a licensed healthcare provider who can prescribe this medication to help people recover from opioid addiction.

References:

Please refer to the examples in IBM DB2 Spatial Tutorial I (word)and II (word). You can also download the sample data and sql scripts. DB2 Spatial Extender documentation PDF can be downloaded here, and the online reference is here.

Please go through DB2 Spatial Tutorial (I, II ) before you begin this homework.
Please also read spatial query examples.
DB2 Spatial Online Reference (V12)

Datasets:

There are two datasets to be used in this homework:

1. Buprenorphine Practitioner Locator (Download the nyprovoider.csv for this homework). This dataset includes the address and geolocation of each buprenorphine practitioner in New York State, and latitude, longitude represents the geolocation of the provider:

(addressId, namePrefix, nameFirst, nameMiddle, nameLast, nameSuffix, addressLine1, addressLine2, city, state, zipCode, county, latitude, longitude, telephone, fax, reachedPatientLimit, certifedFor100, NPSubtype)

3. The US zip code tabulation areas from US Census Bureau, which contains the boundary of each zip code (download tl_2019_us_zcta510.zip). Once you unzip the file, you can run the following command under db2 command line to find metadata for the format. (Documentation of TIGER datasets can be found here.)

db2se shape_info -fileName tl_2019_us_zcta510.shp

The shapefile has a multipolygon object to represent the boundary of each zip code.

Tasks (10 + 2 extra credit points)

1. (3 points) Setup the database.

a. Enable the sample database (or your own database) for spatial support:

db2se enable_db sample

b. Load the zip code area dataset by adapting the import SQL file:

db2 -tf import_zip.sql

c. Create two tables for providers using the createprovidertables.sql (we create two tables, cse532.provideroriginal for original data, and cse532.provider with a spatial column).

db2 -tf createprovidertables.sql

d. Load nyprovoider.csv (all providers of New York State) into cse532.provideroriginal using script:

db2 load from "C:\myfolder\nyprovoider.csv" of del MESSAGES load.msg INSERT INTO cse532.provideroriginal

e. Write a SQL script providerinsert.sql to insert data into cse532.provider by selecting data from cse532.provideroriginal table and converting (Latitude, Longitude) attributes into DB2GSE.ST_POINT type with srs_id 1 for geolocation attribute in cse532.provider.

f. Run the createindexes.sql to create indexes.

db2 -tf createindexes.sql

(Note: you can adjust the grid cell sizes in the index file to tune it for your query. You can check this spatial index advisor.

You can also try to give hint on selectivity to enforce the spatial index to be used, e.g.,

SELECT ... FROM <user_table> WHERE ST_Intersects(<indexed_shape_column>, ...) = 1 SELECTIVITY 0.00001;

)

2. (3 points) Write a query nearestprovider.sql to find closest provider (Buprenorphine Practitioner who could prescribe Buprenorphine to treat OUD patients ) from "1710 NY-112, Coram, NY 11727" (40.891720, -73.016479) (latitude, longitude). Please return location and distance in your result. You can use unit 'KILOMETER', 'METER', or 'STATUTE MILE' for distance measurement.

Nearest neighbor search is not directed supported by Db2. You can use ST_BUFFER to create a buffered area (polygon/circle) from a point within a certain distance and search only stores within the buffer. Note that 0.25 degree is roughly 10 miles. For all the datasets, we use spatial reference nad83_srs_1 with srs ID as 1.

You can find information here on functions such as ST_POINT, ST_BUFFER, ST_WITHIN or ST_CONTAINS, and ST_DISTANCE.

3. (3 points) Write a query noproviderzips.sql to find zip codes without any provider, neither in their neighboring zip codes (with intersecting boundaries), if any.

4. (1 point) Drop all indexes and perform the two queries again, and compare the query performance in terms of execution time for above two queries.

Show your time difference with and without indexes in your README file.

You can refer to how to estimate query time.

5. (Extra credit: 2 points). Write SQL queries or stored procedure mergezip.sql to merge zip code areas into large ones with neighboring zip code areas, so that the new population in each merged region (combined zipcodes) is large than the current average population, using the zip code population table in Homework1. For simplicity, you can remove the duplicates from the population table.

Submission.

Please zip your SQL scripts, results, and a README file. A result file can be a text file or a screenshot.

Please go to BrightSpace, and submit it under homework 3.

Report abuse