Geospatial Data Gathering, Cleaning and Conversion

For this tutorial, we'll be using a NASA dataset, consisting of ~50k rows with information about fallen metoerites in the last 40 years. The dataset can be found here. However, we'll be using a script to download it. Copy the following code into a file named "api.py":

import pickle

import os.path

import requests

import json

import csv

import time

import math

if os.path.exists("./saved_data"):

print("Found saved data.")

with open("saved_data", "rb") as f:

rows = pickle.load(f)

with open("json_data.json", "w") as f:

json.dump(rows, f)

else:

print("No saved data found. Pulling data from NASA API.")

r = requests.get("https://data.nasa.gov/resource/y77d-th95.json?$limit=50000")

rows = r.json()

with open("saved_data", 'wb') as f:

pickle.dump(rows, f)

with open("json_data.json", "w") as f:

json.dump(rows, f)

with open("csv_data.csv", "w", newline='', encoding='utf-8') as f:

csvwriter = csv.writer(f, delimiter=",")

count = 0

for emp in rows:

if count == 0:

header = list(emp.keys())

header+=["X", "Y", "Z"]

print(header)

csvwriter.writerow(header)

count+=1

if ( len(emp.keys()) != 10 or

"reclat" not in emp.keys() or

"reclong" not in emp.keys() or

"fall" not in emp.keys() or

"geolocation" not in emp.keys() or

"id" not in emp.keys() or

"mass" not in emp.keys() or

"name" not in emp.keys() or

"nametype" not in emp.keys() or

"recclass" not in emp.keys() or

"year" not in emp.keys() ):

"do not add"

elif emp["name"]=="Havana": #emp["fall"]!="Found" and emp["fall"]!="Fell":

print(emp)

print("mass" not in emp.keys())

else:

emp['reclat'] = float(emp['reclat'])

emp['reclong'] = float(emp['reclong'])

radius = 10

lat = emp["reclat"]

lon = emp["reclong"]

# if lat<

emp["X"] = radius * math.cos(math.radians(lat)) * math.cos(math.radians(lon))

emp["Y"] = radius * math.cos(math.radians(lat)) * math.sin(math.radians(lon))

emp["Z"] = radius * math.sin(math.radians(lat))

csvwriter.writerow(emp.values())

# print(emp)

Make sure that you have the "requests" package installed. to install it run:

pip install requests

Now, let's pull the data! Run it with

python api.py

Check your folder, you should now have the data saved in a few formats (json, csv, binary data). The script also performed conversion from latitude and longitude to x,y,z coordinates since neither of the programs we're using natively supports geospatial coordinates.

We'll only be using the CSV for this tutorial. Next, choose between the ParaView and Unity tutorial to visualize the data.

Adapting the Data Pipeline for Urban‑Transit Projects

For the Boston AR Transit Futures case study we extended the generic meteorite‑gathering pipeline to transit. Key steps included:

- Pull GTFS/MBTA v3 feeds: Use `requests` or the MBTA's `v3` API to fetch route, stop, and schedule tables. Extract fields such as `route_id`, `route_short_name`, `direction_id`, `stop_id`, `stop_name`, `stop_lat`, and `stop_lon`.

- Clean and merge tables: Filter out inactive routes, join stops to their routes and directions, and calculate derived metrics (e.g., headways, ridership estimates) in Python or pandas.

- Convert lat/lon to Unity coordinates: Apply the spherical‑to‑Cartesian conversion from the base tutorial to each stop's latitude/longitude, or project onto a local planar coordinate system. Normalize positions so the network fits comfortably on the tabletop or AR map.

- Export to CSV/JSON: Write the resulting dataset to a transit‑specific CSV or JSON with columns for position (X,Y,Z), route colour, and metadata (stop name, line name, ridership score). This file is then parsed by the Unity script described in the next tutorial.

This adaptation demonstrates how the same pipeline used for meteorite data can be repurposed for complex network datasets by changing the source API and column mappings.

Contributed by Korey

Page updated

Google Sites

Report abuse