The following is provided for both historical preservation as well as providing input into current (today's) decisions related to designing and implementing tiled vector data stores.
NOTE: There have been recent statements made that vector tiling is a recent GIS development activity - perhaps dating back to the early 2000s. This perception is patently wrong! What most in the profession agree to as being the first production GIS - the Canada Geographic Information System (CGIS) - used a tiled vector data store. This work was performed in the mid to late 1960s. In the late 1970's, Autometric designed and delivered an operational system (WAMS) that implemented a vector tile data store.
The integrated vector/raster tile structure described below was based on this early work. A bibliography provides some of the key references.
GenaMap (earlier name DeltaMap) utilized a highly performant and efficient tiled and layered topological data structure. This page provides a description and diagrams of that data structure. In a GenaMap spatial data store, continuous topology is maintained over tiled source data sets and extensive support is provided for projections (CRS), ellipsoids and datums with on-the-fly reprojection. Data is stored at 32 bit integer precision, allowing a full survey accuracy database to be maintained. Storage is independent of projection, units or scale, all of which can be defined at time of analysis or display. as a side note, a couple of years ago I spoke with a GenaMap user who is currently using various open source and commercial GIS software. He also has GenaMap running and uses that software for various back-end server functions. He stated categorically that for a variety of server functions - including serving data - GenaMap is still faster than many of the newer GIS products!!
We begin with the design principals. The design principals were based on lessons learned and research performed during the evolution of MOSS, the MOSS Arc/Node study, work by the UK Mil Survey, Mark 90, WAMS, and community/user input.
FAST - The spatial database structure had to be really, really fast in terms of all CRUD operations. This includes search and retrieval.
Flexible - No limits on the type of spatial data that can be stored, including geographic area, units of measure, data type (raster and vector), coordinate reference systems, scale, and more.
Scalable - Search and retrieval times should be as independent as possible from the number of maps/images/features in the data store. A performance target was to have the same response time for a point and query operation for 1000 features or 10000000 features.
Adaptable - Total user control of content and structure of content based on their community requirements.
Topology - Had to support topologically structured geometry - including on the fly node and edge snapping and validation.
Networked - Multiple data stores could be distributed across multiple servers connected via LAN or WAN.
Metadata - Metadata for every map in the data store including CRS, units of measure, publisher, and so forth. (Interesting side note: GenaMap map metadata is consistent with today's Dublin core, ISO 19115 and so forth.
Attributes had to be supported, including being able to access attributes in some other data store (such as a DBMS/RDBMS)
One interesting note is that back in the 1980’s, clients were pretty “dumb” so all processing, caching, transformations, symbolization, and so on had to be done on the server side. Further, memory and disk were not cheap. Efficient algorithm design and implementation was mandatory! UNIX work stations were relatively new. The initial DeltaMap/GenaMap development was on an "Bobcat" series 200 UNIX work station (https://en.wikipedia.org/wiki/HP_9000#Series_200). Once the HP 310 was available we migrated work to that workstation.
The above and other requirements, such as dealing with symbology, clearly defined the additional requirements that the data store had to be tiled and support as many map layers (vector or raster) as required by the user.
This decision led to discussion on tiling of the data store and how the design principals could be honored. Tiling in the Wetland Analytical Mapping System (WAMS - later AMS) was the initial starting point. After much discussion, the following was decided regarding layering and tiles in the GenaMap data store.
The same tiling capabilities would be applicable for all supported data geographic data types: 2d vector, 3d vector, discrete raster (e.g. satellite data), continuous raster (e.g. terrain as in DEM), and annotation (graphics text).
Tiles could be any rectalinear shape or just squares.
The tiling scheme had to be totally under user control.
The tiling scheme could be in any supported CRS (projection). The caveat is that all tiles for a specific layer had to be in the same CRS.
Different layers in a data store could be tiled using different CRSs (projections). So one layer could be in lat/long, another in State Plane, and another in UTM - all in the same data store. The obvious implications were that CRS metadata needed to be stored with the layer and that the software needed to support on the fly projection transformations.
Different layers could use different tile sizes. This allowed sparse data to be stored in a small number of large (in extent) tiles and dense data to be stored in smaller (in extent) tiles
Any size geographic area could be tiled. This meant that if the user were a City, a tile structure for just the city could be identified. For other users, such as the US Navy, the tiling scheme could tessellate the entire globe.
Topology and feature construction across tile boundaries needed to be supported.
Break geometries across tile boundaries but do so with no special interaction from the user.
Tiles are invisible to user (unless they wish to “play” with them)
Features/geometries crossing (and hence broken) across tiles appear as single feature/geometry when visualized or used in analysis by users or applications.
For vector data, the data had to be stored as topology (the node, edge, face model was used)
A single vector tile could contain point, line, and polygon features based on the edge/node geometry that was topologically structured.
Had to work across the meridian.
The original DeltaMap (GenaMap) file structure is provided in the attached file. This structure evolved in 1985 and 1986 to accommodate additional requirements such as symbology and portrayal independence. The structure as documented is for topologically structured vector geometry and features within a given tile. The document does not detail the tiling structure although edge nodes are a critical element required to construct features across tiles and so forth. The attached document also does not address indexing. Notice how heavily linked the various files are. Very much like linked data today - but very, very fast.
A consistent naming and file structure system was used for the tiled structure. When a map layer was first created in a GenaMap "Project", all tiles and related files were created on disk. Initially, other than key metadata, all of the files in a tiled map layer were empty. See creation of a mao layer in a project (below)
NOTE: Tile pyramids were not used. There was no need for tile pyramids as visualization was not the driving use case. Performance requirements for query, retrieval, update, and analysis has t also be considered. Not having tile pyramids simplified the structure and its implementation. (See LoD discussion below) The file naming convention was simple and was computed based on the “map” name, such as parcels, and the centroid coordinate of a tile.
Within each individual tile in a map layer, the following files existed (from my memory store!):
ZF01 – Header (the metadata)
ZF02 – Node file
ZF03 – Feature file/index. Updated any time a point, line, or polygon feature was created. Could be 2d or 3d.
ZF04 – Edge File with coordinates.
ZF05 – Node file for vector data.
ZF06 – Chain file (list of edge IDs for each polygon feature). Created by an polygon verification and formation app
ZF07 - Discrete raster data (satellite image for example)
ZF08 - Continuous raster data (DEM for example)
ZF09 – Edge Index/MBR file
ZF10 – Text/labels
One or more of these files could remain empty based on the data type. For example, if discrete raster data were added into tiles in a map layer for classified satellite imagery then the ZF02, ZF03, ZF04, ZF05, ZF06, and ZF09 files would always remain empty. When actual data were added into a tile, for vector data each feature, node, and edge had a unique ID within the tile. This allowed for direct pointers between records in the files. Having a consistent naming and file structure really simplified creating and using file names for specific tiles. NOTE: An interesting issue at the time. The slowest part of any query or display workflow was opening the files! So, one of our engineers generated an assembly package for opening and caching initial content from a tile. This significantly improved performance. Probably not an issue today but sure was back in the 1980s and 1990s!
Below is a diagram of what a very simple GenaMap project could look like. The Roads layer has two tiles and the Parcels layer has 4 tiles. Each tile had the 10 ZF files.
The first three levels are actually components of a directory tree. So, the GenaMap Project would have two sub-directories, one for Roads and one for Parcels. The Roads sub-directory would in our example have two sub-directories, one for each tile.
To ensure excellent query, retrieval, and display performance, early decision in the design and development of the GenaMap was the need for a variety of search indices. The following describes the various indeices and how they were used.
This was a simple binary tree index containing all the map names and some related metadata for all maps accessible in a given GenaMap Project. Anytime a user types in a map name (e.g. ROADS), this map name index was searched for the name. If the name were found, then key metadata were returned to the server software (and user if requested).
For performance, each map (collection of tiles) required a spatial index. After some research, R-trees were selected. Bruce Blackwell (now at Oracle) designed and wrote the initial GenaMap R-Tree code. The initial code worked for small number of tiles (less than a few hundred) but suffered from performance issues when hundreds of tiles were required to cover the geographic extant of the map. This performance issue was easily fixed by using memory caching for the first few layers in the index. As far as I know, this was the first use of R-trees in a GIS (1986).
Anytime a user issued a spatial query against a map (feature data around a point, within an area of interest, etc) or wished to perform an edit function the R-Tree index was accessed. A key aspect of GenaMap was that only map data tiles in the current geographic window as specified by the user was accessed and processed. This constant spatial query (filter) requirement required a very high speed indexing mechanism. R-Tree fulfilled the requirement.
Once a tile was identified for further processing (display, analysis, edit, etc), then index files within the individual tile were used to ensure high speed access to the geometry and/or features of interest.
Edge Index: Each tile had a edge index. The edge index contained a sorted list of of the minimum bounding boxes and a physical link to the edge record for every edge in a tile. As edge geometries were added, deleted, or updated to the tile, the edge index was updated. The edge index allowed for very rapid discovery of an individual edge in a tile. Having an edge index was critical for real time snapping while digitizing or editing, ETL operations, and queries.
Feature index: Each tile had a feature index for the vector features stored in that tile. For each feature, MBRs were computed and added, modified, or deleted for a feature record as features were created, updated, or deleted.
Node index: The node file in a tile was automatically indexed whenever a node was added, deleted, or updated. This was an internal file indexing and did not require another external index.
When used in combination, these indices allowed GenaMap to support multi-user real time edits, updates, and a variety of highly performant queries. The indexes also allowed for huge performance improvements for analytics because a set of candidate features could quickly be identified before any physical data had to be accessed and processed.
One topic that comes up in vector tiles discussions is level of detail (LoD). Previously, I made the statement that GenaMap did not support levels of detail in the sense of how LoDs and tile pyramids are discussed and implemented in today's geospatial products.
LoDs were never really an issue for many GenaMap users simply because their total data volumes for an entire data store (Project) might be only one or two million features. That said, we had customers, such as the DoD, that did require multiple LoD.
This was easily accomplished by simply defining "maps" at scales that were even multiples of each other. The geographic extant of the project area did not need to be identical. For example, a map of a rural area might use tiles of 1:24000 for the project area but for urban areas a map with a different geographic extant might use 1:12000 tiles. The simple rule is that four of the 1:12000 tiles need to "fit" inside one 1:24000 tile. There was no limit on the number of map scales and tiling schemes used. Essentially, for example, the OGC CDB standard concept of LoD could have been implemented in a GenaMap data store - even though LoDs were not part of the GenaMap spatial database conceptual model.
The software did not care. Everything worked them same. The only difference was the ability to manage how much physical data was stored in each tile.
GenaMap implemented a special type of node called an "edge node". This concept came from the Analytical Mapping System (AMS). Edge nodes were nodes created where an edge crossed a tile boundary. These nodes were automatically created as data were entered/loaded into a tiled map. Entry could be by digitizing, entry of survey information, or by ETL (e.g., importing data from an Intergraph system). The key is that these special nodes were created automatically with no user interaction. Further, the process by which GenaMap transformed data being entered into the data store ensured that there was no fuzzy creep. Finally, when an edge crossed a tile boundary, an edge node record was created in both tiles (or 4 tiles the corner case where an edge crossed a tile boundary exactly in a corner!). Have the identical edge node in both tiles made traversal from one tile to the next a trivial operation. This capability was critical for "building" and returning features that were contained in multiple tiles. Please note that for most symbolization/visualization functions, building features that crossed multiple tiles was not necessary. This ensured that the majority of visualization processes were very fast and efficient.
By way of summary and additional design considerations, an edge node had additional information that “told” the software that the geometry/feature continued in an adjacent tile. There other special considerations for edge nodes:
During topology validation and polygon formation, when an edge node was encountered the software knew to follow the edge of the tile in a clockwise (or counter-clockwise for islands) order until the next edge node was encountered. There was not the need for the adjacent tile to have been entered into the data store.
When the adjacent tile was being populated, then attribute consistency was checked across tiles. This was critical for features, such as a parcel or a forestry polygon that crossed multiple tiles.
Special cases of an edge node crossing through the corner of a tile.
Special case in which the entire tile was contained by a feature (possible for lakes, oceans, etc).
Special case when an edge node was created because a geometry ended right on the edge of a tile and did not continue into another tile.
A requirement was that any coordinate stored in a GenaMap data store would never loose precision – no creep or changing of digits even to the 5th decimal place (a requirement in Australia for survey data!).
So, the following algorithm was used for coordinate storage:
Int_X=(X_coordinateValue – XTileCenterCoordinate)*ScaleFactor
Int_Y=(Y_coordinateValue – YTileCenterCoordinate)*ScaleFactor
If the Int_X and Int_Y values were in the 1 to 32767 range, then a (x,y) pair could be stored in one 32 bit word. Otherwise, 2 words of storage were used. For coordinates, reals (floats) and doubles were never stored. In the calculation above, subtracting the center value of the tile reduced the coordinate values from values in the millions or hundreds of thousands to tens of thousands to thousands. The scale factor is then used to create an integer number with the proper precision. The scale factor is an integer number that is a multiple of ten (10, 100, 1000 etc). So, to keep three decimal places, the scale factor would be 1000. The reverse calculation returned the original coordinate. These simple conversions reduced the size of the coordinates in a tile – often by an order of magnitude. This approach also reduced the size of the “stream” that needed to be transmitted to a “smart” client once the web became available. The conversions could happen in the client or on the server.
An added benefit of the above approach is that a variety of internal operations could be implemented using integer arithmetic which is computationally much faster than floating operations (at least back then!)
Originally, the design of the GenaMap spatial database included "slots" for storing a symbology pointer. The "pointer" was the address of symbology information stored in a symbology definition table. However, we quickly discovered that this approach was quite limiting. GenaMap software quickly evolved into providing a very flexible symbology capability for presentation/visualization of map content stored in the GenaMap spatial data.
The first step in the evolution was to capability to define and maintain "bundle tables". The concept for bundle tables came from the Graphics Kernal System (GKS) although in GenaMap we abstracted the table to a hardware independent set of symbology metadata. Each entry in a GenaMap bundle table had 1 or more entries. Each entry had a unique ID along with attributes defining color, thickness, area fill, line pattern and so on. Any given GenaMap project could have one to many bundle tables. A bundle table entry could also specify rules, such as how to style a feature based on its attributes, min and max scales, and so on.
Symbology could be assigned by inserting a symbology ID into the feature record (static) or at run time based on user specified criteria or other symbolization rules (dynamic). Setting symbology IDs in the feature records was easily accomplished by running a set of rules. A rule could be "for map Census any polygon features with a population greater than 5000 shade using symbology 22". Symbology 22 could be a red fill.
At run time, a user could specify which bundle table they wishes to use. This capability allowed, for example, the utilities department to symbolize utility maps according to their mapping requirements while planning could symbolize the same data completely differently. No need to have multiple versions of the same data - what CAD systems did at the time.
Any GenaMap user could create a map. However, projects were typically created by an administrator. The map creation tool asked some basic questions that were used to create the vector/raster tile structure for a map:
The user was asked if they wished to use an existing map as a template. If not, the following questions were asked.
Select the CRS for a map. A list of CRSs was provided. The user selected the CRS for the map. All tiles in a map would use the same CRS. However, different maps in project could use different CRSs. This question also included selection of the units of measure (feet, meters, decimal degrees, etc). Units of measure were necessary metadata for many operations, including specifying tile size.
Given the CRS, the user was then asked for the geographic extent of the Map. The extent could be very small (a town) to the globe. Note: Different maps in a GenaMap project could have different geographic extents.
The user was then asked for the tile size in x and in y. Easy to do for most organizations as they had standard map series anyway and wished to continue using the same map series concept and scales. For example, many cities mapped at 1:2000 and/or 1:5000 scales.The tile extent was specified in the maps units of measure (see above).
The user was provided a list of data types and had to select one for the map being created. Allowed data types were discrete raster, continuous raster, 2d vector, 3d vector, and annotation.
For vector data, the user was asked for the coordinate scale factor. More on this later.
The user was asked for other pieces of metadata. An interesting piece of metadata was the min-scale to max-scale at which the data would be displayed. This metadata rule allowed the user to control at what scales the map would be displayed.
The above information was used to create a map entry in the Project and then to generate a set of empty tiles for the map. The project map names file (a searchable file of map metadata) was updated with the new map information. For each tile, a set of empty files was created. The exception was the header file for a given tile. Each header file for a tile contained the same metadata as the entire map as well as additional metadata, such as date/time of last update for the tile and the name of the person who updated the tile.
Requirements and various approaches reviewed and discussed from 1983 until 1985. Some of the research and operational experience was gathered from:
Wetlands Analytical Mapping System (WAMS). Used a concept called GeoUnits – regular sized tiles initially based on USGS 1:24000. A number of concepts, such as “edge nodes” and real time snapping, came from WAMS operational experience. WAMS was used to digitize (analytical stereo plotters as well as table digitizers) wetlands for the entire US at 1:24000.
Mark 90 (DMA Modernization) – Topology studies, attribute consistency studies, high speed indexing of feature data, and so forth.
UK MilSurvey "Military Survey: Data Structures Study - A Discussion Paper". 1984 Document on a file structure for storing, managing, and using topologically structured data. I have a copy of this document - maybe the only one in existence!
USGS Study on evolving the Map Overlay and Statistical System from a full polygon structure to a topological data structure. MOSS also used a primitive vector/raster tile structure.
Computer Graphics Programming: GKS — The Graphics Standard. Page 21. (1987) Springer
Knowledge gained from the above coupled with ongoing experiments allowed the GenaMap team to design and then implement a highly performant, integrated vector and raster tile structure.