Hybrid approach (proposed by Geoffrey)
Data spec:
Client-side (Desktop BAGIS): Follows BAGIS V2 file spec, i.e., ESRI File Geodatabase for both raster and vector layers
Server-side (eBAGIS): Version-supported shapefiles for vectors and ERDAS .img files for rasters
Uploader:
User Interface
Will be developed as a tool in BAGIS Tools toolbar (esriaddin).
Add-in needs to manage the file transfer session and "calls" the server script (Python?) to convert the files and put them into the repository.
Packaging - zip (whole AOI vs. updated layers based on version control rules, see version control below)
Transferring - ftp, ArcGIS, Python, .NET (?)
Unpackaging - unzip (based on version control rules, see version control below)
Downloader:
User Interface
Will be developed as a tool in BAGIS Tools toolbar and as a web app.
Add-in calls the server script to prepare the download files.
Packaging - zip (whole AOI, individual layer download is not allowed)
Transferring - ftp, ArcGIS, Python, .NET (?)
Unpackaging - unzip (overwrite any existing file on the client computer)
Version control:
No additional metadata is created for versioning purpose
The versioning logic is based on the layer archiving protocol and the date/time discrepancies between the client-side and server-side data
Layer Archiving Protocol: There are 5 types of archiving rules:
Not managed in eBAGIS - these data will not be uploaded to eBAGIS
Cannot be altered - duplicate data will prevent AOI from being uploaded
No archiving - duplicate data will be overwritten
Individual archiving - duplicate data will be archived individually
Group archiving - the whole group of data will be archived if any of its components is updated
Date/Time Discrepancy: When archiving is needed and there is a date/time discrepancy between the client and server layer:
Date/Time are the created (or modified?) dates of the .shp and .img files on the server, and the created (or modified?) dates of the vectors and rasters in the FGDB on the client.
ArcGIS .NET API methods for accessing date file modified
IGxObjectProperties.GetProperty("ESRI_GxObject_FileTime") returns a POSIX date which is seconds since 01/01/1970 00:00:00, not including leap years (link) (link). Lesley could not find a way to use IGXObjectProperties in a non-interactive mode. This is an ArcCatalog API.
IDatasetNameFileStat.get_StatTime(esriDatasetFileStatTimeMode.esriDatasetFileStatTimeLastModification). Same return format as IGxObjectProperties
But will these be accurate? Our assumption as of Dec 2014 is that the File Geodatabases will be created on the server side and sent to the client in a zipped file. These files should maintain their original modify date when they are unzipped. Geoffrey suggests also supplying an XML file with each file name and modify date within the geodatabases. I completely agree, the metadata likely will not be propagated to the client correctly, so simply comparing the modified date is not a good solution
Does Python have access to either of these interfaces? Lesley doesn't think so Python can access arcobjects directly using the comtypes package (though I have not done this)
Update January 2, 2015: ArcGIS does not store any date details for raster datasets in a FGDB :-(
Lesley was able to get a sample going with IDatasetNameFileStat for feature classes. The good news is, it is very fast. The bad news is that it seems to "touch" the parent FGDB so that date is always bumped forward to the most recent date the feature class modify dates were checked. The "touch" is fine because we will not use the FGDB folder date/time for checking archiving condition.
Storing a modified date in the metadata
The metadata is embedded with the data layer so it can't get separated. We need to determine what actions in PC-BAGIS will result in an update that needs to be communicated to central database. Could easily miss types of changes that should be communicated So you are worried users could do something through BAGIS that would modify files, but we wouldn't expect to modify files?
Metadata is awkward to access with .NET API.
Lesley not finding a straightforward API for Python to read metadata either Again, this should be possible using comtypes in python
Maintaining an external file specifying layers that have changed
Hit-or-miss to code actions in PC-BAGIS will result in an update that needs to be communicated to central database. Same as metadata issues
The external file could easily be tampered with or lost on the remote PC This is a valid point: it would be bad to have a file managed solely by BAGIS that cannot be reproduced in case it is lost, rendering any changes a user may have made impossible to upload to the repo on the server. I don't think having an external file(s) to be bad if they can be generated at any point and compared to the current status of the AOI on the user's machine (like a diff).
Advantage that the uploader only has to load file, does not have to make the list for itself
Data Management tools
Use Feature Compare and Raster Compare to compare each file when an AOI is uploaded to the central database. There are Python examples of using these tools. This is the least error-prone method because we aren't relying on PC-BAGIS to set a flag. Will this be too slow? You are correct: this is probably the most robust method. However, I think it would be too slow, as every file would have to be uploaded, changed or otherwise, and then each would have to be compared, likely a lengthy process using the arc functions.
How do we deal with layer adds/deletes? Perhaps the upload tool could check both sides and ask the user to confirm any adds/deletes? But we cannot ask the user if they want to keep their changes on a modified layer, as the upload/comparison process would be lengthy, and the compare would be server-side. This seem like bad UI to me.
Use Git to track the changes
I will throw out my idea again, if we are saying BAGIS is not capable of tracking changes to the AOI files. My solution would be to implement a git-like approach to tracking changes (please see http://stackoverflow.com/questions/15765366/how-does-git-track-file-changes-internally for an explanation). We could see exactly what files have been changed, removed, or added. The key problem with this approach is knowing what the name of the layer is in a geodatabase, give that the layer files within the geodatabase folder structure have names like a0000000c.gdbtable. This can be looked up in the a00000001.gdbtable file in the database, where the row number in that file in hex is equal to the number at the end of the filename: a0000000c is thus row 12 in the a00000001 table.
Lesley is correct that not going through the geodatabase API is a bit dangerous, as ESRI could change the file format at any time. However, I could also argue that changes to the API are just as likely, given that we are talking about ESRI. I have been digging through documentation to see if I can find some object property somewhere that would provide the actual file name in the folder structure of the geodatabase, so the lookup could use a more official means, but I am unfamiliar with arcobjects, so it is slow going.
I will also point out that if we were to switch from gdb format to shp and img files within the AOI, as Lesley suggested as an option, a git-like approach to file tracking would not have to do any strange lookups: the layer name is just the shapefile's name. Additionally, we would not have to change the file from gdb format to shapefile format on the client side before upload, which I think is a significant advantage. Overall, while I don't like the shapefile format, I think it would be a much better solution to switch. The only downside I can see is that it is an aging format, and that file sizes will increase, but only on the client side, and the data transmitted between client and server would only increase on downloads.
(Red comments are from Jarrett)
More information about the ESRI/BAGIS Metadata XML tag excerpted from here:
The XPath to the section where it will be stored is: /metadata/dataIdInfo/searchKeys/keyword
The tag is displayed in the "Tags" section of the ArcMap Item Description if the default Item Description template is used
The format for the tag is BAGIS Tag < Please do not modify: ZUnitCategory|Slope; ZUnit|% Slope; > End Tag. Additional elements may be added to the tag separated by a semi-colon. Name|value pairs are associated with the pipe "|" symbol
Sample Item Description from ArcMap
Actual XML for Esri and dataIdInfo nodes in metadata for a raster in a FGDB
<metadata xml:lang="en">
<Esri>
<CreaDate>20140524</CreaDate>
<CreaTime>10465400</CreaTime>
<ArcGISFormat>1.0</ArcGISFormat>
<SyncOnce>TRUE</SyncOnce>
<ModDate>20150109</ModDate>
<ModTime>10181600</ModTime>
<ArcGISProfile>ItemDescription</ArcGISProfile>
</Esri>
<dataIdInfo>
<searchKeys>
<keyword>BAGIS Tag < Please do not modify: ZUnitCategory|Slope; ZUnit|% Slope; > End Tag</keyword>
</searchKeys>
<idPurp>This is a summary</idPurp>
</dataIdInfo>
</metadata>