Today's session will primarily be an hackathon comparing approaches to create and operate embedded spaces. There are (ate least) 4 models being presented:
Approach
The tensorFlow projector model consists of a config json structure, typically a gist file, that brings together a large tensor file with a medium size metadata file.
Live
https://observablehq.com/@almeidajonas/pubcloud
examples
Wiki
In this model, multiple combinations of the same tensor and annotation files can be aggregated by very small JSON structures.
It also decouples analytical tooling for the embedded space from the annotations.
tsv vs json - JSON's advantages are obvious, but tsv not without merit, the most important, maybe, of offering the smallest volume to compress.
As illustrated by https://observablehq.com/@almeidajonas/pubcloud, there is also the option to have object of arrays vs array of objects.
Need a way to identify data modality, along with any structured "tiling" or "portioning" needed to embed large data.
(URL) References to the original data if possible.
(URL) Reference to the encoder model used if possible. (Maybe the script used to encode it too, if on-device?)
Should the dimensionality reduced version be a separate embedding? Yes according to me.
...
...
...