2025-04-16 APR

Embedded spaces
Interoperable tooling

Today's session will primarily be an hackathon comparing approaches to create and operate embedded spaces. There are (ate least) 4 models being presented:

Jonas

Approach

The tensorFlow projector model consists of a config json structure, typically a gist file, that brings together a large tensor file with a medium size metadata file.

Live

https://observablehq.com/@almeidajonas/pubcloud

examples

Wiki

https://github.com/epiverse/pubCloud/wiki

Discussion

In this model, multiple combinations of the same tensor and annotation files can be aggregated by very small JSON structures.
It also decouples analytical tooling for the embedded space from the annotations.
tsv vs json - JSON's advantages are obvious, but tsv not without merit, the most important, maybe, of offering the smallest volume to compress.
As illustrated by https://observablehq.com/@almeidajonas/pubcloud, there is also the option to have object of arrays vs array of objects.

Praful

Imaging/Multimodal data

Need a way to identify data modality, along with any structured "tiling" or "portioning" needed to embed large data.
(URL) References to the original data if possible.
(URL) Reference to the encoder model used if possible. (Maybe the script used to encode it too, if on-device?)
Should the dimensionality reduced version be a separate embedding? Yes according to me.

Lee

...

Inês

...

Report abuse