Multiple DSERG participants (DCEG Data Science and Research Group) are away at AACR 2025 April 25-30. You are welcome to pursue any topic you want in this C4B session, potentially with a structured follow up at the end of the week, 11:00-12:00 FAIR Fridays.
https://github.com/epiverse/embedStats
Lets use the AACR pause to finalize data structures for embedded spaces.
Resuming conversation:
I am claiming to have written the first ever TSV flattener for MASON files :-D, here's an example:
tsv = await (await import('https://epiverse.github.io/embedStats/embedStats.mjs')).mason2tsv(['id','journalTitle','year','keywords']) you can save the file to test it, say, in excel, by doing
await (await import('https://epiverse.github.io/embedStats/embedStats.mjs')).saveFile(tsv,'firstFlatMason.tsv')
see code at https://github.com/epiverse/embedStats/blob/main/embedStats.mjs#L38
docs = await (await fetch('https://raw.githubusercontent.com/epiverse/pubCloud/refs/heads/main/dceg_publications_title-abstract_gemini.json')).json()
https://observablehq.com/@jonasalmeida/embeddstats
...
...
docs.map(d=>d.embeddings[0].vector)
...