Language Models for Interpreting Embeddings
DEMYSTIFYING EMBEDDING SPACES USING LARGE LANGUAGE MODELS Guy Tennenholtz∗ , Yinlam Chow, Chih-Wei Hsu, Jihwan Jeong, Lior Shani, Azamat Tulepbergenov, Deepak Ramachandran, Martin Mladenov, Craig Boutilier Google Research
This research introduces the Embedding Language Model (ELM), a novel framework that enables large language models (LLMs) to interpret and interact directly with abstract embedding vectors. Unlike traditional methods requiring visualization or specialized tools, ELM translates these complex numerical representations into understandable narratives. The paper details ELM's architecture, which uses adapter layers to integrate domain embeddings into an LLM's token space, allowing natural language queries of embedded data. Evaluated on movie and user preference datasets, ELM demonstrates high semantic and behavioral consistency, even for hypothetical entities, outperforming text-only LLMs in generating meaningful descriptions and extrapolations from embedding spaces.
Raw Data vs LLM Embeddings in Medical ML
When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?
Yanjun Gao1,2, Skatje Myers2 , Shan Chen3,5, Dmitriy Dligach4 , Timothy A Miller5 , Danielle Bitterman3,6 , Matthew Churpek2 , Majid Afshar2 1University of Colorado, 2University of Wisconsin Madison, 3Mass General Brigham, Harvard Medical School, 4Loyola University Chicago, 5Boston Children Hospital, Harvard Medical School, 6Dana-Farber Cancer Institute
This research explores how effectively Large Language Models (LLMs) can represent numerical patient data, such as vital signs and lab results, for medical machine learning tasks. The study investigates whether converting Electronic Health Record (EHR) data into text and then generating LLM embeddings can be a viable alternative to using raw numerical data in traditional machine learning algorithms. While LLM embeddings show promising, competitive results in some medical prediction scenarios, the findings generally indicate that raw data features still prevail, offering more robust and accurate predictions for tasks like diagnosing clinical deterioration or predicting patient mortality and length of hospital stay. The paper also delves into factors influencing LLM performance, such as different data formats and embedding extraction methods, while highlighting the need for further methodological advancements to fully leverage LLMs in medical applications.
...
...
why tensors? ...
characterize embedding spaces using capacity dimension ...
coding with snippets ...