Gabriel Rosenbaum
My research aims to better understand the biomedical causal reasoning abilities of AI language systems known as Large Language Models (LLMs). My work provides insight into the capabilities and inner workings of LLMs.
My research aims to better understand the biomedical causal reasoning abilities of AI language systems known as Large Language Models (LLMs). My work provides insight into the capabilities and inner workings of LLMs.
The understanding of the cause and effect (causal reasoning) between different medical concepts is crucial for many medical applications. Recently, there have been major developments in Artificial Intelligence systems for language applications. Researchers are wondering how specific generative AI systems known as Large Language Models, such as ChatGPT, can be used in a medical context. It has been shown that LLMs are able to display an understanding of causality in the context of medicine, but deeper understanding into how they relate certain concepts is lacking. One way of mapping the cause-effect relationships in a system is through that of the causal graph, a flowchart like structure where “nodes” represent medical concepts, and “edges” represent a link between concepts. There has been previous research in the generation or “discovery” of causal graphs with LLMs, however there are a few caveats. To the best of our knowledge, all prior work has only detailed LLMs discovering the links (edges) between a predefined set of concepts (nodes), but not both the links and the concepts themselves. We propose an algorithm for the full discovery of causal graphs—both edges and nodes included—using LLMs. We believe that our algorithm has the ability to map what we dub the internal causal graphs of LLMs, representations of how LLMs themselves relate medical concepts. The algorithm we utilize has two main stages. First, the graph is “expanded” by asking an LLM for concepts caused by and that cause a “root node,” the concept at the center of the graph. We repeat this process for each newly generated node any given number of times. Second, we perform a final sweep, iterating over all possible node pairs and querying GPT as to the existence of an edge between the two current nodes. We find that the produced graphs have consistently high accuracy, but that comprehensiveness is mediocre. In addition, we believe to observe a considerable influence of public knowledge on the graphs, as opposed to strictly medical knowledge. The graphs produced give us more insight into the medical capabilities and applications of LLMs.