Abstracts.
Arya Ketabchi Haghighat and Evelyn Kim
From Micro-Actions to Macro-Understanding: A Language Model Approach to Task Recognition
We propose a fully automated framework that models user micro-actions within software interfaces to infer ongoing high-level tasks, enabling machines to better understand user intent and provide context-aware assistance. The approach employs stacked Large Language Model (LLM) agents that reason collaboratively over low-level interaction sequences. With Retrieval-Augmented Generation (RAG), the system adapts to new environments with minimal engineering, while an LLM-based data generator supports scalable fine-tuning from textual instructions. Achieving 92.1% task classification accuracy and 93.2% recall, the framework demonstrates robust task understanding that enhances human–AI coordination. This work bridges LLM advancements with system-level interaction modeling, illustrating how combining LLMs, RAG, and interface-level reasoning enables more adaptive and synergistic human–machine collaboration.
Steve Russell
Un-Grounded Agency and the Stationarity Trap in Agentic Workflows
Modern agentic frameworks operate under the fallacy of a Stationarity Trap: the dangerous assumption that meaning in underlying knowledge bases is static and user contexts are invariant. In this talk, evidence is presented from two distinct research streams to demonstrate that this assumption guarantees system failure. Applying our EchoCodes framework, we prove that “ground truth” content (e.g. data such as medical ontologies, intelligence signals, and software APIs) can undergo rate-independent semantic drift; and this can be detected before causing agents to execute employing meanings that have shifted. In the second dimension, as part of our work on the AiVisor system, we illustrate a “Personalization Paradox,” where agents optimizing for user utility incur statistically significant penalties in semantic consistency, effectively decoupling reasoning from lexically grounded baselines. Together, these findings define a non-stationary agent tradespace, where contextual drift (user alignment) and semantic dynamics (meaning discontinuity) converge to produce un-grounded agency. The convergence of these two phenomena, shifting ontological/ground-truth concepts and expanding contextual liabilities, creates a state of semantic insolvency. When such insolvent systems are granted the power to act, the result is un-grounded agency: high-confidence execution untethered from reality. Consequently, agentic deployments risk devolving into blind autonomy. We conclude by proposing a novel Semantic Guidance System to resolve the Stationarity Trap tradespace, moving the field beyond static evaluation toward dynamic semantic grounding for long-lifecycle agents.
Ram D. Sriram
AI Metrology: Ensuring the Reliability of Artificial Intelligence
Currently, we are witnessing the “Age of AI,” where artificial intelligence (AI) is playing a major role in every aspect of our lives. As AI permeates our daily lives, the need for rigorous measurement and evaluation of AI systems becomes increasingly imperative. This talk delves into the emerging field of AI metrology, which aims to establish standardized methods for assessing the accuracy, reliability, and trustworthiness of AI models. We explore various perspectives on AI and discuss the challenges and opportunities associated with quantifying its effectiveness. By examining specific applications in healthcare and manufacturing, we highlight how AI can enhance metrology practices and contribute to more reliable and trustworthy AI-driven solutions. Key topics include uncertainty quantification, neuro-symbolic computing, and applications of Al for metrology in health care.
Krishna R. Pattipati, Rajat Rai, Christopher Norton, Jordan Thurston, Deepak Haste, Sudipto Ghoshal, Somnath Deb, and William Lawless
Human–AI Collaboration for Digital Twins: Automating Knowledge Acquisition and SHM
Across the Lifecycle
This talk presents a human–AI collaborative framework that uses specialized LLM-driven agents to automatically extract, structure, and validate engineering knowledge into RAAML-compliant digital twins for diagnostics, prognostics, and maintenance training. By unifying knowledge acquisition, real-time inference, guided troubleshooting, and AI-enabled training, the approach enables continuously learning digital twins that improve SHM performance, serviceability, and workforce proficiency.
H T Goranson, Beth Cardier, and Matt Garcia
Categorical Type Systems for Salient Influence
We assume a category theoretical approach to reasoning that is separate from but integrates with conventional logical systems and linear algebraic techniques such as employed by large language models, and reinforcement learning. The agenda is motivated by the ability to reason topologically in an expanded vocabulary of types not motivated by conventional ontological needs. One high value application is reasoning over open world situations and causally dynamic systems where essential facts are missing, situated influence is substantial, and significant implicit influence is present. We are particularly looking at dynamic, unstable, possible outcomes, and situations where outcome engineering is desired. The program is motivated by collecting open influences as salients, and focusing on type systems to support the concept that can have topological induction. The focus of this paper is on the nature of a type system where salience is a first class citizen and category theoretical topology centric induction is enabled. While the research is not motivated by a specific use case, we focus on central nervous system modelling where fear memory extinction is modelled, so that we can leverage prior work. But applications are expected to be rather broad, characterised by open world insights and frangible possibilities. Our group builds systems in Haskell frameworks so our a type system needs to be consistent with what can be supported programmatically. Because our group is particularly interested in human/machine navigation of structures, a consistent visual grammar sensitive to these type definitions is desirable. This paper will include a brief survey of the literature with an emphasis on situation theory as a framework for sale and influence. Potential sensor technologies are considered.
Mito Akiyoshi
The Hidden Black-Box: Expert Labor and the Case for Participatory AI
Maintaining human oversight and broad stakeholder participation remain key challenges for trust management in human-AI teams. With the increasing deployment of agentic AI and large language models, the risk of unpredictable yet highly consequential outcomes could, if left unaddressed, undermine public and expert confidence in such teams. This study revisits that debate to articulate a framework that secures fairness, legitimacy, and trust. Drawing on theoretical and empirical analyses of citizen participation in fields adjacent or parallel to AI, it shows that the weaknesses of black-box AI have analogous counterparts in the isomorphic processes of black-box value elicitation and implementation decisions. Specifically, this study focuses on circumstances in which experts operate outside their domain of expertise to develop ad-hoc theories of reality. Examples of such “inexpert expert judgments” include situations where judges must evaluate complex statistical evidence in court, and where algorithm specialists must select features to be included in models based not on relevance to the goal but on availability or intuition. In cases like that, even something as trivial as including or excluding an interaction term in a regression-based model could lead to highly problematic outcomes. Bypassing domain specialist expertise and substituting it with stand-in knowledge is unnecessary when such expertise could be incorporated with modest effort.
Michael Mylrea and Brian Singer
Aurora SOCBench: A HITL-Aware Benchmark for Evaluating LLM Autonomy, Uncertainty, and Robustness in Cyber Defense
AI systems are rapidly becoming central to Security Operations Center (SOC) operations, yet organizations lack rigorous frameworks to evaluate when these systems can act autonomously versus when human oversight is required. Without empirical methods to assess LLM performance, uncertainty, and robustness in adversarial environments, AI adoption remains constrained by reliability concerns and the absence of safety standards. This paper introduces Aurora SOCBench, the first Human-in-the-Loop (HITL) aware benchmark for evaluating LLM autonomy in cyber defense. Aurora SOCBench integrates three foundational research streams: (1) entropy-based AI trust and uncertainty quantification frameworks for measuring predictable autonomy, (2) human–machine teaming theory emphasizing adaptive autonomy under uncertainty, and (3) empirical demonstrations of LLM brittleness in adversarial cyber operations. Through controlled experiments across multiple state-of-the-art LLMs and realistic SOC tasks, Aurora SOCBench establishes five empirically-derived HITL levels (L0–L4) that quantify autonomy boundaries based on predictive entropy, reasoning fidelity, adversarial robustness, and output variance. Our evaluation reveals that LLM autonomy capabilities vary dramatically by task complexity: high-volume triage tasks achieve safe automation (L0–L1), while multi-cloud identity correlation and supply-chain compromise workflows require human decision authority (L3–L4). Adversarial stress testing exposes significant fragility, with reasoning fidelity degrading substantially under log obfuscation and conflicting intelligence. Aurora SOCBench provides the first reproducible, empirically grounded methodology for determining when LLMs can safely operate within SOC workflows and when human judgment remains critical, enabling organizations to deploy AI in cyber defense with quantifiable safety guarantees.
Ali K. Raz
Systems Engineering Foundations for Accelerating AI/ML Transition in Complex Systems
Artificial Intelligence and Machine Learning (AL/ML) are rapidly transforming the technological landscape and introducing unprecedented capabilities from Large Language/Vision Models, image recognition, Reinforcement Learning, and automated situational awareness. Underlying these AI/ML advancements are the Deep Neural Networks (DNNs) that provide the means to accomplish these capabilities. Despite the rapid development, advancements, and capability demonstrations, DNNs are fraught with serious challenges and impediments when it comes to their application in practical and safety-critical systems. This challenge is further exacerbated with Agentic AI and Human in the loop. These shortcomings have led to regulatory organizations barring the AI/ML applications in safety critical systems (for example, the United States Federal Aviation Administration has banned the use of DNNs in flight systems) or companies facing embarrassing outcomes in real word testing (for example, the image recognition system of Cruise self-driving car failing to recognize a city of San Francisco bus or reported bias in many trained AI algorithms). Transforming technological advancements into rigorous, reliable, and resilient engineered systems with humans as stakeholders falls under the purview of Systems Engineering. The challenges for AI/ML adoption in complex systems can be viewed from two dimensions: first, are the unique challenges introduced by the opaque/black-box nature of DNNs, lack of edge cases included in the training and validation data sets etc., and second, are the challenges that arise from AI/ML integration as an integrated element of a complex systems that must interface, interact, and inter-operate with other technological components of a system and humans. This presentation will discuss the AI/ML challenges from a SE perspective and the state-of-the-art academic research (for example explainable AI, counterfactual reasoning, and rigorous systems engineering based test and evaluation) for accelerating AI/ML adoption in practical and safety-critical systems.
Bios.
Dr. Stephen Russell is a Professor in the Department of Intelligent Systems and Robotics at the University of West Florida. Dr. Russell received a B.Sc. in Computer Science and M.S. and Ph.D. degrees in Information Systems from University of Maryland. Before joining UWF, Dr. Russell was the Chief Data Scientist and Director of the Data Science Department for Jackson Health Systems. Prior to Jackson Health Systems, Dr. Russell was the Information Sciences Division Chief at the Army Research Laboratory (ARL). In his time at ARL amongst other efforts, Dr. Russell conceived, established, and led the Army’s Internet of Battlefield Things research, which was focused on multiple challenges incorporating AI and IoT concepts and capabilities for warfighters in battlefield environments. Before working at ARL, he was a Section Head with the US Naval Research Laboratory. Prior to his government service, Dr. Russell was faculty with the George Washington University where he received notable accolades such as a National Academies of Science Fellowship and multiple research grant awards. He holds two patents and has published multiple books and numerous papers in his primary research areas of decision support and intelligent systems, AI, and machine learning. Dr. Russell has also been a serial entrepreneur, owning companies that have specialized in software engineering, information resource management services, and telecommunications equipment manufacturing.
Dr. Ted Goranson previously (from 1971) managed research in the intelligence community with an extended stint at ARPA/DARPA on enterprise integration. His focus areas are modelling to support reasoning over unknowns and unexpected futures. Foundations are applied category theory, modern situation theory, and intuitionistic narrative dynamics. He currently supports a mid-stage startup; Sirius-beta Labs, focused in the national security space, with a research position at George Mason University. The conference report addresses functorially enabled type systems for speculative courses of action as informed in the perception-to-CNS-to-ANS chain in PTSD.
Dr. Kristin E Schaefer-Lay is a Senior Engineer at the U.S. Army Combat Capability Development Command (DEVCOM) Army Research Laboratory (ARL). In this position, she is the Modeling & Simulation Live, Virtual, Constructive Toolkit team lead and proponent for the Research, Development, and Engineering Network (RDENet) Robot Operating Systems (ROS) Enclave. She has an extensive background in AI/ML for context-aware processing, autonomy integration, and data fusion processes. These programs include the Army’s Applied Robotics for Installations and Base Operations (ARIBO), Wingman Joint Capability Technology Demonstration (JCTD), and Army Soldier Operation Experiments for the Robotic Combat Vehicle program. She has a B.A. in Psychology from Susquehanna University and an M.S and PhD in Modeling & Simulation from the Institute for Simulation & Training, University of Central Florida. She completed the Carnegie Mellon University, Data Driven Leadership Program and is recognized in the World’s Top 2% Scientists in Information Sciences, and Top 1% in Artificial Intelligence.
Dr. Ram D. Sriram is currently the chief of the Software and Systems Division, Information Technology Laboratory, at the National Institute of Standards and Technology. Before joining the Software and Systems Division, Sriram was the leader of the Design and Process group in the Manufacturing Systems Integration Division, Manufacturing Engineering Laboratory, where he conducted research on standards for interoperability of computer-aided design systems. Prior to joining NIST, he was on the engineering faculty (1986-1994) at the Massachusetts Institute of Technology (MIT) and was instrumental in setting up the Intelligent Engineering Systems Laboratory. Sriram has co- authored or authored more than 300 publications, including several books. Sriram was a founding co-editor of the International Journal for AI in Engineering. Sriram received several awards including: an NSF’s Presidential Young Investigator Award (1989); ASME Design Automation Award (2011); ASME CIE Distinguished Service Award (2014); the Washington Academy of Sciences’ Distinguished Career in Engineering Sciences Award (2015); ASME CIE division’s Lifetime Achievement Award (2016); CMU CEE Lt. Col. Christopher Raible Distinguished Public Service Award (2018); IIT Madras Distinguished Alumni Award (2021), IEEE Reliability Soceity’s Lifetime Achievement Award (2023), and the 2024 Product Lifecycle Management Pioneers Award from IFIP TC5/WG5.1 for his groundbreaking work on computers and information modeling for design and manufacturing. Sriram is a Fellow of AAIA, ACM, AIBME, ASME, AAAS, IEEE, IET, INCOSE, SMA, SME, and Washington Academy of Sciences, and a Senior Member (life) AAAI. In 2023, Sriram was elected as an honorary member of the Institute of Industrial and Systems Engineers – the highest honor IISE grants to an individual of acknowledged professional eminence who is not a member of IISE. Sriram has a B.Tech. from IIT, Madras, India, and an M.S. and a Ph.D. from Carnegie Mellon University, Pittsburgh, USA.
Dr. Ali Raz is an Assistant Professor at George Mason University Systems Engineering and Operations Research department and an Assistant Director of Intelligent Systems and Integration at the C5I Center. Dr. Raz research and teaching interests are in understanding collaborative autonomy and developing system-of-systems engineering methodologies for integrating autonomous systems. Prior to joining Mason, he has worked with Purdue University, Naval Postgraduate School, John Hopkins University Applied Physics Laboratory (JHU-APL), the United States Missile Defense Agency, and Honeywell Aerospace. He holds a BSc. and MSc. in Electrical Engineering from Iowa State University, and a Ph.D. in Aeronautics and Astronautics from Purdue University. He is a senior member of the American Institute for Aeronautics and Astronautics (AIAA) and Institute of Electrical and Electronics Engineers (IEEE). At AIAA, he is the chair of Information Command and Control Systems technical committee, and at the International Council of Systems Engineering (INCOSE), he leads the AI Systems Working Group with 1000+ members.