While large language models (LLMs) have shown impressive capabilities across a wide range of domains, they still encounter significant challenges in reasoning tasks that require gathering evidence over multiple turns and drawing logical conclusions from this evidence. These challenges present significant obstacles for LLM chat user interfaces, which rely on multi-turn interactions to facilitate effective collaboration. This limitation leads to real-world issues; for example, service chatbots must gather necessary information from customers over multiple turns to diagnose and resolve problems effectively. Despite the multi-turn nature of many real-world LLM use cases, most existing benchmarks rely on carefully curated single-turn tests, which often blur the line between memorization and genuine reasoning. To address this, we introduce the Wason Inductive Logic Test (WILT), a simple yet challenging multi-turn reasoning benchmark designed to resist memorization. WILT is inspired by the Wason 2-4-6 task, where participants must infer a basic boolean function involving three variables (e.g., x < y < z) by proposing test cases (such as (2, 4, 6)). In WILT, each test starts from a clean slate, with only the initial instructions provided, preventing models from relying on pre-learned responses. Over several turns, models must interact with the environment by suggesting test cases to narrow the possible hypotheses and ultimately infer the hidden function based on the outcomes. Our findings reveal that LLMs struggle with this task, exhibiting distinct strengths and weaknesses: some are better at narrowing down the hypothesis space by proposing valuable test cases, while others are more adept at deducing the hidden function from observed cases. Despite these variations, the best-performing model achieves only 28% accuracy, highlighting a significant gap in LLM performance on complex multi-turn reasoning tasks.
When a predator chases its prey, a mind game ensues, requiring both predator and prey to predict what the other will do next. These elements of uncertainty and opponency are also seen in analyses of real-world tasks and games. For instance, one way to define an optimal solution of a non-cooperative game is to find the Nash equilibrium, a state in which each agent in a game has optimized its strategy given the strategies of others. The Regularized Nash Dynamics (R-NaD) algorithm guarantees that policies will converge to the Nash equilibrium, creating AIs that beat top human players in tasks with hidden information. Our research compares the performance of deep reinforcement learning agents trained with and without R-NaD in a simple hide-and-seek game, aim- ing to see how well the agents process unknowns in the environment. We then apply explainable AI (XAI) techniques to the trained model to examine the kinds of information that trained policies encode about opponent strategies. We find that policies trained with R-NaD outperform policies trained in regular self-play when there is hidden information. Furthermore, R-NaD policies use their opponent’s past positions to decide which actions to take, more so than regular self-play. These findings yield insights on how animals and artificial agents operate under spatial uncertainty.
https://link.springer.com/chapter/10.1007/978-3-031-71533-4_25
An introduction to neurorobotics that presents approaches and design principles for developing intelligent autonomous systems grounded in biology and neuroscience.
Neurorobotics is an interdisciplinary field that draws on artificial intelligence, cognitive sciences, computer science, engineering, psychology, neuroscience, and robotics. Because the brain is closely coupled to the body and situated in the environment, neurorobots—autonomous systems modeled after some aspect of the brain—offer a powerful tool for studying neural function and may also be a means for developing autonomous systems with intelligence that rivals that of biological organisms. This textbook introduces approaches and design principles for developing intelligent autonomous systems grounded in biology and neuroscience. It is written for anyone interested in learning about this topic and can be used in cognitive robotics courses for students in psychology, cognitive science, and computer science.
Neurorobotics covers the background and foundations of the field, with information on early neurorobots, relevant principles of neuroscience, learning rules and mechanisms, and reinforcement learning and prediction; neurorobot design principles grounded in neuroscience and principles of neuroscience research; and examples of neurorobots for navigation, developmental robotics, and social robots, presented with the cognitive science and neuroscience background that inspired them. A supplementary website offers videos, robot simulations, and links to software repositories with neurorobot examples.
Rapid non-verbal communication of task-based stimuli is a challenge in human-machine teaming, particularly in closed-loop interactions such as driving. To achieve this, we must understand the representations of information for both the human and machine, and determine a basis for bridging these representations. Techniques of explainable artificial intelligence (XAI) such as layer-wise relevance propagation (LRP) provide visual heatmap explanations for high-dimensional machine learning techniques such as deep neural networks. On the side of human cognition, visual attention is driven by the bottom-up and top-down processing of sensory input related to the current task. Since both XAI and human cognition should focus on task-related stimuli, there may be overlaps between their representations of visual attention, potentially providing a means of nonverbal communication between the human and machine. In this work, we examine the correlations between LRP heatmap explanations of a neural network trained to predict driving behavior and eye gaze heatmaps of human drivers. The analysis is used to determine the feasibility of using such a technique for enhancing driving performance. We find that LRP heatmaps show increasing levels of similarity with eye gaze according to the task specificity of the neural network. We then propose how these findings may assist humans by visually directing attention towards relevant areas. To our knowledge, our work provides the first known analysis of LRP and eye gaze for driving tasks.
The ability to behave differently according to the situation is essential for survival in a dynamic environment. This requires past experiences to be encoded and retrieved alongside the contextual schemas in which they occurred. The complementary learning systems theory suggests that these schemas are acquired through gradual learning via the neocortex and rapid learning via the hippocampus. However, it has also been shown that new information matching a preexisting schema can bypass the gradual learning process and be acquired rapidly, suggesting that the separation of memories into schemas is useful for flexible learning. While there are theories of the role of schemas in memory consolidation, we lack a full understanding of the mechanisms underlying this function. For this reason, we created a biologically plausible neural network model of schema consolidation that studies several brain areas and their interactions. We believe that this model will have an important impact on the study of memory consolidation, providing fresh and testable hypotheses that will further motivate experiments in the interaction between neuromodulation, schemas, and indexing. Moreover, the topic is far-reaching, as a systems level understanding of how the brain separates the learning of tasks is valuable for machine learning researchers solving the challenges of transfer learning and reducing catastrophic forgetting in artificial neural networks.
A manuscript of this work is on BioRxiv at https://www.biorxiv.org/content/early/2018/10/04/434696
Recent developments in neuromorphic engineering have enabled low-powered processing and sensing in robotics, leading to more efficient brain-like computation for many robotic tasks such as motion planning and navigation. However, present experiments in neuromorphic robotic systems have mostly been performed under controlled indoor settings, often with unlimited power supply. While this may be suitable for many applications, these algorithms often fail in outdoor dynamic environments that could benefit the most from the low size, weight, and power of neuromorphic devices. I am interested in the current challenges of outdoor robotics, how current neuromorphic solutions can address these problems, our current approaches to the task, and what further needs to be achieved to create a complete neuromorphic solution to outdoor navigation and path planning.
T. Hwu, A. Y. Wang, N. Oros, and J. L. Krichmar. (2018). Adaptive robot path planning using a spiking neuron algorithm with axonal delays. IEEE Transactions on Cognitive and Developmental Systems. [pdf]
T. Hwu, J. Isbell, N. Oros, and J. L. Krichmar. (2017). A self-driving robot using deep convolutionalneural networks on neuromorphic hardware. IEEE International Joint Conference on Neural Networks (IJCNN), Anchorage, AK. [pdf]
T. Hwu, J. L. Krichmar, and X. Zou. (2017). A complete neuromorphic solution to outdoor navigation and path planning. Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, Maryland. [pdf]
EEG-based Brain-Computer-Interfaces are becoming available as consumer-grade devices, used in applications from gaming to learning programs with neuro-feedback loops. While enabling attractive applications, their proliferation introduces novel privacy concerns and security threats. One such example are attacks in which adversaries compromise EEG-based BCI devices, and are able to analyze the users brain activity to infer private information about a user, such as their bank or area-of-living. However, a key limitation of the above attacks is that they require user cooperation, and are thus easily detectable and rendered inefficient after discovery. We propose and analyze a more serious threat – a subliminal attack in which, given that the visual probing lasts for less than 13.3 milliseconds, the existence of any stimulus is below ones cognitive perception. We show that, even under such strong limitations, the attackers can still analyze subliminal brain activity in response to the rapid visual stimuli and consequently infer private information about the user.
M. Frank, T. Hwu, S. Jain, R. Knight, I. Martinovic, P. Mittal, D. Perito, I. Sluganovic, D. Song. (2017). Using EEG-Based BCI Devices to Subliminally Probe for Private Information. Proceedings of the 2017 on Workshop on Privacy in the Electronic Society (pp. 133-136). ACM. [pdf]
Recommender systems are integrated into many internet-based services, including movie recommendations, shopping suggestions, and automatic playlist generation. Although there are a number of effective recommender models inspired by traditional machine learning methods, the use of psychological models in recommendation is far less explored. We propose that the process of finding similar music tracks parallels the cognitive task of generalization, which could potentially be used to aid in music playlist recommendation. In generalization, stimuli are defined within a psychological space, in which previously experienced stimuli are used to create generalizations about newly presented stimuli. Similarly, a person who is trying to construct a playlist will use their prior musical knowledge or intuitions to find songs of similar taste. The main objective of this work is to evaluate the effectiveness of applying psychological models to large scale online datasets which contain the listening histories of users. The models are tested both qualitatively and quantitively by holding out portions of the dataset and evaluating how well they can predict the missing information. Using common metrics from information retrieval, we explore the advantages and differences of using psychological models over traditional machine learning models in recommender systems. Additionally, we provide an example of how large existing databases of human behavior can be used to conduct psychology experiments in a robust and affordable manner.
T. Hwu. Exploring the Uses of Psychological Models of Generalization in Music Recommendation Systems. Undergraduate Thesis. University of California, Berkeley, 2014. [pdf]